SlideShare a Scribd company logo
1 of 14
Gellish
             A standard data and knowledge representation
                        language and ontology

                           Are Data Models becoming Superfluous?


                                                       by
                                           Ir. Andries van Renssen
                                     Shell Global Solutions International
                                            Andries.vanRenssen@shell.com


Abstract
Data storage and data communication lack a common standard universal data model as well
as a common data language and knowledge base with a taxonomy of concepts and a
grammar for data exchange messages. This article presents a solution to this problem in the
form of the new Gellish language and knowledge base, as an extension of the standard data
models and ontology of two new ISO standards. The article presents Gellish as a language
for neutral data exchange between systems, that can replace data models, and that provides
an extendable ontology with standard reference data for customization and harmonization of
systems. The definition of Gellish includes the public domain (“open data”) Gellish
knowledge base with definitions of a large number of concepts and product models.
It illustrates that a single Gellish Table in a database or data exchange file, is sufficient to
express a wide range of kinds of facts about classes as well as facts about individual objects.
Keywords: knowledge representation, data exchange, language, data models,
standards, ontology, semantic web, knowledge base, classification system


Table of Content
1 Introduction...........................................................................................................................2
   1.1 Standard data models, ontologies and reference data.....................................................3
2 The Gellish language and ontology.......................................................................................4
3 Storage and exchange of data as well as semantics in Gellish..............................................5
4 Interpretation of expressions.................................................................................................7
5 Experiences and applications...............................................................................................10
6 Conclusions..........................................................................................................................13
7 References............................................................................................................................13




Gellish                                                              1                                                   13/04/2010
1 Introduction
Currently, each software system stores its data using its own data model and communicates
with other systems usually using a dedicated interface data structure, which means that it
applies a dedicated interface data model. The large variety of data models cause that data
exchange between systems is costly because of the required conversion of the data from the
semantics of one data model to the other. This demonstrates the urgent need for widely
applicable common standard data models.
Often systems can be ‘customized’ by adding ‘reference data’ as instances, such as the
definition of equipment types, document types, activity types, property types, pick lists, etc.
However, reference data are usually different per implementation, even when database
structures of different systems are equal, such as is the case with several implementations of
the same system. This also holds for different implementations of the same system, such as
a CAD, CAE, PDM, PLM, ERP or CRM system. The consequence is that data in those
implementations can still not be compared, integrated or exchanged without costly data
conversion processes. This illustrates the urgent need for a common dictionary,
classification system or taxonomy of reference data, because there is currently no standard
user data language.
In the current systems there is a separation between the world of data models and the world
of instances. Data models are developed by IT specialists (data modelers) who document
them using either proprietary tools or using a standard data modeling language, such as
EXPRESS (ISO 10303-11) or UML, which languages are especially designed to define data
models. Once a data model is defined in such a language, the data model acts as another
language in which the reference data as well as the user data has to be expressed. The use of
two different languages, one for the model, one for the user data, illustrates the barrier
between the two worlds. It is as if the English language definition is expressed in Chinese.
On top of this comes that each programmer and each reference data producer is free to
define his own terminology using those data definition languages!
The result of the current state of the art is that data storage is done in a Babylonian mix of
data models and reference data ‘languages’ with the consequence that exchange of data
between systems is impossible, except where dedicated bilateral translators are created not
only for each pair of data models, but also for the data content ‘languages’.
The current situation is sketched by Smith and Welty (2001) as follows: “Out of the
apparent chaos, some coherence is beginning to emerge. Gradually, computer scientists are
beginning to recognize that the provision, once for all, of a common, robust reference
ontology – a shared taxonomy of entities – might provide significant advantages over the
ad-hoc, case-by-case methods previously used”.
Several attempts are made to develop an ‘upper ontology’, such as SUMO by Niles and
Pease (2001), the IEEE Standard Upper Ontology, SUO (2001), the Cyc ontology, Lenat
(1995) and GOL, Degen et al (2001). However none of them integrates the upper level
ontology with a lower level ontology of reference data. In other words they do not integrate
a generic data model with reference data and a language for the description of knowledge
and of individual objects and processes.
This article presents a solution to the above-mentioned issues in the form of the Gellish
language. Gellish satisfies the criteria for proper ontologies as expressed by Degen et al
(2001 par 6.1), but is not limited to an upper ontology. It includes and extents concept
definitions that also appear in other sources such as ISO standards and IEC standards, and
knowledge stemming from industry standards and proprietary sources. It is extendable just
as any natural language. Its taxonomy and knowledge base uses unique identifiers for


Gellish                                         2                                   13/04/2010
concepts, thus allowing for synonyms and multiple names in various languages. The latter
      enables the expression of propositions about facts in one natural languages and automatic
      translation and presentation in any other natural language.
      Gellish eliminates the traditional barrier between the data model definitions of classes and
      the data instances. The Gellish language demonstrates that this barrier is not necessary and
      that there are clear advantages when class definitions, reference data and user data are
      expressed in one and the same language.
1.1   Standard data models, ontologies and reference data
      There are several developments of standard lower level ontologies and reference data
      libraries, stimulated among others by requirements of the e-commerce ‘market places’ and
      the developments around The Semantic Web promoted by Lee et al (2000) and the Web
      Ontology Language OWL.
      For example, the UNSPSC code (http://www.unspsc.org/), Ecl@ss (http://www.eclass.de/),
      Trade Ranger (http://www.trade-ranger.com/EN/Pages/ContentStandards.asp), etc. These
      standards have their value mainly in the standardization of terminology, but do not provide
      a standard language or a standard data model for general use, because of their limited
      semantic expression power due to the fact that they apply only a few relation types and lack
      of integration with a rich upper ontology.
      There have also been several attempts to develop standard data models for data exchange
      or for data storage. Some of them are proprietary, but others are in the public domain. Those
      standard data models are defined independent of a particular system, and are therefore called
      ‘neutral’. Those standard data models are usually developed for a particular application
      domain instead of being limited to a particular system.
      Examples of standard data models are the STEP family of standards in ISO 10303, such as a
      graphics data model AP203, a data model for the automotive industry (AP214), one for
      piping systems (AP227), one under development for the defense industry (AP239, PLCS),
      etc. The integration of all those data models into one overall data model is not yet fully
      achieved. Although the scopes of these valuable standard data models are wide, they are still
      limited to particular application area’s and do not provide a general ‘common language’ yet.
      A further step towards a data model with a generic scope was the development and
      publication of the Epistle Core Data Model (2001), in which development the author of this
      article participated. From that, two new ISO standards were derived, ISO 15926-2 and its
      counterpart within the STEP family (AP221). Although these generic data models stem
      from the process industries, they have the generic nature of an upper level ontology, which
      make that they are applicable in other application domains as well.
      To become practically applicable in a particular application domain, these generic data
      models need a standard ‘reference data library’ or lower level ontology, in order to add
      standard definitions of application domain specific concepts and to specialize the generic
      data model. The author coordinated the development of such a standard reference data
      library, called STEPlib. This is a main source for the common standard library ISO
      15926-4.
      Then it was discovered that the top of the specialization hierarchy of standard data in the
      library coincided with the entities, attributes and relations in the generic data model. This
      led to the inclusion of the data model in the library. In other words, the upper level ontology
      was combined with a lower level ontology. The insight that information should be contained
      in relations and not in objects, led to the birth of the Gellish language, which is based on
      standard relation types, expressed by natural language ‘phrases’.




      Gellish                                        3                                   13/04/2010
2 The Gellish language and ontology
Gellish is a public domain standard data and knowledge representation language and
ontology that that is defined in STEPlib. It does not have the barrier between the user data
and the IT data model data. It contains and extents the concepts of the above mentioned
generic data models and integrated and extended them with standard reference data and a
knowledge base with product and process models. The ontology includes also the definition
of a large number of standard fact types (or relation types) that defines the grammar of the
Gellish language. It contains the definition of over 20.000 concepts arranged in a
specialization hierarchy of classes. These concepts can be interpreted as entity types,
attribute types and relationship types or as a classification system or taxonomy. This makes
Gellish equivalent to a very large data model.
In addition to that STEPlib contains a large number of relations between the concepts. They
define the content of the knowledge base of product models and process models.
Gellish is not object oriented, but fact oriented. The basic Gellish object is therefore a fact.
Each (atomic) fact is expressed as a relation between (two) objects.
For example, fact 1 is expressed by a particular relation between objects with unique
identifiers (UID’s) 100 and 101. This expression (1, 100, 101) illustrates the structure of
each basic Gellish expression. Gellish requires that both the objects and the fact must be
classified explicitly by standard classes, including standard relation types. The standard
classes are predefined in the Gellish ontology. In addition to that, objects may have a name.
This enables that the expression can be interpreted correctly by software.
Gellish and the above mentioned ISO standards are both based on the understanding that
there appears to exist a limited set of application independent standard relation types that are
sufficient to model all kinds of products and processes. Gellish standardizes these relation
types. The relation types also define the role types that the related objects play in the
relations with each other. The variety and extendibility of standard relation types define the
semantic expression capabilities of Gellish.
A large part of the Gellish relation types is defined in the ISO standards and an extended set
is defined in the TOP part of the Gellish language definition (STEPlib).
A standard implementation of Gellish is defined as a Gellish Table. In a Gellish Table the
basic Gellish expression becomes:
Left hand      Left hand    Fact     Relation         Relation type          Right         Right hand
 object         object      UID      type UID             name                hand         object name
  UID            name                                                      object UID
   100           thing-1        1      2850            is related to            101            thing-2


In a Gellish Table one (atomic) fact is represented by one record, being as a relation
between two object UID’s, the names of the objects and the classification of the fact. The
classification of the objects is done via separate classification facts in additional records.
Some examples of facts from a particular application domain, which illustrates the use of
standard Gellish relation types are:
  Left       Left hand Fact Relation Relation type name                Right     Right hand        Scale
 hand       object name UID type UID                                   hand      object name
 object                                                                object
  UID                                                                   UID
130091      diesel engine   2       1146   is a specialization of      130108         engine




Gellish                                           4                                        13/04/2010
104            M-1          3      1225       is classified as a     130091    diesel engine
130802         cylinder       4      1146     is a specialization of   730063       artifact
  107            C-1          5      1225       is classified as a     130802       cylinder
  107            C-1          6      1190           is part of          104          M-1
  107            C-1          7      1727          has aspect           108     volume of C-1
  108       volume of C-1     8      1225       is classified as a     550140   internal volume
  108       volume of C-1     9      2044       is quantified as       922235        1800         cm3
  104            M-1         10      4760         is subject of         110         order-1


   •      Note, for human readability, the relation type UID is ignored in the tables below.
The above table illustrates:
   -      Standard Gellish relation types, that classify the facts, and that determine the
          expression capabilities and semantics of Gellish.
   -      Examples from the large number of standard object types that are predefined in
          Gellish. For example: engine, diesel engine, cylinder, artifact, internal volume, 1800
          and cm3.
   -      The way in which new object types can be added: such as fact 2 and 4. Although
          they already exist in Gellish. But if diesel engine and cylinder would not have
          existed, they could have been added in this way.
   -      It is possible in Gellish to express facts, such as the volume of C-1, without the need
          that such a fact is pre-modeled in the data model. Although such a fact type could be
          defined in Gellish, after which this particular instance can be verified against such a
          definition. It could also be defined to be obligatory in a particular context, after
          which the instances can be validated on completeness and compliance.
   -      One table is suitable to express many kinds of facts.
Note: The table above presents just an example of some of the capabilities of Gellish. For
      example, Gellish also allows to express in which language the facts are expressed,
      whether the objects are real or imaginary, what the communicative intent is, who the
      author of a proposition is and the addressee, etc.
3 Storage and exchange of data as well as semantics in Gellish
In this paragraph I will describe how knowledge, data and semantics are represented in
Gellish.
The generic nature of Gellish allows expressing any complex network of facts. For example
it allows expressing that:
- physical objects (of any kind) have properties (of any kind),
- properties have values,
- physical objects have parts,
- physical objects participate in activities or processes in particular roles,
- etc.
But for clarity I will use a specific example, being the fact that:
- a particular pump (‘P-1’) is pumping a particular stream (‘S-1’).
In a conventional database it is required to declare some entity types and attribute types that
define the semantics in the form of a data model. In case of the example, the data model


Gellish                                              5                                      13/04/2010
could for example consist of the entity types ‘pump’, ‘process’ and ‘stream’, each with
some attributes.
In Gellish, the concepts ‘pump’, process’ and ‘stream’ are not entity types, but they are
concepts that are defined via facts that are expressed as relations in a generic knowledge
base.
The knowledge base is built on a structure that only ‘knows’ the minimum number of
‘basic semantic concepts’ and contains the definition of a large number of concepts. The
minimum set of ‘basic semantic concepts’ comprises the fundamental ontological axioms of
Gellish in a structure that should be known and understood and which is sufficient for the
definition of additional semantic concepts. That structure is presented in figure 1.
For the definition of a new concept (‘anything’) it is required to define such a coherent
structure of elementary facts. Each elementary fact is expressed a relation between two
concepts, represented by the blue boxes in figure 1. In other words, each new concept
requires the creation of a structure as presented in figure 1.

                                                  kind of thing




                              is (a)                      is a            is a




                                                        role
  anything                   playing                                 requirement            relation
                                                  (of something
                              a role                                    of role
                                                    in relation)
                             plays                                        in
  - object-1                                         - role-1
                          played by                                     requires            - relation-1
  - object-2                                         - role-2


                        Figure 1, Structure of basic semantic concepts
The minimum set of ‘basic semantic concepts’ that are the axioms of Gellish and which
meaning should be understood is:
          - anything
          - role
          - relation / relations
               - plays role
               - requires role
               - is / is a         (is classified as a)
          - individual thing / individual things
          - kind of thing / kind of things
          - single thing / plural thing


The structure of figure 1 holds for facts about classes as well as facts about individual
objects (instances) or relations, but also for single objects as well as for plural objects. In
other words, object-1 and object-2 in figure 1 can be either a single or plural individual
object, relation or class. The lines in the top left corners of the boxes indicate that the
structure is a typical instance.
Any other ‘atomic fact’ is expressed as such a structure. In other words, any atomic fact is
expressed as an ‘atomic relation’ between two or more ‘objects’ and by the classification of


Gellish                                              6                                 13/04/2010
the ‘objects’, the ‘roles’ and the ‘relation’. This implies that an atomic fact is expressed by a
structure of nine (9) relations, formed by the blue boxes in figure 2 (note that 4 of the 5
boxes appear twice in an atomic fact).
For example the fact that impeller O1 is part of centrifugal pump O2 is expressed in Gellish
by the following 4 elementary relations:
   - O1     plays role            R1
   - R1     is required by        C1
   - C1     requires role         R2
   - R2     is played by          O2
These 4 relations relate 5 objects. To interpret them correctly the following 5 additional
classification relations are required:
   - O1     is classified as an   impeller
   - R1     is classified as a    part
   - C1     is classified as a    composition relation (“is part of”)
   - R2     is classified as a    whole
   - O2     is classified as a    centrifugal pump
In practical implementations it appears that the explicit identification of the roles and their
classification can be neglected, because they follow from the classification of the relation
and the definition of the relation type.
Therefore the above relations are usually summarized in 3 Gellish atomic expressions as
follows:
   - O1     is classified as an impeller
   - O1     is part of          O2
   - O2     is classified as a centrifugal pump
From this example it can be seen that the 5 kinds of things with which the 5 objects are
classified need to be present in or added to the semantics of the Gellish knowledge base in
order to ensure that the fact can be interpreted correctly.
The awareness that a knowledge base of predefined concepts is required for a correct
interpretation of Gellish expressions resulted in the development of the top-down
hierarchical definition of the Gellish knowledge base of concepts, including also relation
types, as available in STEPlib.
Knowledge representation: relations between classes
Any fact type that extends the semantics is expressed as a relation between kinds of things.
For example, assume that the concept ‘centrifugal pump’ needs to be added. Then the
following two atomic relations define that concept:
   1. A specialization relation that defines that:
          centrifugal pump is a specialization of               pump
   2. A relation that defines that a centrifugal pump by definition uses the centrifugal
      principle:
          centrifugal pump has by definition as aspect centrifugal.
These relations build respectively on the definition of the concept ‘pump’ and ‘centrifugal’.
4 Interpretation of expressions
In current database technology the semantic interpretation of an expression is done via the
fact that any object is implicitly classified by being an ‘instance’ of an entity of which the
semantics are defined.


Gellish                                           7                                  13/04/2010
For example, assume that P1 is an instance of an attribute called ‘name’ of an entity called
‘pump’. This probably means that P1 is the name of a thing that is classified as a pump,
although this meaning comprises two facts that are usually not defined in a computer
interpretable way. It should be noted that if there are no other attributes, this data structure
does not allow the classification of P1 as a centrifugal pump.
In Gellish all semantics is made explicit by the creation of explicit classification relations
between the elements in the expression and the Gellish concepts (classes of objects,
including relations). This replaces the instantiation relations and eliminates the need to
define a data model with entities and attributes, such as the entity ‘pump’ and the attribute
‘name’. This is illustrated in figure 3.


                                                                 Green shaded area = Gellish ontology (STEPlib)




               130206                                              730083                                                      192512
          pump                   is performer of          liquid stream                is subject in                    pumping
               classifier                 classifier               classifier                   classifier                   classifier
                     13                                                 15                                                         14
   is classified as aa
    is classified as          is classified as aa
                               is classified as        is classified as aa          is classified as aa
                                                                                     is classified as             is classified as aa
                                                                                                                   is classified as
                                                        is classified as
               classified                                          classified                 classified                     classified
                                                                                       12
                                                                                                                                    112
          ‘P-101’ 111                                         ‘S-1’    113      ‘is subject in pumping S-1’         ‘pumping S-1’
                                  11     classified                       player                             requirer
                          ‘is performer of pumping S-1’
            player                                                                                                requirer



  Figure 2, Linking a Gellish expression to Gellish concepts through classification
Figure 2 illustrates the expression: P-101 is pumping S-1” (in dark yellow). The ‘pumping
S-1’ process is an interaction between the fluid S-1 and the pump P-101. The pump has the
role as performer and the liquid has the role as subject in the pumping process. The blue
boxes in the green shaded area represent the Gellish concepts, being instances in the Gellish
knowledge base, STEPlib. The explicit classification relations with the concepts in those
blue boxes provide the semantics for the interpretation of the expression.
In a Gellish Table this becomes:
Left hand          Left hand             Fact UID         Relation type name                Right hand            Right hand
object UID        object name                                                               object UID            object name
   111                    P-101               11           is performer of                     112               pumping S-1
   113                     S-1                12             is subject in                     112               pumping S-1
   111                    P-101               13           is classified as a                130206                     pump
   112           pumping S-1                  14           is classified as a                192512                 pumping
   113                     S-1                15           is classified as a                730083              liquid stream


Such a set of rows in a Gellish Table can be exchanged between Gellish enabled software
packages in any kind of table, such as an MS-Access database table, an Oracle or DB2 table,


Gellish                                                        8                                                    13/04/2010
XLS spreadsheet, an XML file (e.g. according to ISO 10303-28) or in STEP physical file
format (ISO 10303-21). Further details are described in ref. 1.
Note that the shaded light yellow boxes all have the same name: “is classified as a”.
However, they are different individual classification relations. Each of those relations has a
unique identifier (13, 14 and 15). The name in the shaded box indicates that each is
(implicitly) “conceptualized” to be a classification relation. In other words, each of them is a
“is classified as a” relation.
For a correct interpretation of the Gellish concepts they need to be defined in a computer
interpretable way. This is done via specialization/generalization relations as is illustrated in
figure 3. These specialization relations form one hierarchical network terminating at the top,
called ‘anything’. This generic top supports the wide applicability of Gellish, as any missing
concept can be added to Gellish as a subtype of an existing concept.


                                                                anything

                                                          is aa specialization of
                                                           is specialization of                                                  individual things
 Green area = Gellish ontology
                                                            individual thing instance isis aninstance of
                                                                                         an instance of                           kinds of things
                                                             supertype                                                  entity
                                     is aa specialization of
                                      is  specialization of     is aa specialization of
                                                                 is specialization of            is aa specialization of
                                                                                                  is specialization of
                                            subtype                                                                                        instance
                                  physical object         supertype                 relation                                          activity
             supertype                                                          supertype                                                  supertype
  is aa specialization of
   is specialization of        is aa specialization of
                                is specialization of        is aa specialization of
                                                             is specialization of            is aa specialization of
                                                                                              is specialization of           is aa specialization of
                                                                                                                              is specialization of
                subtype                      subtype                       subtype                        subtype                          subtype
          pump                   is performer of                liquid stream                     is subject in                      pumping
                classifier                   classifier                    classifier                      classifier                       classifier
   is classified as aa
    is classified as            is classified as aa
                                 is classified as            is classified as aa               is classified as aa
                                                                                                is classified as                 is classified as aa
                                                                                                                                  is classified as
                                                              is classified as
                classified                                                 classified                    classified                        classified
          ‘P-101’                                                    ‘S-1’               ‘subject in pumping S-1’                  ‘pumping S-1’
                                           classified                               player                              requirer
                            ‘performer of pumping S-1’
             player                                                                                                          requirer



            Figure 3, Definition of Gellish concepts in a specialization hierarchy
In practice there are several intermediate levels of specialization between e.g. ‘pump’ and
‘physical object’ and ‘anything’, etc. Furthermore there are classes of physical objects
defined as subtypes of ‘physical object’. These can be extended by specializations, such as
standard components (e.g. from ASME, BSI or DIN standards) and also specializations such
as manufacturer catalogue items (e.g. Manufacturer models and types).
Figure 3 contains eight facts expressed as eight “is a specialization of” relations, each of
which is a separate relation between classes. Similarly to what is described above about the
“is classified as a” relation, this illustrates that the term ‘is a specialization of’ is not the
name of each of those relations, but it is a name of the Gellish concept (the class) that is the
conceptualization of those relations.
The knowledge about the meaning of the concepts pump, ‘is performer of’, liquid stream,
‘is subject in’ and pumping is defined in the Gellish ontology STEPlib. Some of that is


Gellish                                                                9                                                          13/04/2010
illustrated in the following facts, which includes some intermediate facts not shown in
figure 3 (the UID’s and names are taken from STEPlib, except for the UID’s of the facts):
Left hand      Left hand       Fact UID      Relation type name      Right hand      Right hand
object UID    object name                                            object UID      object name
 130206           pump             16       is a specialization of     730044       physical object
  4761       is performer of       17       is a specialization of      4767        is involved in
  4761       is performer of       18        requires as role-1 a      640020         performer
 730044      physical object       19       can have as role as a      640020         performer
  4761       is performer of       20        requires as role-2 a       4773           involver
 730083       liquid stream        21       is a specialization of     730045           stream
  4760        is subject in        22       is a specialization of      4767        is involved in
 192512         pumping            23       is a specialization of     190168          process


This knowledge is inherited from higher concepts in the hierarchy to lower level concepts.
If an individual object is classified to be of such a class, then the knowledge is applicable to
the individual object as a constraint for the specific aspects of the individual object.
5 Experiences and applications
Gellish is applied to express
- information about individual objects,
- knowledge about kinds of objects,
- requirements for data and documents in particular contexts about individual objects and
about kinds of objects.
These three application are related to each other, as is illustrated in Figure 4.




Gellish                                         10                                    13/04/2010
Product / Requirements / Knowledge models
                Product Model                                  Requirements Model               Knowledge Model
                       has / is                              shall have a / shall be a       can have a / can be a
                                                               (in the context of a)
                       Dongting



                                                                   SHELLlib                         STEPlib

                        SGP
                                                                      DEP xxx
             Coal gasification
                  facility                                          shall comply with
                                                                                                    compressor
                  U-1300                                            shall have a

         K-1301 system                                                                         luboil system

            K-1301             is classified as a                                                        can have a
                                                                      shall have a

    LubOil-100
                                                                                                           capacity



 Copyright: Shell Global Solutions International B.V.

                                                Figure 4, Three types of Gellish Models
The left hand of Figure 4 represents a Product Model that illustrates a Gellish model of a
process plant (the thick black lines represent composition relations). The relation types in a
product model generally start with ‘is’ or ‘has’. For example, K-1301 system is part of
U-1300 and K-1301 is classified as a compressor.
The right hand Knowledge Model illustrates the content of the STEPlib knowledge base.
The relation types in a knowledge model generally start with ‘can be a’ or ‘can have a’. For
example, a compressor can have a capacity and a lubrication oil system can be part of a
compressor.
The middle part of Figure 4 illustrates a proprietary Requirements Model that expresses
which data has to be present in a particular context. The relation types in a requirements
model generally start with ‘shall be a’ or ‘shall have a’.
For example, we developed requirements models that express that in the context of
‘handover’ of data from design to operations a compressor shall have a capacity (in the
context of a handover) and a compressor shall be compliant with design guide xx, in the
same context. This is expressed in Gellish as follows:
130069                    compressor                    24          shall have a          551564        capacity
130069                    compressor                    25     shall be compliant with    5490386      DEP 31….


When data about a compressor is handed over, then this Gellish specification makes it
possible to do an automated verification of the completeness of that data, whereas that
verification is driven by the requirements model. This is illustrated in figure 5.



Gellish                                                              11                                13/04/2010
Figure 5, Automated verification of a design against a requirements model
The right hand side of figure 5 illustrates the content of the SHELLlib knowledge base,
which is a proprietary extension of STEPlib, which also uses Gellish. It illustrates how the
knowledge in STEPlib and SHELLLlib is inherited via the specialization hierarchy. Because
although P-101 is classified as a centrifugal pump, the requirement that is defined for a
pump in general can automatically be made applicable to P-101, because of the defined
inheritance via the specialization hierarchy.
The specialization hierarchy also enables intelligent queries. For example search engines can
perform intelligent searches on subtypes of keywords. For example, a document which is
recorded to contains information about a line shaft pump can also be found if documents are
searched about ‘centrifugal pump’. And a query on ‘pump’ can also find P-101, being
classified as centrifugal pump.
An example of a commercial application of Gellish is a Gellish Browser developed by Mi2.
The browser can read (and write) data expressed in the Gellish language and is able to
present any knowledge about classes of objects and any data about individual objects. It was
expected that implementation of Gellish would have serious performance issues. Therefore
the Browser was loaded with over 60.000 facts, originating from different systems, but all
expressed in a Gellish Table. These facts included the Gellish knowledge base, extended
with a Shell proprietary standards database, data about documents, a materials catalogue, an
equipment list and material balances of the design of a process plant.
It appears to have an excellent performance.




Gellish                                      12                                  13/04/2010
We also customized an implementation of the Eigner PLM product lifecycle management
system and loaded the same data in that system. This system also had a good performance.
We are currently working on the customization of existing systems so that they can export
data in a Gellish Table. The Browser can then be used to view data from various systems
and data can be imported and integrated with other data in the Eigner PLM system.
It is our intention to use a Gellish Table among others as a data exchange language for data
hand-over of design data between engineering contractors and plant owners and for data
about catalogue items and items delivered by suppliers.
Further work will explore the use of Gellish for the exchange of messages by intelligent
Agent software, acting as nodes in the Semantic Web. For example business communication
messages about transactions in E-procurement.
6 Conclusions
The above illustrates that the current practice to define data models separate from reference
data and user data is unnecessary. Integration of data model concepts with reference data
and user data in one consistent language can provide a single common standard language for
data storage and exchange that can significantly reduce development costs and can simplify
data communication.
A common use of the little data model of figure 2, together with the common use of the
Gellish ontology makes it possible to express and interpret a very wide scope of types of
facts. This is possible because the explicit classification relations provide interpretation rules
for the expressions for which the relation types as well as the object types are defined in
Gellish. It is only required to have the concepts defined in the Gellish knowledge base and
to refer to them as in the basic structure using the ‘basic semantic axioms’ mentioned above.
The above illustrates that:
   -      It is possible that a common standard knowledge base of concepts and relations
          between concepts can replace many data models.
   -      The Gellish knowledge base of concepts solution is more flexible than fixed data
          models and it is easier to add semantics to the database.
   -      The Gellish knowledge base of concepts provides an application independent
          language with a semantic basis that is equivalent to a very large data model. If
          sufficient concepts of an application domain are present or added, then data models
          for such an application domain can become superfluous.
   -      The Gellish knowledge base, using the inheritance capabilities of the specialization
          hierarchy, provides extendable product models for many types of objects.
   -      The implementations have proven that a Gellish knowledge base can be
          implemented with good performance.
   -      The implementations have proven that neutral format data exchange using a Gellish
          Table is a feasible solution.
As Gellish is in the public domain, proposals for extensions of the Gellish language are
invited.
7 References
          1. Andries van Renssen, “The Gellish Table and its Formats”. A definition of the
             Gellish Table and its implementation syntax for Gellish messages.
             www.steplib.com.


Gellish                                         13                                    13/04/2010
2. Andries van Renssen, “Guide on STEPlib”. This guide describes how STEPLib
             is defined and how to extent the Gellish language and knowledge base.
             www.steplib.com.
          3. STEPlib, the Gellish knowledge base. This is a set of Gellish Tables (available in
             Excel and in MS Access). The upper level ontology part is documented in the
             TOPini part. www.steplib.com.
          4. Tim Berners-Lee, James Hendler and Ora Lassila, 'The Semantic Web',
             Scientific American, May 2001;
             http://www.sciam.com/issue.cfm?issueDate=May-01.
          5. OWL, Web Ontology Language Overview. http://www.w3.org/TR/owl-features/
          6. Ian Niles and Adam Pease (2001), “Towards a Standard Upper Ontology”, in:
             Formal Ontology in Information Systems, ISBN 1-58113-377-4.
          7. SUO (2001), The IEEE Standard Upper Ontology website, http://suo.ieee.org.
          8. Lenat, D. (1995), “Cyc: A Large-Scale Investment in Knowledge Infrastructure”,
             Communications of the ACM, 38, no 11 (November 1995).
          9. Wolfgang Degen, Barbara Heller, Heinrich Herre and Barry Smith (2001),
             “GOL: A General Ontological Language”, in: Formal Ontology in Information
             Systems, ISBN 1-58113-377-4.
          10. The Epistle Core Data Model (2001),
              http://www.btinternet.com/~chris.angus/epistle/specifications/ecm/ecm_400.html




Gellish                                         14                                 13/04/2010

More Related Content

What's hot

Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integrationjuanesteva
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionKent State University
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big DataKouji Kozaki
 
The Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyThe Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyMyungjin Lee
 
Database-to-Ontology Mapping Generation for Semantic Interoperability
Database-to-Ontology Mapping Generation for Semantic InteroperabilityDatabase-to-Ontology Mapping Generation for Semantic Interoperability
Database-to-Ontology Mapping Generation for Semantic InteroperabilityRaji Ghawi
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translateIJwest
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning usingIJwest
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology MappingPradeep B Pillai
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based ReporterStefan Prutianu
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreOntology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreAdriel Café
 
A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...Michel Dumontier
 

What's hot (20)

Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integration
 
Ontology
OntologyOntology
Ontology
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big Data
 
E05412327
E05412327E05412327
E05412327
 
The Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyThe Standardization of Semantic Web Ontology
The Standardization of Semantic Web Ontology
 
NIF as a Multi-Model Semantic Information System
NIF as a Multi-Model Semantic Information SystemNIF as a Multi-Model Semantic Information System
NIF as a Multi-Model Semantic Information System
 
Database-to-Ontology Mapping Generation for Semantic Interoperability
Database-to-Ontology Mapping Generation for Semantic InteroperabilityDatabase-to-Ontology Mapping Generation for Semantic Interoperability
Database-to-Ontology Mapping Generation for Semantic Interoperability
 
Using linguistic analysis to translate
Using linguistic analysis to translateUsing linguistic analysis to translate
Using linguistic analysis to translate
 
Ontology
Ontology Ontology
Ontology
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
 
Cs501 intro
Cs501 introCs501 intro
Cs501 intro
 
Web Spa
Web SpaWeb Spa
Web Spa
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based Reporter
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
RDF and Java
RDF and JavaRDF and Java
RDF and Java
 
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreOntology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and more
 
A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...A little more semantics goes a lot further!  Getting more out of Linked Data ...
A little more semantics goes a lot further!  Getting more out of Linked Data ...
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 

Similar to Gellish A Standard Data And Knowledge Representation Language And Ontology

Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorCognitum
 
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentJorge Barreto
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologieseswcsummerschool
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
 
Robust Module based data management system
Robust Module based data management systemRobust Module based data management system
Robust Module based data management systemRahul Roi
 
Semantic Web: From Representations to Applications
Semantic Web: From Representations to ApplicationsSemantic Web: From Representations to Applications
Semantic Web: From Representations to ApplicationsGuus Schreiber
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseEditor IJMTER
 
Semantic technologies at work
Semantic technologies at workSemantic technologies at work
Semantic technologies at workYannis Kalfoglou
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database EssayTammy Moncrief
 
Adri Jovin - Semantic Web
Adri Jovin - Semantic WebAdri Jovin - Semantic Web
Adri Jovin - Semantic WebAdri Jovin
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAudrey Britton
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospectsGuus Schreiber
 
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTSUSING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTScsandit
 
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...David Milward
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanPeter Berger
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 

Similar to Gellish A Standard Data And Knowledge Representation Language And Ontology (20)

Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditor
 
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary Linguistics
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Robust Module based data management system
Robust Module based data management systemRobust Module based data management system
Robust Module based data management system
 
Semantic Web: From Representations to Applications
Semantic Web: From Representations to ApplicationsSemantic Web: From Representations to Applications
Semantic Web: From Representations to Applications
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented Database
 
Semantic technologies at work
Semantic technologies at workSemantic technologies at work
Semantic technologies at work
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database Essay
 
Adri Jovin - Semantic Web
Adri Jovin - Semantic WebAdri Jovin - Semantic Web
Adri Jovin - Semantic Web
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTSUSING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
 
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Ontology Engineering
Ontology EngineeringOntology Engineering
Ontology Engineering
 

Gellish A Standard Data And Knowledge Representation Language And Ontology

  • 1. Gellish A standard data and knowledge representation language and ontology Are Data Models becoming Superfluous? by Ir. Andries van Renssen Shell Global Solutions International Andries.vanRenssen@shell.com Abstract Data storage and data communication lack a common standard universal data model as well as a common data language and knowledge base with a taxonomy of concepts and a grammar for data exchange messages. This article presents a solution to this problem in the form of the new Gellish language and knowledge base, as an extension of the standard data models and ontology of two new ISO standards. The article presents Gellish as a language for neutral data exchange between systems, that can replace data models, and that provides an extendable ontology with standard reference data for customization and harmonization of systems. The definition of Gellish includes the public domain (“open data”) Gellish knowledge base with definitions of a large number of concepts and product models. It illustrates that a single Gellish Table in a database or data exchange file, is sufficient to express a wide range of kinds of facts about classes as well as facts about individual objects. Keywords: knowledge representation, data exchange, language, data models, standards, ontology, semantic web, knowledge base, classification system Table of Content 1 Introduction...........................................................................................................................2 1.1 Standard data models, ontologies and reference data.....................................................3 2 The Gellish language and ontology.......................................................................................4 3 Storage and exchange of data as well as semantics in Gellish..............................................5 4 Interpretation of expressions.................................................................................................7 5 Experiences and applications...............................................................................................10 6 Conclusions..........................................................................................................................13 7 References............................................................................................................................13 Gellish 1 13/04/2010
  • 2. 1 Introduction Currently, each software system stores its data using its own data model and communicates with other systems usually using a dedicated interface data structure, which means that it applies a dedicated interface data model. The large variety of data models cause that data exchange between systems is costly because of the required conversion of the data from the semantics of one data model to the other. This demonstrates the urgent need for widely applicable common standard data models. Often systems can be ‘customized’ by adding ‘reference data’ as instances, such as the definition of equipment types, document types, activity types, property types, pick lists, etc. However, reference data are usually different per implementation, even when database structures of different systems are equal, such as is the case with several implementations of the same system. This also holds for different implementations of the same system, such as a CAD, CAE, PDM, PLM, ERP or CRM system. The consequence is that data in those implementations can still not be compared, integrated or exchanged without costly data conversion processes. This illustrates the urgent need for a common dictionary, classification system or taxonomy of reference data, because there is currently no standard user data language. In the current systems there is a separation between the world of data models and the world of instances. Data models are developed by IT specialists (data modelers) who document them using either proprietary tools or using a standard data modeling language, such as EXPRESS (ISO 10303-11) or UML, which languages are especially designed to define data models. Once a data model is defined in such a language, the data model acts as another language in which the reference data as well as the user data has to be expressed. The use of two different languages, one for the model, one for the user data, illustrates the barrier between the two worlds. It is as if the English language definition is expressed in Chinese. On top of this comes that each programmer and each reference data producer is free to define his own terminology using those data definition languages! The result of the current state of the art is that data storage is done in a Babylonian mix of data models and reference data ‘languages’ with the consequence that exchange of data between systems is impossible, except where dedicated bilateral translators are created not only for each pair of data models, but also for the data content ‘languages’. The current situation is sketched by Smith and Welty (2001) as follows: “Out of the apparent chaos, some coherence is beginning to emerge. Gradually, computer scientists are beginning to recognize that the provision, once for all, of a common, robust reference ontology – a shared taxonomy of entities – might provide significant advantages over the ad-hoc, case-by-case methods previously used”. Several attempts are made to develop an ‘upper ontology’, such as SUMO by Niles and Pease (2001), the IEEE Standard Upper Ontology, SUO (2001), the Cyc ontology, Lenat (1995) and GOL, Degen et al (2001). However none of them integrates the upper level ontology with a lower level ontology of reference data. In other words they do not integrate a generic data model with reference data and a language for the description of knowledge and of individual objects and processes. This article presents a solution to the above-mentioned issues in the form of the Gellish language. Gellish satisfies the criteria for proper ontologies as expressed by Degen et al (2001 par 6.1), but is not limited to an upper ontology. It includes and extents concept definitions that also appear in other sources such as ISO standards and IEC standards, and knowledge stemming from industry standards and proprietary sources. It is extendable just as any natural language. Its taxonomy and knowledge base uses unique identifiers for Gellish 2 13/04/2010
  • 3. concepts, thus allowing for synonyms and multiple names in various languages. The latter enables the expression of propositions about facts in one natural languages and automatic translation and presentation in any other natural language. Gellish eliminates the traditional barrier between the data model definitions of classes and the data instances. The Gellish language demonstrates that this barrier is not necessary and that there are clear advantages when class definitions, reference data and user data are expressed in one and the same language. 1.1 Standard data models, ontologies and reference data There are several developments of standard lower level ontologies and reference data libraries, stimulated among others by requirements of the e-commerce ‘market places’ and the developments around The Semantic Web promoted by Lee et al (2000) and the Web Ontology Language OWL. For example, the UNSPSC code (http://www.unspsc.org/), Ecl@ss (http://www.eclass.de/), Trade Ranger (http://www.trade-ranger.com/EN/Pages/ContentStandards.asp), etc. These standards have their value mainly in the standardization of terminology, but do not provide a standard language or a standard data model for general use, because of their limited semantic expression power due to the fact that they apply only a few relation types and lack of integration with a rich upper ontology. There have also been several attempts to develop standard data models for data exchange or for data storage. Some of them are proprietary, but others are in the public domain. Those standard data models are defined independent of a particular system, and are therefore called ‘neutral’. Those standard data models are usually developed for a particular application domain instead of being limited to a particular system. Examples of standard data models are the STEP family of standards in ISO 10303, such as a graphics data model AP203, a data model for the automotive industry (AP214), one for piping systems (AP227), one under development for the defense industry (AP239, PLCS), etc. The integration of all those data models into one overall data model is not yet fully achieved. Although the scopes of these valuable standard data models are wide, they are still limited to particular application area’s and do not provide a general ‘common language’ yet. A further step towards a data model with a generic scope was the development and publication of the Epistle Core Data Model (2001), in which development the author of this article participated. From that, two new ISO standards were derived, ISO 15926-2 and its counterpart within the STEP family (AP221). Although these generic data models stem from the process industries, they have the generic nature of an upper level ontology, which make that they are applicable in other application domains as well. To become practically applicable in a particular application domain, these generic data models need a standard ‘reference data library’ or lower level ontology, in order to add standard definitions of application domain specific concepts and to specialize the generic data model. The author coordinated the development of such a standard reference data library, called STEPlib. This is a main source for the common standard library ISO 15926-4. Then it was discovered that the top of the specialization hierarchy of standard data in the library coincided with the entities, attributes and relations in the generic data model. This led to the inclusion of the data model in the library. In other words, the upper level ontology was combined with a lower level ontology. The insight that information should be contained in relations and not in objects, led to the birth of the Gellish language, which is based on standard relation types, expressed by natural language ‘phrases’. Gellish 3 13/04/2010
  • 4. 2 The Gellish language and ontology Gellish is a public domain standard data and knowledge representation language and ontology that that is defined in STEPlib. It does not have the barrier between the user data and the IT data model data. It contains and extents the concepts of the above mentioned generic data models and integrated and extended them with standard reference data and a knowledge base with product and process models. The ontology includes also the definition of a large number of standard fact types (or relation types) that defines the grammar of the Gellish language. It contains the definition of over 20.000 concepts arranged in a specialization hierarchy of classes. These concepts can be interpreted as entity types, attribute types and relationship types or as a classification system or taxonomy. This makes Gellish equivalent to a very large data model. In addition to that STEPlib contains a large number of relations between the concepts. They define the content of the knowledge base of product models and process models. Gellish is not object oriented, but fact oriented. The basic Gellish object is therefore a fact. Each (atomic) fact is expressed as a relation between (two) objects. For example, fact 1 is expressed by a particular relation between objects with unique identifiers (UID’s) 100 and 101. This expression (1, 100, 101) illustrates the structure of each basic Gellish expression. Gellish requires that both the objects and the fact must be classified explicitly by standard classes, including standard relation types. The standard classes are predefined in the Gellish ontology. In addition to that, objects may have a name. This enables that the expression can be interpreted correctly by software. Gellish and the above mentioned ISO standards are both based on the understanding that there appears to exist a limited set of application independent standard relation types that are sufficient to model all kinds of products and processes. Gellish standardizes these relation types. The relation types also define the role types that the related objects play in the relations with each other. The variety and extendibility of standard relation types define the semantic expression capabilities of Gellish. A large part of the Gellish relation types is defined in the ISO standards and an extended set is defined in the TOP part of the Gellish language definition (STEPlib). A standard implementation of Gellish is defined as a Gellish Table. In a Gellish Table the basic Gellish expression becomes: Left hand Left hand Fact Relation Relation type Right Right hand object object UID type UID name hand object name UID name object UID 100 thing-1 1 2850 is related to 101 thing-2 In a Gellish Table one (atomic) fact is represented by one record, being as a relation between two object UID’s, the names of the objects and the classification of the fact. The classification of the objects is done via separate classification facts in additional records. Some examples of facts from a particular application domain, which illustrates the use of standard Gellish relation types are: Left Left hand Fact Relation Relation type name Right Right hand Scale hand object name UID type UID hand object name object object UID UID 130091 diesel engine 2 1146 is a specialization of 130108 engine Gellish 4 13/04/2010
  • 5. 104 M-1 3 1225 is classified as a 130091 diesel engine 130802 cylinder 4 1146 is a specialization of 730063 artifact 107 C-1 5 1225 is classified as a 130802 cylinder 107 C-1 6 1190 is part of 104 M-1 107 C-1 7 1727 has aspect 108 volume of C-1 108 volume of C-1 8 1225 is classified as a 550140 internal volume 108 volume of C-1 9 2044 is quantified as 922235 1800 cm3 104 M-1 10 4760 is subject of 110 order-1 • Note, for human readability, the relation type UID is ignored in the tables below. The above table illustrates: - Standard Gellish relation types, that classify the facts, and that determine the expression capabilities and semantics of Gellish. - Examples from the large number of standard object types that are predefined in Gellish. For example: engine, diesel engine, cylinder, artifact, internal volume, 1800 and cm3. - The way in which new object types can be added: such as fact 2 and 4. Although they already exist in Gellish. But if diesel engine and cylinder would not have existed, they could have been added in this way. - It is possible in Gellish to express facts, such as the volume of C-1, without the need that such a fact is pre-modeled in the data model. Although such a fact type could be defined in Gellish, after which this particular instance can be verified against such a definition. It could also be defined to be obligatory in a particular context, after which the instances can be validated on completeness and compliance. - One table is suitable to express many kinds of facts. Note: The table above presents just an example of some of the capabilities of Gellish. For example, Gellish also allows to express in which language the facts are expressed, whether the objects are real or imaginary, what the communicative intent is, who the author of a proposition is and the addressee, etc. 3 Storage and exchange of data as well as semantics in Gellish In this paragraph I will describe how knowledge, data and semantics are represented in Gellish. The generic nature of Gellish allows expressing any complex network of facts. For example it allows expressing that: - physical objects (of any kind) have properties (of any kind), - properties have values, - physical objects have parts, - physical objects participate in activities or processes in particular roles, - etc. But for clarity I will use a specific example, being the fact that: - a particular pump (‘P-1’) is pumping a particular stream (‘S-1’). In a conventional database it is required to declare some entity types and attribute types that define the semantics in the form of a data model. In case of the example, the data model Gellish 5 13/04/2010
  • 6. could for example consist of the entity types ‘pump’, ‘process’ and ‘stream’, each with some attributes. In Gellish, the concepts ‘pump’, process’ and ‘stream’ are not entity types, but they are concepts that are defined via facts that are expressed as relations in a generic knowledge base. The knowledge base is built on a structure that only ‘knows’ the minimum number of ‘basic semantic concepts’ and contains the definition of a large number of concepts. The minimum set of ‘basic semantic concepts’ comprises the fundamental ontological axioms of Gellish in a structure that should be known and understood and which is sufficient for the definition of additional semantic concepts. That structure is presented in figure 1. For the definition of a new concept (‘anything’) it is required to define such a coherent structure of elementary facts. Each elementary fact is expressed a relation between two concepts, represented by the blue boxes in figure 1. In other words, each new concept requires the creation of a structure as presented in figure 1. kind of thing is (a) is a is a role anything playing requirement relation (of something a role of role in relation) plays in - object-1 - role-1 played by requires - relation-1 - object-2 - role-2 Figure 1, Structure of basic semantic concepts The minimum set of ‘basic semantic concepts’ that are the axioms of Gellish and which meaning should be understood is: - anything - role - relation / relations - plays role - requires role - is / is a (is classified as a) - individual thing / individual things - kind of thing / kind of things - single thing / plural thing The structure of figure 1 holds for facts about classes as well as facts about individual objects (instances) or relations, but also for single objects as well as for plural objects. In other words, object-1 and object-2 in figure 1 can be either a single or plural individual object, relation or class. The lines in the top left corners of the boxes indicate that the structure is a typical instance. Any other ‘atomic fact’ is expressed as such a structure. In other words, any atomic fact is expressed as an ‘atomic relation’ between two or more ‘objects’ and by the classification of Gellish 6 13/04/2010
  • 7. the ‘objects’, the ‘roles’ and the ‘relation’. This implies that an atomic fact is expressed by a structure of nine (9) relations, formed by the blue boxes in figure 2 (note that 4 of the 5 boxes appear twice in an atomic fact). For example the fact that impeller O1 is part of centrifugal pump O2 is expressed in Gellish by the following 4 elementary relations: - O1 plays role R1 - R1 is required by C1 - C1 requires role R2 - R2 is played by O2 These 4 relations relate 5 objects. To interpret them correctly the following 5 additional classification relations are required: - O1 is classified as an impeller - R1 is classified as a part - C1 is classified as a composition relation (“is part of”) - R2 is classified as a whole - O2 is classified as a centrifugal pump In practical implementations it appears that the explicit identification of the roles and their classification can be neglected, because they follow from the classification of the relation and the definition of the relation type. Therefore the above relations are usually summarized in 3 Gellish atomic expressions as follows: - O1 is classified as an impeller - O1 is part of O2 - O2 is classified as a centrifugal pump From this example it can be seen that the 5 kinds of things with which the 5 objects are classified need to be present in or added to the semantics of the Gellish knowledge base in order to ensure that the fact can be interpreted correctly. The awareness that a knowledge base of predefined concepts is required for a correct interpretation of Gellish expressions resulted in the development of the top-down hierarchical definition of the Gellish knowledge base of concepts, including also relation types, as available in STEPlib. Knowledge representation: relations between classes Any fact type that extends the semantics is expressed as a relation between kinds of things. For example, assume that the concept ‘centrifugal pump’ needs to be added. Then the following two atomic relations define that concept: 1. A specialization relation that defines that: centrifugal pump is a specialization of pump 2. A relation that defines that a centrifugal pump by definition uses the centrifugal principle: centrifugal pump has by definition as aspect centrifugal. These relations build respectively on the definition of the concept ‘pump’ and ‘centrifugal’. 4 Interpretation of expressions In current database technology the semantic interpretation of an expression is done via the fact that any object is implicitly classified by being an ‘instance’ of an entity of which the semantics are defined. Gellish 7 13/04/2010
  • 8. For example, assume that P1 is an instance of an attribute called ‘name’ of an entity called ‘pump’. This probably means that P1 is the name of a thing that is classified as a pump, although this meaning comprises two facts that are usually not defined in a computer interpretable way. It should be noted that if there are no other attributes, this data structure does not allow the classification of P1 as a centrifugal pump. In Gellish all semantics is made explicit by the creation of explicit classification relations between the elements in the expression and the Gellish concepts (classes of objects, including relations). This replaces the instantiation relations and eliminates the need to define a data model with entities and attributes, such as the entity ‘pump’ and the attribute ‘name’. This is illustrated in figure 3. Green shaded area = Gellish ontology (STEPlib) 130206 730083 192512 pump is performer of liquid stream is subject in pumping classifier classifier classifier classifier classifier 13 15 14 is classified as aa is classified as is classified as aa is classified as is classified as aa is classified as aa is classified as is classified as aa is classified as is classified as classified classified classified classified 12 112 ‘P-101’ 111 ‘S-1’ 113 ‘is subject in pumping S-1’ ‘pumping S-1’ 11 classified player requirer ‘is performer of pumping S-1’ player requirer Figure 2, Linking a Gellish expression to Gellish concepts through classification Figure 2 illustrates the expression: P-101 is pumping S-1” (in dark yellow). The ‘pumping S-1’ process is an interaction between the fluid S-1 and the pump P-101. The pump has the role as performer and the liquid has the role as subject in the pumping process. The blue boxes in the green shaded area represent the Gellish concepts, being instances in the Gellish knowledge base, STEPlib. The explicit classification relations with the concepts in those blue boxes provide the semantics for the interpretation of the expression. In a Gellish Table this becomes: Left hand Left hand Fact UID Relation type name Right hand Right hand object UID object name object UID object name 111 P-101 11 is performer of 112 pumping S-1 113 S-1 12 is subject in 112 pumping S-1 111 P-101 13 is classified as a 130206 pump 112 pumping S-1 14 is classified as a 192512 pumping 113 S-1 15 is classified as a 730083 liquid stream Such a set of rows in a Gellish Table can be exchanged between Gellish enabled software packages in any kind of table, such as an MS-Access database table, an Oracle or DB2 table, Gellish 8 13/04/2010
  • 9. XLS spreadsheet, an XML file (e.g. according to ISO 10303-28) or in STEP physical file format (ISO 10303-21). Further details are described in ref. 1. Note that the shaded light yellow boxes all have the same name: “is classified as a”. However, they are different individual classification relations. Each of those relations has a unique identifier (13, 14 and 15). The name in the shaded box indicates that each is (implicitly) “conceptualized” to be a classification relation. In other words, each of them is a “is classified as a” relation. For a correct interpretation of the Gellish concepts they need to be defined in a computer interpretable way. This is done via specialization/generalization relations as is illustrated in figure 3. These specialization relations form one hierarchical network terminating at the top, called ‘anything’. This generic top supports the wide applicability of Gellish, as any missing concept can be added to Gellish as a subtype of an existing concept. anything is aa specialization of is specialization of individual things Green area = Gellish ontology individual thing instance isis aninstance of an instance of kinds of things supertype entity is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of subtype instance physical object supertype relation activity supertype supertype supertype is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of is aa specialization of is specialization of subtype subtype subtype subtype subtype pump is performer of liquid stream is subject in pumping classifier classifier classifier classifier classifier is classified as aa is classified as is classified as aa is classified as is classified as aa is classified as aa is classified as is classified as aa is classified as is classified as classified classified classified classified ‘P-101’ ‘S-1’ ‘subject in pumping S-1’ ‘pumping S-1’ classified player requirer ‘performer of pumping S-1’ player requirer Figure 3, Definition of Gellish concepts in a specialization hierarchy In practice there are several intermediate levels of specialization between e.g. ‘pump’ and ‘physical object’ and ‘anything’, etc. Furthermore there are classes of physical objects defined as subtypes of ‘physical object’. These can be extended by specializations, such as standard components (e.g. from ASME, BSI or DIN standards) and also specializations such as manufacturer catalogue items (e.g. Manufacturer models and types). Figure 3 contains eight facts expressed as eight “is a specialization of” relations, each of which is a separate relation between classes. Similarly to what is described above about the “is classified as a” relation, this illustrates that the term ‘is a specialization of’ is not the name of each of those relations, but it is a name of the Gellish concept (the class) that is the conceptualization of those relations. The knowledge about the meaning of the concepts pump, ‘is performer of’, liquid stream, ‘is subject in’ and pumping is defined in the Gellish ontology STEPlib. Some of that is Gellish 9 13/04/2010
  • 10. illustrated in the following facts, which includes some intermediate facts not shown in figure 3 (the UID’s and names are taken from STEPlib, except for the UID’s of the facts): Left hand Left hand Fact UID Relation type name Right hand Right hand object UID object name object UID object name 130206 pump 16 is a specialization of 730044 physical object 4761 is performer of 17 is a specialization of 4767 is involved in 4761 is performer of 18 requires as role-1 a 640020 performer 730044 physical object 19 can have as role as a 640020 performer 4761 is performer of 20 requires as role-2 a 4773 involver 730083 liquid stream 21 is a specialization of 730045 stream 4760 is subject in 22 is a specialization of 4767 is involved in 192512 pumping 23 is a specialization of 190168 process This knowledge is inherited from higher concepts in the hierarchy to lower level concepts. If an individual object is classified to be of such a class, then the knowledge is applicable to the individual object as a constraint for the specific aspects of the individual object. 5 Experiences and applications Gellish is applied to express - information about individual objects, - knowledge about kinds of objects, - requirements for data and documents in particular contexts about individual objects and about kinds of objects. These three application are related to each other, as is illustrated in Figure 4. Gellish 10 13/04/2010
  • 11. Product / Requirements / Knowledge models Product Model Requirements Model Knowledge Model has / is shall have a / shall be a can have a / can be a (in the context of a) Dongting SHELLlib STEPlib SGP DEP xxx Coal gasification facility shall comply with compressor U-1300 shall have a K-1301 system luboil system K-1301 is classified as a can have a shall have a LubOil-100 capacity Copyright: Shell Global Solutions International B.V. Figure 4, Three types of Gellish Models The left hand of Figure 4 represents a Product Model that illustrates a Gellish model of a process plant (the thick black lines represent composition relations). The relation types in a product model generally start with ‘is’ or ‘has’. For example, K-1301 system is part of U-1300 and K-1301 is classified as a compressor. The right hand Knowledge Model illustrates the content of the STEPlib knowledge base. The relation types in a knowledge model generally start with ‘can be a’ or ‘can have a’. For example, a compressor can have a capacity and a lubrication oil system can be part of a compressor. The middle part of Figure 4 illustrates a proprietary Requirements Model that expresses which data has to be present in a particular context. The relation types in a requirements model generally start with ‘shall be a’ or ‘shall have a’. For example, we developed requirements models that express that in the context of ‘handover’ of data from design to operations a compressor shall have a capacity (in the context of a handover) and a compressor shall be compliant with design guide xx, in the same context. This is expressed in Gellish as follows: 130069 compressor 24 shall have a 551564 capacity 130069 compressor 25 shall be compliant with 5490386 DEP 31…. When data about a compressor is handed over, then this Gellish specification makes it possible to do an automated verification of the completeness of that data, whereas that verification is driven by the requirements model. This is illustrated in figure 5. Gellish 11 13/04/2010
  • 12. Figure 5, Automated verification of a design against a requirements model The right hand side of figure 5 illustrates the content of the SHELLlib knowledge base, which is a proprietary extension of STEPlib, which also uses Gellish. It illustrates how the knowledge in STEPlib and SHELLLlib is inherited via the specialization hierarchy. Because although P-101 is classified as a centrifugal pump, the requirement that is defined for a pump in general can automatically be made applicable to P-101, because of the defined inheritance via the specialization hierarchy. The specialization hierarchy also enables intelligent queries. For example search engines can perform intelligent searches on subtypes of keywords. For example, a document which is recorded to contains information about a line shaft pump can also be found if documents are searched about ‘centrifugal pump’. And a query on ‘pump’ can also find P-101, being classified as centrifugal pump. An example of a commercial application of Gellish is a Gellish Browser developed by Mi2. The browser can read (and write) data expressed in the Gellish language and is able to present any knowledge about classes of objects and any data about individual objects. It was expected that implementation of Gellish would have serious performance issues. Therefore the Browser was loaded with over 60.000 facts, originating from different systems, but all expressed in a Gellish Table. These facts included the Gellish knowledge base, extended with a Shell proprietary standards database, data about documents, a materials catalogue, an equipment list and material balances of the design of a process plant. It appears to have an excellent performance. Gellish 12 13/04/2010
  • 13. We also customized an implementation of the Eigner PLM product lifecycle management system and loaded the same data in that system. This system also had a good performance. We are currently working on the customization of existing systems so that they can export data in a Gellish Table. The Browser can then be used to view data from various systems and data can be imported and integrated with other data in the Eigner PLM system. It is our intention to use a Gellish Table among others as a data exchange language for data hand-over of design data between engineering contractors and plant owners and for data about catalogue items and items delivered by suppliers. Further work will explore the use of Gellish for the exchange of messages by intelligent Agent software, acting as nodes in the Semantic Web. For example business communication messages about transactions in E-procurement. 6 Conclusions The above illustrates that the current practice to define data models separate from reference data and user data is unnecessary. Integration of data model concepts with reference data and user data in one consistent language can provide a single common standard language for data storage and exchange that can significantly reduce development costs and can simplify data communication. A common use of the little data model of figure 2, together with the common use of the Gellish ontology makes it possible to express and interpret a very wide scope of types of facts. This is possible because the explicit classification relations provide interpretation rules for the expressions for which the relation types as well as the object types are defined in Gellish. It is only required to have the concepts defined in the Gellish knowledge base and to refer to them as in the basic structure using the ‘basic semantic axioms’ mentioned above. The above illustrates that: - It is possible that a common standard knowledge base of concepts and relations between concepts can replace many data models. - The Gellish knowledge base of concepts solution is more flexible than fixed data models and it is easier to add semantics to the database. - The Gellish knowledge base of concepts provides an application independent language with a semantic basis that is equivalent to a very large data model. If sufficient concepts of an application domain are present or added, then data models for such an application domain can become superfluous. - The Gellish knowledge base, using the inheritance capabilities of the specialization hierarchy, provides extendable product models for many types of objects. - The implementations have proven that a Gellish knowledge base can be implemented with good performance. - The implementations have proven that neutral format data exchange using a Gellish Table is a feasible solution. As Gellish is in the public domain, proposals for extensions of the Gellish language are invited. 7 References 1. Andries van Renssen, “The Gellish Table and its Formats”. A definition of the Gellish Table and its implementation syntax for Gellish messages. www.steplib.com. Gellish 13 13/04/2010
  • 14. 2. Andries van Renssen, “Guide on STEPlib”. This guide describes how STEPLib is defined and how to extent the Gellish language and knowledge base. www.steplib.com. 3. STEPlib, the Gellish knowledge base. This is a set of Gellish Tables (available in Excel and in MS Access). The upper level ontology part is documented in the TOPini part. www.steplib.com. 4. Tim Berners-Lee, James Hendler and Ora Lassila, 'The Semantic Web', Scientific American, May 2001; http://www.sciam.com/issue.cfm?issueDate=May-01. 5. OWL, Web Ontology Language Overview. http://www.w3.org/TR/owl-features/ 6. Ian Niles and Adam Pease (2001), “Towards a Standard Upper Ontology”, in: Formal Ontology in Information Systems, ISBN 1-58113-377-4. 7. SUO (2001), The IEEE Standard Upper Ontology website, http://suo.ieee.org. 8. Lenat, D. (1995), “Cyc: A Large-Scale Investment in Knowledge Infrastructure”, Communications of the ACM, 38, no 11 (November 1995). 9. Wolfgang Degen, Barbara Heller, Heinrich Herre and Barry Smith (2001), “GOL: A General Ontological Language”, in: Formal Ontology in Information Systems, ISBN 1-58113-377-4. 10. The Epistle Core Data Model (2001), http://www.btinternet.com/~chris.angus/epistle/specifications/ecm/ecm_400.html Gellish 14 13/04/2010