Lexicon Disambiguation1

Contribution to Knowledge Management:
Cross-disciplinary Terminology
PEMT ’06 transcript

Sead Spužić a), Kazem Abhary a), Clement Stevens b) and Faik Uzunović c)
a) b)
University of South Australia KFUPM University c) University of Zenica

Key words: Disambiguation, definition, homonymy, informatics, knowledge, lexicon,
management, multidisciplinary, terminology, thesaurus, synonymy

Abstract:

The evolution of knowledge has imposed branching into disciplines that use terms understood
“correctly” only by experts. Globalisation favours cross-disciplinary and transparent
communication. However, these trends have uncovered impedances such as prolixity, ambiguity and
jargon. Internet enables communication with the speed of light, thus exposing other limits to
knowledge transfer, such as misinformation and misunderstanding. Knowledge is transferred by
interaction of language with other models (e.g. figures) and by demonstrations. Transparency of
terminology is critical for knowledge management. This treatise presents an axiomatic definition of
the term 'definition' itself, in order to enable a rational analysis of lexical elements - the words. The
examples of confusing terms (homonyms, synonyms) are discussed. A trend of using inter-
disciplinary and transparent lexicon is proposed, with adopting a hierarchy of terms that allocates
the priority to fundamental disciplines (mathematics, physics). Scientific lexicon should attribute to
each definition a unique set of words. The need to expunge scientific and technical language of
ambiguity is urgent and the comprehensive review and “cleaning” of scientific terms is a task that
demands the gathering together of appropriate institutions. The key to disambiguation of scientific
language is in defining quantifiable criteria.

Intro
In this eternal and infinite ambient, our fate depends on our knowledge. Although Man is the most
developed amongst the phenomena we know, this does not guarantees our survival. Rather than
indulging in this relative perception, it is reasonable to assume that our ambient - the Universe - may
soon bring in new challenges that stretch beyond our current capabilities. A rational strategy is to
speed up both the development and transport of our knowledge. This can be enhanced by sharing
and disseminating knowledge to adequate human resources, not to mention broader possibilities.
Academe and other structures concerned with broadening and disseminating knowledge are indeed
conscious of the need for globalization and disambiguation of this invaluable treasure. Reference
publications [1-20] used in this transcript present only a very limited and arbitrary selection amongst
numerous sources which present evidence of this awareness for example, by the virtue of motions
such as European University Association [2], The Bologna Process [3] or UNESCO Thesaurus
[13,14].

1

One of the principal media for transport of knowledge is a language; its basic elements are the words
(terms). Lexicon (alphabetically arranged lists of words setting forth their meanings and etymology)
is direct hyponym of the term ‘knowledge’ [12].
The category of so-called "closed class words" has a fixed, limited number of words which
themselves have permanent, final form and meaning; new words are rarely added. The members
include: pronouns, prepositions, determiners and conjunctions [10].
In the case of so-called "open class words", containing nouns, verbs, adjectives and adverbs, new
words can be added as they become necessary [10]. However, much too frequently, new words are
added although old words, providing a satisfactory meaning, do already exist. This causes
synonymy (e.g. "open class words" are also called "lexical words", while "closed class words" are
also termed “structural” or “function” or “grammatical” words). Or, vice versa, new meanings are
attached to words used in another discipline to denote differing concept, thus causing homonymy
(e.g. word 'discipline' could mean a branch of knowledge; "in what discipline is his doctorate?";
"anthropology is the discipline focused on study of human beings"; but the same word could mean
‘punishment’ o rat least ‘orderly prescribed conduct or pattern of behaviour’).
“There does not seem to be a consensus about what many of the basic terms mean, or which is the
overarching concept, … under which other terms might be presumed to be subsets. …. (C)learly, the
multiplicity of definitions for the same concepts, false synonyms and so forth show that the world of
scholarship needs an approach to definitions of sufficient dimensionality.” [4]
“The recent globalisation trends show that, on all fronts - education, marketing, industry, science,
social standard infrastructure, health - we need a common, well defined, language. Workers in all
disciplines are expected to function effectively in global trans-disciplinary communities.”[5]
“The comprehensive review and ‘cleaning’ of scientific terminology is certainly an immense task
that demands the gathering together of competent institutions. The need to expunge scientific and
technical language of ambiguity and prolixity is urgent and becoming increasingly so.” [8]
Rational analysis of lexical elements - the words - and their relations requires at least an axiomatic
definition of the term 'definition' itself. Hence an initial definition is proposed herewith, in the
following section. In addition, a number of examples of ambiguous terms is presented along with an
attempt to propose their disambiguation. In doing so, an effort is made to avoid homonymy,
synonymy, circularity and other ambiguities. The presented examples are rather arbitrarily chosen
illustrative cases of ambiguous terms and proposed clarifications; an attempt is made to lay down
the initial formulations thus opening the floor for further documented improvements.
Theoretical concepts are presented in references [9] and [20]; for convenience, some key definitions
are cited in full extent.
Definition (excerpt cited from [20])
Minimum Intent: The following definition of a term 'definition' is presented as a reference, (a metric,
a comparator, a norm) that must not be violated when defining scientific and engineering terms.
Axioms:
1) ‘Something’ is a term that has a most general meaning, it can mean anything (but it does not
automatically include ‘everything’).
2) 'Ambient' is everything in the vicinity of, and, to a certain degree, within something.
3) ‘Event’ is something that can be distinguished from ambient.
4) ‘Relation’ is something involving, at least, two events.

2

5) ‘System’ is constituted by at least two relations; this implies that a system also includes, at least,
two events.
6) ‘Phenomenon’ is a generic term (hypernym) for the above terms, providing that one or several of
human senses indicate (directly or indirectly) existence of so termed system, relation, event,
ambient, or something else.
7) All other terms used within this theorem - apart from the term “definition” and the terms listed in
inverted commas under 1) to 6) above - are already intrinsically known; understanding of each of
these terms does not contradict to any other term, and it does not violate logics. Note: most of these
terms will be defined once the definition of the term ‘definition’ is agreed upon; some proposed
definitions are given in this discourse.

Theorem:
"Definition" is a fixed, static form (a model; a concept; an appearance of something as distinguished
from the substance of which it is made; something autonomous from its own representation, imprint,
or description) of some relation(s) that significantly increases the probability of realisation of an
intended (premeditated) change of some phenomenon (or phenomena). Such a change is to be
achieved by an entity that is capable of utilising this definition for such a specified purpose. A
definition cannot be generated, or used without the existence of a system, which is organised and
structured above certain level of chaos. However, once it is generated and recorded, a definition can
continue to exist (to be recorded) without the existence of the mentioned entity. A definition should
be complemented with a minimum intent statement: a context that delimits a minimum domain of
purposes for which it can be used. This statement does not exclude the possibility of using the same
definition correctly for some other purpose. However, this extended use must not violate (contradict)
already established meaning; e.g. this must not cause synonymy or homonymy.

A definition must be complemented with axioms, with one or more examples, and, when needed and
possible, with figures and animated representations.

Definitions are necessary bits needed to construct and communicate the subject of knowledge. A
definition is built by means of its structural components: pieces of information. Information is built
by virtue of its construction elements (signals of various kinds); the most frequently used include
figures and terms. Terms include symbols, numbers and words, and although they can be transferred
by means of figures, they can also be transferred by means of sounds which are registered by
hearing senses. It is worth noting that information media can be mutually translated, i.e. visual info
can be translated into information received by tactile or hearing senses. History of media used to
record an information and a definition shows a variety of options. Alphabetic writing (in which
consonant and vowel sounds are presented by letters or other symbols such as Braille characters and
Morse codes) is the most widespread system, but it is not the earliest, nor is it the only one. Writing
has evolved from an extension of pictures that iconically represented some thing or action and then
the word that bore that meaning. This approach led to so-called character script, such as that of
Chinese, in which each word is represented by a separate symbol.

There is no reason for restricting definition to alphanumeric records only; indeed, the figures
(including drawings) are very efficient in carrying comprehensive information. Many sciences have
accepted ideograms to convey sophisticated notions. For example, in mathematics, symbols π, ∞,
‰, ∫, ≥, represent erudite concepts. Optimal solution is a combination of text and figures (an
animation and sound may be added, when necessary).

3

Aspects of information metrics were discussed by C E Shannon [1] who furthered the principles of
information theory and endowed the word information with a measure, so-called info-entropy:

..................................... (1)

H = info-entropy (the expected value of self-information),
p(i) = probability of understanding i-th interpretation of the presented term
n = number of possible interpretations of the presented term.
"The choice of a logarithmic base corresponds to the choice of a unit for information measure. If the
base 2 is used the resulting units may be called binary digits, or more briefly bits. A device with two
stable positions, such as a relay or a flip-flop circuit, can store one bit of information. N such
devices can store N bits ..." [1]. Using Napier's logarithm in Eq (1) appears more logical; however
transformation from base e = 2.7183... to base 2, is a simple matter of introducing an appropriate
constant ln2 = 0.6931.
A ‘measure’ is a phenomenon used to enable a comparison of (the groups of) other phenomena.
‘Comparison’ is a definition indicating whether (or to what degree) one phenomenon differs from
other phenomena. When comparison indicates that phenomena are sufficiently identical, phenomena
can be counted using numbers. A ‘number’ is a generic measure.

Examples of definitions:

Readily available sources (e.g. dictionaries) define the term “figure” in various ways: (a) a number
symbol, (b) numeral, (c) digit, (d) a geometric form (e.g. a line, triangle, or sphere) especially when
considered as a set of geometric elements (e.g. points) in space of a given number of dimensions, (e)
a diagram or pictorial illustration of textual matter, (f) a short coherent group of notes (sounds) that
may constitute a part of a melody.[8,9]

The first two above definitions, (a) and (b), can themselves be taken as synonymic. The terms
"figure" and "numeral" are synonyms, because both are defined in the same way as follows: "figure"
("numeral") is a conventional symbol (a figure or character) used to represent a number. The
definitions given in (c), (d), (e) and (f) above, have different meanings. Thus the term "figure",
attributed to each of these four cases appears to be a homonym.
By ignoring presence of any noise and assuming 4 equally likely homonyms, according to Eq (1)
information entropy of term ‘figure’ is calculated to be equal to 2 bits.
(Minimum intent:) The following definitions are presented to provide examples how synonyms,
homonyms and other ambiguities can be avoided:

“Figure”: (n) an arrangement of points made within two-dimensional space to present a visual static
impression (a perception) of something (e.g., a figure printed on a book page, showing a front view
of a home).
Still assuming the absence of any noise, after eliminating one of four homonymic ‘figure’ terms, its
information entropy drops 20%, (falls to 1.585 bits). If we eliminate all homonyms, information
entropy drops to zero, and term “figure” becomes self-explanatory, in the absence of another noise.
The following definitions are presented for the sake of making the term 'figure' more distinguishable
from some of its homonyms:

4

“Numeral” is any of the elements that can be combined to form numbers in a number system (e.g.,
decimal system, binary system, hexadecimal system, etc.). “Digit” is a figure representing a
numeral; examples: "0", "1", "A". "Number" is a most general measure, systematically ordered and
analysed within mathematics. "Cypher" is a figure representing a number; examples: "10001001",
"10,001.001".
Important capacity of terms - to represent complex information, definition and even the complete
theory - can be seriously hindered by ambiguity, homonymy and synonymy. Thus for example, by
saying that a system is adiabatic, a number of physiochemical relations is ascribed to this system,
assuming that the receiver of this information knows the meaning of term "adiabatic system", i.e.
assuming availability of a disambiguated definition of this term.

Tendency of professional institutions focused on particular filed (e.g. information technology), to
use certain lexical phrases more frequently should not lead to usurpation of a single term, segregated
and disconnected from the original phrase. Local use of an abbreviation is a better solution in such a
case. In distinguished cases, introducing a new term is an appropriate solution; in such events it is
recommended to consult academe before offering such new term to public scrutiny.

'Knowledge' is a system of 'disciplines' and their relations. A 'discipline' is a system of 'theories',
boundary 'hypotheses' and their relations. 'Theory' is a system of 'definitions'. 'Hypothesis' is a
system of assumptions that may, or may not become definitions. If proven to be fallacies, the
relevant hypotheses should be rejected from the parent discipline, and replaced by other boundary
hypotheses. Rationale for the need for inducting boundary hypotheses at the edge of verified
knowledge can be elucidated using the analogy with the justification for introducing additional
‘uncertain’ digit during recording the measurements by means of significant digits. Any boundary
becomes more vague the closer we approach it. Inasmuch the language presents, with its essentially
mathematical crust, elite medium of knowledge, at its best it will reflect these shades of vague
boundaries. However, we must act in the direction of overcoming, not artificially enlarging these
ambiguities.

Reference database

The scientific community and broader society are well aware of problems caused by inconsistencies
in defining the key elements in English language – words [4-19]. Accordingly, a number of projects
has been launched aiming at contributing to disambiguation of the English terms; examples include
The American Heritage Book of English Usage, WordNet and UNESCO Thesaurus.

The American Heritage Book of English Usage presents current problems in English usage to enable
an informed selection of terms. It suggests answers to questions such as: Has a particular usage
been criticized for substantial reason in the past? What are the linguistic and social issues involved?
Have people frequently applied this usage in the past? This source employs The Usage Panel and
Usage Ballots to collate opinions of the American Heritage Usage Panel (158 members), which has
been in existence since 1964. While the ballots are not scientific surveys in that they are not
conducted under controlled circumstances with stringent questioning criteria, they are nonetheless
carefully worded to get useful responses. The examples discussed by ballots are sentences adapted
from actual citations, presenting a number of cases, giving a specific usage in a variety of different
linguistic environments. Many words have a number of meanings, and experience has shown that
the panel’s opinions about a usage can vary considerably. [11]

5

In most ballots the panelists are asked whether they find a particular word or construction to be
acceptable or not in formal standard English. In reality, many shades of acceptability do exist. What
one panelist approves enthusiastically, another may accept only cautiously. A compromise has been
made, deciding that it is not practical to differentiate degrees of approval or disapproval. For certain
controversial usages a question allows for the option of indicating acceptability in informal contexts
and for and indication of preferences or for providing alternative ways of saying something. [11]

The fact that a word has a lengthy history of use by many provides a compelling argument for its
continued use today. But sometimes historical precedent clashes with contemporary attitudes. In
these cases, both sides of the controversy are presented, and the historical precedent given priority
even if it contradicts to the judgments of the majority of panelists. On the other hand, some
expressions have become so stigmatized that even the history may not save them from provoking a
negative response in a good portion of your readers. In these cases, a warning is provided about the
consequences of using a stigmatized usage. [11]

With ballot responses going back to the 1960s, the issue of historical perspective requires to be
addressed. The book offers results from surveys done in 1987 and later, while the results of an
earlier survey are presented whenever it is 'feel it can help in adjudicating an issue'. [11]

Taking in account a “non-scientific” survey may seem odd; however, rationales for maintaining
strong links with “common” English language are manifold:
- Efforts to improve scientific terminology would be hampered without analysing language
heritage, including its fallacies;
- Knowledge dissemination requires presenting its theories and hypotheses using the language
of common sense;
- Our existence without knowledge would be impossible; without the arts it would become
distorted.

WordNet is an online ('electronic') lexical reference system (database) [12]. WordNet was
developed starting with 1985, by the Cognitive Science Laboratory at Princeton University under
the direction of Professor G. A. Miller. This online lexical reference system is designed on the basis
of current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and
adverbs are organized into synonym sets (synsets), each representing one underlying lexical
concept. Synsets are used as basic units of semantic meaning and linked to a large collection of
semantic relations including hyponymy and antonymy. In other words, WordNet organizes lexical
information in terms of word meanings, rather than word forms. The purpose is manifold: word
sense identification, information retrieval, selectional preferences of verbs, and lexical chains.
WordNet can be used to produce a combination of dictionary and thesaurus that is more intuitively
usable, and to support automatic text analysis and artificial intelligence applications, such as
machine translation.

Extensive bibliography [17] listing research publications that refer to the WordNet lexical
database, adds evidence to significance of related problems. For example, WordNet is used in
numerous researches aiming at text classification by means of artificial intelligence, based on the
hierarchy of hypernyms, hyponyms, holonyms, meronyms and sister terms. [18, 19]

UNESCO (United Nations Educational Scientific and Cultural Organization) Thesaurus: The
UNESCO Thesaurus is a controlled and structured list of terms used in subject analysis of

6

publications in the fields of education, culture, natural sciences, social and human sciences,
communication and information. It covers the major fields of knowledge that constitute the scope of
UNESCO. Continuously updated (it contains 7,000 terms in English, 8,600 terms in French and
6,800 in Spanish); its multidisciplinary ‘terminology’ reflects the evolution of the UNESCO
activities. According to its own definition, 'thesaurus is a controlled and dynamic documentary
language containing semantically and generically related terms, which comprehensively covers a
specific domain of knowledge'. 'Knowledge is information that is presented within a particular
context, yielding insight on application in that context, by members of a community.' This source
further defines 'information' as 'data that has been organized in such a way that it achieves meaning,
in a generalized way'. These definitions are both ambiguous and incomplete. In addition they are
based on undefined terms: ‘meaning’, ‘organized’ and ‘data’. [13, 14]

The UNESCO Thesaurus allows subject terms to be expressed consistently, with increasing
specificity, and in relation to other subjects. It can be used to facilitate subject indexing in libraries,
archives and similar institutions. [13,14]

As in other subject thesauri, the terms in the UNESCO Thesaurus are linked together by three types
of relationships:
(i) Hierarchical relationships, which link terms to other terms expressing more general and
more specific concepts - i.e. broader terms and narrower terms. Hierarchically related terms
are grouped under general subdivisions (known as "microthesauri"), which in turn are
grouped into the areas of knowledge covered by the Thesaurus.
(ii) Associative relationships, which link terms to similar terms (related terms) where the
relationship between the terms is non-hierarchical.
(iii) Equivalence relationships, which link "non-preferred" terms to synonyms or quasi-
synonyms which act as "preferred" terms.

Main sections of the Thesaurus (e.g. Education, Science, Culture) link to the microthesaurus
headings in each section. Each microthesaurus heading links to an alphabetical list of the preferred
terms which are entered under that microthesaurus. Each term includes a link to a display of its
broader terms, narrower terms, related terms, scope notes, non-preferred terms, French equivalent
and Spanish equivalent.

The UNESCO Thesaurus also includes scope notes which explain the meaning and application of
terms, and French and Spanish equivalents of English preferred terms. [13,14]
There is a number of other thesauri under, and beyond the umbrella of United Nations system.
However this reference database did not prevent from appearance of homonymy, synonymy and
ambiguity in educational and scientific publications.

Examples

Several remarks will be useful before presenting examples of synonymy, homonymy and ambiguity.
We address the 'open class words' (also called 'lexical words') within one language only: the
scientific English. Non-coincidental homonymy or synonymy is not discussed in this treatise since it
is assumed that context and syntax attribute the sufficiently clear meaning to this class of words. The
recommended definitions are presented to provide examples of disambiguation; in each case, the
axioms are identical to those presented in section Definition above.

7

The presented examples, comments and suggestions are provided for purpose of illustrating the
problems and sketching the possible solutions. Decisions about eliminating homonyms and adopting
definitions can only be made by a broad multidisciplinary academic consortium. Number of
examples is also provided in sources [8, 9, 16 and 20].

i) Terms ‘Terminology’ and ‘Term’
Source WordNet defines ‘terminology’ as a system of words used to name things in a particular
discipline; example: "legal terminology"; "biological terminology". Term ‘nomenclature’ is listed as
a synonym. Direct hypernym is term ‘word’ [21].
Sources [13,14] recognise term ‘terminology’ as a descriptor and suggest its applications in phrases
such as ‘communication terminology’, ‘scientific terminology’, ‘educational terminology’; it lists
usage such as ‘technical terminology. The same sources define term ‘glossary’ as ‘A vocabulary not
necessarily in alphabetic order, with definitions or explanations for all terms.’
It is suggested herewith that term ‘terminology’ be applied in analogy to terms ‘technology’ (science
of techniques), ‘biology’ (science of biosphere), ‘anthropology’ (science of humankind),
‘psychology’ (science of the psyche), etc, hence: ‘Terminology’ (n) is science of terms.

WordNet suggest 8 meanings for term ‘term’. The most differing options are:
a) term (n) is a (special class of) words used for some particular thing, e.g. "he learned many
medical terms"
b) 'a limited period of time'
The following definition of the word ‘term’ is proposed hereby:
‘Term’ (n) is word that denotes something. It is instructive to elaborate on the difference between
the term ‘term’ and its hypernym - ‘word’ . Term ‘word’ denotes all grammatical variations of
nouns, verbs, adjectives, adverbs, pronouns, conjuctions, prepositions and interjections. ‘Term’ is
the lexical model, a concise one-word representation of an event, relation, phenomenon, system,
discipline, theory, or something else. Examples of terms: ‘material’, ‘probability’,’element’.
Example: "'The term preschooler signals another change in our expectations of children. While
toddler refers to physical development, preschooler refers to a social and intellectual activity: going
to school.' Attribution: Lawrence Kutner (20th century), U.S. child psychologist and author.
Toddlers and Preschoolers, introduction (1994). " [15]

ii) Term “Material”

The WordNet [12] provides 5 options. The most differing meanings include:
a) Material (n) is information (data or ideas or observations) that can be used or reworked into a
finished form; "the archives provided rich material for a definitive biography".
b) Material, (n) is the tangible substance that goes into the makeup of a physical object) "coal is a
hard black material"; "wheat is the material they use to make bread".

UNESCO Thesaurus [13,14] provides the following descriptors (suggested as preferred terms):
‘Audiovisual materials’, ‘Building materials’, ‘Composite materials’, ‘Dangerous materials’,
‘Reference materials’, ‘Materials engineering’, ‘Machine readable materials’, ‘Bookform materials’.

8

The same source recommends the following descriptor: ‘International circulation of materials’ with
a scope note: 'Use only in relation to agreements that aim to facilitate the international exchange of
materials intended for educational, scientific and cultural purposes.' [13,14]
It appears that “material” is a generic term, sometimes subordinated to hypernyms “matter” and
“substance”. It is recommended herewith to cease this subordination, and introduce the following
generic meaning: Material (n) is physiochemical phenomenon that is detected by various sensors as
solid, liquid or gas. In most cases each of these states can be reversibly transformed into other states
under appropriate combination of temperature and pressure, e.g. water can be cooled to solidify as
ice. The smallest fraction of material are atoms and ions, the largest are celestial objects (such as
Earth or Mars). Beyond these limits, terms such as “subatomic particles”, “electromagnetic field”,
“gamma radiation”, “solar system”, “galaxy” etc are used, none of them to be referred to as
“material”. This is not to say that there are no subatomic particles present within material; also,
current evidence suggests that material exists within the galaxies and other phenomena abroad the
universe. Quantity of material is measured in moles.
Phrases such as “teaching materials” should be understood as “materials prepared for educational
purposes”. For example, graphite powder stored in special beakers for teaching demonstrations in
laboratories, liquid nitrogen kept in special cryogenic storage vessels (dewars), can be termed
“teaching materials”. On the contrary, items such as textbooks, lecture notes or tutorial sheets
generally are not to be termed “teaching materials”, if their content – discourse and treatise – is
discussed, rather than addressing the material itself, such as paper or compact disks.
Phrases such as “metallic materials” and “composite materials” have synonyms. “Metallic
materials” are in numerous publications termed “metals”, which violates the meaning of this term
established in chemistry. “Composite materials” are frequently termed “composites”, which is in
good concord with terminology that uses expression “ceramics” (for “ceramic materials”) and
“polymers” (for “polymeric materials”). In terms of applications, the overwhelming majority of
cases is related to solid forms of each of discussed categories. Thus, it is proposed herewith, to
introduce a new term “metallics”, which conforms well with terms “ceramics”, “polymers”,
“composites” and “solids”. It is recommended herewith abandoning lengthy homonyms such as
“metallic materials”, “composite materials”, “polymeric materials” and “ceramic materials”.
Metallics (n) are materials characterized by dominance of one or more metallic attributes, which is
usually the consequence of metals (i.e. metallic bond) occupying the largest fraction of the solid
structure; for example steels, bronze, gold, all belong to metallics.
Another example of the appropriate use of term “material”: "'The asphaltum contains an exactly
requisite amount of sulphides for production of rubber tires. This brown material also contains
ichthyol...’; Attribution: State of Utah, U.S. public relief program (1935-1943). Utah: A Guide to the
State (The WPA Guide to Utah), p.124, in »Mining«, Hastings House (1941) - Of a material found
near the Great Salt Lake.” [15]

iii) Term “Probability”

UNESCO Thesaurus [13,14] recommends use of term “Probability Theory”; it is unclear whether
“Probability Theory” is recommended instead using term “Probability”.

The WordNet [12] provides the following definition: “Probability (n) is a measure of how likely it is
that some event will occur; a number expressing the ratio of favorable cases to the whole number of
cases possible) ‘the probability that an unbiased coin will fall with the head up is 0.5’”. Use of

9

phrase “how likely” means explaining one term by simply introducing another term of the same
category (running in the loop, so-called circularity).
Therefore, the following definition of “probability” is proposed hereby: ‘Probability’ (n) is a
measure that can be used, in the absence of other measures, to define whether or not an event has, or
will happen (or is happening). For example, in the case of equally probable events, probability can
be quantified by means of the ratio 0 ≥ a/b ≤ 1,
where a = number of counted events for which the probability is to be established;
b = a + c (total number of events that can be counted in the observed ambient);
c = number of remaining optional events (equally probable) in the observed ambient.
The probability of an impossible event is 0, the probability of a certain (inevitable) event is 1.
Example: ‘The probability that an unbiased die will fall with a face showing two (spots) is 1/6”, see
Fig 1. Note: Theory of mathematical statistics provides comprehensive treatises defining probability.

Fig 1: A die: the number of spots
on each side varies from 1 to 6.

iv) Term ‘Element’

UNESCO Thesaurus [13, 14] distinguishes four descriptors:
- chemical element
- elementary particles
- structural elements (buildings)
- trace elements
and two additional usage phrases:
- elementary schools
- elementary education

So one certainly would not recommend usage such as ‘Chemical elements such as Uranium, can
damage structural elements, due to long-term emission of elementary particles, even when present as
trace elements’, especially not in the elementary education.

The WordNet [12] provides 7 options. The most differing meanings include:
a) ‘Element’ (n) is any of the more than 100 known substances (of which 92 occur naturally) that
cannot be separated into simpler substances and that singly or in combination constitute all matter
b) ‘Element’ (n) is a component, constituent, an artifact that is one of the individual parts of which a
composite entity is made up; especially a part that can be separated from or attached to a system,
e.g. "spare element for cars".

The above WordNet use cited under (a) is promoted herewith as preferable use for this term. Use
presented under (b) above, can be substituted by terms ‘component’ or ‘constituent’. UNESCO
descriptors, such as ‘chemical element’ and ‘structural element’ are too lengthy, while the phrase
‘elementary education’ is too ambiguous.

10

v) Term ‘Bit’

UN/ECE (United Nations Centre for Trade Facilitation and Electronic Business)
TRADE/CEFACT/2005/24 Recommendation No. 20 - Units of Measure used in International Trade
Common Code in [14a] defines ‘bit’ as “a unit of information equal to one binary digit.” Other UN
published source [14b] provides definition of a ‘bit’ as “a binary digit that can assume a value of 0
or 1.” Both definitions concur to definition promoted in scientific discipline of Information Science.

The WordNet [12] provides 10 options. Apart from the above, the most differing meanings include:
a) Bit (n) is the cutting part of a drill; usually pointed and threaded and is replaceable in a brace or
bitstock or drill press; for example: "he looked around for the right size bit";
b) Bit (n) is an indefinitely short time; "wait just a moment"; "it only takes a minute"; "in just a bit".

This is a typical example of new-fashioned disciplines in demand usurping terms thus ignoring the
cultural heritage and increasing the language ambiguity. It is proposed herewith to decide
appropriate nomenclature by means of an informed and educated consensus.

Conclusions

In this Knowledge Age enhanced by artificial intelligence means, both communication speed and
misinformation waste, multiply at critical rates. Particularly obstructive is increase in information
entropy due to accumulation of homonyms and synonyms combined with other causes of ambiguity.
Universities appear to be institutions that carry the responsibility for initiating projects aiming at
disambiguation of scientific English language.

Artificial intelligence is invaluable in endeavors aiming at disambiguation of scientific and
engineering terminology but the human intelligence lays down superior criteria.

In addition, momentous efforts exhibited by approaching the problems of language prolixity,
ambiguity and translation by means of artificial intelligence, may be significantly reduced by virtue
of eliminating amassed homonyms and synonyms and by introducing more transparent definitions
of key terms by virtue of human intelligence.

Promoting a transparent, cross-disciplinary scientific and engineering terminology by means of
establishing a cross-disciplinary academic consortium will present significant contribution to
dissemination and broadening stock of knowledge. This coordinated effort must take in account
lexical heritage by means of intelligent and common sense consideration of historically established
use of English language.

Authors, editors and publishers would have a competent source of lexical references, and readers
would find such a lexis useful guide in their search for knowledge.

11

Reference Publications

[1] Shannon C E ‘A Mathematical Theory of Communication’ The Bell System Technical Journal,
Vol. 27, p. 379, (July 1948); http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html
(accessed on 13 October 2005)

[2] The European University Association http://www.eua.be/eua/en/about_eua.jspx

[3] "Information on the Bologna Process" by Admissions Officers' and Credential Evaluators'
professional section of the EAIE - European Association for International Education
http://www.aic.lv/ace/ace_disk/Bologna/index.htm (accessed on 13 October 2005)

[4] McCarty S "Cultural, Disciplinary and Temporal Contexts of e-Learning and English as a
Foreign Language", eLearn MAGAZINE published by ACM - Association for Computing
Machinery; http://www.elearnmag.org/subpage.cfm?section=research&article=4-1 (accessed on 23
Sept 2005)

[5] Downey G L, Lucena J C, Moskal B, Bigley T, Hays C, Jesiek B, Kelly L, Lehr J, Miller J, and
Nichols-Belo A "Engineering Cultures: Expanding the Engineering Method for Global Problem
Solvers" (Editors: D Radcliffe and J Humphries) Proceedings 4th ASEE/AaeE Global Colloquium
on Engineering Education, Sydney, 26-29 September 2005

[6] Miller G A "Ambiguous Words" (Originally published March 2001 at Impacts Magazine)
http://www.kurzweilai.net/meme/frame.html?main=/articles/art0186.html

[7] Lohmann J R and Wepfer W J "Preparing and Sustaining Engineers for Global Practice,"
presented at 9th World Conference on Continuing Engineering Education, Tokyo, Japan, 2004

[8] Spuzic S, Abhary K, Stevens C, Fabris N, Rice J, Nouwens F “Contribution to Crossdisciplinary
Lexicon” Proceedings (ed. D Radcliffe and J Humphries) 4th ASEE/AaeE Global Colloquium on
Engineering Education, 26th – 29th September 2005

[9] Spuzic S and Nouwens F "A Contribution to Defining the Term ‘Definition’", Issues in
Informing Science and Information Technology Education, Volume 1 (2004) p. 645

[10] Thorne S "Mastering Advanced English Language" 1997 Palgrave Master Series, Macmillan
Press, London

[11] The American Heritage® Book of English Usage - A Practical and Authoritative Guide to
Contemporary English; http://www.bartleby.com/ and http://www.bartleby.com/64/

[12] "WordNet lexical database for the English language" developed by Cognitive Science
Laboratory at Princeton University, under direction of G A Miller; http://wordnet.princeton.edu/

12

[13] The UNESCO Thesaurus, http://databases.unesco.org/thesaurus/ (accessed on 13 October
2005)

[14a] “UN glossaries UN interpreters’ resource page… “ http://un-interpreters.org/glossaries.html
& http://databases.unesco.org/thesaurus/other.html (accessed on 13 October 2005)

[14b] “Handbook on geographic information systems and digital mapping” Department of
Economic and Social Affairs, Statistics Division, Studies and Mehods, UN Publications, New York,
2000

[15] “The Columbia World of Quotations” Columbia University Press, 1996

[16] Spuzic S "An Initiative in Improving Knowledge Transfer in Engineering Education",
Proceedings from the 2nd Asia-Pacific Forum on Engineering and Technology Education, The
University of Sydney, 4-7 Jyly 1999, edited by Z Pudlowski, page 41

[17] "WordNet bibliography"; J Rosenzweig & R Mihalcea (Last update: September 11, 2004)
http://engr.smu.edu/~rada/wnb/ (accessed on 21 October 2005)

[18] O'Hara T and Wiebe J “Classifying functional relations in Factotum via WordNet hypernym
associations'' In: Proceedings of the 4th Intl. Conference on Intelligent Text Processing and
Computational Linguistics (CICLing-2003) , Mexico City, 2003
http://www.cs.nmsu.edu/~tomohara/factotum-roles/factotum-roles.html

[19] Matwin S, Scott S, ”Text Classification Using WordNet Hypernyms”, Computer Science
Dept., University of Ottawa, 1998 http://acl.ldc.upenn.edu/W/W98/W98-0706.pdf

[20] Spuzic S, Abhary K, Stevens C “A Contribution to Lexis Disambiguation” (work in progress)
to be published in 2005

13

Lexicon Disambiguation1

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Lexicon Disambiguation1

Similar to Lexicon Disambiguation1 (20)

More from Sead Spuzic

More from Sead Spuzic (18)

Recently uploaded

Recently uploaded (20)

Lexicon Disambiguation1