Lexicon Disambiguation1


Published on

The evolution of knowledge has imposed branching into disciplines that use terms understood “correctly” only by experts. Globalisation, howecer, favours cross-disciplinary and transparent communication. Universities appear to be institutions that carry the responsibility for initiating projects aiming at disambiguation of scientific English language.

Published in: Education, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lexicon Disambiguation1

  1. 1. Contribution to Knowledge Management: Cross-disciplinary Terminology PEMT ’06 transcript Sead Spužić a), Kazem Abhary a), Clement Stevens b) and Faik Uzunović c) a) b) University of South Australia KFUPM University c) University of Zenica Key words: Disambiguation, definition, homonymy, informatics, knowledge, lexicon, management, multidisciplinary, terminology, thesaurus, synonymy Abstract: The evolution of knowledge has imposed branching into disciplines that use terms understood “correctly” only by experts. Globalisation favours cross-disciplinary and transparent communication. However, these trends have uncovered impedances such as prolixity, ambiguity and jargon. Internet enables communication with the speed of light, thus exposing other limits to knowledge transfer, such as misinformation and misunderstanding. Knowledge is transferred by interaction of language with other models (e.g. figures) and by demonstrations. Transparency of terminology is critical for knowledge management. This treatise presents an axiomatic definition of the term 'definition' itself, in order to enable a rational analysis of lexical elements - the words. The examples of confusing terms (homonyms, synonyms) are discussed. A trend of using inter- disciplinary and transparent lexicon is proposed, with adopting a hierarchy of terms that allocates the priority to fundamental disciplines (mathematics, physics). Scientific lexicon should attribute to each definition a unique set of words. The need to expunge scientific and technical language of ambiguity is urgent and the comprehensive review and “cleaning” of scientific terms is a task that demands the gathering together of appropriate institutions. The key to disambiguation of scientific language is in defining quantifiable criteria. Intro In this eternal and infinite ambient, our fate depends on our knowledge. Although Man is the most developed amongst the phenomena we know, this does not guarantees our survival. Rather than indulging in this relative perception, it is reasonable to assume that our ambient - the Universe - may soon bring in new challenges that stretch beyond our current capabilities. A rational strategy is to speed up both the development and transport of our knowledge. This can be enhanced by sharing and disseminating knowledge to adequate human resources, not to mention broader possibilities. Academe and other structures concerned with broadening and disseminating knowledge are indeed conscious of the need for globalization and disambiguation of this invaluable treasure. Reference publications [1-20] used in this transcript present only a very limited and arbitrary selection amongst numerous sources which present evidence of this awareness for example, by the virtue of motions such as European University Association [2], The Bologna Process [3] or UNESCO Thesaurus [13,14]. 1
  2. 2. One of the principal media for transport of knowledge is a language; its basic elements are the words (terms). Lexicon (alphabetically arranged lists of words setting forth their meanings and etymology) is direct hyponym of the term ‘knowledge’ [12]. The category of so-called "closed class words" has a fixed, limited number of words which themselves have permanent, final form and meaning; new words are rarely added. The members include: pronouns, prepositions, determiners and conjunctions [10]. In the case of so-called "open class words", containing nouns, verbs, adjectives and adverbs, new words can be added as they become necessary [10]. However, much too frequently, new words are added although old words, providing a satisfactory meaning, do already exist. This causes synonymy (e.g. "open class words" are also called "lexical words", while "closed class words" are also termed “structural” or “function” or “grammatical” words). Or, vice versa, new meanings are attached to words used in another discipline to denote differing concept, thus causing homonymy (e.g. word 'discipline' could mean a branch of knowledge; "in what discipline is his doctorate?"; "anthropology is the discipline focused on study of human beings"; but the same word could mean ‘punishment’ o rat least ‘orderly prescribed conduct or pattern of behaviour’). “There does not seem to be a consensus about what many of the basic terms mean, or which is the overarching concept, … under which other terms might be presumed to be subsets. …. (C)learly, the multiplicity of definitions for the same concepts, false synonyms and so forth show that the world of scholarship needs an approach to definitions of sufficient dimensionality.” [4] “The recent globalisation trends show that, on all fronts - education, marketing, industry, science, social standard infrastructure, health - we need a common, well defined, language. Workers in all disciplines are expected to function effectively in global trans-disciplinary communities.”[5] “The comprehensive review and ‘cleaning’ of scientific terminology is certainly an immense task that demands the gathering together of competent institutions. The need to expunge scientific and technical language of ambiguity and prolixity is urgent and becoming increasingly so.” [8] Rational analysis of lexical elements - the words - and their relations requires at least an axiomatic definition of the term 'definition' itself. Hence an initial definition is proposed herewith, in the following section. In addition, a number of examples of ambiguous terms is presented along with an attempt to propose their disambiguation. In doing so, an effort is made to avoid homonymy, synonymy, circularity and other ambiguities. The presented examples are rather arbitrarily chosen illustrative cases of ambiguous terms and proposed clarifications; an attempt is made to lay down the initial formulations thus opening the floor for further documented improvements. Theoretical concepts are presented in references [9] and [20]; for convenience, some key definitions are cited in full extent. Definition (excerpt cited from [20]) Minimum Intent: The following definition of a term 'definition' is presented as a reference, (a metric, a comparator, a norm) that must not be violated when defining scientific and engineering terms. Axioms: 1) ‘Something’ is a term that has a most general meaning, it can mean anything (but it does not automatically include ‘everything’). 2) 'Ambient' is everything in the vicinity of, and, to a certain degree, within something. 3) ‘Event’ is something that can be distinguished from ambient. 4) ‘Relation’ is something involving, at least, two events. 2
  3. 3. 5) ‘System’ is constituted by at least two relations; this implies that a system also includes, at least, two events. 6) ‘Phenomenon’ is a generic term (hypernym) for the above terms, providing that one or several of human senses indicate (directly or indirectly) existence of so termed system, relation, event, ambient, or something else. 7) All other terms used within this theorem - apart from the term “definition” and the terms listed in inverted commas under 1) to 6) above - are already intrinsically known; understanding of each of these terms does not contradict to any other term, and it does not violate logics. Note: most of these terms will be defined once the definition of the term ‘definition’ is agreed upon; some proposed definitions are given in this discourse. Theorem: "Definition" is a fixed, static form (a model; a concept; an appearance of something as distinguished from the substance of which it is made; something autonomous from its own representation, imprint, or description) of some relation(s) that significantly increases the probability of realisation of an intended (premeditated) change of some phenomenon (or phenomena). Such a change is to be achieved by an entity that is capable of utilising this definition for such a specified purpose. A definition cannot be generated, or used without the existence of a system, which is organised and structured above certain level of chaos. However, once it is generated and recorded, a definition can continue to exist (to be recorded) without the existence of the mentioned entity. A definition should be complemented with a minimum intent statement: a context that delimits a minimum domain of purposes for which it can be used. This statement does not exclude the possibility of using the same definition correctly for some other purpose. However, this extended use must not violate (contradict) already established meaning; e.g. this must not cause synonymy or homonymy. A definition must be complemented with axioms, with one or more examples, and, when needed and possible, with figures and animated representations. Definitions are necessary bits needed to construct and communicate the subject of knowledge. A definition is built by means of its structural components: pieces of information. Information is built by virtue of its construction elements (signals of various kinds); the most frequently used include figures and terms. Terms include symbols, numbers and words, and although they can be transferred by means of figures, they can also be transferred by means of sounds which are registered by hearing senses. It is worth noting that information media can be mutually translated, i.e. visual info can be translated into information received by tactile or hearing senses. History of media used to record an information and a definition shows a variety of options. Alphabetic writing (in which consonant and vowel sounds are presented by letters or other symbols such as Braille characters and Morse codes) is the most widespread system, but it is not the earliest, nor is it the only one. Writing has evolved from an extension of pictures that iconically represented some thing or action and then the word that bore that meaning. This approach led to so-called character script, such as that of Chinese, in which each word is represented by a separate symbol. There is no reason for restricting definition to alphanumeric records only; indeed, the figures (including drawings) are very efficient in carrying comprehensive information. Many sciences have accepted ideograms to convey sophisticated notions. For example, in mathematics, symbols π, ∞, ‰, ∫, ≥, represent erudite concepts. Optimal solution is a combination of text and figures (an animation and sound may be added, when necessary). 3
  4. 4. Aspects of information metrics were discussed by C E Shannon [1] who furthered the principles of information theory and endowed the word information with a measure, so-called info-entropy: ..................................... (1) H = info-entropy (the expected value of self-information), p(i) = probability of understanding i-th interpretation of the presented term n = number of possible interpretations of the presented term. "The choice of a logarithmic base corresponds to the choice of a unit for information measure. If the base 2 is used the resulting units may be called binary digits, or more briefly bits. A device with two stable positions, such as a relay or a flip-flop circuit, can store one bit of information. N such devices can store N bits ..." [1]. Using Napier's logarithm in Eq (1) appears more logical; however transformation from base e = 2.7183... to base 2, is a simple matter of introducing an appropriate constant ln2 = 0.6931. A ‘measure’ is a phenomenon used to enable a comparison of (the groups of) other phenomena. ‘Comparison’ is a definition indicating whether (or to what degree) one phenomenon differs from other phenomena. When comparison indicates that phenomena are sufficiently identical, phenomena can be counted using numbers. A ‘number’ is a generic measure. Examples of definitions: Readily available sources (e.g. dictionaries) define the term “figure” in various ways: (a) a number symbol, (b) numeral, (c) digit, (d) a geometric form (e.g. a line, triangle, or sphere) especially when considered as a set of geometric elements (e.g. points) in space of a given number of dimensions, (e) a diagram or pictorial illustration of textual matter, (f) a short coherent group of notes (sounds) that may constitute a part of a melody.[8,9] The first two above definitions, (a) and (b), can themselves be taken as synonymic. The terms "figure" and "numeral" are synonyms, because both are defined in the same way as follows: "figure" ("numeral") is a conventional symbol (a figure or character) used to represent a number. The definitions given in (c), (d), (e) and (f) above, have different meanings. Thus the term "figure", attributed to each of these four cases appears to be a homonym. By ignoring presence of any noise and assuming 4 equally likely homonyms, according to Eq (1) information entropy of term ‘figure’ is calculated to be equal to 2 bits. (Minimum intent:) The following definitions are presented to provide examples how synonyms, homonyms and other ambiguities can be avoided: “Figure”: (n) an arrangement of points made within two-dimensional space to present a visual static impression (a perception) of something (e.g., a figure printed on a book page, showing a front view of a home). Still assuming the absence of any noise, after eliminating one of four homonymic ‘figure’ terms, its information entropy drops 20%, (falls to 1.585 bits). If we eliminate all homonyms, information entropy drops to zero, and term “figure” becomes self-explanatory, in the absence of another noise. The following definitions are presented for the sake of making the term 'figure' more distinguishable from some of its homonyms: 4
  5. 5. “Numeral” is any of the elements that can be combined to form numbers in a number system (e.g., decimal system, binary system, hexadecimal system, etc.). “Digit” is a figure representing a numeral; examples: "0", "1", "A". "Number" is a most general measure, systematically ordered and analysed within mathematics. "Cypher" is a figure representing a number; examples: "10001001", "10,001.001". Important capacity of terms - to represent complex information, definition and even the complete theory - can be seriously hindered by ambiguity, homonymy and synonymy. Thus for example, by saying that a system is adiabatic, a number of physiochemical relations is ascribed to this system, assuming that the receiver of this information knows the meaning of term "adiabatic system", i.e. assuming availability of a disambiguated definition of this term. Tendency of professional institutions focused on particular filed (e.g. information technology), to use certain lexical phrases more frequently should not lead to usurpation of a single term, segregated and disconnected from the original phrase. Local use of an abbreviation is a better solution in such a case. In distinguished cases, introducing a new term is an appropriate solution; in such events it is recommended to consult academe before offering such new term to public scrutiny. 'Knowledge' is a system of 'disciplines' and their relations. A 'discipline' is a system of 'theories', boundary 'hypotheses' and their relations. 'Theory' is a system of 'definitions'. 'Hypothesis' is a system of assumptions that may, or may not become definitions. If proven to be fallacies, the relevant hypotheses should be rejected from the parent discipline, and replaced by other boundary hypotheses. Rationale for the need for inducting boundary hypotheses at the edge of verified knowledge can be elucidated using the analogy with the justification for introducing additional ‘uncertain’ digit during recording the measurements by means of significant digits. Any boundary becomes more vague the closer we approach it. Inasmuch the language presents, with its essentially mathematical crust, elite medium of knowledge, at its best it will reflect these shades of vague boundaries. However, we must act in the direction of overcoming, not artificially enlarging these ambiguities. Reference database The scientific community and broader society are well aware of problems caused by inconsistencies in defining the key elements in English language – words [4-19]. Accordingly, a number of projects has been launched aiming at contributing to disambiguation of the English terms; examples include The American Heritage Book of English Usage, WordNet and UNESCO Thesaurus. The American Heritage Book of English Usage presents current problems in English usage to enable an informed selection of terms. It suggests answers to questions such as: Has a particular usage been criticized for substantial reason in the past? What are the linguistic and social issues involved? Have people frequently applied this usage in the past? This source employs The Usage Panel and Usage Ballots to collate opinions of the American Heritage Usage Panel (158 members), which has been in existence since 1964. While the ballots are not scientific surveys in that they are not conducted under controlled circumstances with stringent questioning criteria, they are nonetheless carefully worded to get useful responses. The examples discussed by ballots are sentences adapted from actual citations, presenting a number of cases, giving a specific usage in a variety of different linguistic environments. Many words have a number of meanings, and experience has shown that the panel’s opinions about a usage can vary considerably. [11] 5
  6. 6. In most ballots the panelists are asked whether they find a particular word or construction to be acceptable or not in formal standard English. In reality, many shades of acceptability do exist. What one panelist approves enthusiastically, another may accept only cautiously. A compromise has been made, deciding that it is not practical to differentiate degrees of approval or disapproval. For certain controversial usages a question allows for the option of indicating acceptability in informal contexts and for and indication of preferences or for providing alternative ways of saying something. [11] The fact that a word has a lengthy history of use by many provides a compelling argument for its continued use today. But sometimes historical precedent clashes with contemporary attitudes. In these cases, both sides of the controversy are presented, and the historical precedent given priority even if it contradicts to the judgments of the majority of panelists. On the other hand, some expressions have become so stigmatized that even the history may not save them from provoking a negative response in a good portion of your readers. In these cases, a warning is provided about the consequences of using a stigmatized usage. [11] With ballot responses going back to the 1960s, the issue of historical perspective requires to be addressed. The book offers results from surveys done in 1987 and later, while the results of an earlier survey are presented whenever it is 'feel it can help in adjudicating an issue'. [11] Taking in account a “non-scientific” survey may seem odd; however, rationales for maintaining strong links with “common” English language are manifold: - Efforts to improve scientific terminology would be hampered without analysing language heritage, including its fallacies; - Knowledge dissemination requires presenting its theories and hypotheses using the language of common sense; - Our existence without knowledge would be impossible; without the arts it would become distorted. WordNet is an online ('electronic') lexical reference system (database) [12]. WordNet was developed starting with 1985, by the Cognitive Science Laboratory at Princeton University under the direction of Professor G. A. Miller. This online lexical reference system is designed on the basis of current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets (synsets), each representing one underlying lexical concept. Synsets are used as basic units of semantic meaning and linked to a large collection of semantic relations including hyponymy and antonymy. In other words, WordNet organizes lexical information in terms of word meanings, rather than word forms. The purpose is manifold: word sense identification, information retrieval, selectional preferences of verbs, and lexical chains. WordNet can be used to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications, such as machine translation. Extensive bibliography [17] listing research publications that refer to the WordNet lexical database, adds evidence to significance of related problems. For example, WordNet is used in numerous researches aiming at text classification by means of artificial intelligence, based on the hierarchy of hypernyms, hyponyms, holonyms, meronyms and sister terms. [18, 19] UNESCO (United Nations Educational Scientific and Cultural Organization) Thesaurus: The UNESCO Thesaurus is a controlled and structured list of terms used in subject analysis of 6
  7. 7. publications in the fields of education, culture, natural sciences, social and human sciences, communication and information. It covers the major fields of knowledge that constitute the scope of UNESCO. Continuously updated (it contains 7,000 terms in English, 8,600 terms in French and 6,800 in Spanish); its multidisciplinary ‘terminology’ reflects the evolution of the UNESCO activities. According to its own definition, 'thesaurus is a controlled and dynamic documentary language containing semantically and generically related terms, which comprehensively covers a specific domain of knowledge'. 'Knowledge is information that is presented within a particular context, yielding insight on application in that context, by members of a community.' This source further defines 'information' as 'data that has been organized in such a way that it achieves meaning, in a generalized way'. These definitions are both ambiguous and incomplete. In addition they are based on undefined terms: ‘meaning’, ‘organized’ and ‘data’. [13, 14] The UNESCO Thesaurus allows subject terms to be expressed consistently, with increasing specificity, and in relation to other subjects. It can be used to facilitate subject indexing in libraries, archives and similar institutions. [13,14] As in other subject thesauri, the terms in the UNESCO Thesaurus are linked together by three types of relationships: (i) Hierarchical relationships, which link terms to other terms expressing more general and more specific concepts - i.e. broader terms and narrower terms. Hierarchically related terms are grouped under general subdivisions (known as "microthesauri"), which in turn are grouped into the areas of knowledge covered by the Thesaurus. (ii) Associative relationships, which link terms to similar terms (related terms) where the relationship between the terms is non-hierarchical. (iii) Equivalence relationships, which link "non-preferred" terms to synonyms or quasi- synonyms which act as "preferred" terms. Main sections of the Thesaurus (e.g. Education, Science, Culture) link to the microthesaurus headings in each section. Each microthesaurus heading links to an alphabetical list of the preferred terms which are entered under that microthesaurus. Each term includes a link to a display of its broader terms, narrower terms, related terms, scope notes, non-preferred terms, French equivalent and Spanish equivalent. The UNESCO Thesaurus also includes scope notes which explain the meaning and application of terms, and French and Spanish equivalents of English preferred terms. [13,14] There is a number of other thesauri under, and beyond the umbrella of United Nations system. However this reference database did not prevent from appearance of homonymy, synonymy and ambiguity in educational and scientific publications. Examples Several remarks will be useful before presenting examples of synonymy, homonymy and ambiguity. We address the 'open class words' (also called 'lexical words') within one language only: the scientific English. Non-coincidental homonymy or synonymy is not discussed in this treatise since it is assumed that context and syntax attribute the sufficiently clear meaning to this class of words. The recommended definitions are presented to provide examples of disambiguation; in each case, the axioms are identical to those presented in section Definition above. 7
  8. 8. The presented examples, comments and suggestions are provided for purpose of illustrating the problems and sketching the possible solutions. Decisions about eliminating homonyms and adopting definitions can only be made by a broad multidisciplinary academic consortium. Number of examples is also provided in sources [8, 9, 16 and 20]. i) Terms ‘Terminology’ and ‘Term’ Source WordNet defines ‘terminology’ as a system of words used to name things in a particular discipline; example: "legal terminology"; "biological terminology". Term ‘nomenclature’ is listed as a synonym. Direct hypernym is term ‘word’ [21]. Sources [13,14] recognise term ‘terminology’ as a descriptor and suggest its applications in phrases such as ‘communication terminology’, ‘scientific terminology’, ‘educational terminology’; it lists usage such as ‘technical terminology. The same sources define term ‘glossary’ as ‘A vocabulary not necessarily in alphabetic order, with definitions or explanations for all terms.’ It is suggested herewith that term ‘terminology’ be applied in analogy to terms ‘technology’ (science of techniques), ‘biology’ (science of biosphere), ‘anthropology’ (science of humankind), ‘psychology’ (science of the psyche), etc, hence: ‘Terminology’ (n) is science of terms. WordNet suggest 8 meanings for term ‘term’. The most differing options are: a) term (n) is a (special class of) words used for some particular thing, e.g. "he learned many medical terms" b) 'a limited period of time' The following definition of the word ‘term’ is proposed hereby: ‘Term’ (n) is word that denotes something. It is instructive to elaborate on the difference between the term ‘term’ and its hypernym - ‘word’ . Term ‘word’ denotes all grammatical variations of nouns, verbs, adjectives, adverbs, pronouns, conjuctions, prepositions and interjections. ‘Term’ is the lexical model, a concise one-word representation of an event, relation, phenomenon, system, discipline, theory, or something else. Examples of terms: ‘material’, ‘probability’,’element’. Example: "'The term preschooler signals another change in our expectations of children. While toddler refers to physical development, preschooler refers to a social and intellectual activity: going to school.' Attribution: Lawrence Kutner (20th century), U.S. child psychologist and author. Toddlers and Preschoolers, introduction (1994). " [15] ii) Term “Material” The WordNet [12] provides 5 options. The most differing meanings include: a) Material (n) is information (data or ideas or observations) that can be used or reworked into a finished form; "the archives provided rich material for a definitive biography". b) Material, (n) is the tangible substance that goes into the makeup of a physical object) "coal is a hard black material"; "wheat is the material they use to make bread". UNESCO Thesaurus [13,14] provides the following descriptors (suggested as preferred terms): ‘Audiovisual materials’, ‘Building materials’, ‘Composite materials’, ‘Dangerous materials’, ‘Reference materials’, ‘Materials engineering’, ‘Machine readable materials’, ‘Bookform materials’. 8
  9. 9. The same source recommends the following descriptor: ‘International circulation of materials’ with a scope note: 'Use only in relation to agreements that aim to facilitate the international exchange of materials intended for educational, scientific and cultural purposes.' [13,14] It appears that “material” is a generic term, sometimes subordinated to hypernyms “matter” and “substance”. It is recommended herewith to cease this subordination, and introduce the following generic meaning: Material (n) is physiochemical phenomenon that is detected by various sensors as solid, liquid or gas. In most cases each of these states can be reversibly transformed into other states under appropriate combination of temperature and pressure, e.g. water can be cooled to solidify as ice. The smallest fraction of material are atoms and ions, the largest are celestial objects (such as Earth or Mars). Beyond these limits, terms such as “subatomic particles”, “electromagnetic field”, “gamma radiation”, “solar system”, “galaxy” etc are used, none of them to be referred to as “material”. This is not to say that there are no subatomic particles present within material; also, current evidence suggests that material exists within the galaxies and other phenomena abroad the universe. Quantity of material is measured in moles. Phrases such as “teaching materials” should be understood as “materials prepared for educational purposes”. For example, graphite powder stored in special beakers for teaching demonstrations in laboratories, liquid nitrogen kept in special cryogenic storage vessels (dewars), can be termed “teaching materials”. On the contrary, items such as textbooks, lecture notes or tutorial sheets generally are not to be termed “teaching materials”, if their content – discourse and treatise – is discussed, rather than addressing the material itself, such as paper or compact disks. Phrases such as “metallic materials” and “composite materials” have synonyms. “Metallic materials” are in numerous publications termed “metals”, which violates the meaning of this term established in chemistry. “Composite materials” are frequently termed “composites”, which is in good concord with terminology that uses expression “ceramics” (for “ceramic materials”) and “polymers” (for “polymeric materials”). In terms of applications, the overwhelming majority of cases is related to solid forms of each of discussed categories. Thus, it is proposed herewith, to introduce a new term “metallics”, which conforms well with terms “ceramics”, “polymers”, “composites” and “solids”. It is recommended herewith abandoning lengthy homonyms such as “metallic materials”, “composite materials”, “polymeric materials” and “ceramic materials”. Metallics (n) are materials characterized by dominance of one or more metallic attributes, which is usually the consequence of metals (i.e. metallic bond) occupying the largest fraction of the solid structure; for example steels, bronze, gold, all belong to metallics. Another example of the appropriate use of term “material”: "'The asphaltum contains an exactly requisite amount of sulphides for production of rubber tires. This brown material also contains ichthyol...’; Attribution: State of Utah, U.S. public relief program (1935-1943). Utah: A Guide to the State (The WPA Guide to Utah), p.124, in »Mining«, Hastings House (1941) - Of a material found near the Great Salt Lake.” [15] iii) Term “Probability” UNESCO Thesaurus [13,14] recommends use of term “Probability Theory”; it is unclear whether “Probability Theory” is recommended instead using term “Probability”. The WordNet [12] provides the following definition: “Probability (n) is a measure of how likely it is that some event will occur; a number expressing the ratio of favorable cases to the whole number of cases possible) ‘the probability that an unbiased coin will fall with the head up is 0.5’”. Use of 9
  10. 10. phrase “how likely” means explaining one term by simply introducing another term of the same category (running in the loop, so-called circularity). Therefore, the following definition of “probability” is proposed hereby: ‘Probability’ (n) is a measure that can be used, in the absence of other measures, to define whether or not an event has, or will happen (or is happening). For example, in the case of equally probable events, probability can be quantified by means of the ratio 0 ≥ a/b ≤ 1, where a = number of counted events for which the probability is to be established; b = a + c (total number of events that can be counted in the observed ambient); c = number of remaining optional events (equally probable) in the observed ambient. The probability of an impossible event is 0, the probability of a certain (inevitable) event is 1. Example: ‘The probability that an unbiased die will fall with a face showing two (spots) is 1/6”, see Fig 1. Note: Theory of mathematical statistics provides comprehensive treatises defining probability. Fig 1: A die: the number of spots on each side varies from 1 to 6. iv) Term ‘Element’ UNESCO Thesaurus [13, 14] distinguishes four descriptors: - chemical element - elementary particles - structural elements (buildings) - trace elements and two additional usage phrases: - elementary schools - elementary education So one certainly would not recommend usage such as ‘Chemical elements such as Uranium, can damage structural elements, due to long-term emission of elementary particles, even when present as trace elements’, especially not in the elementary education. The WordNet [12] provides 7 options. The most differing meanings include: a) ‘Element’ (n) is any of the more than 100 known substances (of which 92 occur naturally) that cannot be separated into simpler substances and that singly or in combination constitute all matter b) ‘Element’ (n) is a component, constituent, an artifact that is one of the individual parts of which a composite entity is made up; especially a part that can be separated from or attached to a system, e.g. "spare element for cars". The above WordNet use cited under (a) is promoted herewith as preferable use for this term. Use presented under (b) above, can be substituted by terms ‘component’ or ‘constituent’. UNESCO descriptors, such as ‘chemical element’ and ‘structural element’ are too lengthy, while the phrase ‘elementary education’ is too ambiguous. 10
  11. 11. v) Term ‘Bit’ UN/ECE (United Nations Centre for Trade Facilitation and Electronic Business) TRADE/CEFACT/2005/24 Recommendation No. 20 - Units of Measure used in International Trade Common Code in [14a] defines ‘bit’ as “a unit of information equal to one binary digit.” Other UN published source [14b] provides definition of a ‘bit’ as “a binary digit that can assume a value of 0 or 1.” Both definitions concur to definition promoted in scientific discipline of Information Science. The WordNet [12] provides 10 options. Apart from the above, the most differing meanings include: a) Bit (n) is the cutting part of a drill; usually pointed and threaded and is replaceable in a brace or bitstock or drill press; for example: "he looked around for the right size bit"; b) Bit (n) is an indefinitely short time; "wait just a moment"; "it only takes a minute"; "in just a bit". This is a typical example of new-fashioned disciplines in demand usurping terms thus ignoring the cultural heritage and increasing the language ambiguity. It is proposed herewith to decide appropriate nomenclature by means of an informed and educated consensus. Conclusions In this Knowledge Age enhanced by artificial intelligence means, both communication speed and misinformation waste, multiply at critical rates. Particularly obstructive is increase in information entropy due to accumulation of homonyms and synonyms combined with other causes of ambiguity. Universities appear to be institutions that carry the responsibility for initiating projects aiming at disambiguation of scientific English language. Artificial intelligence is invaluable in endeavors aiming at disambiguation of scientific and engineering terminology but the human intelligence lays down superior criteria. In addition, momentous efforts exhibited by approaching the problems of language prolixity, ambiguity and translation by means of artificial intelligence, may be significantly reduced by virtue of eliminating amassed homonyms and synonyms and by introducing more transparent definitions of key terms by virtue of human intelligence. Promoting a transparent, cross-disciplinary scientific and engineering terminology by means of establishing a cross-disciplinary academic consortium will present significant contribution to dissemination and broadening stock of knowledge. This coordinated effort must take in account lexical heritage by means of intelligent and common sense consideration of historically established use of English language. Authors, editors and publishers would have a competent source of lexical references, and readers would find such a lexis useful guide in their search for knowledge. 11
  12. 12. Reference Publications [1] Shannon C E ‘A Mathematical Theory of Communication’ The Bell System Technical Journal, Vol. 27, p. 379, (July 1948); http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html (accessed on 13 October 2005) [2] The European University Association http://www.eua.be/eua/en/about_eua.jspx (accessed on 13 October 2005) [3] "Information on the Bologna Process" by Admissions Officers' and Credential Evaluators' professional section of the EAIE - European Association for International Education http://www.aic.lv/ace/ace_disk/Bologna/index.htm (accessed on 13 October 2005) [4] McCarty S "Cultural, Disciplinary and Temporal Contexts of e-Learning and English as a Foreign Language", eLearn MAGAZINE published by ACM - Association for Computing Machinery; http://www.elearnmag.org/subpage.cfm?section=research&article=4-1 (accessed on 23 Sept 2005) [5] Downey G L, Lucena J C, Moskal B, Bigley T, Hays C, Jesiek B, Kelly L, Lehr J, Miller J, and Nichols-Belo A "Engineering Cultures: Expanding the Engineering Method for Global Problem Solvers" (Editors: D Radcliffe and J Humphries) Proceedings 4th ASEE/AaeE Global Colloquium on Engineering Education, Sydney, 26-29 September 2005 [6] Miller G A "Ambiguous Words" (Originally published March 2001 at Impacts Magazine) http://www.kurzweilai.net/meme/frame.html?main=/articles/art0186.html (accessed on 13 October 2005) [7] Lohmann J R and Wepfer W J "Preparing and Sustaining Engineers for Global Practice," presented at 9th World Conference on Continuing Engineering Education, Tokyo, Japan, 2004 [8] Spuzic S, Abhary K, Stevens C, Fabris N, Rice J, Nouwens F “Contribution to Crossdisciplinary Lexicon” Proceedings (ed. D Radcliffe and J Humphries) 4th ASEE/AaeE Global Colloquium on Engineering Education, 26th – 29th September 2005 [9] Spuzic S and Nouwens F "A Contribution to Defining the Term ‘Definition’", Issues in Informing Science and Information Technology Education, Volume 1 (2004) p. 645 [10] Thorne S "Mastering Advanced English Language" 1997 Palgrave Master Series, Macmillan Press, London [11] The American Heritage® Book of English Usage - A Practical and Authoritative Guide to Contemporary English; http://www.bartleby.com/ and http://www.bartleby.com/64/ (accessed on 13 October 2005) [12] "WordNet lexical database for the English language" developed by Cognitive Science Laboratory at Princeton University, under direction of G A Miller; http://wordnet.princeton.edu/ (accessed on 13 October 2005) 12
  13. 13. [13] The UNESCO Thesaurus, http://databases.unesco.org/thesaurus/ (accessed on 13 October 2005) [14a] “UN glossaries UN interpreters’ resource page… “ http://un-interpreters.org/glossaries.html & http://databases.unesco.org/thesaurus/other.html (accessed on 13 October 2005) [14b] “Handbook on geographic information systems and digital mapping” Department of Economic and Social Affairs, Statistics Division, Studies and Mehods, UN Publications, New York, 2000 [15] “The Columbia World of Quotations” Columbia University Press, 1996 [16] Spuzic S "An Initiative in Improving Knowledge Transfer in Engineering Education", Proceedings from the 2nd Asia-Pacific Forum on Engineering and Technology Education, The University of Sydney, 4-7 Jyly 1999, edited by Z Pudlowski, page 41 [17] "WordNet bibliography"; J Rosenzweig & R Mihalcea (Last update: September 11, 2004) http://engr.smu.edu/~rada/wnb/ (accessed on 21 October 2005) [18] O'Hara T and Wiebe J “Classifying functional relations in Factotum via WordNet hypernym associations'' In: Proceedings of the 4th Intl. Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003) , Mexico City, 2003 http://www.cs.nmsu.edu/~tomohara/factotum-roles/factotum-roles.html (accessed on 21 October 2005) [19] Matwin S, Scott S, ”Text Classification Using WordNet Hypernyms”, Computer Science Dept., University of Ottawa, 1998 http://acl.ldc.upenn.edu/W/W98/W98-0706.pdf (accessed on 21 October 2005) [20] Spuzic S, Abhary K, Stevens C “A Contribution to Lexis Disambiguation” (work in progress) to be published in 2005 13