Using ontology for natural language processing
                      Cr˘c˘oanu Constantin Sergiu
                        a a
                              January 21, 2012


                                    Abstract
         Natural language processing is represented by a set of methods and
     techniques used to mediate the human-machine communication. To make
     this possible we have to define a communication format and software able
     to analyse, understand and give appropriate response. For the commu-
     nication level a formal representation of the knowledge it is needed and
     this can be represented by ontology.

Keywords: natural language processing, ontology, artificial intelligence




                                        1
1     Introduction
Ontology is defined as representing knowledge in a formal model and is based on
conceptualization; conceptualization of a knowledge area must be understood as
objects, concepts plus other entities that are assumed to exist and the relations
that exist among them.

Depending on the purpose, context, coverage and the way that are used, on-
tology can be general, middle or specific.

Natural language processing is considered to be a sub-field of artificial intelli-
gence and has the main goal of making systems smart enough to make inferences
and respond with correct and complete answer when requested by a user. Using
ontologies in natural language processing is a relatively new part of artificial in-
telligence.
An artificial neural network is a computational model inspired by biological neu-
ral networks that is able to learn and it is used to solve problems that need an
answer based on previous experience of the system.


2     Technical
The NLP view described in this article uses a conjunction of general and specific
ontologies. Basically there are two methods to create an ontology: from scratch
or using already existing ontologies. There are at least three ways of combining
ontologies: inclusion, restriction and refinement.

Our approach has three parts:

    • a general ontology based on lexemes is needed. Suggested Upper Merged
      Ontology (SUMO) is currently the best candidate because its domain forms
      the largest formal public ontology in existence today and it is the only for-
      mal ontology that has been mapped to all of the Wordnet lexicon.WordNet
      is a large lexical database of English. Nouns, verbs, adjectives and adverbs
      are grouped into sets of cognitive synonyms (synsets), each expressing a
      distinct concept.


    • a middle or specific level ontology must be used.

                                         2
• and the program that is a mediator between human language and machine
      language by using the two types of ontologies.


3     Architecture
The next diagram shows the relation between human language, ontologies and
natural language processing.




Human knowledge is mapped to a middle or specific ontology. This ontology
will use general ontology when needed. The NLP chooses the correct ontology
for the current domain and apply the corresponding algorithm. Then the knowl-
edge is translated into Machine Language. It is well agreed today that NLP
has not yet reached its goal, to make machine understand human language by
drawing inferences, but for now we receive an answer to our current request and
sometimes computers seem smart.


4     How to be used
This section describes two concrete cases of using ontologies with natural lan-
guage processing.

First example is using ontologies for automatically translating from one language
to other(for example from German to English). For this four ontologies can be
used :

                                       3
• ontology mapped for language A lexicon

    • ontology with grammar rules for lexicon A

    • ontology mapped for language B lexicon

    • ontology with grammar rules for lexicon B

    • ontology containing a dictionary that maps language A to B

These ontologies are merged to form a new ontology. An algorithm to combine
and use the resulting ontology is also needed. The program that implements this
algorithm would have as an input a text in language A and as an output the
corresponding text in language B.

The resulting ontology can be completed by using artificial neural network :
after each training translation performed the system can learn new rules that
must be taken into account.

A second example for using ontology is an automatic speech recognition and
generation system. This system can be used for example in automotive industry
or in banking services. One ontology is needed to generate two grammars, one
for speech recognition and the second one for speech generation. The system
puts questions to the user, gives or receives answers and executes commands.


5     How to extend ontologies
To completely define our solution for mediating human-machine communication
we need to define also a way of extending ontologies. According to [Jingshan
Huang et al.] for defining the semantics of an ontology concept three elements
must be determined: concept’s name, properties and relationships.
The proposed solution for extending ontologies is based on artificial neural net-
work. Every ontology it is represented by a directed graph G. Every graph it
is represented in a plan of it’s own, nodes are horizontal connected in the same
type of ontology(general, middle or specific), but these graphs are also vertical
connected, specific ontology is based on middle ontology and middle ontology
uses general ontology.



                                       4
Figure 1: there are three plans, one for the general ontology, one for the
               middle ontologies and one for specific ontologies

Graph description:
G = {VG , EG }
VG is the nodes set; every node has two views :

   • it represents a concept: it has a name

   • it is a perceptron: all inferences about this concept are represented by
     the formula Σi=1,n xi wi ck where xi is the input for the ith input, wi is the
     weight of this input and ck is the context; ith , wi ∈ [0, 1] and ck ∈ {0, 1};
     the learning rule is wi = wi + [T − A] ∗ xi and T is the correct result
     that the neuron should have shown, A is the actual output of the neuron;
     ck = 1 only if there is a number of inferences ≥ Θ that influence each
     other

EG is the edges set; every edge represents a property or a relationship.

Properties and relationships are the equivalent of inferences which are grouped
into subsets that influence each other.
Until now we have defined an artificial neuronal network but the main purpose

                                        5
is to be able to extend the ontology and this is done by using training sessions.
After each session the knowledge grows and training can stop when the trained
system is smart enough for a specified set of requirements.


6     Conclusions
Natural language processing can successfully use ontologies to mediate human-
machine communication. The final goal for this research domain is to transform
natural language processing into natural language understanding by the machine.
A complete natural language understanding must be able to:
    • Paraphrase an input text
    • Translate text into another language
    • Answer questions about the contents of the text
    • Draw inferences from the text
The first three objectives have relatively been accomplished but the fourth re-
mains only a concept that might become reality if NLP uses ontologies for con-
structing inferences.


References
 [1] Dr. Elizabeth D. Liddy, Natural Language Processing Encyclopedia of Li-
     brary and Information Science: Second Edition DOI: 10.1081/E-ELIS-
     120008664
 [2] Dario Bianchi and Agostino Poggi, Ontology Based Automatic Speech
     Recognition and Generation for Human-Agent Interaction, University of
     Modena and Reggio Emilia, Italy, June 14-June 16, ISBN: 0-7695-2183-
     5
 [3] Ru-Yng Chang, Chu-Ren Huang, Feng-Ju Lo, Sueming Chang, From Gen-
     eral Ontology to Specialized Ontology: A study based on a single author
     historical corpus
 [4] Jingshan Huang, Jiangbo Dang, Jose M. Vidal, Michael N. Huhns, Ontology
     Matching Using an Artificial Neural Network to Learn Weights

                                       6
[5] Teaching Activity of Sabin-Corneliu Bura, http://profs.info.uaic.ro/
    ~busaco/teach
[6] http://research.microsoft.com

[7] http://www.w3.org/TR/wordnet-rdf

[8] http://www.informaworld.com

[9] http://www.ontologyportal.org

[10] http://wordnet.princeton.edu

[11] http://www.computer.org/portal/web/csdl/doi/10.1109/ENABL.
     2004.47




                                  7

Using ontology for natural language processing

  • 1.
    Using ontology fornatural language processing Cr˘c˘oanu Constantin Sergiu a a January 21, 2012 Abstract Natural language processing is represented by a set of methods and techniques used to mediate the human-machine communication. To make this possible we have to define a communication format and software able to analyse, understand and give appropriate response. For the commu- nication level a formal representation of the knowledge it is needed and this can be represented by ontology. Keywords: natural language processing, ontology, artificial intelligence 1
  • 2.
    1 Introduction Ontology is defined as representing knowledge in a formal model and is based on conceptualization; conceptualization of a knowledge area must be understood as objects, concepts plus other entities that are assumed to exist and the relations that exist among them. Depending on the purpose, context, coverage and the way that are used, on- tology can be general, middle or specific. Natural language processing is considered to be a sub-field of artificial intelli- gence and has the main goal of making systems smart enough to make inferences and respond with correct and complete answer when requested by a user. Using ontologies in natural language processing is a relatively new part of artificial in- telligence. An artificial neural network is a computational model inspired by biological neu- ral networks that is able to learn and it is used to solve problems that need an answer based on previous experience of the system. 2 Technical The NLP view described in this article uses a conjunction of general and specific ontologies. Basically there are two methods to create an ontology: from scratch or using already existing ontologies. There are at least three ways of combining ontologies: inclusion, restriction and refinement. Our approach has three parts: • a general ontology based on lexemes is needed. Suggested Upper Merged Ontology (SUMO) is currently the best candidate because its domain forms the largest formal public ontology in existence today and it is the only for- mal ontology that has been mapped to all of the Wordnet lexicon.WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • a middle or specific level ontology must be used. 2
  • 3.
    • and theprogram that is a mediator between human language and machine language by using the two types of ontologies. 3 Architecture The next diagram shows the relation between human language, ontologies and natural language processing. Human knowledge is mapped to a middle or specific ontology. This ontology will use general ontology when needed. The NLP chooses the correct ontology for the current domain and apply the corresponding algorithm. Then the knowl- edge is translated into Machine Language. It is well agreed today that NLP has not yet reached its goal, to make machine understand human language by drawing inferences, but for now we receive an answer to our current request and sometimes computers seem smart. 4 How to be used This section describes two concrete cases of using ontologies with natural lan- guage processing. First example is using ontologies for automatically translating from one language to other(for example from German to English). For this four ontologies can be used : 3
  • 4.
    • ontology mappedfor language A lexicon • ontology with grammar rules for lexicon A • ontology mapped for language B lexicon • ontology with grammar rules for lexicon B • ontology containing a dictionary that maps language A to B These ontologies are merged to form a new ontology. An algorithm to combine and use the resulting ontology is also needed. The program that implements this algorithm would have as an input a text in language A and as an output the corresponding text in language B. The resulting ontology can be completed by using artificial neural network : after each training translation performed the system can learn new rules that must be taken into account. A second example for using ontology is an automatic speech recognition and generation system. This system can be used for example in automotive industry or in banking services. One ontology is needed to generate two grammars, one for speech recognition and the second one for speech generation. The system puts questions to the user, gives or receives answers and executes commands. 5 How to extend ontologies To completely define our solution for mediating human-machine communication we need to define also a way of extending ontologies. According to [Jingshan Huang et al.] for defining the semantics of an ontology concept three elements must be determined: concept’s name, properties and relationships. The proposed solution for extending ontologies is based on artificial neural net- work. Every ontology it is represented by a directed graph G. Every graph it is represented in a plan of it’s own, nodes are horizontal connected in the same type of ontology(general, middle or specific), but these graphs are also vertical connected, specific ontology is based on middle ontology and middle ontology uses general ontology. 4
  • 5.
    Figure 1: thereare three plans, one for the general ontology, one for the middle ontologies and one for specific ontologies Graph description: G = {VG , EG } VG is the nodes set; every node has two views : • it represents a concept: it has a name • it is a perceptron: all inferences about this concept are represented by the formula Σi=1,n xi wi ck where xi is the input for the ith input, wi is the weight of this input and ck is the context; ith , wi ∈ [0, 1] and ck ∈ {0, 1}; the learning rule is wi = wi + [T − A] ∗ xi and T is the correct result that the neuron should have shown, A is the actual output of the neuron; ck = 1 only if there is a number of inferences ≥ Θ that influence each other EG is the edges set; every edge represents a property or a relationship. Properties and relationships are the equivalent of inferences which are grouped into subsets that influence each other. Until now we have defined an artificial neuronal network but the main purpose 5
  • 6.
    is to beable to extend the ontology and this is done by using training sessions. After each session the knowledge grows and training can stop when the trained system is smart enough for a specified set of requirements. 6 Conclusions Natural language processing can successfully use ontologies to mediate human- machine communication. The final goal for this research domain is to transform natural language processing into natural language understanding by the machine. A complete natural language understanding must be able to: • Paraphrase an input text • Translate text into another language • Answer questions about the contents of the text • Draw inferences from the text The first three objectives have relatively been accomplished but the fourth re- mains only a concept that might become reality if NLP uses ontologies for con- structing inferences. References [1] Dr. Elizabeth D. Liddy, Natural Language Processing Encyclopedia of Li- brary and Information Science: Second Edition DOI: 10.1081/E-ELIS- 120008664 [2] Dario Bianchi and Agostino Poggi, Ontology Based Automatic Speech Recognition and Generation for Human-Agent Interaction, University of Modena and Reggio Emilia, Italy, June 14-June 16, ISBN: 0-7695-2183- 5 [3] Ru-Yng Chang, Chu-Ren Huang, Feng-Ju Lo, Sueming Chang, From Gen- eral Ontology to Specialized Ontology: A study based on a single author historical corpus [4] Jingshan Huang, Jiangbo Dang, Jose M. Vidal, Michael N. Huhns, Ontology Matching Using an Artificial Neural Network to Learn Weights 6
  • 7.
    [5] Teaching Activityof Sabin-Corneliu Bura, http://profs.info.uaic.ro/ ~busaco/teach [6] http://research.microsoft.com [7] http://www.w3.org/TR/wordnet-rdf [8] http://www.informaworld.com [9] http://www.ontologyportal.org [10] http://wordnet.princeton.edu [11] http://www.computer.org/portal/web/csdl/doi/10.1109/ENABL. 2004.47 7