SlideShare a Scribd company logo
1 of 10
Download to read offline
AUTOMATIC DOMAIN SPECIFIC
  TERMINOLOGY EXTRACTION USING A
      DECISION SUPPORT SYSTEM

    Imran Sarwar Bajwa1 , Imran Siddique2 and M. Abbas Choudhary3
            1 Department of Computer Science and Information Technology
                   The Islamia University of Bahawalpur, Pakistan
                             Phone: +92 (62) 9255466
                          E-mail: imransbajwa@yahoo.com

                    2 Facuty of Computer an EmergingSciences
    3 Balochistan University of Information Technology and Management Sciences
                                   Quetta, Pakistan.
                      Phone: +92 (81) 9202463 Fax: +92 (81) 9201064
                            E-mail: imran@buitms.edu.pk
                            E-mail:abbas@buitms.edu.pk




Abstract:      Speech languages or Natural languages contents are major
               tools of communication. This research paper presents a
               natural language processing based automated system for
               understanding speech language text. A new rule based model
               is presented for analyzing the natural languages and
               extracting the relative meanings from the given text. User
               writes the natural language scenario in simple English in a
               few paragraphs and the designed system has an obvious
               capability of analyzing the given script by the user. After
               composite analysis and extraction of associated information,
               the designed system gives particular meanings to an
               assortment of speech language text on the basis of its
               context. The designed system uses standard speech language
               rules that are clearly defined for all speech languages as
English, Urdu, Chinese, Arabic, French, etc. The designed
              system provides a quick and reliable way to comprehend
              speech language context and generates respective meanings.
              The application with such abilities can be more intelligent
              and pertinent specifically for the user to save the time.
Keywords: Terminology Extraction, Automatic text understanding,
          Speech language processing, Information extraction

1. INTRODUCTION
In daily life, natural languages or speech languages are main source of
commutation. Particular and typical set of words and vocabulary
collections are used for each field of life and they are quiet exclusive to
their use. In computer science, Natural Language Processing (NLP) is the
field used for the analysis of speech language text [4]. NLP introduces
many new concepts and new terminologies to describe its models,
techniques and processes. Some of these terms come directly from
linguistics and the science of perception, while others were invented to
describe discoveries that did not fit into any previous category [3]. Natural
Language understanding is a hefty field to grasp. Understanding and
comprehending a speech language means to know that what concepts a
word or a particular phrase stands under a particular perspective. To give
meanings to a particular sentence a system should know how to link the
concepts together in a meaningful way. It's satirical that the speech
languages are easiest for human beings in terms of learning, understanding
and using, on the other hand hardest for a computer to understand.
In modern era, computer machines have attained the ability of solving
complex mathematical and statistical problems with speed and grace but
still they are inefficient to comprehend the basics of spoken and written
languages. The designed automated system can be useful in various
business and technical software by only acquiring the requirements from
the user. The designed system will extract the required information from
the given text and provide to the computer for further automated
processing. Applications can be automated generation of UML diagrams,
query processing, web mining, web template designing, user interface
designing, etc. As far as this software is concerns the time, it takes to
explore all the facilities and services, should is far less than a minute and
this information is adequately useful for the engineers and computer users.

2. RELATED WORK
Natural languages have been an area of interest from last one century. In
the lat nineteen sixties and seventies, so many researchers as Noam
Chomsky (1965) [9], Maron, M. E. and Kuhns, J. L (1960) [10], Chow, C.,
& Liu, C (1968) [11] contributed in the area of information retrieval from
natural languages. They contributed for analysis and understanding of the
natural languages, but still there was lot of effort required for better
understanding and analysis. Some authors concentrated in this area in
eighties and nineties as Losee, R. M (1998) [6], Salton, G., & McGill, M
(1995) [8], Maron, M. E. & Kuhns, J. (1997) [9]. These authors worked for
lexical ambiguity [5]and information retrieval [7], probabilistic indexing
[9], data bases handling [3] and so many other related areas.
In this research paper, a newly designed rule based framework has been
proposed that is able to read the English language text and extract its
meanings after analyzing and extracting related information. The conducted
research provides a robust solution to the addressed problem. The
functionality of the conducted research was domain specific but it can be
enhanced easily in the future according to the requirements. Current
designed system incorporate the capability of mapping user requirements
after reading the given requirements in plain text and drawing the set of
speech language contents.

3. SPEECH LANGUAGE ANALYSIS
The understanding and multi-aspect processing of the natural languages
that are also termed as "speech languages", is actually one of the arguments
of greater interest in the field artificial intelligence field [6]. The natural
languages are irregular and asymmetrical. Traditionally, natural languages
are based on un-formal grammars. There are the geographical,
psychological and sociological factors which influence the behaviours of
natural languages [17]. There are undefined set of words and they also
change and vary area to area and time to time.Due to these variations and
inconsistencies, the natural languages have different flavours as English
language has more than half dozen renowned flavours all over the world.
These flavours have different accents, set of vocabularies and phonological
aspects. These ominous and menacing discrepancies and inconsistencies in
natural languages make it a difficult task to process them as compared to
the formal languages [13].
In the process of analyzing and understanding the natural languages,
various problems are usually faced by the researchers. The problems
connected to the greater complexity of the natural language are verb’s
conjugation, verbal inflexion & paradigms, active & passive vice, lexical
amplitude, problem of ambiguity, etc. All these problems are significant to
handle but problem of ambiguity is difficult to handle. Ambiguity could be
easily solved at the syntax and semantic level by using a sound and robust
rule-based system. During the interpretation process of the English
language text, the text is distinguished in four classes to handle the
ambiguity as: lexical analysis, syntactic analysis, semantic analysis and
pragmatic analysis. Such distinction is not exhaustive but it is useful to
focus on the problem and for practical purposes [7].
3.1.   Lexical Analysis:
The problem of words written or pronounced not correctly is omitted in this
problem scenario. These kinds of errors are simply solvable through the
comparison with the expressions contained in a dictionary. Lexical
ambiguity is created when a same word assumes various meanings [3]. In
this case that ambiguity is generated from the fact that which meanings will
be incorporated in which scenario. As an example we consider the "cold"
adjective in the following sentences as "That room is cold." and "That
person is cold." It turns out obvious that the same "cold" adjective assumes,
in the two phrases different meanings. In the first sentence it indicate a
temperature, in the second one a particular character of a person.
3.2.   Syntactic Analysis:
Syntax analysis is performed on world level to recognize the word
category. The syntactic analysis of the programs would have to be in a
position to isolate subject, verbs, objects and various complements. It is
little complex procedure. For example, "Imran eats the apple." In this
example, the actual meanings are that “Mario eats apple” but ambiguity can
be asserted if this sentence is conceived as “the apple eats Mario”.
3.3.   Semantic Analysis:
 To analyze a phrase from the semantic point of view means to give it a
meaning. This should let you understand we arrived to a crucial point.
Semantic ambiguities are most common due to the fact that generally a
computer is not in a position to distinguish the logical situations as "The
bus hit the pole while it was moving." All of us would surely interpret the
phrase like "The car, while moving, hit the pole.", while nobody would be
dreamed to attribute to the sentence the meant "the car hit the pole while
the pole was moving ".
3.4.   Pragmatic Analysis:
Pragmatic ambiguities born when the communication happens between two
persons who do not share the same context. As following example as "I
will arrive to the airport at 8 o'clock". In this example, if the subject person
belongs to a different continent, the meanings can be totally changed.

4. DESIGNED SYSTEM ARCHITECTURE
The designed speech language contents system has ability to draw speech
language contents after reading the text scenario provided by the user. This
system draws in five modules: Text input acquisition, text understanding,
knowledge extraction, generation of speech language contents and finally
multi-lingual code generation as shown in following fig.
4.1.   Text input acquisition
This module helps to acquire input text scenario. User provides the
business scenario in from of paragraphs of the text. This module reads the
input text in the form characters and generates the words by concatenating
the input characters. This module is the implementation of the lexical
phase. Lexicons and tokens are generated in this module.
4.2.    Contents Interpretation
This module reads the input from module 1 in the form of words. These
words are categorized into various classes as verbs, helping verbs, nouns,
pronouns, adjectives, prepositions, conjunctions, etc.

    Text                        Required
                   Text                       Terms
  Document                       Terms

       Text Input Acquisition

 Understanding
     Text

       Contents Interpretation

   Analyzing
    Terms

        Knowledge Retrieval


   Matching
    Terms                                   Domain
                                           Knowledge
       Terminology Extraction


  Required        Terms
   Output


Figure 1: Architecture of the designed Decision Support System for Domain Specific Terminology
Extraction


4.3.    Knowledge Retrieval
This module, extracts different objects and classes and their respective
attributes on the basses of the input provided by the preceding module.
Nouns are symbolized as classes and objects and their associated attributes
are termed as attributes.
4.4.   Terminology Extraction
This module finally uses speech language contents symbols to constitute
the various speech language contents by combining available symbols
according to the information extracted of the previous module.

5. Used Methodology
Conventional natural language processing based systems user rule based
systems. Agents are another way to address this problem [8]. In the
research, a rule-based algorithm has been used which has robust ability to
read, understand and extract the desired information. First of all basic
elements of the language grammar are extracted as verbs, nouns, adjectives,
etc then on the basis of this extracted information further processing is
performed. In linguistic terms, verbs often specify actions, and noun
phrases the objects that participate in the action [16]. Each noun phrase's
then role specifies how the object participates in the action. As in the
following example:
"Ahmed hit a ball with a racket."
A procedure that understands such a sentence must discover that Role is the
agent because he performs the action of hitting, that the ball as the thematic
object because it is the object hit, and that the racket is an instrument
because it is the tool with which hitting is done. Thus, sentence analysis
requires, in part, the answers to these actions: The number of thematic roles
embraced by various theories varies probably. Some people use about half-
dozen thematic roles [9]. Others use more times as many. The exact
number does not matter much, as long as they will great enough to expose
natural constraints on how verbs and thematic-instances form sentences.
Agent: The agent causes the action to occur as in "Ahmed hit the ball,"
Ahmed is agent who performs the task. But in this example a passive
sentence, the agent also may appear in a prepositional "The ball was hit by
Ahmed.''
Co-agent: The word with may introduce a join phrase that serves a partner
in the principal agent. They two carry out the action together "Ahmed
played tennis with Ali."
Beneficiary: The beneficiary is the person for whom an action has bee
performed: "Ahmed bought the balls for Ali." In this sentence Ali is
beneficiary.
Thematic object: The thematic object is the object the sentence is really
all about— typically the object, undergoing a change. Often the thematic
object is the same as the syntactic direct object, as "Ahmed hit the ball."
Here the ball is thematic object.
Conveyance: The conveyance is something in which or on which travels:
“Aslam always goes by train.”
Trajectory: Motion from source to destination takes place over at
trajectory. ID contrast to the other role possibilities, several prepositions
can serve to introduce trajectory noun phrases: "Ahmed and Aslam went in
through the front door."
Location: The location is where an action occurs. As in the trajectory role,
several prepositions are possible, which conveys meanings in addition to
serve as a signal that a location noun phrase is "Ahmed and Ali studied in
the library, at a desk, by the wall, a picture, near the door."
Time: Time specifies when an action occurs. Prepositions such at, before,
and after introduce noun phrases serving as time role fill "Ahmed and Ali
left before noon."
Duration: Duration specifies how long an action takes. Preposition such as
for indicate duration. "Ali and Zia jogged for an hour.”

6. CONCLUSION
The designed system for Automatic Domain Specific Terminology
Extraction using a Decision Support System is a robust framework for
inferring the appropriate meanings from a given text. The accomplished
research is related to the understanding of the human languages. Human
being needs specific linguistic knowledge generating and understanding
speech language contents. It is difficult for computers to perform this task.
The speech language context understanding using a rule based framework
has ability to read user provided text, extract related information and
ultimately give meanings to the extracted contents. The designed system
very effective and have high accuracy up to 90 %. The designed system
depicts the meanings of the given sentence or paragraph efficiently. An
elegant graphical user interface has also been provided to the user for
entering the Input scenario in a proper way and generating speech language
contents.

7. FUTURE WORK
The designed system analyzes and extracts the contents of a speech
language as English. Current designed system only works for the active-
vice sentences. Passive-vice sentences are still to work for to make the
system more efficient and effective for various business applications. There
is also periphery of improvements in the algorithms for the better extraction
of language parts and understanding the meanings of the speech language
context. Current accuracy of generating meanings from a text is about 85%
to 90%. It can be enhanced up to 95% or even more by improving the
algorithms and inducing the ability of learning in the system.

8. REFERENCES
[1] - Andrake M.A. and Bork P. “Automated Extraction of Information in
  Molecular Biology” FEBS Letters, pp. 12-17, 2000.
[2] -Blaschke C, Andrade M. A, Ouzounis C. and Valencia,A. “Automatic
  extraction of biological information from scientific text: protein–protein
  interactions”. Ismb, pp. 60–67, 1999.
[3] - C. A. Thompson, R. J. Mooney and L. R. Tang, “Learning to parse natural
  language database queries into logical form”, in: Workshop on Automata
  Induction, Grammatical Inference and Language Acquisition, 1997.
[4] - Pustejovsky J, Castaño J., Zhang J, Kotecki M, Cochran B. “Robust relational
  parsing over biomedical literature: Extracting inhibit relations”. Pac. Symp.
  Biocomput, 362-73, 2002.
[5] - Krovetz, R., & Croft, W. B. “Lexical ambiguity and information retrieval”,
  ACM Transactions on Information Systems, 10, pp. 115–141, 1992.
[6] - Losee, R. M. “Parameter estimation for probabilistic document retrieval
  models”. Journal of the American Society for Information Science, 39(1), pp. 8–
  16, 1988.
[7] - Partee, B. H., Meulen, A. t., &Wall, R. E. “Mathematical Methods in
  Linguistics”. Kluwer, Dordrecht, The Netherlands. 1999.
[8] - Salton, G., & McGill, M. “Introduction to Modern Information Retrieval”
  McGraw-Hill, New York., 1995
[9] - Maron, M. E. & Kuhns, J. L. “On relevance, probabilistic indexing, and
  information retrieval” Journal of the ACM, 7, 216–244, 1997.
[10] - Chomsky, N. “Aspects of the Theory of Syntax. MIT Press, Cambridge,
  Mass, 1965.
[11] - Chow, C., & Liu, C. “Approximating discrete probability distributions with
  dependence trees”. IEEE Transactions on Information Theory, IT-14(3), 462–
  467, 1968.
[12] - C. A. Thompson, R. J. Mooney and L. R. Tang, Learning to parse natural language
  database queries into logical form, Workshop on Automata Induction, Grammatical
  Inference and Language Acquisition (1997).
[13] - Taffet, Mary D. (2001). GENTECH 2001 Scholarship Proposal: Automatic Tagging
  of Genealogical Data to Enhance Web-based Retrieval. Available at:
  <http://web.syr.edu/~mdtaffet/GENTECH_Scholarship_Proposal.htm>.
[14] - Nerbonne, John (2003): "Natural language processing in computer-assisted language
  learning". In: Mitkov, R. (ed.): The Oxford Handbook of Computational Linguistics.
  Oxford: 670-698.
[15] - Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis for language learning
  systems". Proceedings of NLP+IA'98 1: 526-530.
[16] - Etchegoyhen, Thierry/ Wehrle, Thomas (1998): "Overview of GBGen: A large-scale,
  domain independent syntactic generator". In: Hovy, Eduard (ed.): Proceedings of the 9th
  International Workshop on Natural Language Generation. Niagara-on-the-Lake: 288-291.
[17] - Granger, Sylviane (2003): "Error-tagged Learner Corpora and CALL: A Promising
  Synergy". CALICO Journal 20/3: 465-480.
[18] - Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis for language learning
  systems". Proceedings of NLP+IA'98 1: 526-530. Anne Vandeventer Faltin: Natural
  Language Tools for CALL 153
[19] - Nerbonne, John (2003): "Natural language processing in computer-assisted language
  learning". In: Mitkov, R. (ed.): The Oxford Handbook of Computational Linguistics.
  Oxford: 670-698.

More Related Content

What's hot

IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET -  	  A Review on Chatbot Design and Implementation TechniquesIRJET -  	  A Review on Chatbot Design and Implementation Techniques
IRJET - A Review on Chatbot Design and Implementation TechniquesIRJET Journal
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
Object-Oriented Design
Object-Oriented DesignObject-Oriented Design
Object-Oriented Designadil raja
 
Intelligent personal assistants
Intelligent personal assistantsIntelligent personal assistants
Intelligent personal assistantsFabiolaPanetti
 
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACEINTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACEMohamed Reda
 
IRJET- Chatbot using NLP and Deep Learning
IRJET-  	  Chatbot using NLP and Deep LearningIRJET-  	  Chatbot using NLP and Deep Learning
IRJET- Chatbot using NLP and Deep LearningIRJET Journal
 
IRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep LearningIRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep LearningIRJET Journal
 
NL Interface for Database - EJSR 20(4)
NL Interface for Database - EJSR 20(4)NL Interface for Database - EJSR 20(4)
NL Interface for Database - EJSR 20(4)IT Industry
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
 
Speaker independent visual lip activity detection for human - computer inte...
Speaker   independent visual lip activity detection for human - computer inte...Speaker   independent visual lip activity detection for human - computer inte...
Speaker independent visual lip activity detection for human - computer inte...eSAT Journals
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
The upsurge of deep learning for computer vision applications
The upsurge of deep learning for computer vision applicationsThe upsurge of deep learning for computer vision applications
The upsurge of deep learning for computer vision applicationsIJECEIAES
 
A Metamodel and Graphical Syntax for NS-2 Programing
A Metamodel and Graphical Syntax for NS-2 ProgramingA Metamodel and Graphical Syntax for NS-2 Programing
A Metamodel and Graphical Syntax for NS-2 ProgramingEditor IJCATR
 
Senjuti Kundu - Resume
Senjuti Kundu - ResumeSenjuti Kundu - Resume
Senjuti Kundu - ResumeSenjuti Kundu
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi LanguageIRJET Journal
 
Curriculumn Viate Educationshakti
Curriculumn Viate EducationshaktiCurriculumn Viate Educationshakti
Curriculumn Viate EducationshaktiShakti Singh
 
VTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesVTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesathiathi3
 

What's hot (20)

IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET -  	  A Review on Chatbot Design and Implementation TechniquesIRJET -  	  A Review on Chatbot Design and Implementation Techniques
IRJET - A Review on Chatbot Design and Implementation Techniques
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
Object-Oriented Design
Object-Oriented DesignObject-Oriented Design
Object-Oriented Design
 
Intelligent personal assistants
Intelligent personal assistantsIntelligent personal assistants
Intelligent personal assistants
 
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACEINTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACE
 
resume -1-
resume -1-resume -1-
resume -1-
 
IRJET- Chatbot using NLP and Deep Learning
IRJET-  	  Chatbot using NLP and Deep LearningIRJET-  	  Chatbot using NLP and Deep Learning
IRJET- Chatbot using NLP and Deep Learning
 
IRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep LearningIRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep Learning
 
NL Interface for Database - EJSR 20(4)
NL Interface for Database - EJSR 20(4)NL Interface for Database - EJSR 20(4)
NL Interface for Database - EJSR 20(4)
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
Speaker independent visual lip activity detection for human - computer inte...
Speaker   independent visual lip activity detection for human - computer inte...Speaker   independent visual lip activity detection for human - computer inte...
Speaker independent visual lip activity detection for human - computer inte...
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
The upsurge of deep learning for computer vision applications
The upsurge of deep learning for computer vision applicationsThe upsurge of deep learning for computer vision applications
The upsurge of deep learning for computer vision applications
 
A Metamodel and Graphical Syntax for NS-2 Programing
A Metamodel and Graphical Syntax for NS-2 ProgramingA Metamodel and Graphical Syntax for NS-2 Programing
A Metamodel and Graphical Syntax for NS-2 Programing
 
Senjuti Kundu - Resume
Senjuti Kundu - ResumeSenjuti Kundu - Resume
Senjuti Kundu - Resume
 
C++ notes.pdf
C++ notes.pdfC++ notes.pdf
C++ notes.pdf
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi Language
 
CV_Khalid - TI
CV_Khalid - TICV_Khalid - TI
CV_Khalid - TI
 
Curriculumn Viate Educationshakti
Curriculumn Viate EducationshaktiCurriculumn Viate Educationshakti
Curriculumn Viate Educationshakti
 
VTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pagesVTU FINAL YEAR PROJECT REPORT Front pages
VTU FINAL YEAR PROJECT REPORT Front pages
 

Similar to Domain Specific Terminology Extraction (ICICT 2006)

NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)IT Industry
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESLinda Garcia
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficultiesijtsrd
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 
Exploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language textExploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language texteSAT Journals
 
Syracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docxSyracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docxdeanmtaylor1545
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challengesantonellarose
 
Teachbot teaching robot_using_artificial
Teachbot teaching robot_using_artificialTeachbot teaching robot_using_artificial
Teachbot teaching robot_using_artificialCamillaTonanzi
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMijcsa
 
Software feasibility study to
Software feasibility study toSoftware feasibility study to
Software feasibility study toijdms
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingConstantin Orasan
 
Natural language processing and sanskrit
Natural language processing and sanskritNatural language processing and sanskrit
Natural language processing and sanskritIAEME Publication
 
Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 

Similar to Domain Specific Terminology Extraction (ICICT 2006) (20)

NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficulties
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
NLPinAAC
NLPinAACNLPinAAC
NLPinAAC
 
Exploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language textExploiting rules for resolving ambiguity in marathi language text
Exploiting rules for resolving ambiguity in marathi language text
 
Syracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docxSyracuse UniversitySURFACEThe School of Information Studie.docx
Syracuse UniversitySURFACEThe School of Information Studie.docx
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
 
Teachbot teaching robot_using_artificial
Teachbot teaching robot_using_artificialTeachbot teaching robot_using_artificial
Teachbot teaching robot_using_artificial
 
Nlp
NlpNlp
Nlp
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
Software feasibility study to
Software feasibility study toSoftware feasibility study to
Software feasibility study to
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
 
Applsci 09-02758
Applsci 09-02758Applsci 09-02758
Applsci 09-02758
 
Natural language processing and sanskrit
Natural language processing and sanskritNatural language processing and sanskrit
Natural language processing and sanskrit
 
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
 
Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

More from IT Industry

The News Today 24 (https://thenewstoday24.com/)
The News Today 24 (https://thenewstoday24.com/)The News Today 24 (https://thenewstoday24.com/)
The News Today 24 (https://thenewstoday24.com/)IT Industry
 
Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)IT Industry
 
Virtual Telemedicine (IJITWE 5(1))
Virtual Telemedicine (IJITWE 5(1))Virtual Telemedicine (IJITWE 5(1))
Virtual Telemedicine (IJITWE 5(1))IT Industry
 
NL to OCL Transformation (EDOC 2010)
NL to OCL Transformation (EDOC 2010)NL to OCL Transformation (EDOC 2010)
NL to OCL Transformation (EDOC 2010)IT Industry
 
BPM & SOA for Small Business Enterprises (ICIME 2009)
BPM & SOA for Small Business Enterprises (ICIME 2009)BPM & SOA for Small Business Enterprises (ICIME 2009)
BPM & SOA for Small Business Enterprises (ICIME 2009)IT Industry
 
Web Layout Mining - JECS 29(2)
Web Layout Mining - JECS 29(2)Web Layout Mining - JECS 29(2)
Web Layout Mining - JECS 29(2)IT Industry
 
Web User Forms (ICOMMS 2006)
Web User Forms (ICOMMS 2006)Web User Forms (ICOMMS 2006)
Web User Forms (ICOMMS 2006)IT Industry
 
Image Classification (icast 2006)
Image Classification  (icast 2006)Image Classification  (icast 2006)
Image Classification (icast 2006)IT Industry
 
Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)IT Industry
 
GIS for Quetta (ICAST 2006)
GIS for Quetta (ICAST 2006)GIS for Quetta (ICAST 2006)
GIS for Quetta (ICAST 2006)IT Industry
 
Web Layout Generation (IC-SCCE 2006)
Web Layout Generation (IC-SCCE 2006)Web Layout Generation (IC-SCCE 2006)
Web Layout Generation (IC-SCCE 2006)IT Industry
 
PCA Clouds (ICET 2005)
PCA Clouds (ICET 2005)PCA Clouds (ICET 2005)
PCA Clouds (ICET 2005)IT Industry
 
Feature Based Image Classification by using Principal Component Analysis
Feature Based Image Classification by using Principal Component AnalysisFeature Based Image Classification by using Principal Component Analysis
Feature Based Image Classification by using Principal Component AnalysisIT Industry
 

More from IT Industry (13)

The News Today 24 (https://thenewstoday24.com/)
The News Today 24 (https://thenewstoday24.com/)The News Today 24 (https://thenewstoday24.com/)
The News Today 24 (https://thenewstoday24.com/)
 
Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)
 
Virtual Telemedicine (IJITWE 5(1))
Virtual Telemedicine (IJITWE 5(1))Virtual Telemedicine (IJITWE 5(1))
Virtual Telemedicine (IJITWE 5(1))
 
NL to OCL Transformation (EDOC 2010)
NL to OCL Transformation (EDOC 2010)NL to OCL Transformation (EDOC 2010)
NL to OCL Transformation (EDOC 2010)
 
BPM & SOA for Small Business Enterprises (ICIME 2009)
BPM & SOA for Small Business Enterprises (ICIME 2009)BPM & SOA for Small Business Enterprises (ICIME 2009)
BPM & SOA for Small Business Enterprises (ICIME 2009)
 
Web Layout Mining - JECS 29(2)
Web Layout Mining - JECS 29(2)Web Layout Mining - JECS 29(2)
Web Layout Mining - JECS 29(2)
 
Web User Forms (ICOMMS 2006)
Web User Forms (ICOMMS 2006)Web User Forms (ICOMMS 2006)
Web User Forms (ICOMMS 2006)
 
Image Classification (icast 2006)
Image Classification  (icast 2006)Image Classification  (icast 2006)
Image Classification (icast 2006)
 
Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)
 
GIS for Quetta (ICAST 2006)
GIS for Quetta (ICAST 2006)GIS for Quetta (ICAST 2006)
GIS for Quetta (ICAST 2006)
 
Web Layout Generation (IC-SCCE 2006)
Web Layout Generation (IC-SCCE 2006)Web Layout Generation (IC-SCCE 2006)
Web Layout Generation (IC-SCCE 2006)
 
PCA Clouds (ICET 2005)
PCA Clouds (ICET 2005)PCA Clouds (ICET 2005)
PCA Clouds (ICET 2005)
 
Feature Based Image Classification by using Principal Component Analysis
Feature Based Image Classification by using Principal Component AnalysisFeature Based Image Classification by using Principal Component Analysis
Feature Based Image Classification by using Principal Component Analysis
 

Domain Specific Terminology Extraction (ICICT 2006)

  • 1. AUTOMATIC DOMAIN SPECIFIC TERMINOLOGY EXTRACTION USING A DECISION SUPPORT SYSTEM Imran Sarwar Bajwa1 , Imran Siddique2 and M. Abbas Choudhary3 1 Department of Computer Science and Information Technology The Islamia University of Bahawalpur, Pakistan Phone: +92 (62) 9255466 E-mail: imransbajwa@yahoo.com 2 Facuty of Computer an EmergingSciences 3 Balochistan University of Information Technology and Management Sciences Quetta, Pakistan. Phone: +92 (81) 9202463 Fax: +92 (81) 9201064 E-mail: imran@buitms.edu.pk E-mail:abbas@buitms.edu.pk Abstract: Speech languages or Natural languages contents are major tools of communication. This research paper presents a natural language processing based automated system for understanding speech language text. A new rule based model is presented for analyzing the natural languages and extracting the relative meanings from the given text. User writes the natural language scenario in simple English in a few paragraphs and the designed system has an obvious capability of analyzing the given script by the user. After composite analysis and extraction of associated information, the designed system gives particular meanings to an assortment of speech language text on the basis of its context. The designed system uses standard speech language rules that are clearly defined for all speech languages as
  • 2. English, Urdu, Chinese, Arabic, French, etc. The designed system provides a quick and reliable way to comprehend speech language context and generates respective meanings. The application with such abilities can be more intelligent and pertinent specifically for the user to save the time. Keywords: Terminology Extraction, Automatic text understanding, Speech language processing, Information extraction 1. INTRODUCTION In daily life, natural languages or speech languages are main source of commutation. Particular and typical set of words and vocabulary collections are used for each field of life and they are quiet exclusive to their use. In computer science, Natural Language Processing (NLP) is the field used for the analysis of speech language text [4]. NLP introduces many new concepts and new terminologies to describe its models, techniques and processes. Some of these terms come directly from linguistics and the science of perception, while others were invented to describe discoveries that did not fit into any previous category [3]. Natural Language understanding is a hefty field to grasp. Understanding and comprehending a speech language means to know that what concepts a word or a particular phrase stands under a particular perspective. To give meanings to a particular sentence a system should know how to link the concepts together in a meaningful way. It's satirical that the speech languages are easiest for human beings in terms of learning, understanding and using, on the other hand hardest for a computer to understand. In modern era, computer machines have attained the ability of solving complex mathematical and statistical problems with speed and grace but still they are inefficient to comprehend the basics of spoken and written languages. The designed automated system can be useful in various business and technical software by only acquiring the requirements from the user. The designed system will extract the required information from the given text and provide to the computer for further automated processing. Applications can be automated generation of UML diagrams, query processing, web mining, web template designing, user interface designing, etc. As far as this software is concerns the time, it takes to
  • 3. explore all the facilities and services, should is far less than a minute and this information is adequately useful for the engineers and computer users. 2. RELATED WORK Natural languages have been an area of interest from last one century. In the lat nineteen sixties and seventies, so many researchers as Noam Chomsky (1965) [9], Maron, M. E. and Kuhns, J. L (1960) [10], Chow, C., & Liu, C (1968) [11] contributed in the area of information retrieval from natural languages. They contributed for analysis and understanding of the natural languages, but still there was lot of effort required for better understanding and analysis. Some authors concentrated in this area in eighties and nineties as Losee, R. M (1998) [6], Salton, G., & McGill, M (1995) [8], Maron, M. E. & Kuhns, J. (1997) [9]. These authors worked for lexical ambiguity [5]and information retrieval [7], probabilistic indexing [9], data bases handling [3] and so many other related areas. In this research paper, a newly designed rule based framework has been proposed that is able to read the English language text and extract its meanings after analyzing and extracting related information. The conducted research provides a robust solution to the addressed problem. The functionality of the conducted research was domain specific but it can be enhanced easily in the future according to the requirements. Current designed system incorporate the capability of mapping user requirements after reading the given requirements in plain text and drawing the set of speech language contents. 3. SPEECH LANGUAGE ANALYSIS The understanding and multi-aspect processing of the natural languages that are also termed as "speech languages", is actually one of the arguments of greater interest in the field artificial intelligence field [6]. The natural languages are irregular and asymmetrical. Traditionally, natural languages are based on un-formal grammars. There are the geographical, psychological and sociological factors which influence the behaviours of natural languages [17]. There are undefined set of words and they also change and vary area to area and time to time.Due to these variations and inconsistencies, the natural languages have different flavours as English
  • 4. language has more than half dozen renowned flavours all over the world. These flavours have different accents, set of vocabularies and phonological aspects. These ominous and menacing discrepancies and inconsistencies in natural languages make it a difficult task to process them as compared to the formal languages [13]. In the process of analyzing and understanding the natural languages, various problems are usually faced by the researchers. The problems connected to the greater complexity of the natural language are verb’s conjugation, verbal inflexion & paradigms, active & passive vice, lexical amplitude, problem of ambiguity, etc. All these problems are significant to handle but problem of ambiguity is difficult to handle. Ambiguity could be easily solved at the syntax and semantic level by using a sound and robust rule-based system. During the interpretation process of the English language text, the text is distinguished in four classes to handle the ambiguity as: lexical analysis, syntactic analysis, semantic analysis and pragmatic analysis. Such distinction is not exhaustive but it is useful to focus on the problem and for practical purposes [7]. 3.1. Lexical Analysis: The problem of words written or pronounced not correctly is omitted in this problem scenario. These kinds of errors are simply solvable through the comparison with the expressions contained in a dictionary. Lexical ambiguity is created when a same word assumes various meanings [3]. In this case that ambiguity is generated from the fact that which meanings will be incorporated in which scenario. As an example we consider the "cold" adjective in the following sentences as "That room is cold." and "That person is cold." It turns out obvious that the same "cold" adjective assumes, in the two phrases different meanings. In the first sentence it indicate a temperature, in the second one a particular character of a person. 3.2. Syntactic Analysis: Syntax analysis is performed on world level to recognize the word category. The syntactic analysis of the programs would have to be in a position to isolate subject, verbs, objects and various complements. It is little complex procedure. For example, "Imran eats the apple." In this
  • 5. example, the actual meanings are that “Mario eats apple” but ambiguity can be asserted if this sentence is conceived as “the apple eats Mario”. 3.3. Semantic Analysis: To analyze a phrase from the semantic point of view means to give it a meaning. This should let you understand we arrived to a crucial point. Semantic ambiguities are most common due to the fact that generally a computer is not in a position to distinguish the logical situations as "The bus hit the pole while it was moving." All of us would surely interpret the phrase like "The car, while moving, hit the pole.", while nobody would be dreamed to attribute to the sentence the meant "the car hit the pole while the pole was moving ". 3.4. Pragmatic Analysis: Pragmatic ambiguities born when the communication happens between two persons who do not share the same context. As following example as "I will arrive to the airport at 8 o'clock". In this example, if the subject person belongs to a different continent, the meanings can be totally changed. 4. DESIGNED SYSTEM ARCHITECTURE The designed speech language contents system has ability to draw speech language contents after reading the text scenario provided by the user. This system draws in five modules: Text input acquisition, text understanding, knowledge extraction, generation of speech language contents and finally multi-lingual code generation as shown in following fig. 4.1. Text input acquisition This module helps to acquire input text scenario. User provides the business scenario in from of paragraphs of the text. This module reads the input text in the form characters and generates the words by concatenating the input characters. This module is the implementation of the lexical phase. Lexicons and tokens are generated in this module.
  • 6. 4.2. Contents Interpretation This module reads the input from module 1 in the form of words. These words are categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives, prepositions, conjunctions, etc. Text Required Text Terms Document Terms Text Input Acquisition Understanding Text Contents Interpretation Analyzing Terms Knowledge Retrieval Matching Terms Domain Knowledge Terminology Extraction Required Terms Output Figure 1: Architecture of the designed Decision Support System for Domain Specific Terminology Extraction 4.3. Knowledge Retrieval This module, extracts different objects and classes and their respective attributes on the basses of the input provided by the preceding module. Nouns are symbolized as classes and objects and their associated attributes are termed as attributes.
  • 7. 4.4. Terminology Extraction This module finally uses speech language contents symbols to constitute the various speech language contents by combining available symbols according to the information extracted of the previous module. 5. Used Methodology Conventional natural language processing based systems user rule based systems. Agents are another way to address this problem [8]. In the research, a rule-based algorithm has been used which has robust ability to read, understand and extract the desired information. First of all basic elements of the language grammar are extracted as verbs, nouns, adjectives, etc then on the basis of this extracted information further processing is performed. In linguistic terms, verbs often specify actions, and noun phrases the objects that participate in the action [16]. Each noun phrase's then role specifies how the object participates in the action. As in the following example: "Ahmed hit a ball with a racket." A procedure that understands such a sentence must discover that Role is the agent because he performs the action of hitting, that the ball as the thematic object because it is the object hit, and that the racket is an instrument because it is the tool with which hitting is done. Thus, sentence analysis requires, in part, the answers to these actions: The number of thematic roles embraced by various theories varies probably. Some people use about half- dozen thematic roles [9]. Others use more times as many. The exact number does not matter much, as long as they will great enough to expose natural constraints on how verbs and thematic-instances form sentences. Agent: The agent causes the action to occur as in "Ahmed hit the ball," Ahmed is agent who performs the task. But in this example a passive sentence, the agent also may appear in a prepositional "The ball was hit by Ahmed.'' Co-agent: The word with may introduce a join phrase that serves a partner in the principal agent. They two carry out the action together "Ahmed played tennis with Ali."
  • 8. Beneficiary: The beneficiary is the person for whom an action has bee performed: "Ahmed bought the balls for Ali." In this sentence Ali is beneficiary. Thematic object: The thematic object is the object the sentence is really all about— typically the object, undergoing a change. Often the thematic object is the same as the syntactic direct object, as "Ahmed hit the ball." Here the ball is thematic object. Conveyance: The conveyance is something in which or on which travels: “Aslam always goes by train.” Trajectory: Motion from source to destination takes place over at trajectory. ID contrast to the other role possibilities, several prepositions can serve to introduce trajectory noun phrases: "Ahmed and Aslam went in through the front door." Location: The location is where an action occurs. As in the trajectory role, several prepositions are possible, which conveys meanings in addition to serve as a signal that a location noun phrase is "Ahmed and Ali studied in the library, at a desk, by the wall, a picture, near the door." Time: Time specifies when an action occurs. Prepositions such at, before, and after introduce noun phrases serving as time role fill "Ahmed and Ali left before noon." Duration: Duration specifies how long an action takes. Preposition such as for indicate duration. "Ali and Zia jogged for an hour.” 6. CONCLUSION The designed system for Automatic Domain Specific Terminology Extraction using a Decision Support System is a robust framework for inferring the appropriate meanings from a given text. The accomplished research is related to the understanding of the human languages. Human being needs specific linguistic knowledge generating and understanding speech language contents. It is difficult for computers to perform this task. The speech language context understanding using a rule based framework has ability to read user provided text, extract related information and ultimately give meanings to the extracted contents. The designed system
  • 9. very effective and have high accuracy up to 90 %. The designed system depicts the meanings of the given sentence or paragraph efficiently. An elegant graphical user interface has also been provided to the user for entering the Input scenario in a proper way and generating speech language contents. 7. FUTURE WORK The designed system analyzes and extracts the contents of a speech language as English. Current designed system only works for the active- vice sentences. Passive-vice sentences are still to work for to make the system more efficient and effective for various business applications. There is also periphery of improvements in the algorithms for the better extraction of language parts and understanding the meanings of the speech language context. Current accuracy of generating meanings from a text is about 85% to 90%. It can be enhanced up to 95% or even more by improving the algorithms and inducing the ability of learning in the system. 8. REFERENCES [1] - Andrake M.A. and Bork P. “Automated Extraction of Information in Molecular Biology” FEBS Letters, pp. 12-17, 2000. [2] -Blaschke C, Andrade M. A, Ouzounis C. and Valencia,A. “Automatic extraction of biological information from scientific text: protein–protein interactions”. Ismb, pp. 60–67, 1999. [3] - C. A. Thompson, R. J. Mooney and L. R. Tang, “Learning to parse natural language database queries into logical form”, in: Workshop on Automata Induction, Grammatical Inference and Language Acquisition, 1997. [4] - Pustejovsky J, Castaño J., Zhang J, Kotecki M, Cochran B. “Robust relational parsing over biomedical literature: Extracting inhibit relations”. Pac. Symp. Biocomput, 362-73, 2002. [5] - Krovetz, R., & Croft, W. B. “Lexical ambiguity and information retrieval”, ACM Transactions on Information Systems, 10, pp. 115–141, 1992. [6] - Losee, R. M. “Parameter estimation for probabilistic document retrieval models”. Journal of the American Society for Information Science, 39(1), pp. 8– 16, 1988.
  • 10. [7] - Partee, B. H., Meulen, A. t., &Wall, R. E. “Mathematical Methods in Linguistics”. Kluwer, Dordrecht, The Netherlands. 1999. [8] - Salton, G., & McGill, M. “Introduction to Modern Information Retrieval” McGraw-Hill, New York., 1995 [9] - Maron, M. E. & Kuhns, J. L. “On relevance, probabilistic indexing, and information retrieval” Journal of the ACM, 7, 216–244, 1997. [10] - Chomsky, N. “Aspects of the Theory of Syntax. MIT Press, Cambridge, Mass, 1965. [11] - Chow, C., & Liu, C. “Approximating discrete probability distributions with dependence trees”. IEEE Transactions on Information Theory, IT-14(3), 462– 467, 1968. [12] - C. A. Thompson, R. J. Mooney and L. R. Tang, Learning to parse natural language database queries into logical form, Workshop on Automata Induction, Grammatical Inference and Language Acquisition (1997). [13] - Taffet, Mary D. (2001). GENTECH 2001 Scholarship Proposal: Automatic Tagging of Genealogical Data to Enhance Web-based Retrieval. Available at: <http://web.syr.edu/~mdtaffet/GENTECH_Scholarship_Proposal.htm>. [14] - Nerbonne, John (2003): "Natural language processing in computer-assisted language learning". In: Mitkov, R. (ed.): The Oxford Handbook of Computational Linguistics. Oxford: 670-698. [15] - Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis for language learning systems". Proceedings of NLP+IA'98 1: 526-530. [16] - Etchegoyhen, Thierry/ Wehrle, Thomas (1998): "Overview of GBGen: A large-scale, domain independent syntactic generator". In: Hovy, Eduard (ed.): Proceedings of the 9th International Workshop on Natural Language Generation. Niagara-on-the-Lake: 288-291. [17] - Granger, Sylviane (2003): "Error-tagged Learner Corpora and CALL: A Promising Synergy". CALICO Journal 20/3: 465-480. [18] - Menzel, Wolfgang/Schröder, Ingo (1998): "Error diagnosis for language learning systems". Proceedings of NLP+IA'98 1: 526-530. Anne Vandeventer Faltin: Natural Language Tools for CALL 153 [19] - Nerbonne, John (2003): "Natural language processing in computer-assisted language learning". In: Mitkov, R. (ed.): The Oxford Handbook of Computational Linguistics. Oxford: 670-698.