• Save
An Intuitive Natural Language Understanding System
Upcoming SlideShare
Loading in...5
×
 

An Intuitive Natural Language Understanding System

on

  • 4,729 views

Vanitadevi Patil

Vanitadevi Patil
Sapthagiri College of Engineering, Department of Computer Science and Engineering, Bangalore, INDIA

Statistics

Views

Total Views
4,729
Views on SlideShare
4,711
Embed Views
18

Actions

Likes
6
Downloads
0
Comments
0

5 Embeds 18

http://lakshmansrikanth.blogspot.com 12
http://www.slideshare.net 3
http://www.instac.es 1
http://www.lakshmansrikanth.blogspot.com 1
http://66.102.9.104 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An Intuitive Natural Language Understanding System An Intuitive Natural Language Understanding System Presentation Transcript

  • An Intuitive Natural Language Understanding System Vanitadevi Patil
  • Contents
    • Introduction
    • Methodology
    • Results
    • Conclusion
  • What is NLP?
      • The process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation.
  • Components of NLP
    • Natural Language Understanding
    • 1. Mapping the given input in the natural language into a
    • useful representation.
    • 2. Different level of analysis required:
      • - morphological analysis, syntactic analysis, semantic
      • analysis, discourse analysis , …
    • Natural Language Generation
    • 1. Producing output in the natural language from some internal
    • representation.
    • 2. Different level of synthesis required:
        • deep planning (what to say),
        • syntactic generation
  • Introduction
    • The research goal
      • To capture the meaning of natural language understanding system
    • Motivation
    • - To understand human language being one of the key steps in the understanding of human intelligence
  • Knowledge of Language
    • Morphology - how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language
    • Syntax - how words are combined to produce correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of what other phrases
    • Semantics - what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning
    • Word Knowledge - study of background knowledge used to actually understand the meaning of utterances.
  • Methodology
    • Implementation of six modules to analyze and process English sentences.
    • - dealing with right set of words that could generate meaning of the whole sentence
    • - extracting the right command to be executed by the system
    • Interaction with the shell is in natural language English.
  • Modules in building up NLU
    • The Input Preprocessor
    • Morphological Analysis
    • Synonym Matching
    • Syntax Analysis
    • Semantic Analysis
    • Knowledge Base
  • The Input Preprocessor
    • Removal of special characters
    • - question marks, apostrophes and full stops except dot in the filenames having the dot extension
    • For example:
    • Input - Can you show me the file Test.txt ?
    • Output - Can you show me the file Test.txt
  • The Input Preprocessor
    • Locating the constants
    • - file or directory names compared against the lexical productions of the grammar
    • - replaced with keyword CONST
    • For example:
    • Input - Can you show me the file Test.txt
    • Output - Can you show me the file CONST
    • Unrecognized Constant list = { Test.txt }
  • The Input Preprocessor
    • Changing the case
    • - All those words that are not tagged as Constants are converted to lower case.
    • For example:
    • Input - Can you show me the file Test.txt
    • Output - can you show me the file CONST
    • Unrecognized Constant list: { Test.txt }
  • Morphological Analysis
    • The stemmer checks individual words from input sentence
    • Replaces with basic stem by consulting a Morph Table which is a collection of various suffix forms of a particular word mapped to the basic stem.
    • Words marked as CONST are also checked in morph table, using the Unrecognized Constant list and the Morph Table.
    • If the word in Unrecognized Constant list has a basic stem form, then it replaces the CONST in the sentence and that particular word is removed from Unrecognized Constant list.
  • Morphological Analysis
    • For example
    • At Constant-Locator function
    • Input : i want the file Abc.txt to be displayed
    • Output : i want the file CONST to be CONST
    • Unrecognized Constant list : { Abc.txt, displayed }
    • At Morphological Stemmer
    • Output : i want the file CONST to be display
    • Unrecognized Constant list : { Abc.txt }
  • Synonym Matching
    • Input sentence containing different words whose stems have same meaning for the system are replaced with their synonyms before sending to the parser to cater to fewer words in the knowledge base.
    • For example, each occurrence of the words show, display, view, list, see in the context of the system, mean the same as show.
    • Synonym replacement is done by maintaining a wordlist containing list of words mapped to their appropriate synonym and is then recognized in the knowledge base.
  • Synonym Matching
    • For example:
    • Input : i want the file CONST to be display
    • Output : i want the file CONST to be show
  • Syntax Analysis
    • The representation of the sentence in some structured form helps in precisely locating the desired information in the sentence
    • A context-free grammar (CFG) being a classical method for constituent structure in English is designed covering the major constructions of natural language for our domain-specific application
  • Context-free Grammars
    • The productions of CFG are presented into two groups
    • Grammatical productions
    • - The labels are all nonterminal symbols which are the grammatical categories to label constituents
    • Lexical productions
    • - Terminal symbols are the words whose syntactic behavior is defined by the grammar.
  • A simple CFG
    • Grammatical Productions Lexical Productions
    • imp vp det the
    • vp op pp predicate date
    • pp np det ap op show
    • ap predicate pron me
    • np pron
    • A sample grammar that deals with imperative type of sentence
    • imp is the special start symbol
  • Different sentence structures
    • CFG is designed for each of the four types of sentence structures
    • - declarative structure
    • - imperative structure
    • - yes-no question structure
    • - wh-question structure
  • Declarative and Imperative sentences
    • Declarative sentences have a subject noun phrase followed by a verb phrase.
    • For example: I want to know today’s date
    • Imperative sentences often begin with a verb phrase, and have no subject.
    • For example: Show me today’s date
  • Yes-no-question and wh-question sentences
    • Yes-no-question sentences begin with an auxiliary verb, followed by a subject noun phrase, followed by a verb phrase. May be considered a command or a suggestion.
    • For example: Can you show me today’s date?
    • Wh-question sentences are generally questions that begin with a wh-word, like what, which and where.
    • For example: What is today’s date?
  • Chart parser
    • Employs a top-down approach using dynamic programming technique.
    • Selected grammar is taken and the sentence is parsed using the listed lexical and grammatical productions
    • Appropriate grammar is chosen by looking into the first word in the sentence
  • Chart parser
    • For example
    • - sentences starting with a wh-determiner like what, which, are mapped on to the whq-grammar
    • - while sentences that start with words like can, may , are mapped to yes-or-no-grammar.
    • If the first word of the sentence is a verb (an operation), then imperative-grammar is chosen.
    • If it is a pronoun, particularly I , the declarative-grammar is chosen.
  • Chart parser
    • Parse tree the output of the parser is obtained
    • It is retrieved as a list where individual words are tagged according to their respective word class
  • Chart parser
    • Predicates, Constants, Operations, along with the other English word classes like Prepositions, Conjunctions, Pronouns, Determiners, Auxiliary Verbs, etc are extracted.
    • Among the set of tagged words, the word classes Predicates, Constants, Operations and Prepositions are maintained in global lists.
  • Word classes
    • Words in predicates category referring to the noun word class are mainly used in the knowledge base to arrive at the intended UNIX command.
    • Constants are used to indicate the identifiers to be used with the UNIX commands, like file or directory names and also contain words that refer to identifiers, like, this directory, present directory, all files.
    • Constants are also used to associate with the appropriate UNIX command options.
    • Operations help in determining the job that is to be performed.
  • Word classes
    • An example of a declarative structure sentence.
    • Input Sentence : I want to see the contents of file Test.txt
    • Preprocessed Output: i want to show the contents of file CONST
    • Predicate List: [ ‘contents’, ‘file’ ]
    • Operator List: [ ‘ want’, ‘show’ ]
    • Prepositions List: [ ‘ to’, ’of’ ]
    • Constant List: [ ‘ Test.txt’ ]
  • Parse tree as a list
    • (declstat:
    • (decl:
    • (np: (pron: < i >))
    • (vp:
    • (op: < want >)
    • (pp:
    • (prep: < to >)
    • (vp: (op: < show >) (pp: ) (np: ) (aux: ) (pp: )))
    • (np: (det: < the >) (ap: (predicate: < contents >)))
    • (aux: )
    • (pp:
    • (prep: < of >)
    • (np: (det: ) (ap: (predicate: < file >) (constant: < CONST >)))))
    • (pp: )))
  • Semantic analysis
    • Predicates, constants and operations extracted from the sentence are mainly responsible for finding out the suitable command at this stage.
    • Once intended semantics are known, they are mapped to the right command.
  • Knowledge Base
    • A consistent knowledge base is maintained once conditioned input is obtained in the synonym replacement module.
    • Each predicate referring to a particular chief predicate is found, which is used for further probing into what could be the expected command.
  • Knowledge Base
    • For instance, each of the predicates , day, date, month, year, time indicate that the date command is to be executed. Hence the chief predicate is date .
    • Then, all chief predicates are used to find what category of command is intended.
  • Knowledge Base
    • Based on various command categories six primary tables are maintained, namely, directorykb, filekb, datekb, userkb, hostkb, sizekb that store information about the different commands and their options.
    • The chief predicate so chosen is used to select one among these tables for the required command category that the user input sentence would map on to.
    • Thus these tables provide the basic command that must be executed in response to the user input.
  • Priority of chief predicates
    • Predicates are prioritized so that directory related predicates are given the highest priority while predicates referring to size are given the lowest priority.
    • This is done to find relationship between the various predicates present in the predicate list.
  • Priority of chief predicates
    • Increasing order of priority of chief predicates
    • DIRECTORY
    • FILE
    • DATE
    • HOST
    • USER
    • SIZE
  • Predicates, Operations and Constants
    • The highest priority chief predicate indicates the category of command that the sentence indicates.
    • The occurrence of specific predicates in the relevant category are enough to determine the command to be selected.
    • If only the predicates are not sufficient, the Operations can be used to decide the right command.
  • Predicates, Operations and Constants
    • Having selected the command, the Constants serve the purpose of providing arguments to the command.
    • Also, in several cases, they give additional specification for the command, and help in choosing the appropriate options to supplement the command.
  • Example
    • Input: Show me the size of the file Test.txt in bytes
    • Output:
    • Chief Predicate: file
    • Constants: Test.txt, bytes
    • Command chosen using the chief predicate: du
    • Option chosen using the constant: -b
    • Whole command: du –b Test.txt
  • Results
    • A black-box approach through observation is used for evaluating the response quality of the system.
    • It requires a set of sentences that can sufficiently examine the response generation strength of the system under evaluation.
    • Around 50 sentences of varying types of structures were given by the layman who is not familiar with the machine commands to the system.
    • These sentences were used to probe the system and the actual response was gathered for the analysis.
  • The response analysis of four different types of structures 00 12 12 Wh-question 01 07 08 Yes-No Question 02 14 16 Imperative 00 14 14 Declarative Number of wrong responses Number of Correct responses Number of Sentences Sentence Type
  • Results
    • Precision being the most enduring metric of performance is applied to this natural language understanding system
    • It measures the extent to which the system produced only the appropriate output.
    • Thus the precision of the system was found to be 94%.
  • Conclusion
    • NLP techniques used to build a natural language understanding system that is able to process and understand the natural language involving relating linguistic forms to extract the meaning of a command to carry out the required action.
    • Implemented using different modules corresponding to different levels of knowledge of language understanding
    • Tested for varying structures of sentences.
    • The performance evaluation of the system is examined which is capable of generating the answers in response to different forms of sentence structures
    • Thank you