SlideShare a Scribd company logo
1 of 28
Download to read offline
A Vietnamese Text-based Conversational Agent
Nguyen Quoc Dai
Faculty of Information Technology
University of Engineering and Technology
Vietnam National University, Hanoi
Supervised by
Dr. Pham Bao Son
A thesis submitted in fulfillment of the requirements
for the degree of
Master of Science in Computer Science
November 2011
ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my knowledge
it contains no materials previously published or written by another person, or substan-
tial proportions of material which have been accepted for the award of any other degree
or diploma at University of Engineering and Technology (UET/Coltech) or any other
educational institution, except where due acknowledgement is made in the thesis. Any
contribution made to the research by others, with whom I have worked at UET/Coltech
or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual
content of this thesis is the product of my own work, except to the extent that assistance
from others in the project’s design and conception or in style, presentation and linguistic
expression is acknowledged.’
Hanoi, November 23rd
, 2011
Signed ........................................................................
i
ABSTRACT
The first step that a question answering system must perform is to transform
an input question into an intermediate representation. Most published works so far
use rule-based approaches to realize this transformation in question answering sys-
tems. Nevertheless, in existing rule-based approaches, manually creating the rules is
error-prone and expensive in time and effort. In this thesis, we focus on introduc-
ing a rule-based approach that offers an intuitive way to create compact rules for
extracting intermediate representation of input questions. Experimental results are
promising where our system achieves reasonable performance and demonstrate that
it is straightforward to adapt to new domains and languages.
More importantly, this thesis introduces a Vietnamese text-based conversational agent
architecture on specific knowledge domain which is integrated in a question answer-
ing system. When the question answering system fails to provide answers to user
input, our conversational agent can step in to interact with users to provide answers
to users. Experimental results are promising where our Vietnamese text-based con-
versational agent achieves positive feedback in a study conducted in the university
academic regulation domain.
Publications:
? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. A Vietnamese Text-based Conver-
sational Agent. In Proc. of The 25th International Conference on Industrial, Engineering & Other
Applications of Applied Intelligent Systems (IEA/AIE 2012), Springer-Verlag LNAI, pp. 699-708.
? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. A Semantic Approach for Ques-
tion Analysis. In Proc. of The 25th International Conference on Industrial, Engineering & Other
Applications of Applied Intelligent Systems (IEA/AIE 2012), Springer-Verlag LNAI, pp. 156-165.
? Dat Quoc Nguyen, Dai Quoc Nguyen and Son Bao Pham. Systematic Knowledge Acquisition
for Question Analysis. In Proc. of the 8th International Conference on Recent Advances in Natural
Language Processing (RANLP 2011), ACL Anthology, pp. 406-412.
ii
iii
? Dai Quoc Nguyen, Dat Quoc Nguyen, Khoi Trong Ma and Son Bao Pham. Automatic On-
tology Construction from Vietnamese text. In Proceedings of the 7th International Conference on
Natural Language Processing and Knowledge Engineering (NLPKE’11), IEEE, pp. 485-488.
? Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham and Dang Duc Pham. Ripple Down
Rules for Part-Of-Speech Tagging. In Proc. of 12th International Conference on Intelligent Text
Processing and Computational Linguistics (CICLING 2011), Springer-Verlag LNCS, part I, pp.
190-201.
? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. A Vietnamese question answering
system. In Proceedings of the 2009 International Conference on Knowledge and Systems Engineer-
ing (KSE 2009) , IEEE CS, pp. 26–32.
ACKNOWLEDGEMENTS
First and foremost, I would like to express my deepest gratitude to my supervisor,
Dr. Pham Bao Son, for his patient guidance and continuous support throughout the
years. He always appears when I need help, and responds to queries so helpfully and
promptly.
I would like to give my honest appreciation to my younger brother, Nguyen Quoc
Dat, for his great support.
I would like to specially thank Prof. Bui The Duy and my colleagues for their help
through my time at Human Machine Interaction Laboratory, UET/Coltech.
I sincerely acknowledge the Vietnam National University, Hanoi, Toshiba Founda-
tion Scholarship, and especially Dr. Pham Bao Son for supporting finance to my
master study.
Finally, this thesis would not have been possible without the support and love of
my mother and my father. Thank you!
iv
To my family ♥
v
Table of Contents
1 Introduction 1
1.1 A Semantic Approach for Question Analysis . . . . . . . . . . . . . . 1
1.2 A Vietnamese Text-based Conversational Agent . . . . . . . . . . . . 2
1.3 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature review 4
2.1 Text-based conversational agents . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Using keywords for pattern matching . . . . . . . . . . . . . . 4
2.1.2 Using the sentence similarity measure
for pattern matching . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 FrameScript Scripting Language . . . . . . . . . . . . . . . . . . . . . 9
2.3 Question answering systems . . . . . . . . . . . . . . . . . . . . . . . 12
3 Our Question Answering System Architecture 15
3.1 Vietnamese Question Answering System . . . . . . . . . . . . . . . . 15
3.1.1 Natural language question analysis component . . . . . . . . . 16
3.1.1.1 Intermediate representation of an input question . . 16
3.1.1.2 Question analysis . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Answer retrieval component . . . . . . . . . . . . . . . . . . . 18
3.2 Using FrameScript for question analysis . . . . . . . . . . . . . . . . . 19
3.2.1 Preprocessing module . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2 Syntactic analysis module . . . . . . . . . . . . . . . . . . . . 20
3.2.3 Semantic analysis module . . . . . . . . . . . . . . . . . . . . 22
4 Text-based Conversational Agent for Vietnamese 24
4.1 Overview of architecture . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Determining separate contexts . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Identifying hierarchical contexts . . . . . . . . . . . . . . . . . . . . . 27
vi
TABLE OF CONTENTS vii
5 Evaluation and Discussion 29
5.1 Experimental results
for Vietnamese text-based conversational agent . . . . . . . . . . . . 29
5.2 Question Analysis for English . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Conclusion 34
A Scripting patterns
for English question analysis 36
B Definitions of question-class types 38
C Definitions of question-structures 40
List of Figures
2.1 O’Shea et al.’s conversational agent framework. . . . . . . . . . . . . 7
2.2 Aqualog’s architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Architecture of our question answering system. . . . . . . . . . . . . . 16
3.2 Architecture of the natural language question analysis component
using FrameScript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Architecture of our Vietnamese text-based conversational agent. . . . 25
viii
List of Tables
4.1 Script examples of “subjects” . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Transformations between contexts . . . . . . . . . . . . . . . . . . . . 27
4.3 Order of transformation rules . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Ordered transformation between contexts . . . . . . . . . . . . . . . . 28
5.1 List of transformations among contexts . . . . . . . . . . . . . . . . . 30
5.2 Unsatisfying analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 The satisfied degree of students . . . . . . . . . . . . . . . . . . . . . 31
5.4 Number of rules corresponding with each question-structure type . . . 31
5.5 Number of rules with conditional responses . . . . . . . . . . . . . . . 32
5.6 Number of questions corresponding with each question-structure type 32
5.7 Error results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
ix
List of Abbreviations
CA Conversational Agent
QA Question Answering
IR Information Retrieval
IE Information Extraction
GATE General Architecture for Text Engineering
JAPE Java Annotation Patterns Engine
NLIDB Natural Language Interface to DataBase
POS Part-of-Speech
NLP Natural Language Processing
GUI Graphic User Interface
x
Chapter 1
Introduction
1.1 A Semantic Approach for Question Analysis
The goal of question answering systems is to give answers to the user’s questions
instead of ranked lists of related documents as used by most current search engines
(Hirschman and Gaizauskas, 2001). Natural language question analysis component
is the first component in any question answering systems. This component creates
an intermediate representation of the input question, which is expressed in natural
language, to be utilized in the rest of the system.
For the task of translating a natural language question into an explicit intermedi-
ate representation of the complexity in question answering systems, most published
works so far use rule-based approach to the best of our knowledge. Some question
answering systems such as (Lopez et al., 2007; Phan and Nguyen, 2010) manually
defined a list of sequence rule structures to analyze questions. However, in these
rule-based approaches, manually creating the rules is error-prone and expensive in
time and effort.
In this thesis, we present an approach to return an intermediate representation
of question via FrameScript scripting language (McGill et al., 2003). Natural lan-
guage questions will be transformed into intermediate representation elements which
include the construction type of question, question class, keywords in question and
semantic constraints between them. Framescript allows users to intuitively write
rules to directly extract the output tuple.
1
2 Chapter 1. Introduction
1.2 A Vietnamese Text-based Conversational Agent
A text-based conversational agent is a program allowing the conversational inter-
actions between human and machine by using natural language through text. The
text-based conversational agent uses scripts organized into contexts comprising hier-
archically constructed rules. The rules consist of patterns and associated responses,
where the input is matched based on patterns and the corresponding responses are
sent to user as output.
We focus on the analysis of input text in building a conversational agent. Re-
cently, the input analysis over user’s statements have been developed following two
main approaches: using keywords (ELIZA (Weizenbaum, 1983), ALICE (Wallace,
2001), ProBot (Sammut, 2001)) and using similarity measures (O’Shea et al., 2010;
Graesser et al., 2004; Traum, 2006) for pattern matching. The approaches using
keywords usually utilize a scripting language to match the input statements, while
the other approaches measure the similarity between the statements and patterns
from the agent’s scripts.
In this thesis, we introduce a Vietnamese text-based conversational agent ar-
chitecture on a specific knowledge domain. Our system aims to direct the user’s
statement into an appropriate context. The contexts are structured in a hierarchy of
scripts consisting of rules in FrameScript language (McGill et al., 2003). In addition,
our text-based conversational agent was constructed to integrate in a Vietnamese
question answering system. Our conversational agent provides not only information
related to user’s statement but also provides necessary knowledge to support our
question answering system when it is unable to find an answer.
The knowledge domain we used to build our text-based conversational agent is
the academic regulation at Vietnam National University, Hanoi (VNU). The aca-
demic regulation book helps students to know the course programs, the regulation of
examinations, the discipline at VNU... However, most students don’t prefer reading
the academic regulation book. Therefore, our contribution creates an interaction
channel to offer the necessary information to students. Once students give their
statements that they are interested in the academic regulation, our text-based con-
versational agent responses these statements by providing the related information in
detail. Furthermore, our conversation agent also interacts with students by offering
the option to ask if students want to know other information.
1.3. Thesis Organisation 3
1.3 Thesis Organisation
This dissertation consists of 6 chapters. In chapter 2, we provide some literature re-
views and describe our Vietnamese question answering system architecture, in which
we present a method for converting a natural language question into an intermediate
representation, in chapter 3. We propose our Vietnamese text-based conversational
agent architecture in chapter 4. We describe our experiments and discussions in
chapter 5, and conclusion will be presented in chapter 6.
Chapter 2
Literature review
In this chapter, we review related works using text-based approaches for conversa-
tional agent (CA). Section 2.1 describes the approaches constructing rules to match
user’s natural language utterances in the ways of using keywords (in section 2.1.1)
and using a sentence similarity measure (in section 2.1.2). In addition, section 2.2
covers the basic knowledge background about FrameScript scripting language that
we have been working on, while section 2.3 presents reviews about the question
answering systems driving specific-domains.
2.1 Text-based conversational agents
2.1.1 Using keywords for pattern matching
ELIZA (Weizenbaum, 1983) was one of the earliest text-based conversational agents
based on a simple pattern matching by using the identification of keywords from
user’s statement. Then ELIZA transforms the user’s statement to an appropriate
rule and generates output response. The procedure that ELIZA responds to an user
input to give an appropriate output consists of five steps.
• Identify the important keywords appearing in user’s statement.
• Define some minimal context within which selected keyword occurs.
• Determine an appropriate transformation rule.
• Generate the responses when the input text contained no keywords.
4
2.1. Text-based conversational agents 5
• Provide a facilitate editing for scripts on the script writing level.
Transformation rules are used to serve decomposing a data string according to
certain criteria and reassembling a decomposed string according to certain assembly
specifications. Therefore, the input are analyzed based on the decomposition rules
triggered by keywords, and responses are generated against the reassembly rules
associated with selected decomposition rules. For example, encountering the input
sentence:
“It seems that you like me”
this sentence is decomposed into the four parts:
(1) It seems that (2) you (3) like (4) me
by using the decomposition rule:
(0 YOU 1 ME)
The associated response might then be:
“What makes you think I like you”
by using the reassembly rule:
(WHAT MAKES YOU THINK I 3 YOU)
An integer 0 in the decomposition rule will match more words and a non-zero integer
“n” appearing in a decomposition rules indicates that exactly “n” words will be
matched, while an integer 3 in the above reassembly rule shows that the third part
of the decomposed sentence is inserted in its place to reply the input sentence. If
each word is defined in a dictionary of keywords by scanning an input sentence from
left to right, then only decomposition rules containing that keyword need to be tried.
An ELIZA script consists mainly of a set of list structures as following:
(K ((D1) (R1, 1) (R1, 2) ... (R1, m1))
((D2) (R2, 1) (R2, 2) ... (R2, m2))
.
.
.
((Dn) (Rn, 1) (Rn, 2) ... (Rn, mn)))
where K is the keyword, Di the ith decomposition rule associated with K and Ri, j
the jth reassembly rule associated with the ith decomposition rule. Any number
of decomposition rules may be associated with a given keyword and any number of
reassembly rules with any specific decomposition rule since having no predetermined
ordering limitations.
6 Chapter 2. Literature review
ALICE (Wallace, 2001) is a text-based conversational agent as chat robot uti-
lizing an XML language called Artificial Intelligence Markup Language (AIML).
AIML files consist of category tags representing rules; each category tag contains a
pair of pattern and template tag. The entire category is stored in a tree. The system
searches the pattern according with an user input by using depth-first search in the
tree, and produces the appropriate template as a response. For example, a category
below:
<category>
<pattern>DO YOU KNOW WHO * IS?</pattern>
<template><srai>WHO IS <star/></srai></template>
</category>
AIML uses the * wild-card character in creating patterns to match any non-zero
number of words. When an input matched this pattern, the portion bound to the
* wild-card may be placed into the response with the <star/> markup. This above
category reduces any input of the form “Do you know who X is?” to “Who is X”.
AIML allows two types of optional context called “that” and “topic”. The that
tag appearing inside the category matches the robot’s previous utterance, while the
topic tag occurring outside the category indicates a group of categories together and
the topic may be set inside any template. Observing a sample topic, like:
<topic name=“MOVIES”>
<category>
<pattern>YES</pattern>
<that>DO YOU LIKE ROMANTIC MOVIES</that>
<template>What is your favourite romantic movie?</template>
</category>
<category>
<pattern>YES</pattern>
<that>DO YOU LIKE ACTION MOVIES</that>
<template>What is your favourite action movie?</template>
</category>
When the client says yes, the program must discover the robot’s previous utterance.
If the robot asked “Do you like romantic movies?”, the response sent to reply is
“What is your favourite romantic movie?”.
AIML is clever and simple, and easy for implementation and a good start for
beginners writing simple bots. However, it is difficult to write and debug more
2.1. Text-based conversational agents 7
discriminating patterns, and it is very hard to know all the transformations available
because AIML depends on self-modifying the input.
Sammut (Sammut, 2001) presented a text-based CA called ProBot that is able
to extract data from users. ProBot’s scripts are typically organized into hierarchi-
cal contexts consisting of a number of organized rules to handle unexpected inputs.
Concurrently, McGill et al. (McGill et al., 2003) derived from ProBot’s scripts (Sam-
mut, 2001) build the rule system in FrameScript scripting language (in section 2.2).
FrameScript (McGill et al., 2003) provides for the rapid prototyping of conversa-
tional interfaces and simplifies the writing of scripts.
2.1.2 Using the sentence similarity measure
for pattern matching
O’Shea et al. (O’Shea et al., 2008, 2010) proposed a text-based conversational agent
framework (shown in figure 2.1) using semantic analysis. All patterns in scripts are
the natural language sentences. The pattern matching uses a sentence similarity
measure (Li et al., 2006) to calculate the similarity between sentences from scripts
and user input. The highest ranked sentence is selected and its associated response
is sent as output.
Figure 2.1: O’Shea et al.’s conversational agent framework.
Scripts used in framework consist of contexts relating to a specific topic of conver-
sation. Each context contains one or more rules, and each rule uses “s” to represent
8 Chapter 2. Literature review
a natural language sentence and “r” to represent a response statement. For example,
considering a following rule:
<Rule_01>
s: I’m a student
r: Which university do you study?
With a user’s statement:
“I am a master student” or
“I am a phd student”
This input and the natural language sentences from the scripts are received in order
to send the sentence similarity measure. Then sentence similarity measure calculates
a firing strength for each sentence pair to rank the sentences. In this above example,
the highest ranked sentence selected is “I’m a student” and its associated response
sent to user is “Which university do you study?”.
The advantages of using a sentence similarity measure for pattern-matching is
that rule structures are simplified and reduced in size and complexity. By contrast,
this approach can’t retrieve some information from an input to insert into response
like using keywords for presented section 2.1.1.
Graesser et al. (Graesser et al., 2004) presented a conversational agent called
AUTOTUTOR matching input statements in the use of Latent Semantic Analysis.
Traum (Traum, 2006) adapted the effective question answering characters (Leuski
et al., 2006) to build a conversational agent also employing Latent Semantic Analysis
for pattern matching.
2.2. FrameScript Scripting Language 9
2.2 FrameScript Scripting Language
FrameScript (McGill et al., 2003) is a language for creating multi-modal user in-
terfaces. It employs from Sammut’s Probot (Sammut, 2001) to enable rule-based
programming, frame representations and simple function evaluation. The Frame-
Script scripting language also proposes a set of tools to represent knowledge and
interacting with users and external devices.
Each script in FrameScript (McGill et al., 2003) includes a list of rules matched
against user input and used to give the appropriate response. Rules are grouped into
particular contexts of the form: context_name :: rule_set. The scripting rules in the
FrameScript language consist of patterns and responses with the form:
pattern ==> response.
Pattern expressions may contain 2 wild-cards characters which are * and ∼
. * will
match 0 or more words and ∼
within a word indicates that 0 or more characters may
be matched. Pattern expressions also allow the use of the alternatives by constructing
of the form:
{ alternative 1 | alternative 2 | ... }.
Moreover, patterns use non-terminals to reuse other pattern expressions by writing
the name of the non-terminal surrounded by ‘<’ and ‘>’. Non-terminals are often
declared as list of alternatives followed by ;;.
For example:
Number:: ==>
{1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | 13 | 14 | 15
| 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25};;
Response expressions contain two different types which are sequences and alter-
natives. Sequence of responses has the form surrounded by brackets:
[response 1 | response 2 | ... | another response],
where each response is given in turn every time the pattern is matched and the
sequence repeats when the last response is output. Alternatives have the form sur-
rounded by braces:
{response 1 | response 2 | ... | another response},
in which any response may be chosen randomly for user output.
In addition, responses utilize the ‘#’ to perform some action such as chang-
ing the current context. For example, #goto(a_script) transforms a conversation or
interaction from one context to another. Similarly, ‘∧
’ is used to perform actions, ex-
10 Chapter 2. Literature review
cept that when the following expression is evaluated it is inserted into response not
thrown away. And some response expression may be dependent on some conditions
holding true in the constructed form below:
* ==>
[ ∧
(condition) –>
response if true
| ∧
(! condition) –>
response if false ]
Furthermore, some pattern elements create a numbered match component when
a pattern matches. These component are segments of the input that can be referred
to in a response using ‘∧
’. Pattern elements that identify match components are
wild-card (*, and ∼
), alternatives and non-terminals. When ‘∧
’ is followed by an
integer then the numbered pattern component associated with that integer is placed
in the output response. Encountering an example as following:
{My name is | I’m} * ==>
[ Hello ∧
2. How old are you? ]
I am <Number> years old ==>
[ ∧
(∧
1 <= 20) –> Are you a student?
| How do you do? ]
The transcript of dialogue is shown below illustrating the above example:
User: My name is X
CA: Hello X. How old are you?
User: I am 19 years old
CA: Are you a student?
An input received from user is given to a domain in order to ensure that the input
is matched against the correct scripts. Script can be registered as topic in a domain
to become the current script and process the input. When a script is registered as
a topic, the domain uses the script’s trigger to determine whether or not an input
activates that topic. If a topic doesn’t have a trigger, any input will activate it.
When a topic’s trigger matches the input, it becomes the current context and the
current topic.
2.2. FrameScript Scripting Language 11
Example ::
domain example
trigger{* {Hi | hi | Hello | hello} *}
* {Hi | hi | Hello | hello} * ==> [Hi there!]
When writing complex scripts where scripts have similar behaviours, FrameScript
is possible to use inheritance to enable rule to be shared between scripts. Moreover,
FrameScript allows defining failsafes for scripts. A failsafe is another script whose
rules would be used if an input matches incorrectly any of rules for a script.
The order in which domains attempt to determine rules that the input should
be matched is:
1. triggers of the topics
2. the current context
3. the failsafe of the current context
4. the current topic
5. the failsafe of the current topic
6. the failsafe for the domain
When an input is compared to the rules of a script, the input is first compared to
the rules specifically defined by the script. If none of these rules match, the input is
matched against the rules of the script’s parents. The rules of the scripts are tried
in top to bottom order.
12 Chapter 2. Literature review
2.3 Question answering systems
Kinds of question answering systems range from closed-domain systems (aiming to
answer questions in a specific domain) to open-domain systems (aiming to answer all
of asked questions). In our experiment, the open-domain systems focus on retrieving
and ranking related documents corresponding with the input, while the close-domain
systems focus on analysis natural language questions to extract reliable terms.
Additionally, natural language question analysis component is the first compo-
nent in any question answering systems. This component creates an intermediate
representation of the input question, which is expressed in natural language, to be
utilized in the rest of the system. The basis of the question parser is question clas-
sification that can be defined as the task of mapping a given question to one of
k classes based on the possible types of the answers (Li and Roth, 2002b). Subse-
quently, natural language questions analysis techniques are used to identify keywords
and semantic relations in input questions.
Therefore, our related works come from reviewing question answering systems
against the question analysis approaches in specific domain driven ones.
Pattern-matching based systems
Close-domain question answering systems are usually linked to relational databases
and called natural language interfaces to databases. A natural language interface to
a database (NLIDB) is a system that allows the users to access information stored in
a database by typing questions using natural language expressions (Androutsopoulos
et al., 1995).
Early NLIDB systems used pattern-matching technique to process user’s ques-
tions and generate corresponding answers. (Sneiders, 2002) presented a NLIDB sys-
tem by using question patterns covering conceptual model of the database. The
input is converted into SQL query by using defined templates that contain entity
slots – free space for data instances representing the primary concepts of the ques-
tion. Some other open-domain systems presented in (Wu et al., 2003; Saxena et al.,
2007) used pattern-matching techniques to respond user’s requests.
The main advantage of pattern-matching approach is its simplicity, and the sys-
tem can be able to perform well in certain applications. However, the one’s shallow-
ness would often lead to bad results.
2.3. Question answering systems 13
Semantic-based systems
Later NILDBs respond user’s question by using semantic grammar to parse the
input into syntax tree and mapping the tree to a database query. In semantic-based
systems, the grammar’s categories (i.e. the non-leaf nodes appearing in the parse
tree) have not to correspond to syntactic concepts (Androutsopoulos et al., 1995).
Semantic constraints are usually enforced by choosing semantic grammar categories,
in which the grammar’s categories can also be chosen to ease the mapping from the
syntax tree to database objects.
Nguyen and Le (Nguyen and Le, 2008) introduced a NLIDB question answering
system in Vietnamese employing semantic grammars. Their system includes two
main modules: QTRAN and TGEN. QTRAN (Query Translator) maps a natural
language question to an SQL query while TGEN (Text Generator) generates answers
based on the query result tables. QTRAN uses limited context-free grammars to
analyze user’s question into syntax tree via CYK algorithm. The syntax tree is
then converted into an SQL query by using a mapping dictionary to determine
names of attributes in Vietnamese, names of attributes in the database and names
of individuals stored in these attributes.
The PRECISE system (Popescu et al., 2003) maps the natural language ques-
tion to a unique semantic interpretation by analyzing some lexicons and semantic
constraints. (Stratica et al., 2003) described a template-based system to translate
English question into SQL query by matching the syntactic parse of the question to
a set of fixed semantic templates. Some other systems based on semantic grammar
rules such as Planes (Waltz, 1978), Eufid (Templeton and Burger, 1983). Semantic
grammar-based approaches were considered as an engineering methodology, which
allows semantic knowledge to be easily included in the system.
Annotation-based systems
Recently, some question answering systems that used semantic annotations gener-
ated high results in natural language question analysis. A well known annotation
based framework is GATE (General Architecture for Text Engineering) (Cunning-
ham et al., 2002) which have been used in many question answering systems like
Ontology-based AquaLog (Lopez et al., 2007) and QuestIO (Damljanovic et al.,
2008) systems, and Galea’s open-domain system (Galea, 2003), especially for the
natural language question analysis component.
14 Chapter 2. Literature review
Aqualog (Lopez et al., 2007) shown in figure 2.2 is an ontology-based question
answering system for English and is the basis for the development of our system. A
natural language question is mapped to a set of representation based on the inter-
mediate triple that is called a Query-Triple through the Linguistic Component by
using Java Annotation Patterns Engine (JAPE) grammars in GATE (Cunningham
et al., 2002). The Relation Similarity Service takes a Query-Triple and processes
it to provide queries with respect to the input ontology called Onto-Triple. Then
Aqualog uses Onto-Triple to return an answer for users.
Figure 2.2: Aqualog’s architecture.
In our experiment, we reported an approach to convert Vietnamese natural lan-
guage questions into intermediate representation element in query-tuples (Question-
structure, Question-class, Term1, Relation, Term2, Term3) based on semantic annota-
tions via JAPE grammars (Nguyen et al., 2009). The selected query-tuple type is
more complex aiming to cover a wider variety of question types in different languages.
In addition, we proposed a language-independent approach to acquire JAPE rules
in a systematic manner which avoids unintended interaction among rules (Nguyen
et al., 2011). (Phan and Nguyen, 2010) presented an approach to syntactically and
semantically map Vietnamese questions into triple-like of Subject, Verb and Object
in also utilizing JAPE grammars.
The START (Katz, 1997; Katz et al., 2006) question answering system also
used natural language annotations (Katz, 1997) without utilizing GATE. A lexical
database WordNet (Fellbaum, 1998) is important natural language application. After
the appearance of WordNet, almost question answering systems used it to provide
information for analyzing questions.
Tải bản FULL (58 trang): https://bit.ly/3daue2I
Dự phòng: fb.com/TaiHo123doc.net
Chapter 3
Our Question Answering System
Architecture
In this chapter, we introduce the overview of our first Ontology-based question an-
swering system for Vietnamese (in section 3.1). Our system contains a front-end
component that performs syntactic and semantic analysis on natural language ques-
tions. The back-end component is responsible for making sense of the user’s query
with respect to a target ontology using various concept-matching techniques between
a natural language phrase and elements in the ontology. The communication between
the front-end and back-end is an intermediate representation of the question, which
captures the semantic structure of the users’ question.
Furthermore, we focus on describing a rule-based approach to directly extract an
intermediate representation elements of question via FrameScript scripting language
(McGill et al., 2003) (in section 3.2).
3.1 Vietnamese Question Answering System
The architecture of our question answering system is shown in figure 3.1. It includes
two components: the Natural language question analysis and the Answer retrieval.
The question analysis component takes the user’s question as an input and re-
turns a query-tuple representing the question in a compact form. The role of this
intermediate representation is to provide structured information of the input ques-
tion for later processing such as retrieving answers.
The answer retrieval component includes two main modules: Ontology mapping
15
Tải bản FULL (58 trang): https://bit.ly/3daue2I
Dự phòng: fb.com/TaiHo123doc.net
16 Chapter 3. Our Question Answering System Architecture
and Answer extraction. It takes an intermediate representation produced by the
question analysis component and an ontology as its input to generate semantic
answers.
Figure 3.1: Architecture of our question answering system.
3.1.1 Natural language question analysis component
3.1.1.1 Intermediate representation of an input question
The intermediate representation used in our approach aims to cover a wider variety
of question types. It consists of a question-structure and one or more query-tuple in
the following format:
( question-structure, question-class, Term1, Relation, Term2, Term3 )
where Term1 represents a concept (object class), Term2 and Term3, if exist,
represent entities (objects), Relation (property) is a semantic constraint between
terms in the question. This representation is meant to capture the semantics of the
question.
Simple questions corresponding to basic constructions only have one query-tuple
6816180

More Related Content

Similar to A Vietnamese Text-based Conversational Agent.pdf

Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportTrushita Redij
 
Mechanising_Programs_in_IsabelleHOL
Mechanising_Programs_in_IsabelleHOLMechanising_Programs_in_IsabelleHOL
Mechanising_Programs_in_IsabelleHOLAnkit Verma
 
Python for Everybody
Python for EverybodyPython for Everybody
Python for Everybodyvishalpanday2
 
A hybrid approach to finding phenotype candidates in genetic text.pdf
A hybrid approach to finding phenotype candidates in genetic text.pdfA hybrid approach to finding phenotype candidates in genetic text.pdf
A hybrid approach to finding phenotype candidates in genetic text.pdfNuioKila
 
Matloff programming on-parallel_machines-2013
Matloff programming on-parallel_machines-2013Matloff programming on-parallel_machines-2013
Matloff programming on-parallel_machines-2013lepas Yikwa
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Pieter Van Zyl
 
HaiqingWang-MasterThesis
HaiqingWang-MasterThesisHaiqingWang-MasterThesis
HaiqingWang-MasterThesisHaiqing Wang
 
Measuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In PracticeMeasuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In PracticeHakan Özler
 

Similar to A Vietnamese Text-based Conversational Agent.pdf (20)

Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_report
 
Mechanising_Programs_in_IsabelleHOL
Mechanising_Programs_in_IsabelleHOLMechanising_Programs_in_IsabelleHOL
Mechanising_Programs_in_IsabelleHOL
 
dmo-phd-thesis
dmo-phd-thesisdmo-phd-thesis
dmo-phd-thesis
 
Python for Everybody
Python for EverybodyPython for Everybody
Python for Everybody
 
Thesis
ThesisThesis
Thesis
 
Python for everybody
Python for everybodyPython for everybody
Python for everybody
 
thesis
thesisthesis
thesis
 
A hybrid approach to finding phenotype candidates in genetic text.pdf
A hybrid approach to finding phenotype candidates in genetic text.pdfA hybrid approach to finding phenotype candidates in genetic text.pdf
A hybrid approach to finding phenotype candidates in genetic text.pdf
 
MicroFSharp
MicroFSharpMicroFSharp
MicroFSharp
 
thesis_online
thesis_onlinethesis_online
thesis_online
 
BA_FCaballero
BA_FCaballeroBA_FCaballero
BA_FCaballero
 
MaryamNajafianPhDthesis
MaryamNajafianPhDthesisMaryamNajafianPhDthesis
MaryamNajafianPhDthesis
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Sanskrit Parser Report
Sanskrit Parser ReportSanskrit Parser Report
Sanskrit Parser Report
 
Matloff programming on-parallel_machines-2013
Matloff programming on-parallel_machines-2013Matloff programming on-parallel_machines-2013
Matloff programming on-parallel_machines-2013
 
Thesis
ThesisThesis
Thesis
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
 
diss
dissdiss
diss
 
HaiqingWang-MasterThesis
HaiqingWang-MasterThesisHaiqingWang-MasterThesis
HaiqingWang-MasterThesis
 
Measuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In PracticeMeasuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In Practice
 

More from NuioKila

Pháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdf
Pháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdfPháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdf
Pháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdfNuioKila
 
BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...
BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...
BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...NuioKila
 
A study on common mistakes committed by Vietnamese learners in pronouncing En...
A study on common mistakes committed by Vietnamese learners in pronouncing En...A study on common mistakes committed by Vietnamese learners in pronouncing En...
A study on common mistakes committed by Vietnamese learners in pronouncing En...NuioKila
 
[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...
[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...
[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...NuioKila
 
THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...
THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...
THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...NuioKila
 
Nhu cầu lập pháp của hành pháp.pdf
Nhu cầu lập pháp của hành pháp.pdfNhu cầu lập pháp của hành pháp.pdf
Nhu cầu lập pháp của hành pháp.pdfNuioKila
 
KẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdf
KẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdfKẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdf
KẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdfNuioKila
 
KIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdf
KIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdfKIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdf
KIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdfNuioKila
 
QUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdf
QUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdfQUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdf
QUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdfNuioKila
 
NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...
NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...
NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...NuioKila
 
TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...
TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...
TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...NuioKila
 
Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...
Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...
Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...NuioKila
 
Inefficiency in engineering change management in kimberly clark VietNam co., ...
Inefficiency in engineering change management in kimberly clark VietNam co., ...Inefficiency in engineering change management in kimberly clark VietNam co., ...
Inefficiency in engineering change management in kimberly clark VietNam co., ...NuioKila
 
An Investigation into culrural elements via linguistic means in New Headway t...
An Investigation into culrural elements via linguistic means in New Headway t...An Investigation into culrural elements via linguistic means in New Headway t...
An Investigation into culrural elements via linguistic means in New Headway t...NuioKila
 
An evaluation of the translation of the film Rio based on Newmarks model.pdf
An evaluation of the translation of the film Rio based on Newmarks model.pdfAn evaluation of the translation of the film Rio based on Newmarks model.pdf
An evaluation of the translation of the film Rio based on Newmarks model.pdfNuioKila
 
Teachers and students views on grammar presentation in the course book Englis...
Teachers and students views on grammar presentation in the course book Englis...Teachers and students views on grammar presentation in the course book Englis...
Teachers and students views on grammar presentation in the course book Englis...NuioKila
 
11th graders attitudes towards their teachers written feedback.pdf
11th graders attitudes towards their teachers written feedback.pdf11th graders attitudes towards their teachers written feedback.pdf
11th graders attitudes towards their teachers written feedback.pdfNuioKila
 
Phân tích tài chính Công ty Cổ phần VIWACO.pdf
Phân tích tài chính Công ty Cổ phần VIWACO.pdfPhân tích tài chính Công ty Cổ phần VIWACO.pdf
Phân tích tài chính Công ty Cổ phần VIWACO.pdfNuioKila
 
Ngói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdf
Ngói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdfNgói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdf
Ngói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdfNuioKila
 
ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...
ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...
ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...NuioKila
 

More from NuioKila (20)

Pháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdf
Pháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdfPháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdf
Pháp luật về Quỹ trợ giúp pháp lý ở Việt Nam.pdf
 
BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...
BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...
BÁO CÁO Kết quả tham vấn cộng đồng về tính hợp pháp của gỗ và các sản phẩm gỗ...
 
A study on common mistakes committed by Vietnamese learners in pronouncing En...
A study on common mistakes committed by Vietnamese learners in pronouncing En...A study on common mistakes committed by Vietnamese learners in pronouncing En...
A study on common mistakes committed by Vietnamese learners in pronouncing En...
 
[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...
[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...
[123doc] - thu-nghiem-cai-tien-chi-tieu-du-bao-khong-khi-lanh-cac-thang-cuoi-...
 
THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...
THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...
THỬ NGHIỆM CẢI TIẾN CHỈ TIÊU DỰ BÁO KHÔNG KHÍ LẠNH CÁC THÁNG CUỐI MÙA ĐÔNG BẰ...
 
Nhu cầu lập pháp của hành pháp.pdf
Nhu cầu lập pháp của hành pháp.pdfNhu cầu lập pháp của hành pháp.pdf
Nhu cầu lập pháp của hành pháp.pdf
 
KẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdf
KẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdfKẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdf
KẾ HOẠCH DẠY HỌC CỦA TỔ CHUYÊN MÔN MÔN HỌC SINH HỌC - CÔNG NGHỆ.pdf
 
KIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdf
KIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdfKIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdf
KIẾN TRÚC BIỂU HIỆN TẠI VIỆT NAM.pdf
 
QUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdf
QUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdfQUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdf
QUY HOẠCH PHÁT TRIỂN HỆ THỐNG Y TẾ TỈNH NINH THUẬN.pdf
 
NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...
NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...
NGHIÊN CỨU XÂY DỰNG BỘ TIÊU CHÍ ĐÁNH GIÁ CHẤT LƯỢNG CÁC CHƯƠNG TRÌNH ĐÀO TẠO ...
 
TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...
TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...
TIỂU LUẬN Phân tích các loại nguồn của luật tư La Mã và so sánh với các nguồn...
 
Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...
Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...
Nuevo enfoque de aprendizajesemi-supervisado para la identificaciónde secuenci...
 
Inefficiency in engineering change management in kimberly clark VietNam co., ...
Inefficiency in engineering change management in kimberly clark VietNam co., ...Inefficiency in engineering change management in kimberly clark VietNam co., ...
Inefficiency in engineering change management in kimberly clark VietNam co., ...
 
An Investigation into culrural elements via linguistic means in New Headway t...
An Investigation into culrural elements via linguistic means in New Headway t...An Investigation into culrural elements via linguistic means in New Headway t...
An Investigation into culrural elements via linguistic means in New Headway t...
 
An evaluation of the translation of the film Rio based on Newmarks model.pdf
An evaluation of the translation of the film Rio based on Newmarks model.pdfAn evaluation of the translation of the film Rio based on Newmarks model.pdf
An evaluation of the translation of the film Rio based on Newmarks model.pdf
 
Teachers and students views on grammar presentation in the course book Englis...
Teachers and students views on grammar presentation in the course book Englis...Teachers and students views on grammar presentation in the course book Englis...
Teachers and students views on grammar presentation in the course book Englis...
 
11th graders attitudes towards their teachers written feedback.pdf
11th graders attitudes towards their teachers written feedback.pdf11th graders attitudes towards their teachers written feedback.pdf
11th graders attitudes towards their teachers written feedback.pdf
 
Phân tích tài chính Công ty Cổ phần VIWACO.pdf
Phân tích tài chính Công ty Cổ phần VIWACO.pdfPhân tích tài chính Công ty Cổ phần VIWACO.pdf
Phân tích tài chính Công ty Cổ phần VIWACO.pdf
 
Ngói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdf
Ngói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdfNgói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdf
Ngói Champa ở di tích Triền Tranh (Duy Xuyên Quảng Nam).pdf
 
ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...
ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...
ĐỀ XUẤT CÁC GIẢI PHÁP NÂNG CAO HIỆU QUẢ VẬN HÀNH LƯỚI ĐIỆN PHÂN PHỐI TÂY NAM ...
 

Recently uploaded

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 

Recently uploaded (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

A Vietnamese Text-based Conversational Agent.pdf

  • 1. A Vietnamese Text-based Conversational Agent Nguyen Quoc Dai Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi Supervised by Dr. Pham Bao Son A thesis submitted in fulfillment of the requirements for the degree of Master of Science in Computer Science November 2011
  • 2.
  • 3. ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substan- tial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET/Coltech) or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UET/Coltech or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’ Hanoi, November 23rd , 2011 Signed ........................................................................ i
  • 4. ABSTRACT The first step that a question answering system must perform is to transform an input question into an intermediate representation. Most published works so far use rule-based approaches to realize this transformation in question answering sys- tems. Nevertheless, in existing rule-based approaches, manually creating the rules is error-prone and expensive in time and effort. In this thesis, we focus on introduc- ing a rule-based approach that offers an intuitive way to create compact rules for extracting intermediate representation of input questions. Experimental results are promising where our system achieves reasonable performance and demonstrate that it is straightforward to adapt to new domains and languages. More importantly, this thesis introduces a Vietnamese text-based conversational agent architecture on specific knowledge domain which is integrated in a question answer- ing system. When the question answering system fails to provide answers to user input, our conversational agent can step in to interact with users to provide answers to users. Experimental results are promising where our Vietnamese text-based con- versational agent achieves positive feedback in a study conducted in the university academic regulation domain. Publications: ? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. A Vietnamese Text-based Conver- sational Agent. In Proc. of The 25th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2012), Springer-Verlag LNAI, pp. 699-708. ? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. A Semantic Approach for Ques- tion Analysis. In Proc. of The 25th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2012), Springer-Verlag LNAI, pp. 156-165. ? Dat Quoc Nguyen, Dai Quoc Nguyen and Son Bao Pham. Systematic Knowledge Acquisition for Question Analysis. In Proc. of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), ACL Anthology, pp. 406-412. ii
  • 5. iii ? Dai Quoc Nguyen, Dat Quoc Nguyen, Khoi Trong Ma and Son Bao Pham. Automatic On- tology Construction from Vietnamese text. In Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE’11), IEEE, pp. 485-488. ? Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham and Dang Duc Pham. Ripple Down Rules for Part-Of-Speech Tagging. In Proc. of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2011), Springer-Verlag LNCS, part I, pp. 190-201. ? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham. A Vietnamese question answering system. In Proceedings of the 2009 International Conference on Knowledge and Systems Engineer- ing (KSE 2009) , IEEE CS, pp. 26–32.
  • 6. ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Dr. Pham Bao Son, for his patient guidance and continuous support throughout the years. He always appears when I need help, and responds to queries so helpfully and promptly. I would like to give my honest appreciation to my younger brother, Nguyen Quoc Dat, for his great support. I would like to specially thank Prof. Bui The Duy and my colleagues for their help through my time at Human Machine Interaction Laboratory, UET/Coltech. I sincerely acknowledge the Vietnam National University, Hanoi, Toshiba Founda- tion Scholarship, and especially Dr. Pham Bao Son for supporting finance to my master study. Finally, this thesis would not have been possible without the support and love of my mother and my father. Thank you! iv
  • 7. To my family ♥ v
  • 8. Table of Contents 1 Introduction 1 1.1 A Semantic Approach for Question Analysis . . . . . . . . . . . . . . 1 1.2 A Vietnamese Text-based Conversational Agent . . . . . . . . . . . . 2 1.3 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Literature review 4 2.1 Text-based conversational agents . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Using keywords for pattern matching . . . . . . . . . . . . . . 4 2.1.2 Using the sentence similarity measure for pattern matching . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 FrameScript Scripting Language . . . . . . . . . . . . . . . . . . . . . 9 2.3 Question answering systems . . . . . . . . . . . . . . . . . . . . . . . 12 3 Our Question Answering System Architecture 15 3.1 Vietnamese Question Answering System . . . . . . . . . . . . . . . . 15 3.1.1 Natural language question analysis component . . . . . . . . . 16 3.1.1.1 Intermediate representation of an input question . . 16 3.1.1.2 Question analysis . . . . . . . . . . . . . . . . . . . . 17 3.1.2 Answer retrieval component . . . . . . . . . . . . . . . . . . . 18 3.2 Using FrameScript for question analysis . . . . . . . . . . . . . . . . . 19 3.2.1 Preprocessing module . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 Syntactic analysis module . . . . . . . . . . . . . . . . . . . . 20 3.2.3 Semantic analysis module . . . . . . . . . . . . . . . . . . . . 22 4 Text-based Conversational Agent for Vietnamese 24 4.1 Overview of architecture . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Determining separate contexts . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Identifying hierarchical contexts . . . . . . . . . . . . . . . . . . . . . 27 vi
  • 9. TABLE OF CONTENTS vii 5 Evaluation and Discussion 29 5.1 Experimental results for Vietnamese text-based conversational agent . . . . . . . . . . . . 29 5.2 Question Analysis for English . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6 Conclusion 34 A Scripting patterns for English question analysis 36 B Definitions of question-class types 38 C Definitions of question-structures 40
  • 10. List of Figures 2.1 O’Shea et al.’s conversational agent framework. . . . . . . . . . . . . 7 2.2 Aqualog’s architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Architecture of our question answering system. . . . . . . . . . . . . . 16 3.2 Architecture of the natural language question analysis component using FrameScript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1 Architecture of our Vietnamese text-based conversational agent. . . . 25 viii
  • 11. List of Tables 4.1 Script examples of “subjects” . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Transformations between contexts . . . . . . . . . . . . . . . . . . . . 27 4.3 Order of transformation rules . . . . . . . . . . . . . . . . . . . . . . 28 4.4 Ordered transformation between contexts . . . . . . . . . . . . . . . . 28 5.1 List of transformations among contexts . . . . . . . . . . . . . . . . . 30 5.2 Unsatisfying analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.3 The satisfied degree of students . . . . . . . . . . . . . . . . . . . . . 31 5.4 Number of rules corresponding with each question-structure type . . . 31 5.5 Number of rules with conditional responses . . . . . . . . . . . . . . . 32 5.6 Number of questions corresponding with each question-structure type 32 5.7 Error results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 ix
  • 12. List of Abbreviations CA Conversational Agent QA Question Answering IR Information Retrieval IE Information Extraction GATE General Architecture for Text Engineering JAPE Java Annotation Patterns Engine NLIDB Natural Language Interface to DataBase POS Part-of-Speech NLP Natural Language Processing GUI Graphic User Interface x
  • 13. Chapter 1 Introduction 1.1 A Semantic Approach for Question Analysis The goal of question answering systems is to give answers to the user’s questions instead of ranked lists of related documents as used by most current search engines (Hirschman and Gaizauskas, 2001). Natural language question analysis component is the first component in any question answering systems. This component creates an intermediate representation of the input question, which is expressed in natural language, to be utilized in the rest of the system. For the task of translating a natural language question into an explicit intermedi- ate representation of the complexity in question answering systems, most published works so far use rule-based approach to the best of our knowledge. Some question answering systems such as (Lopez et al., 2007; Phan and Nguyen, 2010) manually defined a list of sequence rule structures to analyze questions. However, in these rule-based approaches, manually creating the rules is error-prone and expensive in time and effort. In this thesis, we present an approach to return an intermediate representation of question via FrameScript scripting language (McGill et al., 2003). Natural lan- guage questions will be transformed into intermediate representation elements which include the construction type of question, question class, keywords in question and semantic constraints between them. Framescript allows users to intuitively write rules to directly extract the output tuple. 1
  • 14. 2 Chapter 1. Introduction 1.2 A Vietnamese Text-based Conversational Agent A text-based conversational agent is a program allowing the conversational inter- actions between human and machine by using natural language through text. The text-based conversational agent uses scripts organized into contexts comprising hier- archically constructed rules. The rules consist of patterns and associated responses, where the input is matched based on patterns and the corresponding responses are sent to user as output. We focus on the analysis of input text in building a conversational agent. Re- cently, the input analysis over user’s statements have been developed following two main approaches: using keywords (ELIZA (Weizenbaum, 1983), ALICE (Wallace, 2001), ProBot (Sammut, 2001)) and using similarity measures (O’Shea et al., 2010; Graesser et al., 2004; Traum, 2006) for pattern matching. The approaches using keywords usually utilize a scripting language to match the input statements, while the other approaches measure the similarity between the statements and patterns from the agent’s scripts. In this thesis, we introduce a Vietnamese text-based conversational agent ar- chitecture on a specific knowledge domain. Our system aims to direct the user’s statement into an appropriate context. The contexts are structured in a hierarchy of scripts consisting of rules in FrameScript language (McGill et al., 2003). In addition, our text-based conversational agent was constructed to integrate in a Vietnamese question answering system. Our conversational agent provides not only information related to user’s statement but also provides necessary knowledge to support our question answering system when it is unable to find an answer. The knowledge domain we used to build our text-based conversational agent is the academic regulation at Vietnam National University, Hanoi (VNU). The aca- demic regulation book helps students to know the course programs, the regulation of examinations, the discipline at VNU... However, most students don’t prefer reading the academic regulation book. Therefore, our contribution creates an interaction channel to offer the necessary information to students. Once students give their statements that they are interested in the academic regulation, our text-based con- versational agent responses these statements by providing the related information in detail. Furthermore, our conversation agent also interacts with students by offering the option to ask if students want to know other information.
  • 15. 1.3. Thesis Organisation 3 1.3 Thesis Organisation This dissertation consists of 6 chapters. In chapter 2, we provide some literature re- views and describe our Vietnamese question answering system architecture, in which we present a method for converting a natural language question into an intermediate representation, in chapter 3. We propose our Vietnamese text-based conversational agent architecture in chapter 4. We describe our experiments and discussions in chapter 5, and conclusion will be presented in chapter 6.
  • 16. Chapter 2 Literature review In this chapter, we review related works using text-based approaches for conversa- tional agent (CA). Section 2.1 describes the approaches constructing rules to match user’s natural language utterances in the ways of using keywords (in section 2.1.1) and using a sentence similarity measure (in section 2.1.2). In addition, section 2.2 covers the basic knowledge background about FrameScript scripting language that we have been working on, while section 2.3 presents reviews about the question answering systems driving specific-domains. 2.1 Text-based conversational agents 2.1.1 Using keywords for pattern matching ELIZA (Weizenbaum, 1983) was one of the earliest text-based conversational agents based on a simple pattern matching by using the identification of keywords from user’s statement. Then ELIZA transforms the user’s statement to an appropriate rule and generates output response. The procedure that ELIZA responds to an user input to give an appropriate output consists of five steps. • Identify the important keywords appearing in user’s statement. • Define some minimal context within which selected keyword occurs. • Determine an appropriate transformation rule. • Generate the responses when the input text contained no keywords. 4
  • 17. 2.1. Text-based conversational agents 5 • Provide a facilitate editing for scripts on the script writing level. Transformation rules are used to serve decomposing a data string according to certain criteria and reassembling a decomposed string according to certain assembly specifications. Therefore, the input are analyzed based on the decomposition rules triggered by keywords, and responses are generated against the reassembly rules associated with selected decomposition rules. For example, encountering the input sentence: “It seems that you like me” this sentence is decomposed into the four parts: (1) It seems that (2) you (3) like (4) me by using the decomposition rule: (0 YOU 1 ME) The associated response might then be: “What makes you think I like you” by using the reassembly rule: (WHAT MAKES YOU THINK I 3 YOU) An integer 0 in the decomposition rule will match more words and a non-zero integer “n” appearing in a decomposition rules indicates that exactly “n” words will be matched, while an integer 3 in the above reassembly rule shows that the third part of the decomposed sentence is inserted in its place to reply the input sentence. If each word is defined in a dictionary of keywords by scanning an input sentence from left to right, then only decomposition rules containing that keyword need to be tried. An ELIZA script consists mainly of a set of list structures as following: (K ((D1) (R1, 1) (R1, 2) ... (R1, m1)) ((D2) (R2, 1) (R2, 2) ... (R2, m2)) . . . ((Dn) (Rn, 1) (Rn, 2) ... (Rn, mn))) where K is the keyword, Di the ith decomposition rule associated with K and Ri, j the jth reassembly rule associated with the ith decomposition rule. Any number of decomposition rules may be associated with a given keyword and any number of reassembly rules with any specific decomposition rule since having no predetermined ordering limitations.
  • 18. 6 Chapter 2. Literature review ALICE (Wallace, 2001) is a text-based conversational agent as chat robot uti- lizing an XML language called Artificial Intelligence Markup Language (AIML). AIML files consist of category tags representing rules; each category tag contains a pair of pattern and template tag. The entire category is stored in a tree. The system searches the pattern according with an user input by using depth-first search in the tree, and produces the appropriate template as a response. For example, a category below: <category> <pattern>DO YOU KNOW WHO * IS?</pattern> <template><srai>WHO IS <star/></srai></template> </category> AIML uses the * wild-card character in creating patterns to match any non-zero number of words. When an input matched this pattern, the portion bound to the * wild-card may be placed into the response with the <star/> markup. This above category reduces any input of the form “Do you know who X is?” to “Who is X”. AIML allows two types of optional context called “that” and “topic”. The that tag appearing inside the category matches the robot’s previous utterance, while the topic tag occurring outside the category indicates a group of categories together and the topic may be set inside any template. Observing a sample topic, like: <topic name=“MOVIES”> <category> <pattern>YES</pattern> <that>DO YOU LIKE ROMANTIC MOVIES</that> <template>What is your favourite romantic movie?</template> </category> <category> <pattern>YES</pattern> <that>DO YOU LIKE ACTION MOVIES</that> <template>What is your favourite action movie?</template> </category> When the client says yes, the program must discover the robot’s previous utterance. If the robot asked “Do you like romantic movies?”, the response sent to reply is “What is your favourite romantic movie?”. AIML is clever and simple, and easy for implementation and a good start for beginners writing simple bots. However, it is difficult to write and debug more
  • 19. 2.1. Text-based conversational agents 7 discriminating patterns, and it is very hard to know all the transformations available because AIML depends on self-modifying the input. Sammut (Sammut, 2001) presented a text-based CA called ProBot that is able to extract data from users. ProBot’s scripts are typically organized into hierarchi- cal contexts consisting of a number of organized rules to handle unexpected inputs. Concurrently, McGill et al. (McGill et al., 2003) derived from ProBot’s scripts (Sam- mut, 2001) build the rule system in FrameScript scripting language (in section 2.2). FrameScript (McGill et al., 2003) provides for the rapid prototyping of conversa- tional interfaces and simplifies the writing of scripts. 2.1.2 Using the sentence similarity measure for pattern matching O’Shea et al. (O’Shea et al., 2008, 2010) proposed a text-based conversational agent framework (shown in figure 2.1) using semantic analysis. All patterns in scripts are the natural language sentences. The pattern matching uses a sentence similarity measure (Li et al., 2006) to calculate the similarity between sentences from scripts and user input. The highest ranked sentence is selected and its associated response is sent as output. Figure 2.1: O’Shea et al.’s conversational agent framework. Scripts used in framework consist of contexts relating to a specific topic of conver- sation. Each context contains one or more rules, and each rule uses “s” to represent
  • 20. 8 Chapter 2. Literature review a natural language sentence and “r” to represent a response statement. For example, considering a following rule: <Rule_01> s: I’m a student r: Which university do you study? With a user’s statement: “I am a master student” or “I am a phd student” This input and the natural language sentences from the scripts are received in order to send the sentence similarity measure. Then sentence similarity measure calculates a firing strength for each sentence pair to rank the sentences. In this above example, the highest ranked sentence selected is “I’m a student” and its associated response sent to user is “Which university do you study?”. The advantages of using a sentence similarity measure for pattern-matching is that rule structures are simplified and reduced in size and complexity. By contrast, this approach can’t retrieve some information from an input to insert into response like using keywords for presented section 2.1.1. Graesser et al. (Graesser et al., 2004) presented a conversational agent called AUTOTUTOR matching input statements in the use of Latent Semantic Analysis. Traum (Traum, 2006) adapted the effective question answering characters (Leuski et al., 2006) to build a conversational agent also employing Latent Semantic Analysis for pattern matching.
  • 21. 2.2. FrameScript Scripting Language 9 2.2 FrameScript Scripting Language FrameScript (McGill et al., 2003) is a language for creating multi-modal user in- terfaces. It employs from Sammut’s Probot (Sammut, 2001) to enable rule-based programming, frame representations and simple function evaluation. The Frame- Script scripting language also proposes a set of tools to represent knowledge and interacting with users and external devices. Each script in FrameScript (McGill et al., 2003) includes a list of rules matched against user input and used to give the appropriate response. Rules are grouped into particular contexts of the form: context_name :: rule_set. The scripting rules in the FrameScript language consist of patterns and responses with the form: pattern ==> response. Pattern expressions may contain 2 wild-cards characters which are * and ∼ . * will match 0 or more words and ∼ within a word indicates that 0 or more characters may be matched. Pattern expressions also allow the use of the alternatives by constructing of the form: { alternative 1 | alternative 2 | ... }. Moreover, patterns use non-terminals to reuse other pattern expressions by writing the name of the non-terminal surrounded by ‘<’ and ‘>’. Non-terminals are often declared as list of alternatives followed by ;;. For example: Number:: ==> {1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25};; Response expressions contain two different types which are sequences and alter- natives. Sequence of responses has the form surrounded by brackets: [response 1 | response 2 | ... | another response], where each response is given in turn every time the pattern is matched and the sequence repeats when the last response is output. Alternatives have the form sur- rounded by braces: {response 1 | response 2 | ... | another response}, in which any response may be chosen randomly for user output. In addition, responses utilize the ‘#’ to perform some action such as chang- ing the current context. For example, #goto(a_script) transforms a conversation or interaction from one context to another. Similarly, ‘∧ ’ is used to perform actions, ex-
  • 22. 10 Chapter 2. Literature review cept that when the following expression is evaluated it is inserted into response not thrown away. And some response expression may be dependent on some conditions holding true in the constructed form below: * ==> [ ∧ (condition) –> response if true | ∧ (! condition) –> response if false ] Furthermore, some pattern elements create a numbered match component when a pattern matches. These component are segments of the input that can be referred to in a response using ‘∧ ’. Pattern elements that identify match components are wild-card (*, and ∼ ), alternatives and non-terminals. When ‘∧ ’ is followed by an integer then the numbered pattern component associated with that integer is placed in the output response. Encountering an example as following: {My name is | I’m} * ==> [ Hello ∧ 2. How old are you? ] I am <Number> years old ==> [ ∧ (∧ 1 <= 20) –> Are you a student? | How do you do? ] The transcript of dialogue is shown below illustrating the above example: User: My name is X CA: Hello X. How old are you? User: I am 19 years old CA: Are you a student? An input received from user is given to a domain in order to ensure that the input is matched against the correct scripts. Script can be registered as topic in a domain to become the current script and process the input. When a script is registered as a topic, the domain uses the script’s trigger to determine whether or not an input activates that topic. If a topic doesn’t have a trigger, any input will activate it. When a topic’s trigger matches the input, it becomes the current context and the current topic.
  • 23. 2.2. FrameScript Scripting Language 11 Example :: domain example trigger{* {Hi | hi | Hello | hello} *} * {Hi | hi | Hello | hello} * ==> [Hi there!] When writing complex scripts where scripts have similar behaviours, FrameScript is possible to use inheritance to enable rule to be shared between scripts. Moreover, FrameScript allows defining failsafes for scripts. A failsafe is another script whose rules would be used if an input matches incorrectly any of rules for a script. The order in which domains attempt to determine rules that the input should be matched is: 1. triggers of the topics 2. the current context 3. the failsafe of the current context 4. the current topic 5. the failsafe of the current topic 6. the failsafe for the domain When an input is compared to the rules of a script, the input is first compared to the rules specifically defined by the script. If none of these rules match, the input is matched against the rules of the script’s parents. The rules of the scripts are tried in top to bottom order.
  • 24. 12 Chapter 2. Literature review 2.3 Question answering systems Kinds of question answering systems range from closed-domain systems (aiming to answer questions in a specific domain) to open-domain systems (aiming to answer all of asked questions). In our experiment, the open-domain systems focus on retrieving and ranking related documents corresponding with the input, while the close-domain systems focus on analysis natural language questions to extract reliable terms. Additionally, natural language question analysis component is the first compo- nent in any question answering systems. This component creates an intermediate representation of the input question, which is expressed in natural language, to be utilized in the rest of the system. The basis of the question parser is question clas- sification that can be defined as the task of mapping a given question to one of k classes based on the possible types of the answers (Li and Roth, 2002b). Subse- quently, natural language questions analysis techniques are used to identify keywords and semantic relations in input questions. Therefore, our related works come from reviewing question answering systems against the question analysis approaches in specific domain driven ones. Pattern-matching based systems Close-domain question answering systems are usually linked to relational databases and called natural language interfaces to databases. A natural language interface to a database (NLIDB) is a system that allows the users to access information stored in a database by typing questions using natural language expressions (Androutsopoulos et al., 1995). Early NLIDB systems used pattern-matching technique to process user’s ques- tions and generate corresponding answers. (Sneiders, 2002) presented a NLIDB sys- tem by using question patterns covering conceptual model of the database. The input is converted into SQL query by using defined templates that contain entity slots – free space for data instances representing the primary concepts of the ques- tion. Some other open-domain systems presented in (Wu et al., 2003; Saxena et al., 2007) used pattern-matching techniques to respond user’s requests. The main advantage of pattern-matching approach is its simplicity, and the sys- tem can be able to perform well in certain applications. However, the one’s shallow- ness would often lead to bad results.
  • 25. 2.3. Question answering systems 13 Semantic-based systems Later NILDBs respond user’s question by using semantic grammar to parse the input into syntax tree and mapping the tree to a database query. In semantic-based systems, the grammar’s categories (i.e. the non-leaf nodes appearing in the parse tree) have not to correspond to syntactic concepts (Androutsopoulos et al., 1995). Semantic constraints are usually enforced by choosing semantic grammar categories, in which the grammar’s categories can also be chosen to ease the mapping from the syntax tree to database objects. Nguyen and Le (Nguyen and Le, 2008) introduced a NLIDB question answering system in Vietnamese employing semantic grammars. Their system includes two main modules: QTRAN and TGEN. QTRAN (Query Translator) maps a natural language question to an SQL query while TGEN (Text Generator) generates answers based on the query result tables. QTRAN uses limited context-free grammars to analyze user’s question into syntax tree via CYK algorithm. The syntax tree is then converted into an SQL query by using a mapping dictionary to determine names of attributes in Vietnamese, names of attributes in the database and names of individuals stored in these attributes. The PRECISE system (Popescu et al., 2003) maps the natural language ques- tion to a unique semantic interpretation by analyzing some lexicons and semantic constraints. (Stratica et al., 2003) described a template-based system to translate English question into SQL query by matching the syntactic parse of the question to a set of fixed semantic templates. Some other systems based on semantic grammar rules such as Planes (Waltz, 1978), Eufid (Templeton and Burger, 1983). Semantic grammar-based approaches were considered as an engineering methodology, which allows semantic knowledge to be easily included in the system. Annotation-based systems Recently, some question answering systems that used semantic annotations gener- ated high results in natural language question analysis. A well known annotation based framework is GATE (General Architecture for Text Engineering) (Cunning- ham et al., 2002) which have been used in many question answering systems like Ontology-based AquaLog (Lopez et al., 2007) and QuestIO (Damljanovic et al., 2008) systems, and Galea’s open-domain system (Galea, 2003), especially for the natural language question analysis component.
  • 26. 14 Chapter 2. Literature review Aqualog (Lopez et al., 2007) shown in figure 2.2 is an ontology-based question answering system for English and is the basis for the development of our system. A natural language question is mapped to a set of representation based on the inter- mediate triple that is called a Query-Triple through the Linguistic Component by using Java Annotation Patterns Engine (JAPE) grammars in GATE (Cunningham et al., 2002). The Relation Similarity Service takes a Query-Triple and processes it to provide queries with respect to the input ontology called Onto-Triple. Then Aqualog uses Onto-Triple to return an answer for users. Figure 2.2: Aqualog’s architecture. In our experiment, we reported an approach to convert Vietnamese natural lan- guage questions into intermediate representation element in query-tuples (Question- structure, Question-class, Term1, Relation, Term2, Term3) based on semantic annota- tions via JAPE grammars (Nguyen et al., 2009). The selected query-tuple type is more complex aiming to cover a wider variety of question types in different languages. In addition, we proposed a language-independent approach to acquire JAPE rules in a systematic manner which avoids unintended interaction among rules (Nguyen et al., 2011). (Phan and Nguyen, 2010) presented an approach to syntactically and semantically map Vietnamese questions into triple-like of Subject, Verb and Object in also utilizing JAPE grammars. The START (Katz, 1997; Katz et al., 2006) question answering system also used natural language annotations (Katz, 1997) without utilizing GATE. A lexical database WordNet (Fellbaum, 1998) is important natural language application. After the appearance of WordNet, almost question answering systems used it to provide information for analyzing questions. Tải bản FULL (58 trang): https://bit.ly/3daue2I Dự phòng: fb.com/TaiHo123doc.net
  • 27. Chapter 3 Our Question Answering System Architecture In this chapter, we introduce the overview of our first Ontology-based question an- swering system for Vietnamese (in section 3.1). Our system contains a front-end component that performs syntactic and semantic analysis on natural language ques- tions. The back-end component is responsible for making sense of the user’s query with respect to a target ontology using various concept-matching techniques between a natural language phrase and elements in the ontology. The communication between the front-end and back-end is an intermediate representation of the question, which captures the semantic structure of the users’ question. Furthermore, we focus on describing a rule-based approach to directly extract an intermediate representation elements of question via FrameScript scripting language (McGill et al., 2003) (in section 3.2). 3.1 Vietnamese Question Answering System The architecture of our question answering system is shown in figure 3.1. It includes two components: the Natural language question analysis and the Answer retrieval. The question analysis component takes the user’s question as an input and re- turns a query-tuple representing the question in a compact form. The role of this intermediate representation is to provide structured information of the input ques- tion for later processing such as retrieving answers. The answer retrieval component includes two main modules: Ontology mapping 15 Tải bản FULL (58 trang): https://bit.ly/3daue2I Dự phòng: fb.com/TaiHo123doc.net
  • 28. 16 Chapter 3. Our Question Answering System Architecture and Answer extraction. It takes an intermediate representation produced by the question analysis component and an ontology as its input to generate semantic answers. Figure 3.1: Architecture of our question answering system. 3.1.1 Natural language question analysis component 3.1.1.1 Intermediate representation of an input question The intermediate representation used in our approach aims to cover a wider variety of question types. It consists of a question-structure and one or more query-tuple in the following format: ( question-structure, question-class, Term1, Relation, Term2, Term3 ) where Term1 represents a concept (object class), Term2 and Term3, if exist, represent entities (objects), Relation (property) is a semantic constraint between terms in the question. This representation is meant to capture the semantics of the question. Simple questions corresponding to basic constructions only have one query-tuple 6816180