Intelligent Natural Language QA System Overview

Intelligent N
I lli Natural L
l Language
System

MANISH JOSHI
RAJENDRA AKERKAR

Open Domain Question Answering

 What is Question Answering?

 How is QA related to IR, IE?

S
Some i
issues related to QA
l d

 Question taxonomies
Q

 General approach to QA

ENLIGHT sys 2 8 July, 2007

Question Answering Systems
These types of systems try to provide exact information
as an answer in response to the natural language query
raised by the user.

Motivation: given a question, system should provide
an answer instead of requiring user to search for the
answer in a set of documents

Example:

Q: What year was Mozart born?
A: Mozart was born in 1756.


Information Retrieval

 Document is the unit of information

 Answers questions indirectly
One has to search into the Document

 Results: (ranked) list based on estimated relevance

Effective approaches are p
pp predominantly statistical
y
(“bag of words”)

Q
QA = ( y short) p
(very ) passage retrieval with natural language
g g g
questions (not queries)


Information Extraction

Task

 Identify messages that fall under a number of specific topics

 Extract information according to p
g pre-defined templates
p

 Place the information into frame-like database records

Limitations

 Templates are hand-crafted by human experts

 Templates are domain dependent and not easily portable


Issues
Applications

 Source of the answers
Structured data — natural language queries on databases
A fixed collection or book — encyclopedia
Web d
b data

 Domain-independent vs. Domain specific
p p

Users

 Casual users vs. Regular users — Profile, History, etc.

 May be maintained for regular users
y g


Question Taxonomy

Factual questions: answer is often found in a text snippet
from one or more documents
Questions that may have yes/no answers
 wh questions ( h where, when, etc.)
h i (who, h h )
 what, which questions are hard
 Questions may be phrased as requests or commands

Questions requiring simple reasoning: Some world
knowledge,
knowledge elementary reasoning may be required to relate
the question with the answer. why, how questions
e.g.
e g How did Socrates die? (by) drinking poisoned wine.
wine


Question Taxonomy

Context questions: Questions have to be answered in the
context of previous interactions with the user
Who assassinated Indira Gandhi?
When did this happen?

List questions: Fusion of partial answers scattered over
several documents is necessary
Ex. - List 3 major rice producing nations.
How do I assemble a bicycle?


QA System Architecture


General Approach

Question analysis: Find type of object that answers question:
"when" -time, date "who" -person, organization, etc.

Document collection preprocessing: Prepare documents
for real-time query processing
q yp g

Document retrieval (IR): Using (augmented) question,
retrieve set of possible relevant documents/passages using IR


General Approach

Document processing (IE): Search documents for entities
of the desired type and in appropriate relations using NLP

Answer extraction and ranking: Extract and rank
candidate answers from the documents

Answer construction: Provide (links to) context, evidence
context evidence,
etc.


Question Analysis

Identify semantic type of the entity sought by the question
 when, where, who — easy to handle
 which, what — ambiguous
e.g. What was the Beatles’ first hit single?
Beatles

Determine additional constraints on the answer entity
key words that will be used to locate candidate
answer-bearing sentences
relations (syntactic/semantic) that should hold between
a candidate answer entity and other entities mentioned
in the question


Document Processing

Preprocessing: Detailed analysis of all texts in the corpus
may b d
be done a priori
i i
one group annotates terms with one of 50 semantic
tags which are indexed along with terms

Retrieval: Initial set of candidate answer bearing documents
answer-bearing
are selected from a large collection
 Boolean retrieval methods may be used profitably
 Passage retrieval may be more appropriate


Document Processing
Analysis:

 P t of speech t
Part f h tagging
i

 Named entity identification: recognizes multi-word
strings as names of companies/persons, locations/addresses,
quantities, etc.

 Shallow/deep syntactic analysis: Obtains information
about syntactic relations, semantic roles


History
MURAX ((Kupiec, 1993 )

was designed to answer questions from the Trivial Pursuit
general-knowledge board game – drawing answers from
Grolier’s on-line encyclopaedia (1990).

Text Retrieval Conference (TREC). TREC was started in 1992
with the aim of supporting information retrieval research by
pp g y
providing the infrastructure necessary for large-scale
evaluation of text retrieval methodologies.

The QA track was first included as part of TREC in 1999 with
seventeen research groups entering one or more systems
systems.


Techniques for performing open-domain question
answering

Manual and automatically constructed question analysers,

Document retrieval specifically for question answering,

Semantic type answer extraction
extraction,

Answer extraction via automatically acquired surface matching text
p
patterns,
,

principled target processing combined with document retrieval for
definition questions,

and various approaches to sentence simplification which aid in the
generation of concise definitions.


Answer Extraction
Look for strings whose semantic type matches that of the
expected answer - matching may include subasumption
(incorporating something under a more general category )

Check additional constraints
 Select a window around matching candidate and
calculate word overlap between window and query;
OR
 Check how many distinct question keywords are found
in a matching sentence order of occurrence, etc.
sentence, occurrence etc

Check syntactic/semantic role of matching candidate

 Semantic Symmetry

 Ambiguous Modification


Semantic Symmetry
Question – Who killed militants?
 Militants killed five innocents in Doda District.
 After 6 hour long encounter army soldiers killed 3
Militants.
We are looking for sentences containing word ‘Militant’ as
subject but we got a sentence where word ‘Militant’ acts as
object (second sentence)

It is a Linguistic Phenomena which occur when an entity acts
as subject in some sentences and as object in another
sentences.


Example

Following Example illustrates the phenomenon of semantic symmetry
and demonstrates problems caused thereof.

Question : Who visited President of India?

Candidate Answer 1: George Bush visited President of India

Candidate Answer 2: President of India visited flood affected area of
Mumbai.

More than one sentences are similar at the word level, but they have
very different meanings.


Some more examples showing semantic symmetry

(1) The birds ate the snake. (1) The snake ate the bird.
(What does snake eat?)

(2) Communists in India are (2) Small parties are supporting
supporting UPA government. Communists in Kerala.
(To whom communists are
supporting?)


Ambiguous Modification

It is a Linguistic Phenomena which occurs when an
adjective in the sentence may modify more than one
noun.
noun
Question : What is the largest volcano in the Solar System?

Candidate Answer 1: In the Solar System, the largest planet
Jupitor has several volcanoes. ---- Wrong

Candidate Answer 2: Olympus Mons, the largest volcano in
the solar system. --- Correct

In first sentence Largest modifies word ‘planet’ whereas in
second sentence Largest modifies word ‘volcano’.


Approaches to tackle the problem

Boris Katz and James Lin of MIT developed a system
SAPERE that handles problems occurring due to semantic
symmetry and ambiguous modification.

These problems occurs at semantic level
level.
To deal with problems occurring at semantic level detailed
information at syntactic level is g
y gathered in all approaches
pp

System developed by Katz and Lin gives results after
utilizing syntactic relations. These typical S-V-O ternary
relations are obtained after processing the information
gat e ed
gathered by Minipar functional depe de cy pa se .
pa u ct o a dependency parser.


Our Approach

To deal with problems at semantic level most of the
approaches available need to obtain and work on
information gathered at syntactic level.

We have proposed a new approach to deal with the
problems caused by Linguistic phenomena of Semantic
Symmetry and Ambiguous Modification.

The Algorithms based on our approach removes wrong
sentences f
t from th answer with th h l of i f
the ith the help f information
ti
obtained at Lexical level (Lexical Analysis).


Algorithm for Handling Semantic Symmetry

Rule 1 -
If (sequence of keywords in question and candidate
answer matches) then
If (POS of verb keyword are same) then
Candidate
C did t answer i Cis Correct
t
Rule 2 -
If (sequence of keywords in question and candidate
answer do not match) then
If (POS verb keyword are not same) then
Candidate answer is Correct
Otherwise -
Candidate Answer is wrong


Algorithm for Handling Ambiguous Modification

We have identified the adjective as Adj, Scope defining noun as SN and the
Identifier noun as IN.

Rules –
If the sentence contains keywords in following order –
Adj α SN Where α indicate string of zero or more
keywords.
Thene
Rule1-a  If α is IN == Correct Answer Or

Rule1-b
Rule1 b  If α is Blank == Correct Answer
Else
Rule 2  If α is Otherwise == Wrong Answer


Algorithm for Handling Ambiguous Modification
(Cont.)

If the sentence contains keywords in following order –
y g
SN α Adj β IN Where α and β indicate string
of zero or more keywords.
Then
Rule 3  If β is Blank
== Correct Answer
(Value f
(V l of α D Does not matter)
t tt )
Else
Rule 4  If β is Otherwise
== Wrong Answer


Working System - ENLIGHT

We have developed a system that answers
questions using ‘keyword based matching
paradigm’.

We have incorporated newly formulated
algorithms in the system and we got good
results.


ENLIGHT System Architecture


Preprocessing
Thi module prepares platform f th I t lli
This d l l tf for the Intelligent and
t d
Effective interface.

This module transfer raw format data into well organized
corpus with the help of following activities.

Keyword Extraction
Sentence Segmentation
Handling of Abbreviations and Punctuation Marks
Tokenization
Stemming
Identifying Group of Words with Specific Meaning
Shallow Parsing
Reference Resolution


Question Analysis
Question T k i i
Q i Tokenization
Question Classification
Corpus M
C Management
t
Various database tables are created to manage the vast data
InfoKeywords
QuestionKeyword
QuestionAnswer
CorpusSentences
Abbreviations
Abb i i
Apostrophes
StopWords

Answer Retrieval
Answer Searching
Answer Generation

Answer Rescoring

Handling problems caused due to linguistic phenomena
using shallow parsing based algorithms

Semantic Symmetry
Ambiguous Modification

Intelligence Incorporation
Learning
Rote Learning g
Feedback
Can Improve
Satisfactory
Wrong Answer
Loose criterion
Automated Classification


Results

Preciseness
P i

Response Time
p

Adaptability


Preciseness

Basic Keyword
ENLIGHT
Matchingg

Average Number of sentences
3 34.6
returned as Answer

Average Number of correct sentences 2.63 6

Average precision 84 % 32 %


Response Time (ENLIGHT Vs Sapere)

Time Required by Time Required by
Type of Data and
QTAG Minipar
No. f
N of words
d
(Used in ENLIGHT) (Used in Sapere)
News extract, Times of
1.71 s 2.88 s
India.
India 202 Words
Reply, START QA
1.89 s 3.11 s
System. 251 Words
Google Search Engine
1.55 s 2.86 s
Result
Yahoo S
Y h Search E i
h Engine
1.67 s 3.13 s
Results
AVERAGE 1.705 s 2.995 s


Adaptability

Handling Additional Keywords

Question like ‘who killed the Prime Minister?’ can
also be handled by ENLIGHT System
y y

Use of synonyms

If the question and answer contains synonyms
ENLIGHT System can associate these two words
using the Learning phase.


References
L. Hirschman, R. Gaizauskas, Natural language question answering: the
view from here, Natural Language engineering, 7(4), December
2001.

Manish Joshi, Rajendra Akerkar, The ENLIGHT System, Intelligent
Natural Language System, Journal of Digital Information
Management, J
M June 2007.


Intelligent Natural Language QA System Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Intelligent Natural Language QA System Overview

Similar to Intelligent Natural Language QA System Overview (20)

More from R A Akerkar

More from R A Akerkar (13)

Recently uploaded

Recently uploaded (20)

Intelligent Natural Language QA System Overview