2. What is Instant Question Answering?
User asks a question in text format and the instantQA
system automatically retrieves or formulates an answer and
presents it back to the user, instantly.
●
3. Why Instant Question Answering?
●
●
●
●
In spite of the continuous progress of search engines, many of
users’ needs still remain unanswered.
While Community Question Answering (e.g. AnA platform) can
feature factoid questions but their primary goal is to satisfy needs
such as: Opinion seeking, Recommendation, Open-ended questions,
Problem solving.
In community question answering user has to wait for answers
which he seeks, even if question is very simple and a mere fact.
Better User Experience : Why browse through search result listings
or related questions when information can be catered upfront.
4. Why Instant Question Answering?
●
CASE : SHIKSHA.COM
●
●
●
Top domains being searched based on Both query logs
and data availability with listings: fees, duration, seats,
application date, application url, affiliation, approval,
entrance exams, placement companies and job salaries.
High number of Fact type questions, which can be
targeted, although we are not targeting opinion based or
open ended questions.
23% of questions belong to these 10 domains out of 1.15L
random sample.
5. Is it something similar to AnA platform?
●
●
Our organization have a discussion forum
called as AnA(Ask and Answer) platform.
InstantQA has no relation what so ever and no
direct usecase with the current AnA forum
contents, as of now.
6. What kind of questions we target?
●
What is the price of X?
●
When is the last date of Y?
●
How much is the fee for W?
●
What is the fee for W?
●
●
What is meaning of life, universe
and everything?
I do not feel like studying, what
to do?
●
Which company hire from
campus Q?
Will I get admission in Z?
●
How to improve my career?
●
●
●
Should I invest in noida?
How is the placement at Z?
●
●
Is Z college in Delhi? (transform
to where)
●
I have purchased X project,
should I sell it now or hold?
Is it beneficial to buy 2bhk in 30
lacs?
7. What kind of questions we target?
●
When is the last date of Y?
ID
How much is the fee for W?
●
What is the fee for W?
TO
●
Which company hire from
campus Q?
FA
C
●
●
●
How is the placement at Z?
Is Z college in Delhi? (transform
to where)
●
What is meaning of life, universe
and everything?
O
N pe
ot n
de en
fin de
ite d.
What is the price of X?
S
●
●
I do not feel like studying, what
to do?
●
Will I get admission in Z?
●
How to improve my career?
●
Should I invest in noida?
●
●
I have purchased X project,
should I sell it now or hold?
Is it beneficial to buy 2bhk in 30
lacs?
8.
9. What is the very basic approach to
instant question answering?
●
General architecture
question
e.g.
What is
Calvad
os?
Question
Classification
and Analysis
/Q is /A
where:/Q=
“(Calvado
s)”
Information
Retrieval
Query=“Calvad
os is”
Text retrieva
l=“…Calvados
is often used in
cooking…
Calvados is a
dry apple
brandy made
in…
Answer
Extraction
/A is : a
dry
apple
brandy
answer
Answe
r
Answer:
/Q is /A:
“Calvad
os” is ”a
dry
apple
brandy”
10. If it is so simple, why haven't you
done it already?
11. There are challenges in QA !
●
●
●
●
●
●
●
Quality of text data.
Language variability (paraphrase)
Knowledge base domain: the answer has to be
supported by the collection, not by the current state
of the world.
How to locate the information given the question
keywords.
It is unlikely that a system will have all necessary
resources pre-computed.
The task requires some deduction or extra linguistic
knowledge.
How does a reasoning system find relevant pieces
of information.
12. Do we have any prior research to
tackle these challeneges?
13. QA research
●
●
Well established over two decades
TREC (Text REtrieval Conference)
●
●
●
CLEF (Cross Language Evaluation Forum)
●
●
●
2001- current
Information Retrieval, language resources
NTCIR (NII Test Collection for IR Systems)
●
●
●
funded by NIST/DARPA since 1992
QA track 1999 – 2007, directed at ‘Factoids’
1997 – current
IR, question answering, summarization, extraction
Our Literature Survey can be accessed at :
http://svn.infoedge.com:8080/Common_Engineering_Projects_Trac/wiki/instant_question_answering#LiteratureSurvey
17. Knowledge base generation: Example
Index
Btech, iit d, fees,
24000, INR
●
PH
AS
E
1
●
●
●
●
●
●
The fees for Btech
course in IIT D is
24000 INR.
The <<fees>> for
<<Btech>> course
in <<IIT D>> is
<<24000 INR>>.
The fees for Btech course in IIT
D is 24000 INR.
The <<fees>> for <<Btech>>
course in <<IIT D>> is <<24000
INR>>.
Fees, Btech, IIT D, 24000
What is the fees of Btech
course at IIT Delhi?
How much is the fees for Btech
Coure from IIT Delhi?
How many INR is the fees of
btech from iit delhi.
What ….........
19. Answer Retreival : Example
Already indexed
knowledge base.
Trained once at
startup.
How much will I pay
for btech from IIT D?
How much will I
<<pay for>>
<<btech>> from
<<IIT D>>?
Rank and prune
best answer based
on collective match.
Focus: How Much
Object : Pay
Class: quanitity to
pay, fees
●
●
Consistency checks
●
You should pay
24000 INR for
Btech from IIT D.
The fees for Btech
from IITD is 24000
INR.
24000 INR should
be paid for Btech
from IIT D.
20. So many boxes !!
Let us check out major components in
brief.
21. A.1. Fact phrase generator from
structured listings
●
Structured listing to factoid text.
●
No need to rely only on user generated sentences.
●
Use basic language model techniques to create
sentences from templates.
<doc>
…..
<college_name>iit</college_name>
<college_id>13213</college_id>
<fee>54000 inr annual</fee>
<location>delhi</location>
…....
</doc>
Language Model
Fee of iit delhi is 54000 inr annual.
22. A.2. Template Generator
●
Start with identifying:
–
–
–
●
Answer Type
Entities in focus
Part of Speech tags
With these tags and language grammar rules, a
factoid/ sentence can be converted into all possible
question forms. (Question Generation QG task)
Fee of iit delhi is 54000 inr annually. Answer type: quantity● What is the fee of iit delhi annually?
● What is the fee of iit delhi
focus: fee
Fee of <II> <LL> is <$$>.
● How much is the fee of iit delhi?
entity : iit + delhi
Fees of <II> <LL> is <$$>.
● Is fee of iit delhi 54000 inr?
Pos tags etc.
Cost of <II> <LL> is <$$>.
23. B.1. Text Preprocessing
●
Short-forms
– i’m, im, i m
– can’t, cant, can t
i am
can not
●
Spelling correction
●
Repeated punctuation (!!!, ???, …)
●
Smilies
●
Salutations (Hi all, Hiya, etc.)
●
Names, signature, course codes
24. B.2. Entity and POS Tagger
●
QER
–
●
Names, locations etc.
Part of Speech Tagger using word sequence
patterns
–
Sequence (noun, verbs, auxiliaries, modifiers)
●
Phrase Chunker
●
Dependency parsing : validate tag relationships
25. B.3. Question Analysis
●
Create features to be used during answer extraction
●
Identify keywords to be matched in document sentences
●
●
●
Identify answer type to match answer candidates. We can
create an inventory of questions and expected answer
types and so we can train a classifier
– Quantity?
– Dates?
– Definition?
Select a list of useful patterns from a pattern repository
Identify question relations which may be used for sentence
analysis, etc.
26. B.4. Query Formulation
●
●
●
●
The question needs to be transformed in a
query to the document retrieval system
Each IR system has its own query language
so we need to perform this mapping
Identify useful keywords; use type of answer
sought, entities to boost etc.
Query Creation : Ordered terms, combined
terms, weighted terms.
27. B.5. Answer Candidate Searcher
●
●
●
Index the <question, qtypes, entities, answer
template> in a training corpus
Retrieve set of n <question, qtypes, entities,
answer template> given a new question
Decide based on the scores of answers
returned the best answer to the new question
29. Where do we need Natural
Language Processing?
●
●
●
●
●
●
●
●
Tokenisation (words, numbers, punctuation, whitespace)
Sentence detection
Part of speech tagging (verbs, nouns, pronouns, etc.)
Query entity recognition
Chunking/Parsing (noun/verb phrases and relationships)
Statistical modelling tools
Dictionaries, word-lists, WordNet , VerbNet
Template generation using grammar rules.
30. So you are telling me there
are readymade nlp tools?
31. NLP tools problems
●
Training data issues
●
●
●
Training domains are completely different.
Local english language: slang, spell, localisation
Sentence detection failures:
●
●
●
Tokenisation failures:
●
●
●
Multiple punctuation ???, !!! (student emphasis)
Abbreviations (im, m.b.a, cant, doesnt, etc.)
POS errors
●
●
Bad style (capitalisation, punctuation)
Ellipsis (i tried... it failed... error message...)
Spelling, grammar
We need to experiment, modify codes and train
on our domain data !
32. What are the use cases of instant QA ?
How does it fit in our system?
33. Interaction
●
If users are not writing good english then try to minimize their
writings. We can focus on capturing user intent with least amount
of typed text.
✔ Auto complete
✔ Guidance
✔ Spell check
✔ Auto correct
✔ Manual feedback on conflicts
✔ Make them write good queries
●
This helps not onle user experience but increases the
accuracy of language based statistical systems.
35. Shiksha : Integration with main
search auto-suggestor
We will already generate
good quality questions.
Could be intigrated here.
36. 99acres
●
●
●
Similar use cases like shiksha.
The real estate domain has more open ended
opinion question and very less factoid
questions.
If a single text box search is introduced in future
–
–
SRP can cater not only listings but also Question
Answers
Instant QA would be really helpful in user experience.
37. And many more other use cases …...
Plus some components of this system will be utilized separately in
improving other existing systems.