SlideShare a Scribd company logo
1 of 4
Download to read offline
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 5, Issue 12, December 2016
All Rights Reserved © 2016 IJARCET 2703

Abstract—Question Answering systems (QASs) do the task of
retrieving text portions from a collection of documents that
contain the answer to the user’s questions. These QASs use a
variety of linguistic tools that be able to deal with small
fragments of text. Therefore, to retrieve the documents which
contains the answer from a large document collections, QASs
employ Information Retrieval (IR) techniques to minimize the
number of documents collections to a treatable amount of
relevant text. In this paper, we propose a model for passage
retrieval model that do this task with a better performance for
the purpose of Arabic QASs. We first segment each the top five
ranked documents returned by the IR module into passages.
Then, we compute the similarity score between the user’s
question terms and each passage. The top five passages (with
high similarity score) are retrieved are retrieved. Finally,
Answer Extraction techniques are applied to extract the final
answer. Our method achieved an average for precision of
87.25%, Recall of 86.2% and F1-measure of 87%.
Index Terms—Answer Extraction ,Information Retrieval,
Passage Retrieval, Question Answering Systems.
I. INTRODUCTION
For the purpose of satisfying user requirements, a user
submit a query to the Information Retrieval (IR) systems, and
the IR system has to return a set of documents arranged by
their connectivity to the user’s query. Several techniques are
used to perform the process of document extraction, but most
of majority of them depends on pattern matching technique
that is based on the number of times that a user’s query terms
occur in each document, in addition to significance or
differentiation value of these terms in the document
collection. On the other hand, Question Answering systems
(QASs) try to improve the output produces by IR systems by
only retrieving small pieces of text that are likely to contain
the information that satisfy the question/query. In fact, QASs
consists of IR and Natural Language Processing(NLP)
techniques to carry out their task. This combination of IR &
NLP allows the minimization of the answer to the lowest
level that allows accurate answer detection and extraction.
However, knowing the fact that NLP techniques are
computationally expensive, QASs need to minimize the
amount of text in times when these techniques need to be
applied. Hence, they normally applied on the IR systems
output [1] that return the most relevant documents to the
user’s query considering that they will contain the required.
Most used IR systems are basically depending on three
Waheeb Ahmed, Department of Information Technology, Kannur
University., Kannur, Kerala, India, 7994702655,
Dr. Babu Anto P, Department of Information Technology, Kannur
University, Kannur, Kerala, India, 9447405362.
models: the cosine model, the pivoted cosine model, and the
probabilistic model [2] [3] [4]. In addition, IR systems
usually apply query expansion techniques that increase their
precision. These techniques may be based on thesaurus [5] or
on the combination of the most occurring terms in the top N
relevant documents [6]. In the literature, there several
Passage Retrieval (PR) systems that have also been proposed
for English language[7][8]. PR systems works on portions of
text so that they can determine the relevance of a document to
a query, besides detecting document fragments that are likely
to contain the required answer(rather than the full document).
Although PR systems use IR-based methods to do their work,
they are considered to be more effective than IR systems for
QA tasks. In this paper, we are proposing a model for PR
system for Arabic QA. The following section briefly discuss
the backgrounds in IR, PR and QA. Section 3 shows the
architecture of IR-n. Section 4 presents the evaluation
accomplished and finally, section 5 details conclusions and
work in progress. Open domain QASs are defined as software
capable of generating the answer to user questions directly
from open domain documents. In other words, systems that
can extract small fragments from texts, in a way it is possible
to infer the answer to a specific question from whose content
of these fragments. As a benefit, these systems attempts to
reduce the amount of time to find a concrete information.
QASs differs from the usual search engines like Google and
Yahoo. Such search engines returns a list of links in response
to a user query and it is the sole responsibility of the user to
go through these links and find the expected answer. In
contrary to these traditional search engines , QA systems tries
to find a precise answer to the user query. For a QA system to
successfully satisfy a user’s request, it is required by the QA
system to understand questions completely. From a linguistic
point of view, “understanding” means to perform several and
typical steps on processing of natural language : lexical,
syntactic and semantic. This processing takes much more
time than the statistical analysis that is performed in IR. In
addition, as QA systems have to deal with much text as done
for IR tasks, and the user requires the answer in a short period
of time, it is obligatory that first, an IR system process the
query and then, the QA system process deals with its output.
In this procedure, the time needed for analysis is highly
decreased[9] .
Answer Extraction and Passage Retrieval for
Question Answering Systems
Waheeb Ahmed, Dr. Babu Anto P
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 5, Issue 12, December 2016
www.ijarcet.org 2704
Question Analysis
Document
Retrieval /IR
Answer Extraction
Natural
Language
Question
Passage Retrieval
Document
Segmentation
Passage Scoring
& Selection
II. METHODOLOGY
Our QA system consists of the following modules:
• Question Analysis.
• Document Retrieval.
• Passage Retrieval/Selection.
• Answer Extraction.
A. Question analysis
Question analysis module receive natural language
questions and converts them into queries for the document
retriever. This process includes mapping the user’s question
into query by employing sophisticated question analysis
techniques to generate complex and structured queries. As a
part of its function, this module also identifies the expected
answer type of a question. For example, the expected answer
type of
“‫اﻷﺧﻼﻗﯿﺎت؟‬ ‫ھﻲ‬ ‫“(”ﻣﺎ‬What is ethology?”) is
DESCRIPTION:defintion. That is, the question is requiring a
descriptive answer of type “definition”. This information
assists in guiding the answer extraction module to apply the
proper technique for extracting the answer. Another
example,"‫أﻋﻼم؟‬ ‫ﺑﺴﺘﺔ‬ ‫ﻋﺎﺷﺖ‬ ‫اﻟﺘﻲ‬ ‫اﻷﻣﺮﯾﻜﯿﺔ‬ ‫اﻟﻮﻻﯾﺔ‬ ‫ھﻲ‬ ‫("ﻣﺎ‬ “What U.S.
state lived under six flags ?”). The expected answer type for
this question is “LOCATION:state”. This indicates that the
question is looking for a location, specifically a “state” name.
B. Document retrieval
This module returns a list of ranked relevant documents
from the collection of documents /corpus. This process
reduces the corpus to a manageable set of documents for
additional processing. We used Vector Space Model(VSM)
[2]for document retrieval for its simplicity and efficiency.
C. Passage retrieval
This module analyze the retrieved set of documents and
returns a list of ranked passages scored with respect to query
terms. This module extracts the passages based on its
similarity with the user’s question. This module works
basically by document segmentation. For segmentation of
documents , our method first identifies sentences in the
document using punctuation marks as separators, and then it
identifies paragraphs by means of separators made by empty
lines. The passages are detected by merging paragraph
boundaries into segments until they satisfy the required
length ,e.g..500 characters). The separate passages do not
have any common content with each other, and the sliding
passages change in size with the difference of one paragraph
boundary, i.e., we start creating a new passage starting from
the initial beginning of each paragraph of the document. If the
boundaries of these paragraph are not detected, then these
adapting(sliding) passages are getting partially overlapped
with each other.
Fig. 1 QA system architecture with Passage Retrieval
The technique for passage retrieval works as follows:
1) Question terms are arranged based on the number of
documents they occur in. Terms that appear in fewer
documents are processed firstly.
2) The documents which contain any term of the
question are selected.
3) The similarity for each passage p (available in the
retrieved/selected documents) with the question q is
calculated using the following formulas:
Tf-Idf= tf x idf (1)
tf(t,d) = 1 + log ft,d (2)
idf=log(1+N/nt) (3)
Where,
Term frequency(tf)=the no. of times a term appears
in a document
inverse-document frequency(idf)=the no. of
documents in which the term appears Only the most
relevant passage of each document is selected for
retrieval. That is, the most similar passage to the
question terms is returned.
4) The passages that are selected are sorted by their
similarity measure. The passage with the highest
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 5, Issue 12, December 2016
All Rights Reserved © 2016 IJARCET 2705
similarity score is put at the top of the list in
descending order.
5) Passages are connected with the document they
belongs to and they are sorted in a ranked list.
6) We used the similarity measure (the cosine measure)
presented in [2]. With one difference is that the size
of each passage (the count of terms) is not used to
normalize the results. In this proposed method we
are using N as the number of documents in the
collection, instead of the number of passages based
on the methodology proposed in [9].
D. Answer extraction
Answer Extraction module process the top ranked passages
for extracting the final answer to the user’s question.
Typically, named-entity recognition technique is used to find
candidate answers that match the question’s named entities.
For questions that are looking for dates a pattern matching
technique is used. Questions that requires descriptive
answers like how & why cosine similarity is used to find the
answers.
III. RESULTS AND EVALUATION
Our method is tested using 200 questions and the corpus
consisted of 500 documents extracted from Arabic Wikipedia.
The measures used for calculating the performance of our
passage retrieval technique are the same which used for
evaluating TREC-10 [10] project:
Precision is a measure of the capability of the system to return
only relevant passages.
Precision=
Recall=
F1-measure=2x
Table 1 The precision, recall and F1 measure after using
the passage retrieval technique.
Question set Precision(%) Recall(%) F1-measure
First set
(50 questions)
84 79 81.4
Second set
(50 questions)
86 92 88.8
Third set
(50 questions)
91 85 87.9
Fourth set
(50 questions)
88 89 88.5
AVERAGE 87.25 86.2 87
Table 1 shows the different sets of questions along with
their scored values by the QA system for precision, recall,
and F1-measure. At bottom of the table is the average value
for each one of them which is considered the average final
performance of our QA system which employs both the
passage retrieval method and answer extraction techniques.
Fig. 2 Performance distribution of our QA system
employing passage retrieval technique.
Fig. 3. Distribution of the average value for precision,
recall and F1-measure.
Fig. 2 shows the performance of our QA systems after
submitting 200 questions in four separate runs. The result
is generally promising comparing to existing QA systems
in different languages like English.
Fig. 3. Shows the average values of precision, recall and
f1-measure which is very promising result and can be used
in the enhancement of Arabic question Answering
systems.
IV. CONCLUSION
The obtained results in this paper shows that it is
possible to highly improve answer passage retrieval and
ranking by incorporating several factors including
document segmentation into passages, measuring
similarity of the passages with the user question and
ranking the candidate answers. Our passage retrieval
technique can be used for developing efficient Arabic QA
ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 5, Issue 12, December 2016
www.ijarcet.org 2706
systems.
REFERENCES
[1] A. Lally, J. M. Prager, M. C. McCord, B. K. Boguraev, S.
Patwardhan, J. Fan P. Fodor and J. Chu-Carroll. ,” Question
analysis: How Watson reads a clue”, IBM Journal of Research &
Development, Vol. 56 ,No. 2, July 2012.
[2] G Salton. “Automatic Text Processing: The Transformation,
Analysis, and Retrieval of Information by Computer” ,Addison
Wesley Publishing, New York. 1989.
[3] A. Singhal, C. Buckley and M. Mitra, “Pivoted document length
normalization.”, Proceedings of the 19th
annual international
ACM- 1996.
[4] G. Venner, S. Walker and S. Okapi ,”Best match System”.
Microcomputer networking in libraries II. Vine, 48, pp. 22-26.,
[5] 1983.
[6] W. Mohammed, A. Basim and A. Adnan, “The effect of using a
thesaurus in Arabic information retrieval system”, International
Journal of Computer Science Issues(IJCSI), Vol. 9, No. 1, pp.
431-435., November 2012.
[7] R. Cummins, & C. O’Riordan, “An axiomatic comparison of
learned term-weighting schemes in information retrieval:
clarifications and extensions”. Artificial Intelligence. Rev., Vol.
28, No. 1, pp. 51–68, 2007.
[8] A. Mahboob and V. Suzan, “Passage Retrieval for Question
Answering using Sliding Windows”, In Proceedings of the 2nd
workshop on Information Retrieval for Question
Answering(Coling 2008), pp. 26-33, August, 2008.
[9] Hong Sun et. al. ,“ Answer Extraction from Passage Graph for
Question Answering”, In Proceedings of the Twenty-Third
International Joint Conference on Artificial Intelligence, pp.
2169-2175, August 2013.
[10] W. Bruce Croft, “Answer models for question answering
passage retrieval”, Proceedings of the 27th annual international
ACM SIGIR conference on Research and development in
information retrieval, pp. 516-517, July 2004.
[11] E. Voorhees,” Overview of the TREC 2001 question answering
track,” Proceedings of the 10th Text Retrieval Conference, pp.
42–52, 2001.

More Related Content

What's hot

Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingeSAT Publishing House
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESkevig
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Miningijsc
 
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...idescitation
 
Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance csandit
 
A report of the work done in this project is available here
A report of the work done in this project is available hereA report of the work done in this project is available here
A report of the work done in this project is available herebutest
 
IRJET- Survey for Amazon Fine Food Reviews
IRJET- Survey for Amazon Fine Food ReviewsIRJET- Survey for Amazon Fine Food Reviews
IRJET- Survey for Amazon Fine Food ReviewsIRJET Journal
 
Automatic document clustering
Automatic document clusteringAutomatic document clustering
Automatic document clusteringIAEME Publication
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Paper id 25201435
Paper id 25201435Paper id 25201435
Paper id 25201435IJRAT
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
 
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...home
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 

What's hot (20)

Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labeling
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
 
Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance Using Cisco Network Components to Improve NIDPS Performance
Using Cisco Network Components to Improve NIDPS Performance
 
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
A report of the work done in this project is available here
A report of the work done in this project is available hereA report of the work done in this project is available here
A report of the work done in this project is available here
 
IRJET- Survey for Amazon Fine Food Reviews
IRJET- Survey for Amazon Fine Food ReviewsIRJET- Survey for Amazon Fine Food Reviews
IRJET- Survey for Amazon Fine Food Reviews
 
Automatic document clustering
Automatic document clusteringAutomatic document clustering
Automatic document clustering
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Paper id 25201435
Paper id 25201435Paper id 25201435
Paper id 25201435
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extraction
 
G04124041046
G04124041046G04124041046
G04124041046
 

Viewers also liked

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Petra Galuscakova
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Asteroïde
AsteroïdeAsteroïde
Asteroïder0308651
 
Airport uses around the country
Airport uses around the countryAirport uses around the country
Airport uses around the countrymblamey
 
Some bush foods, medicines & crafts
Some bush foods, medicines & craftsSome bush foods, medicines & crafts
Some bush foods, medicines & craftsGuurrbi Cooktown
 
Ramses carrillo
Ramses carrilloRamses carrillo
Ramses carrilloramchute
 
Amcult term presentation
Amcult term presentationAmcult term presentation
Amcult term presentationAlan Yu
 
Project bio
Project bioProject bio
Project biogmata1
 
Giao trinh php 2009 vo duy tuan - final
Giao trinh php 2009   vo duy tuan - finalGiao trinh php 2009   vo duy tuan - final
Giao trinh php 2009 vo duy tuan - finalhieusy
 
The rhetoricalsituationpeterson001w
The rhetoricalsituationpeterson001wThe rhetoricalsituationpeterson001w
The rhetoricalsituationpeterson001wnathanaelgrace
 
Politica exterior
Politica exteriorPolitica exterior
Politica exteriorjavier Soto
 
懂我,不難:內向心理學(一)
懂我,不難:內向心理學(一)懂我,不難:內向心理學(一)
懂我,不難:內向心理學(一)Wan Jen Huang
 

Viewers also liked (20)

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Asteroïde
AsteroïdeAsteroïde
Asteroïde
 
Rubric
RubricRubric
Rubric
 
Finding Article Spring 2012
Finding Article Spring 2012Finding Article Spring 2012
Finding Article Spring 2012
 
Airport uses around the country
Airport uses around the countryAirport uses around the country
Airport uses around the country
 
Some bush foods, medicines & crafts
Some bush foods, medicines & craftsSome bush foods, medicines & crafts
Some bush foods, medicines & crafts
 
autism
autismautism
autism
 
Ramses carrillo
Ramses carrilloRamses carrillo
Ramses carrillo
 
Ondeo IS Bedrijfspresentatie
Ondeo IS BedrijfspresentatieOndeo IS Bedrijfspresentatie
Ondeo IS Bedrijfspresentatie
 
Amcult term presentation
Amcult term presentationAmcult term presentation
Amcult term presentation
 
Escepticismo (2) (1)
Escepticismo (2) (1)Escepticismo (2) (1)
Escepticismo (2) (1)
 
Vietnam bus1
Vietnam bus1Vietnam bus1
Vietnam bus1
 
31. cost of bc care
31. cost of bc care31. cost of bc care
31. cost of bc care
 
Project bio
Project bioProject bio
Project bio
 
Workshop Production Live
Workshop Production Live Workshop Production Live
Workshop Production Live
 
Giao trinh php 2009 vo duy tuan - final
Giao trinh php 2009   vo duy tuan - finalGiao trinh php 2009   vo duy tuan - final
Giao trinh php 2009 vo duy tuan - final
 
The rhetoricalsituationpeterson001w
The rhetoricalsituationpeterson001wThe rhetoricalsituationpeterson001w
The rhetoricalsituationpeterson001w
 
Politica exterior
Politica exteriorPolitica exterior
Politica exterior
 
懂我,不難:內向心理學(一)
懂我,不難:內向心理學(一)懂我,不難:內向心理學(一)
懂我,不難:內向心理學(一)
 

Similar to Answer extraction and passage retrieval for

Application of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsApplication of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsijcsa
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMIRJET Journal
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodijcsit
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
An evaluation and overview of indices
An evaluation and overview of indicesAn evaluation and overview of indices
An evaluation and overview of indicesIJCSEA Journal
 
Testing Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting TechniqueTesting Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting Techniquekevig
 
Testing Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting TechniqueTesting Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting Techniquekevig
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONIJCSEA Journal
 
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...ijnlc
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual GuideIRJET Journal
 
Advantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information RetrievalAdvantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information RetrievalOnur Yılmaz
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Using data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningUsing data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningeSAT Journals
 

Similar to Answer extraction and passage retrieval for (20)

Application of hidden markov model in question answering systems
Application of hidden markov model in question answering systemsApplication of hidden markov model in question answering systems
Application of hidden markov model in question answering systems
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
An evaluation and overview of indices
An evaluation and overview of indicesAn evaluation and overview of indices
An evaluation and overview of indices
 
Testing Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting TechniqueTesting Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting Technique
 
Testing Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting TechniqueTesting Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting Technique
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
 
Advantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information RetrievalAdvantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information Retrieval
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
 
Using data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningUsing data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text mining
 
[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
 
Novel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid QuestionsNovel Scoring System for Identify Accurate Answers for Factoid Questions
Novel Scoring System for Identify Accurate Answers for Factoid Questions
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Answer extraction and passage retrieval for

  • 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5, Issue 12, December 2016 All Rights Reserved © 2016 IJARCET 2703  Abstract—Question Answering systems (QASs) do the task of retrieving text portions from a collection of documents that contain the answer to the user’s questions. These QASs use a variety of linguistic tools that be able to deal with small fragments of text. Therefore, to retrieve the documents which contains the answer from a large document collections, QASs employ Information Retrieval (IR) techniques to minimize the number of documents collections to a treatable amount of relevant text. In this paper, we propose a model for passage retrieval model that do this task with a better performance for the purpose of Arabic QASs. We first segment each the top five ranked documents returned by the IR module into passages. Then, we compute the similarity score between the user’s question terms and each passage. The top five passages (with high similarity score) are retrieved are retrieved. Finally, Answer Extraction techniques are applied to extract the final answer. Our method achieved an average for precision of 87.25%, Recall of 86.2% and F1-measure of 87%. Index Terms—Answer Extraction ,Information Retrieval, Passage Retrieval, Question Answering Systems. I. INTRODUCTION For the purpose of satisfying user requirements, a user submit a query to the Information Retrieval (IR) systems, and the IR system has to return a set of documents arranged by their connectivity to the user’s query. Several techniques are used to perform the process of document extraction, but most of majority of them depends on pattern matching technique that is based on the number of times that a user’s query terms occur in each document, in addition to significance or differentiation value of these terms in the document collection. On the other hand, Question Answering systems (QASs) try to improve the output produces by IR systems by only retrieving small pieces of text that are likely to contain the information that satisfy the question/query. In fact, QASs consists of IR and Natural Language Processing(NLP) techniques to carry out their task. This combination of IR & NLP allows the minimization of the answer to the lowest level that allows accurate answer detection and extraction. However, knowing the fact that NLP techniques are computationally expensive, QASs need to minimize the amount of text in times when these techniques need to be applied. Hence, they normally applied on the IR systems output [1] that return the most relevant documents to the user’s query considering that they will contain the required. Most used IR systems are basically depending on three Waheeb Ahmed, Department of Information Technology, Kannur University., Kannur, Kerala, India, 7994702655, Dr. Babu Anto P, Department of Information Technology, Kannur University, Kannur, Kerala, India, 9447405362. models: the cosine model, the pivoted cosine model, and the probabilistic model [2] [3] [4]. In addition, IR systems usually apply query expansion techniques that increase their precision. These techniques may be based on thesaurus [5] or on the combination of the most occurring terms in the top N relevant documents [6]. In the literature, there several Passage Retrieval (PR) systems that have also been proposed for English language[7][8]. PR systems works on portions of text so that they can determine the relevance of a document to a query, besides detecting document fragments that are likely to contain the required answer(rather than the full document). Although PR systems use IR-based methods to do their work, they are considered to be more effective than IR systems for QA tasks. In this paper, we are proposing a model for PR system for Arabic QA. The following section briefly discuss the backgrounds in IR, PR and QA. Section 3 shows the architecture of IR-n. Section 4 presents the evaluation accomplished and finally, section 5 details conclusions and work in progress. Open domain QASs are defined as software capable of generating the answer to user questions directly from open domain documents. In other words, systems that can extract small fragments from texts, in a way it is possible to infer the answer to a specific question from whose content of these fragments. As a benefit, these systems attempts to reduce the amount of time to find a concrete information. QASs differs from the usual search engines like Google and Yahoo. Such search engines returns a list of links in response to a user query and it is the sole responsibility of the user to go through these links and find the expected answer. In contrary to these traditional search engines , QA systems tries to find a precise answer to the user query. For a QA system to successfully satisfy a user’s request, it is required by the QA system to understand questions completely. From a linguistic point of view, “understanding” means to perform several and typical steps on processing of natural language : lexical, syntactic and semantic. This processing takes much more time than the statistical analysis that is performed in IR. In addition, as QA systems have to deal with much text as done for IR tasks, and the user requires the answer in a short period of time, it is obligatory that first, an IR system process the query and then, the QA system process deals with its output. In this procedure, the time needed for analysis is highly decreased[9] . Answer Extraction and Passage Retrieval for Question Answering Systems Waheeb Ahmed, Dr. Babu Anto P
  • 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5, Issue 12, December 2016 www.ijarcet.org 2704 Question Analysis Document Retrieval /IR Answer Extraction Natural Language Question Passage Retrieval Document Segmentation Passage Scoring & Selection II. METHODOLOGY Our QA system consists of the following modules: • Question Analysis. • Document Retrieval. • Passage Retrieval/Selection. • Answer Extraction. A. Question analysis Question analysis module receive natural language questions and converts them into queries for the document retriever. This process includes mapping the user’s question into query by employing sophisticated question analysis techniques to generate complex and structured queries. As a part of its function, this module also identifies the expected answer type of a question. For example, the expected answer type of “‫اﻷﺧﻼﻗﯿﺎت؟‬ ‫ھﻲ‬ ‫“(”ﻣﺎ‬What is ethology?”) is DESCRIPTION:defintion. That is, the question is requiring a descriptive answer of type “definition”. This information assists in guiding the answer extraction module to apply the proper technique for extracting the answer. Another example,"‫أﻋﻼم؟‬ ‫ﺑﺴﺘﺔ‬ ‫ﻋﺎﺷﺖ‬ ‫اﻟﺘﻲ‬ ‫اﻷﻣﺮﯾﻜﯿﺔ‬ ‫اﻟﻮﻻﯾﺔ‬ ‫ھﻲ‬ ‫("ﻣﺎ‬ “What U.S. state lived under six flags ?”). The expected answer type for this question is “LOCATION:state”. This indicates that the question is looking for a location, specifically a “state” name. B. Document retrieval This module returns a list of ranked relevant documents from the collection of documents /corpus. This process reduces the corpus to a manageable set of documents for additional processing. We used Vector Space Model(VSM) [2]for document retrieval for its simplicity and efficiency. C. Passage retrieval This module analyze the retrieved set of documents and returns a list of ranked passages scored with respect to query terms. This module extracts the passages based on its similarity with the user’s question. This module works basically by document segmentation. For segmentation of documents , our method first identifies sentences in the document using punctuation marks as separators, and then it identifies paragraphs by means of separators made by empty lines. The passages are detected by merging paragraph boundaries into segments until they satisfy the required length ,e.g..500 characters). The separate passages do not have any common content with each other, and the sliding passages change in size with the difference of one paragraph boundary, i.e., we start creating a new passage starting from the initial beginning of each paragraph of the document. If the boundaries of these paragraph are not detected, then these adapting(sliding) passages are getting partially overlapped with each other. Fig. 1 QA system architecture with Passage Retrieval The technique for passage retrieval works as follows: 1) Question terms are arranged based on the number of documents they occur in. Terms that appear in fewer documents are processed firstly. 2) The documents which contain any term of the question are selected. 3) The similarity for each passage p (available in the retrieved/selected documents) with the question q is calculated using the following formulas: Tf-Idf= tf x idf (1) tf(t,d) = 1 + log ft,d (2) idf=log(1+N/nt) (3) Where, Term frequency(tf)=the no. of times a term appears in a document inverse-document frequency(idf)=the no. of documents in which the term appears Only the most relevant passage of each document is selected for retrieval. That is, the most similar passage to the question terms is returned. 4) The passages that are selected are sorted by their similarity measure. The passage with the highest
  • 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5, Issue 12, December 2016 All Rights Reserved © 2016 IJARCET 2705 similarity score is put at the top of the list in descending order. 5) Passages are connected with the document they belongs to and they are sorted in a ranked list. 6) We used the similarity measure (the cosine measure) presented in [2]. With one difference is that the size of each passage (the count of terms) is not used to normalize the results. In this proposed method we are using N as the number of documents in the collection, instead of the number of passages based on the methodology proposed in [9]. D. Answer extraction Answer Extraction module process the top ranked passages for extracting the final answer to the user’s question. Typically, named-entity recognition technique is used to find candidate answers that match the question’s named entities. For questions that are looking for dates a pattern matching technique is used. Questions that requires descriptive answers like how & why cosine similarity is used to find the answers. III. RESULTS AND EVALUATION Our method is tested using 200 questions and the corpus consisted of 500 documents extracted from Arabic Wikipedia. The measures used for calculating the performance of our passage retrieval technique are the same which used for evaluating TREC-10 [10] project: Precision is a measure of the capability of the system to return only relevant passages. Precision= Recall= F1-measure=2x Table 1 The precision, recall and F1 measure after using the passage retrieval technique. Question set Precision(%) Recall(%) F1-measure First set (50 questions) 84 79 81.4 Second set (50 questions) 86 92 88.8 Third set (50 questions) 91 85 87.9 Fourth set (50 questions) 88 89 88.5 AVERAGE 87.25 86.2 87 Table 1 shows the different sets of questions along with their scored values by the QA system for precision, recall, and F1-measure. At bottom of the table is the average value for each one of them which is considered the average final performance of our QA system which employs both the passage retrieval method and answer extraction techniques. Fig. 2 Performance distribution of our QA system employing passage retrieval technique. Fig. 3. Distribution of the average value for precision, recall and F1-measure. Fig. 2 shows the performance of our QA systems after submitting 200 questions in four separate runs. The result is generally promising comparing to existing QA systems in different languages like English. Fig. 3. Shows the average values of precision, recall and f1-measure which is very promising result and can be used in the enhancement of Arabic question Answering systems. IV. CONCLUSION The obtained results in this paper shows that it is possible to highly improve answer passage retrieval and ranking by incorporating several factors including document segmentation into passages, measuring similarity of the passages with the user question and ranking the candidate answers. Our passage retrieval technique can be used for developing efficient Arabic QA
  • 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5, Issue 12, December 2016 www.ijarcet.org 2706 systems. REFERENCES [1] A. Lally, J. M. Prager, M. C. McCord, B. K. Boguraev, S. Patwardhan, J. Fan P. Fodor and J. Chu-Carroll. ,” Question analysis: How Watson reads a clue”, IBM Journal of Research & Development, Vol. 56 ,No. 2, July 2012. [2] G Salton. “Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer” ,Addison Wesley Publishing, New York. 1989. [3] A. Singhal, C. Buckley and M. Mitra, “Pivoted document length normalization.”, Proceedings of the 19th annual international ACM- 1996. [4] G. Venner, S. Walker and S. Okapi ,”Best match System”. Microcomputer networking in libraries II. Vine, 48, pp. 22-26., [5] 1983. [6] W. Mohammed, A. Basim and A. Adnan, “The effect of using a thesaurus in Arabic information retrieval system”, International Journal of Computer Science Issues(IJCSI), Vol. 9, No. 1, pp. 431-435., November 2012. [7] R. Cummins, & C. O’Riordan, “An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions”. Artificial Intelligence. Rev., Vol. 28, No. 1, pp. 51–68, 2007. [8] A. Mahboob and V. Suzan, “Passage Retrieval for Question Answering using Sliding Windows”, In Proceedings of the 2nd workshop on Information Retrieval for Question Answering(Coling 2008), pp. 26-33, August, 2008. [9] Hong Sun et. al. ,“ Answer Extraction from Passage Graph for Question Answering”, In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2169-2175, August 2013. [10] W. Bruce Croft, “Answer models for question answering passage retrieval”, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 516-517, July 2004. [11] E. Voorhees,” Overview of the TREC 2001 question answering track,” Proceedings of the 10th Text Retrieval Conference, pp. 42–52, 2001.