Your SlideShare is downloading. ×
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav

2,314

Published on

A brief survey presentation about Arabic Question Answering touching the different Natural Language Processing and Information Retrieval Approaches to Question Analysis, Passage Retrieval and Answer …

A brief survey presentation about Arabic Question Answering touching the different Natural Language Processing and Information Retrieval Approaches to Question Analysis, Passage Retrieval and Answer Extraction. In addition to the listing of the different NLP tools used in AQA and the Challenges and future trends in this area.
Please if you want to cite this paper you can download it here:
http://www.acit2k.org/ACIT/2012Proceedings/13106.pdf

2 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total Views
2,314
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
62
Comments
2
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Survey of Arabic Question AnsweringChallenges, Tasks, Approaches, Tools, and Future Trends Ahmed Magdy & Dr. Mohamed Shaheen ACIT 2012
  • 2. Outline● Motivation● Question Answering Tasks - Question Analysis, Passage Retrieval, and Answer Extraction● Arabic Language Challenges● Approaches - Stemming, Named Entity Recognition, Language Resources● Tools● Future Trends And Open Issues
  • 3. Motivation● Arabic is the 6th most important language● More than 300 million speakers● Increasing amounts of Arabic content on the Internet● Increasing demand for Information● There is no survey that covers Arabic Question Answering
  • 4. Question Answering Tasks● Question Analysis● Passage Retrieval● Answer Extraction
  • 5. Question Analysis● Tokenization & Normalization● Remove stop words● Named Entity Recognition (gazetteer, maxent model)● Stemming all words except Named Entities● Question Focus determination by extracting the main NE● Keywords Extraction & Expansion● Answer type extraction by question words (Name, Place, Date, Quantity)● Query generation of keywords into a Boolean formula● Experiments with cross-language Arabic/English QA● Not Promising because of Translation Ambiguity
  • 6. Passage Retrieval● Systems used: – Salton’s vector space model based systems – JIRS passage retrieval system● Ranking retrieved passages according to: – Answer and Question words Count – Answer and Question words Association – Query words weight – Cosine similarity between documents words and question words – Distance Density N-gram Model
  • 7. Answer Extraction● Ranking candidate answers according to: – Manual lexical patterns – Answer Snippet position – Question Word frequencies in Answer – Matching using N-grams – Select answers with NEs of the same expected answer type – Semantic similarity between the question’s focus and the answer
  • 8. Challenges● Arabic Morphology is highly inflectional – Many affixes (articles, prepositions, pronouns etc.)● Arabic Morphology is highly derivational – 10,000 root and 120 pattern for derivation● No Capital Letters in Named Entities – Unlike Latin based languages● Scarceness of Arabic Language Resources – corpora, lexicons, and machine-readable dictionaries
  • 9. Approaches● Stemming – Removing prefixes, suffixes and infixes from words – Match root with patterns – Language dependent rules – defining the most used affixed statistically● Named Entity Recognition – Maxent model or CRF – ANERcorp and ANERgazet● Language Resources – Arabic WordNet – Arabic Penn Tree Bank
  • 10. Tools● NOOJ for Arabic NLP – C# .NET Freeware linguistic engineering development environment – Supports Regular Expressions and Context Free Grammars – Has Arabic Language resources (Sample Text and Dictionary)● Amine Platform – Java platform for intelligent systems and multi-agents – Used for semantic analysis of questions and answers – Uses Conceptual Graphs, Knowledge bases, and Ontologies● JIRS a Java Passage Retrieval – Search based on question n-grams – Based on the Space Vectorial Model – Simple N-gram Model (SNM) – Term-weight N-gram Model (TNM) – Distance N-gram Model
  • 11. Tools [continued]● Arabic Stemmers – Khoja Arabic stemmer (With roots dictionary) – AraMorph (uses Transliteration to English Letters) – Information Science Research Institute’s (ISRI) stemmer (without a root dictionary)● GATE (General Architecture for Text Engineering) – Java based platform that composes of a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. – Plugins for machine learning with Weka, RASP, MAXENT, SVM Light – Managing ontologies like WordNet
  • 12. Tools [continued]● OpenNLP – NLP tasks like tokenization, sentence segmentation, part-of- speech tagging, named entity extraction, chunking, parsing, maximum entropy, perceptron based machine learning, and coreference resolution● Stanford NLP – Java Framework with many NLP modules for: – Dependency parsers, and a lexicalized PCFG parser – Part-of-speech (POS) tagger – CRF-based Named Entity Recognizer – CRF-based Word Segmenter – Maxent Text Classifier – Tokens Regex: regular expressions over tokens
  • 13. Future Trends and Open Issues● More research on Arabic restricted domain QA – Makes semantic tasks like word sense disambiguation easier – Domain rules affects how the question is posed and how the answer is formulated – A Restricted domain should be circumscribed, practical, and complex – E.g. Agriculture, Architectural Engineering or any field of science – But not news and current events as they have no constraints● Use of deep application dependent approaches – use application dependent constraints and rules to guide the question analysis and answer extraction and validation – Depending on the available resources
  • 14. Future Trends and Open Issues [continued]● Intensive usage of semantics – Arabic QA focused on morpho-syntactic approaches – Very little used the Arabic Wordnet – Still a lot to be done in the field of word sense disambiguation, coreference resolution and ontology based reasoning● Use of theorem proving & deep reasoning● Use of logic-based and inference- based approaches
  • 15. Summary● Motivation● Question Answering Tasks - Question Analysis, Passage Retrieval, and Answer Extraction● Arabic Language Challenges● Approaches - Stemming, Named Entity Recognition, Language Resources● Tools● Future Trends And Open Issues
  • 16. Thank YouYou can view the Full Paper on ACIT 2012 Proceedings

×