A brief survey presentation about Arabic Question Answering touching the different Natural Language Processing and Information Retrieval Approaches to Question Analysis, Passage Retrieval and Answer Extraction. In addition to the listing of the different NLP tools used in AQA and the Challenges and future trends in this area.
Please if you want to cite this paper you can download it here:
http://www.acit2k.org/ACIT/2012Proceedings/13106.pdf
Arabic is the 6th most wide-spread natural language in the world with more than 350 million native speakers. Arabic question answering systems are gaining great significance due to the increasing amounts of Arabic unstructured content on the Internet and the increasing demand for information that regular information retrieval techniques do not satisfy. Question answering systems generally, and Arabic systems are no exception, hit an upper bound of performance due to the propagation of error in their pipeline. This increases the significance of answer selection and validation systems as they enhance the certainty and accuracy of question answering systems. Very few works tackled the Arabic answer selection and validation problem, and they used the same question answering pipeline without any changes to satisfy the requirements of answer selection and validation. That is why they did not perform adequately well in this task. In this dissertation, a new approach to Arabic answer selection and validation is presented through “ALQASIM”, which is a QA4MRE (Question Answering for Machine Reading Evaluation) system. ALQASIM analyzes the reading test documents instead of the questions, utilizes sentence splitting, root expansion, and semantic expansion using an ontology built from the CLEF 2012 background collections. Our experiments have been conducted on the test-set provided by CLEF 2012 through the task of QA4MRE. This approach led to a promising performance of 0.36 Accuracy and 0.42 C@1, which is double the performance of the best performing Arabic QA4MRE system.
Publications:
http://scholar.google.com/citations?user=XGJiEioAAAAJ&hl=en
https://aast.academia.edu/AhmedMagdy
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
Design an application for a domain specific question answering system. Built a solution for finding answers of factoid questions by using N-gram Mining Approach. Calculated percentage about the related answers for the specific question. Built this application in Java platform.
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
End to-end goal-oriented question answering systems
version 2.0: An updated version with references of the old version (https://www.slideshare.net/QiHe2/kdd-2018-tutorial-end-toend-goaloriented-question-answering-systems).
08/22/2018: The old version was just deleted for reducing the confusion.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
Practical Machine Learning - Part 1 contains:
- Basic notations of ML (what tasks are there, what is a model, how to measure performance)
- A couple of examples of problems and solutions (taken from previous work)
- A brief presentation of open-source software used for ML (R, scikit-learn, Weka)
Arabic is the 6th most wide-spread natural language in the world with more than 350 million native speakers. Arabic question answering systems are gaining great significance due to the increasing amounts of Arabic unstructured content on the Internet and the increasing demand for information that regular information retrieval techniques do not satisfy. Question answering systems generally, and Arabic systems are no exception, hit an upper bound of performance due to the propagation of error in their pipeline. This increases the significance of answer selection and validation systems as they enhance the certainty and accuracy of question answering systems. Very few works tackled the Arabic answer selection and validation problem, and they used the same question answering pipeline without any changes to satisfy the requirements of answer selection and validation. That is why they did not perform adequately well in this task. In this dissertation, a new approach to Arabic answer selection and validation is presented through “ALQASIM”, which is a QA4MRE (Question Answering for Machine Reading Evaluation) system. ALQASIM analyzes the reading test documents instead of the questions, utilizes sentence splitting, root expansion, and semantic expansion using an ontology built from the CLEF 2012 background collections. Our experiments have been conducted on the test-set provided by CLEF 2012 through the task of QA4MRE. This approach led to a promising performance of 0.36 Accuracy and 0.42 C@1, which is double the performance of the best performing Arabic QA4MRE system.
Publications:
http://scholar.google.com/citations?user=XGJiEioAAAAJ&hl=en
https://aast.academia.edu/AhmedMagdy
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
Design an application for a domain specific question answering system. Built a solution for finding answers of factoid questions by using N-gram Mining Approach. Calculated percentage about the related answers for the specific question. Built this application in Java platform.
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
End to-end goal-oriented question answering systems
version 2.0: An updated version with references of the old version (https://www.slideshare.net/QiHe2/kdd-2018-tutorial-end-toend-goaloriented-question-answering-systems).
08/22/2018: The old version was just deleted for reducing the confusion.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
Practical Machine Learning - Part 1 contains:
- Basic notations of ML (what tasks are there, what is a model, how to measure performance)
- A couple of examples of problems and solutions (taken from previous work)
- A brief presentation of open-source software used for ML (R, scikit-learn, Weka)
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Vectors in Search - Towards More Semantic MatchingSimon Hughes
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Vectors in Search - Towards More Semantic MatchingSimon Hughes
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Keyphrase Extraction And Source Code Similarity Detection- A Survey Nakul Sharma
This is the presentation given at chsn2020. For full article please visit the website:https://iopscience.iop.org/article/10.1088/1757-899X/1074/1/012027 or https://doi.org/10.1088/1757-899X/1074/1/012027
The best known natural language processing tool is GPT-3, from OpenAI, which uses AI and statistics to predict the next word in a sentence based on the preceding words. NLP practitioners call tools like this “language models,” and they can be used for simple analytics tasks, such as classifying documents and analyzing the sentiment in blocks of text, as well as more advanced tasks, such as answering questions and summarizing reports. Language models are already reshaping traditional text analytics, but GPT-3 was an especially pivotal language model because, at 10x larger than any previous model upon release, it was the first large language model, which enabled it to perform even more advanced tasks like programming and solving high school–level math problems. The latest version, called InstructGPT, has been fine-tuned by humans to generate responses that are much better aligned with human values and user intentions, and Google’s latest model shows further impressive breakthroughs on language and reasoning.
For businesses, the three areas where GPT-3 has appeared most promising are writing, coding, and discipline-specific reasoning. OpenAI, the Microsoft-funded creator of GPT-3, has developed a GPT-3-based language model intended to act as an assistant for programmers by generating code from natural language input. This tool, Codex, is already powering products like Copilot for Microsoft’s subsidiary GitHub and is capable of creating a basic video game simply by typing instructions. This transformative capability was already expected to change the nature of how programmers do their jobs, but models continue to improve — the latest from Google’s DeepMind AI lab, for example, demonstrates the critical thinking and logic skills necessary to outperform most humans in programming competitions.
Models like GPT-3 are considered to be foundation models — an emerging AI research area — which also work for other types of data such as images and video. Foundation models can even be trained on multiple forms of data at the same time, like OpenAI’s DALL·E 2, which is trained on language and images to generate high-resolution renderings of imaginary scenes or objects simply from text prompts. Due to their potential to transform the nature of cognitive work, economists expect that foundation models may affect every part of the economy and could lead to increases in economic growth similar to the industrial revolution.
This presentation on Opinion Mining is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Reliability is concerned with decreasing faults and their impact. The earlier the faults are detected the better. That's why this presentation talks about automated techniques using machine learning to detect faults as early as possible.
This presentation is about GATE which is a Natural Language Processing Platform That supports many Languages. It also mentions Mimir which is an Indexing server for GATE that enables its users to search in a corpus of documents
This presentation is a briefing of a paper about Networks and Natural Language Processing. It describes many graph based methods and algorithms that help in syntactic parsing, lexical semantics and other applications.
Object Role Modeling (ORM) is a powerful method for designing and querying database models at the conceptual level, where the application is described in non-technical terms.
1. A Survey of Arabic
Question Answering
Challenges, Tasks, Approaches,
Tools, and Future Trends
Ahmed Magdy & Dr. Mohamed Shaheen
ACIT 2012
2. Outline
● Motivation
● Question Answering Tasks
- Question Analysis, Passage Retrieval, and Answer
Extraction
● Arabic Language Challenges
● Approaches
- Stemming, Named Entity Recognition, Language
Resources
● Tools
● Future Trends And Open Issues
3. Motivation
● Arabic is the 6th most important language
● More than 300 million speakers
● Increasing amounts of Arabic content on the
Internet
● Increasing demand for Information
● There is no survey that covers Arabic
Question Answering
5. Question Analysis
● Tokenization & Normalization
● Remove stop words
● Named Entity Recognition (gazetteer, maxent model)
● Stemming all words except Named Entities
● Question Focus determination by extracting the main NE
● Keywords Extraction & Expansion
● Answer type extraction by question words (Name, Place,
Date, Quantity)
● Query generation of keywords into a Boolean formula
● Experiments with cross-language Arabic/English QA
● Not Promising because of Translation Ambiguity
6. Passage Retrieval
● Systems used:
– Salton’s vector space model based systems
– JIRS passage retrieval system
● Ranking retrieved passages according to:
– Answer and Question words Count
– Answer and Question words Association
– Query words weight
– Cosine similarity between documents words and
question words
– Distance Density N-gram Model
7. Answer Extraction
● Ranking candidate answers according to:
– Manual lexical patterns
– Answer Snippet position
– Question Word frequencies in Answer
– Matching using N-grams
– Select answers with NEs of the same expected
answer type
– Semantic similarity between the question’s focus and
the answer
8. Challenges
● Arabic Morphology is highly inflectional
– Many affixes (articles, prepositions, pronouns etc.)
● Arabic Morphology is highly derivational
– 10,000 root and 120 pattern for derivation
● No Capital Letters in Named Entities
– Unlike Latin based languages
● Scarceness of Arabic Language Resources
– corpora, lexicons, and machine-readable dictionaries
9. Approaches
● Stemming
– Removing prefixes, suffixes and infixes from words
– Match root with patterns
– Language dependent rules
– defining the most used affixed statistically
● Named Entity Recognition
– Maxent model or CRF
– ANERcorp and ANERgazet
● Language Resources
– Arabic WordNet
– Arabic Penn Tree Bank
10. Tools
●
NOOJ for Arabic NLP
– C# .NET Freeware linguistic engineering development environment
– Supports Regular Expressions and Context Free Grammars
– Has Arabic Language resources (Sample Text and Dictionary)
●
Amine Platform
– Java platform for intelligent systems and multi-agents
– Used for semantic analysis of questions and answers
– Uses Conceptual Graphs, Knowledge bases, and Ontologies
●
JIRS a Java Passage Retrieval
– Search based on question n-grams
– Based on the Space Vectorial Model
– Simple N-gram Model (SNM)
– Term-weight N-gram Model (TNM)
– Distance N-gram Model
11. Tools [continued]
● Arabic Stemmers
– Khoja Arabic stemmer (With roots dictionary)
– AraMorph (uses Transliteration to English Letters)
– Information Science Research Institute’s (ISRI) stemmer
(without a root dictionary)
● GATE (General Architecture for Text Engineering)
– Java based platform that composes of a tokenizer, a
gazetteer, a sentence splitter, a part of speech tagger, a
named entities transducer and a coreference tagger.
– Plugins for machine learning with Weka, RASP,
MAXENT, SVM Light
– Managing ontologies like WordNet
12. Tools [continued]
● OpenNLP
– NLP tasks like tokenization, sentence segmentation, part-of-
speech tagging, named entity extraction, chunking, parsing,
maximum entropy, perceptron based machine learning, and
coreference resolution
● Stanford NLP
– Java Framework with many NLP modules for:
– Dependency parsers, and a lexicalized PCFG parser
– Part-of-speech (POS) tagger
– CRF-based Named Entity Recognizer
– CRF-based Word Segmenter
– Maxent Text Classifier
– Tokens Regex: regular expressions over tokens
13. Future Trends and Open Issues
● More research on Arabic restricted domain QA
– Makes semantic tasks like word sense disambiguation easier
– Domain rules affects how the question is posed and how the answer
is formulated
– A Restricted domain should be circumscribed, practical, and complex
– E.g. Agriculture, Architectural Engineering or any field of science
– But not news and current events as they have no constraints
● Use of deep application dependent approaches
– use application dependent constraints and rules to guide the question
analysis and answer extraction and validation
– Depending on the available resources
14. Future Trends and Open Issues [continued]
● Intensive usage of semantics
– Arabic QA focused on morpho-syntactic approaches
– Very little used the Arabic Wordnet
– Still a lot to be done in the field of word sense
disambiguation, coreference resolution and ontology
based reasoning
● Use of theorem proving & deep
reasoning
● Use of logic-based and inference-
based approaches
15. Summary
● Motivation
● Question Answering Tasks
- Question Analysis, Passage Retrieval, and Answer
Extraction
● Arabic Language Challenges
● Approaches
- Stemming, Named Entity Recognition, Language
Resources
● Tools
● Future Trends And Open Issues