Explainability for Natural Language ProcessingYunyao Li
Final deck for our popular tutorial on "Explainability for Natural Language Processing" at KDD'2021. See links below for downloadable version (with higher resolution) and recording of the live tutorial.
Title: Explainability for Natural Language Processing
Presenter: Marina Danilevsky, Shipi Dhanorkar, Yunyao Li and Lucian Popa and Kun Qian and Anbang Xu
Website: http://xainlp.github.io/
Recording: https://www.youtube.com/watch?v=PvKOSYGclPk&t=2s
Downloadable version with higher resolution: https://drive.google.com/file/d/1_gt_cS9nP9rcZOn4dcmxc2CErxrHW9CU/view?usp=sharing
@article{kdd2021xaitutorial,
title={Explainability for Natural Language Processing},
author= {Marina Danilevsky, Shipi Dhanorkar and Yunyao Li and Lucian Popa and Kun Qian and Anbang Xu},
journal={KDD},
year={2021}
}
Abstract:
This lecture-style tutorial, which mixes in an interactive literature browsing component, is intended for the many researchers and practitioners working with text data and on applications of natural language processing (NLP) in data science and knowledge discovery. The focus of the tutorial is on the issues of transparency and interpretability as they relate to building models for text and their applications to knowledge discovery. As black-box models have gained popularity for a broad range of tasks in recent years, both the research and industry communities have begun developing new techniques to render them more transparent and interpretable.Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP/knowledge management researchers, our tutorial has two components: an introduction to explainable AI (XAI) in the NLP domain and a review of the state-of-the-art research; and findings from a qualitative interview study of individuals working on real-world NLP projects as they are applied to various knowledge extraction and discovery at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability inNLP. Then, we will discuss explainability for NLP tasks and reporton a systematic literature review of the state-of-the-art literaturein AI, NLP and HCI conferences. The second component reports on our qualitative interview study, which identifies practical challenges and concerns that arise in real-world development projects that require the modeling and understanding of text data.
Compiler Design lab manual for Computer Engineering .pdfkalpana Manudhane
Compiler Design can be divided into two parts: analysis and synthesis. The analysis part breaks up the source program into constituent pieces and imposes a grammatical structure on them. The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table. If we examine the compilation process in more detail, we see that it operates as a sequence of phases-lexical analysis, syntax & semantic analysis, intermediate code generation, code optimization & code generation, each of which transforms one representation of the source program to another.
Practical of Compiler Design Lab for Computer Science & Engineering are divided into two parts- i) implementation using programming language and ii) implementation using tools. In first part, practical can be performed in any programming language such as C language or Python. C language is always preferred. C is the middle level language combining low-level hardware controlling ability and high level programming capabilities. It is the procedural language focusing on functions and pointers.
To implement phases of compiler, various tools are available in Linux system. Lex tool is lexical analyzer generator whereas YACC tool is parser generator. Recent versions of Lex and YACC tools are Flex and Bison respectively. Recently windows version of these tools is available. Flex Windows (Lex and Yacc) contains the GNU Win 32 Ports of Flex and Bison, which are Lex and Yacc Compilers respectively, and are used for generating tokens and parsers.
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...Marlon Dumas
Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate what would be the cycle time of a process if one or more resources became unavailable. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on information collected from stakeholders and empirical observations, or automatically discovered from execution data. Regardless of its origin, a key question when using a BPS model is how to assess its quality. In this paper, we propose a collection of measures to evaluate the quality of a BPS model w.r.t. its ability to replicate the observed behavior of the process. We advocate an approach whereby different measures tackle different process perspectives. We evaluate the ability of the proposed measures to discern the impact of modifications to a BPS model, and their ability to uncover the relative strengths and weaknesses of two approaches for automated discovery of BPS models. The evaluation shows that the measures not only capture how close a BPS model is to the observed behavior, but they also help us to identify sources of discrepancies.
Presentation delivered by David Chapela-Campa at the BPM'2023 conference, Utrecht, September 2023.
Presented as a Tutorial at the 2023 Knowledge Graph Conference, this deck explores different ways that information can be transformed across knowledge portals, from basic RDF structures to the use of SPARQL UPDATE based Workflows. It then explores how ChatGPT can be used to expand upon this transformation capability, and why knowledge portals should be considered transformation engines for graphs.
valohai에서 발표한 2021, State of MLOps 2021 survey 자료를 요약하여 정리한 것입니다. 조직내에서 MLOps 와 관련하여 역할과 팀의 규모, 집중하는 영역, 현재 툴링화 하여 사용하고 있는 영역 등에 대한 100명의 응답자 내용을 정리한 것입니다.
Explainability for Natural Language ProcessingYunyao Li
Final deck for our popular tutorial on "Explainability for Natural Language Processing" at KDD'2021. See links below for downloadable version (with higher resolution) and recording of the live tutorial.
Title: Explainability for Natural Language Processing
Presenter: Marina Danilevsky, Shipi Dhanorkar, Yunyao Li and Lucian Popa and Kun Qian and Anbang Xu
Website: http://xainlp.github.io/
Recording: https://www.youtube.com/watch?v=PvKOSYGclPk&t=2s
Downloadable version with higher resolution: https://drive.google.com/file/d/1_gt_cS9nP9rcZOn4dcmxc2CErxrHW9CU/view?usp=sharing
@article{kdd2021xaitutorial,
title={Explainability for Natural Language Processing},
author= {Marina Danilevsky, Shipi Dhanorkar and Yunyao Li and Lucian Popa and Kun Qian and Anbang Xu},
journal={KDD},
year={2021}
}
Abstract:
This lecture-style tutorial, which mixes in an interactive literature browsing component, is intended for the many researchers and practitioners working with text data and on applications of natural language processing (NLP) in data science and knowledge discovery. The focus of the tutorial is on the issues of transparency and interpretability as they relate to building models for text and their applications to knowledge discovery. As black-box models have gained popularity for a broad range of tasks in recent years, both the research and industry communities have begun developing new techniques to render them more transparent and interpretable.Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP/knowledge management researchers, our tutorial has two components: an introduction to explainable AI (XAI) in the NLP domain and a review of the state-of-the-art research; and findings from a qualitative interview study of individuals working on real-world NLP projects as they are applied to various knowledge extraction and discovery at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability inNLP. Then, we will discuss explainability for NLP tasks and reporton a systematic literature review of the state-of-the-art literaturein AI, NLP and HCI conferences. The second component reports on our qualitative interview study, which identifies practical challenges and concerns that arise in real-world development projects that require the modeling and understanding of text data.
Compiler Design lab manual for Computer Engineering .pdfkalpana Manudhane
Compiler Design can be divided into two parts: analysis and synthesis. The analysis part breaks up the source program into constituent pieces and imposes a grammatical structure on them. The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table. If we examine the compilation process in more detail, we see that it operates as a sequence of phases-lexical analysis, syntax & semantic analysis, intermediate code generation, code optimization & code generation, each of which transforms one representation of the source program to another.
Practical of Compiler Design Lab for Computer Science & Engineering are divided into two parts- i) implementation using programming language and ii) implementation using tools. In first part, practical can be performed in any programming language such as C language or Python. C language is always preferred. C is the middle level language combining low-level hardware controlling ability and high level programming capabilities. It is the procedural language focusing on functions and pointers.
To implement phases of compiler, various tools are available in Linux system. Lex tool is lexical analyzer generator whereas YACC tool is parser generator. Recent versions of Lex and YACC tools are Flex and Bison respectively. Recently windows version of these tools is available. Flex Windows (Lex and Yacc) contains the GNU Win 32 Ports of Flex and Bison, which are Lex and Yacc Compilers respectively, and are used for generating tokens and parsers.
Can I Trust My Simulation Model? Measuring the Quality of Business Process Si...Marlon Dumas
Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate what would be the cycle time of a process if one or more resources became unavailable. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on information collected from stakeholders and empirical observations, or automatically discovered from execution data. Regardless of its origin, a key question when using a BPS model is how to assess its quality. In this paper, we propose a collection of measures to evaluate the quality of a BPS model w.r.t. its ability to replicate the observed behavior of the process. We advocate an approach whereby different measures tackle different process perspectives. We evaluate the ability of the proposed measures to discern the impact of modifications to a BPS model, and their ability to uncover the relative strengths and weaknesses of two approaches for automated discovery of BPS models. The evaluation shows that the measures not only capture how close a BPS model is to the observed behavior, but they also help us to identify sources of discrepancies.
Presentation delivered by David Chapela-Campa at the BPM'2023 conference, Utrecht, September 2023.
Presented as a Tutorial at the 2023 Knowledge Graph Conference, this deck explores different ways that information can be transformed across knowledge portals, from basic RDF structures to the use of SPARQL UPDATE based Workflows. It then explores how ChatGPT can be used to expand upon this transformation capability, and why knowledge portals should be considered transformation engines for graphs.
valohai에서 발표한 2021, State of MLOps 2021 survey 자료를 요약하여 정리한 것입니다. 조직내에서 MLOps 와 관련하여 역할과 팀의 규모, 집중하는 영역, 현재 툴링화 하여 사용하고 있는 영역 등에 대한 100명의 응답자 내용을 정리한 것입니다.
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
apidays LIVE India 2022 - Building the API Banking capabilityapidays
apidays LIVE India 2022: Accelerating India’s digitisation with APIs
May 11 & 12, 2022
Building the API Banking capability
Abhijit Dey, Deputy VP, Product Head API Banking at Axis Bank
------------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Deep dive into the API industry with our reports:
https://www.apidays.global/industry-reports/
Subscribe to our global newsletter:
https://apidays.typeform.com/to/i1MPEW
Torry Harris API and Application Integration Governance FrameworkShubaS4
An API and application integration governance framework should facilitate good governance. It must allow the initiative to evolve, and iteratively present best practices based on results achieved. The Torry Harris API and Application Integration Governance Framework enables cohesive integration across the enterprise such that all elements are connected, rationalized, and organized to provide consistent guidance and incentives that executives and business unit leaders require.
For more information, visit - https://www.torryharris.com/services/api-and-application-integration-governance
this was a web based project developed using HTML(Twitter Bootstrap framework) as the front end, jsp as the server side language and Microsoft sql server as the back end
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
apidays LIVE India 2022 - Building the API Banking capabilityapidays
apidays LIVE India 2022: Accelerating India’s digitisation with APIs
May 11 & 12, 2022
Building the API Banking capability
Abhijit Dey, Deputy VP, Product Head API Banking at Axis Bank
------------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Deep dive into the API industry with our reports:
https://www.apidays.global/industry-reports/
Subscribe to our global newsletter:
https://apidays.typeform.com/to/i1MPEW
Torry Harris API and Application Integration Governance FrameworkShubaS4
An API and application integration governance framework should facilitate good governance. It must allow the initiative to evolve, and iteratively present best practices based on results achieved. The Torry Harris API and Application Integration Governance Framework enables cohesive integration across the enterprise such that all elements are connected, rationalized, and organized to provide consistent guidance and incentives that executives and business unit leaders require.
For more information, visit - https://www.torryharris.com/services/api-and-application-integration-governance
this was a web based project developed using HTML(Twitter Bootstrap framework) as the front end, jsp as the server side language and Microsoft sql server as the back end
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...CSCJournals
Multiple Choice Questions (MCQs) are a popular large-scale assessment tool. MCQs make it much easier for test-takers to take tests and for examiners to interpret their results; however, they are very expensive to compile manually, and they often need to be produced on a large scale and within short iterative cycles. We examine the problem of automated MCQ generation with the help of unsupervised Relation Extraction, a technique used in a number of related Natural Language Processing problems. Unsupervised Relation Extraction aims to identify the most important named entities and terminology in a document and then recognize semantic relations between them, without any prior knowledge as to the semantic types of the relations or their specific linguistic realization. We investigated a number of relation extraction patterns and tested a number of assumptions about linguistic expression of semantic relations between named entities. Our findings indicate that an optimized configuration of our MCQ generation system is capable of achieving high precision rates, which are much more important than recall in the automatic generation of MCQs. Its enhancement with linguistic knowledge further helps to produce significantly better patterns. We furthermore carried out a user-centric evaluation of the system, where subject domain experts from biomedical domain evaluated automatically generated MCQ items in terms of readability, usefulness of semantic relations, relevance, acceptability of questions and distractors and overall MCQ usability. The results of this evaluation make it possible for us to draw conclusions about the utility of the approach in practical e-Learning applications.
Factoid based natural language question generation systemAnimesh Shaw
The proposed system addresses the generation of factoid or wh-questions from sentences in a corpus consisting of factual, descriptive and unbiased details. We discuss our heuristic algorithm for sentence simplification or pre-processing and the knowledge base extracted from previous step is stored in a structured format which assists us in further processing. We further discuss the sentence semantic relation which enables us to construct questions following certain recognizable patterns among different sentence entities, following by the evaluation of question generated.
What is a chatbot, types of chatbots, Chat platforms, Facebook Messenger platform, Mind map of components need to be thought of when we build a chatbot, How to build a chatbot on Messenger platform, different tools and techniques to build chatbot agent.
Information about Bots for Messenger Challenge, Rules and Timeline.
A OpTA é uma sociedade entra a op2B, a TGPowerhouse e a Andersen Tax.Nossa solução permite a otimização da margem de contribuiução / geração de caixa, trabalhando de forma integrada e simultanea TODAS as variáveis que
afetam o custo "Logístico"(Value Chain) - Tributário da empresa.
Customer Centered Product Development – MVPs, Functional Prototypes and Agile...Paul Heirendt
2017 ProductCamp St Louis
Session ID#:36 Customer Centered Product Development – MVPs, functional prototypes and Agile development
Description: Too many companies develop and launch products only to find less than favorable market response.
Involving customers and customer feedback can dramatically improve your results but it is harder than it sounds.
We will explore some best practices around customer centered product development
Session Format: Ask the Expert
Category: Product Management/Development
Submitted by: Paul Heirendt. A former Fortune 120 executive with decades of experience in Product Development, M&A, Sales & Marketing and Startup success Paul is on Twitter as @pheirendt.
Compromisos de gestión escolar y plan anual de trabajo 2017 v191216 (1)Gira ......
Los Compromisos de Gestión Escolar consolidan prácticas de las Instituciones Educativas que buscan generar condiciones para el aprendizaje de los estudiantes.
What changes with the EU Data Protection Regulation for Gambling CompaniesGiulio Coraggio
The General Data Protection Regulation is a massive change for both gaming and gambling operators and suppliers, also introducing sanctions up to 4% of the global turnover of the breaching entity for privacy breaches.
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. This paper will describe some algorithms for identifying semantic relations and constructing an Information Technology Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences. We then extract these sentences based on English pattern in order to build training set. We use a random sample among 245 categories of ACM to evaluate our results. Results generated show that our system yields superior performance.
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information
Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology
is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic
concepts that characterizes the domain as well as their definitions and interrelationships. This paper will
describe some algorithms for identifying semantic relations and constructing an Information Technology
Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed
based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our
algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language
Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences.
We then extract these sentences based on English pattern in order to build training set. We use a
random sample among 245 categories of ACM to evaluate our results. Results generated show that our
system yields superior performance.
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
Considerable research in the field of ontology matching has been performed where information sharing
and reuse becomes necessary in ontology development. Measurement of lexical similarity in ontology
matching is performed using synset, defined in WordNet. In this paper, we defined a Super Word Set,
which is an aggregate set that includes hypernym, hyponym, holonym, and meronym sets in WordNet.
The Super Word Set Similarity is calculated by the rate of words of concept name and synset’s words
inclusion in the Super Word Set. In order to measure of Super Word Set Similarity, we first extracted
Matched Concepts(MC), Matched Properties(MP) and Property Unmatched Concepts(PUC) from the
result of ontology matching. We compared these against two ontology matching tools – COMA++ and
LOM. The Super Word Set Similarity shows an average improvement of 12% over COMA++ and 19%
over LOM.
Architecture of an ontology based domain-specific natural language question a...IJwest
Question answering (QA) system aims at retrieving precise information from a large collection of
documents against a query. This paper describes the architecture of a Natural Language Question
Answering (NLQA) system for a specifi
c domain based on the ontological information, a step towards
semantic web question answering. The proposed architecture defines four basic modules suitable for
enhancing current QA capabilities with the ability of processing complex questions. The first m
odule was
the question processing, which analyses and classifies the question and also reformulates the user query.
The second module allows the process of retrieving the relevant documents. The next module processes the
retrieved documents, and the last m
odule performs the extraction and generation of a response. Natural
language processing techniques are used for processing the question and documents and also for answer
extraction. Ontology and domain knowledge are used for reformulating queries and ident
ifying the
relations. The aim of the system is to generate short and specific answer to the question that is asked in the
natural language in a specific domain. We have achieved 94 % accuracy of natural language question
answering in our implementation
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Theoretical work submitted to the Journal should be original in its motivation or modeling structure. Empirical analysis should be based on a theoretical framework and should be capable of replication. It is expected that all materials required for replication (including computer programs and data sets) should be available upon request to the authors.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
Question Answering (QA) system is one of the ever growing applications in Natural Language Processing. The purpose of Automatic Question and Answer Generation system is to generate all possible questions and its relevant answers from a given unstructured document. Complex sentences are simplified to make question generation easier. The accuracy of the generated questions is measured by identifying the subtopics from the text using Gaussian Mixture Neural Topic Model (GMNTM).The similarity between generated questions and text are calculated using Extended String Subsequence Kernel (ESSK). The syntactic correctness of the questions is measured by Syntactic Tree Kernel which computes the similarity scores between each sentence in the given context and generated questions. Based on the similarity score, questions are ranked. The answers for the generated questions are extracted using Pattern Matching Approach. This system is expected to produce better accuracy when compared with the system using Latent Dirichlet Allocation (LDA) for subtopic identification.
QUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGEijaia
Humans are always in a quest to extract information related to some topic or entity. Question answering system helps user to find the precise answer of the question articulated in natural language. Question answering system provides explicit, concise and accurate answer to user questions rather than providing set of relevant documents or web pages as answers as most of the information retrieval system does. The paper proposes question answering system for Marathi natural language by using concept of ontology as a formal representation of knowledge base for extracting answers. Ontology is used to express domain specific knowledge about semantic relations and restrictions in the given domains. The ontologies are developed with the help of domain experts and the query is analyzed both syntactically and semantically. The results obtained here are accurate enough to satisfy the query raised by the user. The level of accuracy is enhanced since the query is analyzed semantically.
Question and Answer System (QAS) are some of the many challenges for natural language understanding and interfaces. In this paper we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again the question as well as paragraph. A name entity recognizer and a Part of Speech tagger are applied on each of these words to encode necessary of information. After that the text to finally reach at the index of the most probable answer with respect to question. In this the entropy algorithm is used to find the exact answer.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In this research work we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again the question as well as paragraph. Text to finally reach at the index of the most probable answer with respect to question.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...csandit
Extraction of semantic relations from various sourc
es such as corpus, web pages, dictionary
definitions etc. is one of the most important issue
in study of Natural Language Processing
(NLP). Various methods have been used to extract se
mantic relation from various sources.
Pattern-based approach is one of the most popular m
ethod among them. In this study, we
propose a model to extract antonym pairs from Turki
sh corpus automatically. Using a set of
seeds, we automatically extract lexico-syntactic pa
tterns (LSPs) for antonym relation from
corpus. Reliability score is calculated for each pa
ttern. The most reliable patterns are used to
generate new antonym pairs. Study conduct on only a
djective-adjective and noun-noun pairs.
Noun and adjective target words are used to measure
success of method and candidate
antonyms are generated using reliable patterns. For
each antonym pair consisting of candidate
antonym and target word, antonym score is calculat
ed. Pairs that have a certain score are
assigned to antonym pair. The proposed method shows
good performance with 77.2% average
accuracy.
Similar to Automatic multiple choice question generation system for (20)
Cracking the Workplace Discipline Code Main.pptxWorkforce Group
Cultivating and maintaining discipline within teams is a critical differentiator for successful organisations.
Forward-thinking leaders and business managers understand the impact that discipline has on organisational success. A disciplined workforce operates with clarity, focus, and a shared understanding of expectations, ultimately driving better results, optimising productivity, and facilitating seamless collaboration.
Although discipline is not a one-size-fits-all approach, it can help create a work environment that encourages personal growth and accountability rather than solely relying on punitive measures.
In this deck, you will learn the significance of workplace discipline for organisational success. You’ll also learn
• Four (4) workplace discipline methods you should consider
• The best and most practical approach to implementing workplace discipline.
• Three (3) key tips to maintain a disciplined workplace.
Putting the SPARK into Virtual Training.pptxCynthia Clay
This 60-minute webinar, sponsored by Adobe, was delivered for the Training Mag Network. It explored the five elements of SPARK: Storytelling, Purpose, Action, Relationships, and Kudos. Knowing how to tell a well-structured story is key to building long-term memory. Stating a clear purpose that doesn't take away from the discovery learning process is critical. Ensuring that people move from theory to practical application is imperative. Creating strong social learning is the key to commitment and engagement. Validating and affirming participants' comments is the way to create a positive learning environment.
Attending a job Interview for B1 and B2 Englsih learnersErika906060
It is a sample of an interview for a business english class for pre-intermediate and intermediate english students with emphasis on the speking ability.
Discover the innovative and creative projects that highlight my journey throu...dylandmeas
Discover the innovative and creative projects that highlight my journey through Full Sail University. Below, you’ll find a collection of my work showcasing my skills and expertise in digital marketing, event planning, and media production.
What are the main advantages of using HR recruiter services.pdfHumanResourceDimensi1
HR recruiter services offer top talents to companies according to their specific needs. They handle all recruitment tasks from job posting to onboarding and help companies concentrate on their business growth. With their expertise and years of experience, they streamline the hiring process and save time and resources for the company.
Business Valuation Principles for EntrepreneursBen Wann
This insightful presentation is designed to equip entrepreneurs with the essential knowledge and tools needed to accurately value their businesses. Understanding business valuation is crucial for making informed decisions, whether you're seeking investment, planning to sell, or simply want to gauge your company's worth.
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s DholeraAvirahi City Dholera
The Tata Group, a titan of Indian industry, is making waves with its advanced talks with Taiwanese chipmakers Powerchip Semiconductor Manufacturing Corporation (PSMC) and UMC Group. The goal? Establishing a cutting-edge semiconductor fabrication unit (fab) in Dholera, Gujarat. This isn’t just any project; it’s a potential game changer for India’s chipmaking aspirations and a boon for investors seeking promising residential projects in dholera sir.
Visit : https://www.avirahi.com/blog/tata-group-dials-taiwan-for-its-chipmaking-ambition-in-gujarats-dholera/
What is the TDS Return Filing Due Date for FY 2024-25.pdfseoforlegalpillers
It is crucial for the taxpayers to understand about the TDS Return Filing Due Date, so that they can fulfill your TDS obligations efficiently. Taxpayers can avoid penalties by sticking to the deadlines and by accurate filing of TDS. Timely filing of TDS will make sure about the availability of tax credits. You can also seek the professional guidance of experts like Legal Pillers for timely filing of the TDS Return.
LA HUG - Video Testimonials with Chynna Morgan - June 2024Lital Barkan
Have you ever heard that user-generated content or video testimonials can take your brand to the next level? We will explore how you can effectively use video testimonials to leverage and boost your sales, content strategy, and increase your CRM data.🤯
We will dig deeper into:
1. How to capture video testimonials that convert from your audience 🎥
2. How to leverage your testimonials to boost your sales 💲
3. How you can capture more CRM data to understand your audience better through video testimonials. 📊
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
Kseniya Leshchenko: Shared development support service model as the way to ma...Lviv Startup Club
Kseniya Leshchenko: Shared development support service model as the way to make small projects with small budgets profitable for the company (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
The Influence of Marketing Strategy and Market Competition on Business Perfor...
Automatic multiple choice question generation system for
1. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
Automatic Multiple Choice Question Generation System for
Semantic Attributes Using String Similarity Measures
Ibrahim Eldesoky Fattoh
Teacher Assistant at faculty of Information Technology, MUST University, Cairo, Egypt.
Abstract
This research introduces an automatic multiple choice question generation system to evaluate the understanding
of the semantic role labels and named entities in a text. The system provided selects the informative sentence and
the keyword to be asked based on the semantic labels and named entities that exist in the sentence, the distractors
are chosen based on a similarity measure between sentences in the data set. The system is tested using a set of
sentences extracted from the TREC 2007 dataset for question answering. From the experimental results, it can be
induced that the semantic role labeling and named entity recognition approaches could be used as a good
keyword selection mechanism. The second conclusion is that the string similarity measures proved to be a very
good approach that can used in generating the distractors for an automatic multiple choice question. Also,
combining the similarity measures of different algorithms would lead to generate a good distractors.
66
1. Introduction
Developing Automatic Question Generation (AQG) systems became one of the important research issues
because it requires insights from a variety of disciplines, including, Artificial Intelligence (AI), Natural
Language Understanding (NLU), and Natural Language Generation (NLG). There are two types of question
formats; multiple choice questions which asks about a word in a given sentence, the word may be an adjective,
adverb, vocabulary, etc., the second format is the entity questions systems or Text to Text QG that asks about a
word or phrase corresponding to a particular entity in a given sentence. In this research the first type of question
formats is covered. The traditional multiple-choice question is made up of three components, where the sentence
with a gap is defined as the question sentence, the correct choice (removed word) as the key, and the other
alternative choices as the distractors [1].
is the current president of Egypt
(a) H.Mubarak (b) A.ELsisi (c) M.Morsi (d)A.Mansour
The above sentence is an example of multiple choice question, the underline gap represents the word or phrase
that is the correct answer, the four choices represent the true answer and three distractors . This research
introduces a model for a multiple choice question generator that asks about labels extracted from the given
sentence using Semantic Role Labeler (SRL) and entities extracted using Named Entity Recognizer (NER). The
distractors generated for the sentence are chosen based on the string similarity between the question sentence and
all other sentences in the data set. The rest of the paper is organized as follows: section 2 discusses the related
work of Automatic Multiple Choice Questions (AMCQ), section 3 introduces the SRL and NER in brief, section
4 provides the different string text similarity approaches, section 5 introduces the proposed model, and section 6
shows the experimental results and evaluation, and finally section 7 introduces a conclusion and future work
with some remarks.
2. Related work
In this section, a review of the previous Automatic Multiple Question Generation systems for the first question
type formats mentioned in section 1 is introduced.
Authors in [2] proposed an approach for AQG for vocabulary assessment; they generated 6 types of questions:
definition, synonym, antonym, hypernym, hyponym, and cloze questions. They retrieve the data from WordNet
after choosing the correct sense for it. Concerning the distractor choice, the question generation system chooses
distractors of the same part of speech and similar frequency to the correct answer. Four of the six computer-generated
question types were assessed: the definition, synonym, antonym, and cloze questions. The percentage
of questions generated for the four types were above 60% for 156 word list.
The authors of [3] introduced a prototype for an automatic quiz generation system for English text to test learner
comprehension of text content and English skills. They used the semantic network to represent the relationship
between a vocabulary and its context. They proposed two generators for two types of questions. The first
generator is for sense comprehension of adjectives; the generator will extract adjectives from the SemNet of a
given text as questionnaire vocabularies and form multiple-choice cloze questions. The right answer is
substituted by the synonym or a similar adjective of the applied sense of the questionnaire adjective from
WordNet. The second generator is for anaphor comprehension, a learner must integrate these subnets by
connecting each anaphor with its antecedents. The generator identifies the antecedent of an anaphor and form a
2. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
multiple-choice cloze question by scooping the anaphor out of its sentence. The options comprise its antecedent
and the distractors.
The same authors of [3] proposed another research [4] for multiple choice questions for understanding the
evaluation of adjectives in a text. Also based on the sense association among adjectives, an adjective being
examined can be usually substituted by some other adjectives. The system was able to generate three types of
questions: questions for collocations, questions for antonyms, and questions for synonyms. For a given sentence,
the system extracts an adjective-noun pairs that exist, then for each adjective-noun pair, if it is a collocation,
generate a question for it. If the original sentence has words which have negative meanings, generate a question
for antonyms. Also generate questions for synonyms or similar words. The candidates of a substitute are
gathered from WordNet and filtered by web corpus searching. For evaluating the generated questions, they
choose Far East senior high school English textbook, Book One, which contains 12 articles, as the experimental
material. Experimental results have shown that the proposed answer determination approaches and question
filtering strategies are effective in precision.
Another automatic question generation system that can generate gap-fill questions provided by [5]. Syntactic
and lexical features are used in the process of choosing the informative sentence, determining the key, and
finding the distractors. The authors introduced some features as a basis for sentence selection like its position,
common tokens, contains an abbreviation and others. In the key selection, part of speech tagging (POS) used to
generate a list of keys, then selecting the best key from this list depend on three parameters which are; number
of occurrences of the key in the document, does it is a word in the title, and height of the key in the syntactic
tree. The distractor selection depends on some features like Dice coefficient score between gap fill sentence and
the sentence containing the distractor and others. The system was tested using two chapters of the biology book
and has been evaluated manually by two biology students. The sentence selection module takes 0.7 inter
evaluator agreement, the key selection takes 0.75 inter evaluator agreement, and 0.60 are useful gap fill question
which has at least one good distractor.
From this literature review, it can be noted that building an automatic multiple choice question generation system
concerns with three steps, the first is choosing the informative sentence, the second is choosing the key word or
phrase to be the right answer in the multiple choices, and the last is finding the distractors for that key word. In
this research, the informative sentence selection depends on if the sentence contains any named entities or
semantic labels. Also the keys that are chosen will base on the output of semantic role labeling and named entity
recognizer. And the distractors selection will be based on the string based similarity measures as will be
explained in section 4.
3. Semantic Role Labeling (SRL) and Named Entity Recognition (NER)
Semantic role labeling describe WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW etc. for a given
situation, and contribute to the construction of meaning [6], for this reason the natural language processing
community has recently experienced a growth of interest in SRL. SRL has been used in many different
applications like automatic text summarization [6] and automatic question answering [7]. Given a sentence, a
semantic role labeler tries to identify the predicates (relations and actions) and the semantic entities associated
with each of those predicates. The set of semantic roles used in PropBank [8] includes both predicate-specific
roles whose precise meaning are determined by their predicate, and general-purpose adjunct-like modifier roles
whose meaning is consistent across all predicates. The predicate specific roles are Arg0,Arg1, ..., Arg5 and
ArgA. A complete list of the modifier roles as proposed in the PropBank are shown in table 1. Giving a sentence
like
Anders Celsius born in Uppsala in Sweden (1)
The SRL parse would be as seen in (2).
[Andres Celsius /A0] [born /v:] [in Uppsala /AM-Loc] [in Sweden/ AM-Loc] (2)
The relation identified in (2) is the verb (born), the predicate specific roles are (Andres celsius) identified as A0
(Arg 0), is the subject of the verb, and (in Uppsala) identified as AM Location, Also, (in Sweden) identified
semantically as AM Location which is a general purpose adjunct.
67
3. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
Table 1: ProbBank Arguments Roles
Role Meaning
ArgM-LOC Location
ArgM-EXT Extent
ArgM-DIS Discourse connectives
ArgM-ADV Adverbial
ArgM-NEG Negation marker
ArgM-MOD Modal verb
ArgM-CAU Cause
ArgM-TMP Temporal
ArgM-PNC Purpose
ArgM-MNR Manner
ArgM-DIR Direction
ArgM-PRD Secondary prediction
Another set of semantic attributes like persons, organizations, locations, erc.,can be recognized using named
entity recognition systems. Named entity recognition is an essential task in many natural language processing
applications nowadays, and is given much attention in the research community and considerable progress has
been achieved in many domains, such as news wire and biomedical [9]. If we have the sentence in (1), the
output of NER would be like (3).
[Person Andres Celsuis] born [Loc Uppsala] in [Loc Sweden]. (3)
Entity (Andres Celsuis) is identified as person , both entities (Uppsala) and (Sweden) are identified as location.
All these entities could be used as a target by replacing them with gaps, one at a time. The attributes extracted
from both NER and SRL act as the keywords which we search for in the sentence to be asked for are shown in
table 2.
Table 2: Keyword types (labels and entities) selected from the question sentence
Keyword Types Source
<AM-CAUS> SRL
<Person> NER
<AM-LOC> SRL
<Location> NER
<AM-TMP> SRL
<Date> NER
<Time> NER
68
4. Text Similarity Approaches
Text similarity measures play an important role in NLP applications such as text classification, information
retrieval, document clustering, short answer scoring, machine translation, text summarization and others.
Finding the similarity between words is a fundamental step in finding the similarity between sentences and
documents [10]. Words can be lexically similar and semantically similar. The words are lexically similar in
case of they share the similar sequence of characters and they are semantically similar in different cases like if
they have the same thing, are opposite of each other, used in the same context and one is a type of another. In
this research, a set of the string-based similarity algorithms are applied to measure the similarity between the
question sentence and the remaining sentences exist in the knowledge base as a new methodology proposed to
choose the distractors for the keyword asked in a multiple choice question. The string metric is a metric that
measures the similarity or distance between two strings. The string similarity algorithms are divided into two
categories, the first one is the character based similarity algorithms, and the second is the term based similarity
algorithms. In this research, three algorithms of the character based type, and five algorithms of the term based
types were applied to measure the similarity between two sentences. The character based algorithms used are
Smith-Waterman [11], Damerau-Levenshtein [12, 13], and Jaro [14, 15]. The five term-based algorithms applied
are N-gram, Cosine similarity, Dice’s coefficient [16], Jaccard similarity [17], and Block distance [18]. These
algorithms are explained and implemented in SimMetrics package [19]. Figure 1 illustrates the string based
algorithms applied in this research. A survey about these algorithms and text similarity approaches exists in [10].
4. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
Figure 1: applied algorithms in this research
69
5. Proposed model
The automatic multiple choice questions system proposed in this research asks for the semantic roles, and the
named entities exist in a sentence like attributes specified in table 2. At the beginning, a knowledge base is
prepared by extracting the sentences from the used dataset, then parsing them semantically using a semantic role
labeling tool and named entity recognizer for discovering the attributes that exist in the sentence. The SENNA
tool is used for both purposes [20]. The sentence that has any semantic attribute is recorded in the knowledge
base and its attribute is linked with it. To generate a question, the question sentence is chosen from the
knowledge base and the keyword asked for is considered the labeled word or entity word identified by SENNA
tool and is substituted with a gap. The distractors for the key word asked are considered from the other keyword
for the remaining sentences in the knowledge base. To find a distractor a string similarity measure between the
question sentence and all other sentences exist within the knowledge base is applied. Then, 3 keywords are
retrieved, these keywords belong to the sentences that got the highest similarity values. The retrieved three
keywords are considered to be the distractors for the question sentence. Both Algorithm 1 and figure 2 show the
basic steps followed in the proposed model.
5. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
Figure 2: flow diagram of automatic multiple choice question generation system
Algorithm
Build the Knowledge Base by extracting the sentences which have the semantic attributes from the
dataset
Select a question sentence and identify the semantic type of the keyword by parsing it semantically
Foreach question sentence
Measure the similarity between the question sentence and all sentences in the knowledge base
Sort the obtained similarity values.
Return the three sentences that have the highest similarity values
Return three keywords of the three sentences as distractors and identify their types.
70
6. Experimental Results and Discussion:
In this section, the applied experimental results will be explained. The dataset used is the TREC 2007 dataset for
question answering [21]. A set of files of different domain subjects is parsed and 109 sentences are extracted to
be used in testing the proposed model. The semantic attributes for these sentences are similar to types in table 2.
The 109 sentences that are chosen are the sentences yielded a good result from the SENNA tool in retrieving
their semantic attributes. Some sentences are rejected because of their output from the SENNA tool. The
evaluation of both sentence selection and keyword identification depend on the output of the tool used to
6. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
identify semantic attributes of a sentence. In this research out of nearly 145 parsed sentences, there were 109
considered good according to the keywords that are extracted from them. The distractor evaluation is the
important part we tried to contribute in this research, so eight string similarity algorithms are applied trying to
generate a good distractors. In this research we tried to evaluate the question difficulty according to the
distractors generated. The question difficulties levels considered in this research are very difficult, difficult,
intermediate, and easy. These levels are proposed according to the type of the generated distractor word. Each
question has a true answer which is the keyword exists in the question sentence and three distractors which are
generated from the remaining sentences in the knowledge base. To evaluate the usage of the algorithms in
generating the distractors , we suggested four classes for the question difficulty level, the question will be very
difficult if the all the generated distractors have the same type of the keyword, the question will be difficult if
two of the generated distractors have the same type of the keyword, the question will be intermediate if only one
of the generated distractors has the same type of the keyword, and the question will be considered as an easy
question if all generated distractors are of different types other than the key word’s type. For more illustration,
consider the following question sentences in table 3.
Table 3: example of question with different difficulty levels
Difficulty level Question Sentence Key word Choices
Very Difficult was the sixteenth
71
President of the United States
Abraham Lincoln (A) Abraham Lincoln
(B) Barack Obama
(C) Calvin Coolidge
(D) Anders Celsius
Difficult is the sixth largest
country in Europe in terms of
area
Finland (A) Abraham Lincoln
(B) Finland
(C) Russia
(D) Switzerland
Intermediate In Sadat made a
historic visit to Israel, which led
to the 1979 peace treaty in
exchange for the complete
Israeli withdrawal from Sinai
1977 (A) Abraham Lincoln
(B) 1973
(C) 1977
(D) Finland
Easy is the capital of the
Republic of Austria and one of
the nine states of Austria.
Vienna (A) Abraham Lincoln
(B) June 18 1953
(C) Vienna
(D) 1977
According to table 3, the evaluation of the eight algorithms of string similarity is performed and their results are
shown in table 4. The first column of the table shows the number of questions yielded in each class. The 45
appears in the first row for the N-gram algorithm means that the system yielded 45 questions having three
distractors of the same type of the keyword asked.
Table 4: number of sentences obtained in each class of the 8 algorithms
N-gram Smith
Levensh-tein
Jaro Cosine
Dice
coefficien
t
Block
Distance
Jaccard
No of very difficult
questions.
45 42 35 21 41 40 42 42
No of difficult
questions.
36 27 36 33 31 30 29 30
No of intermediate
questions..
21 24 26 34 19 21 20 19
No of easy
questions.
7 16 12 21 18 18 18 18
From table 4, it is clear that N-gram algorithm achieves the highest level of difficulty, it yielded 81 questions in
the top difficult levels (very difficult and difficult), and only 28 questions for the intermediate and easy levels.
Also the Jaro algorithm achieved the highest level of simplicity in the 8 algorithms, it yielded 55 questions in the
7. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
intermediate and easy levels, and 54 in the difficult levels. Another measurement introduced to measure the
difficulty level of the generated questions for each algorithm by the following equation
Difficulty level of questions = ((3 * No.of three + 2 * No.of two + 1 * No.of one) / 109) /3
Where No.of three is the number of questions that has 3 distractors which type is the same as the key word type.
And the same is for No.of two and No.of one. The overall value is divided by 3 at the end of the equation for
normalizing the obtained values to get a percentage value. The value of the difficulty level of questions increases
as the amount of difficult questions increase. Table 5 shows the value of the difficulty level of questions
generated for each algorithm
Table 5: difficulty level of the generated questions for the 8 algorithms
72
N-gram Smith
Levensh
-tein
Jaro Cosine
Dice
coefficien
t
Block
Distance
Jaccard
Difficulty level of
questions
69.7% 62.4% 62.1% 48.6% 62.4% 61.5% 62.4% 62.7%
The output resulted in table 5 shows that N-gram algorithm got the highest value and the Jaro algorithm got the
lowest value which proofs our conclusion about both algorithms before. By considering a useful multiple choice
questions are those which have at least one good distractor, and considering a good distractor is the one which
has the same type as the keyword type. Table 6 shows the percentage of good questions that generated from each
algorithm according to the questions that have at least one good distractor.
Table 6: percentage of good questions for the 8 algorithms
N-gram Smith
Levensh
-tein
Jaro Cosine
Dice
coefficien
t
Block
Distance
Jaccard
Percentage of good
questions
93.6% 85.3% 89% 80.7% 83.5% 83.5% 83.5% 83.5%
It can be noticed from table 6 that the N-gram algorithm got the highest percentage of good question because the
least number of the easy questions it has. Also, the percentage value of all term based algorithms except the N-gram
is equal to 83.5%, and the cause of that is all of these algorithms resulted the same number of easy
questions as shown in table 4. Another evaluation is introduced by combining the best results obtained from the
character based algorithms (Smith Waterman results) with the best results obtained from the term based
algorithms (N-gram results) to enhance the results obtained. Also, combining the results obtained from both (N-gram
algorithm and Jaccard algorithm), the cause of combining the results of these two algorithms is that they
got the highest level of questions value from all 8 algorithms as shown in table 5. Table 7 shows the results
yielded by combining the results of two different algorithms.
Table 7: results of combining results of 2 different algorithms
N-gram+Smith N-gram+Jaccard
No of Very difficult questions. 42 46
No of difficult questions. 30 32
No of intermediate questions.. 26 23
No of easy questions. 11 8
Difficulty level of questions 64.8% 68.8%
Percentage of good questions 89.9% 92.7%
From table 7, it is clear that the values obtained from both (N-gram+Smith) increase the values obtained from
the Smith’s results only in Difficulty level of questions and Percentage of good questions. Combining the N-gram’s
results with Jaccard’s results yielded an increase of the both values compared to Jaccard’s results. Also,
we can notice that N-gram results still gives the best after combination.
7. Conclusion and future work
This research introduced an automatic generation of multiple choice questions based on the semantic attributes in
the question sentence. The semantic attributes are extracted using both semantic role labeling tool and named
entity recognition tool. The distractor generation process introduced based on the string similarity measures
between the question sentence and all other sentences existed in a knowledge base of all sentences in the system.
Eight algorithms of string based similarity are applied for all sentences and the results obtained are analyzed and
introduced with a classification introduced to identify the question difficulty level. All algorithms introduced
promising results in the process of generating distractors specially the N-gram algorithm which introduced the
highest level of difficulty questions. Also, combining the results of more than one algorithm with each other is
8. Computer Engineering and Intelligent Systems www.iiste.org
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.5, No.8, 2014
tried and the output of this process enhances the difficulty level of some algorithms. In the future we could try
semantic similarity measures like corpus-based similarity and knowledge base similarity algorithms. Also, a
prior classification of the sentences in the knowledge base according the key word types could be introduced to
increase the level of difficulty of the generated questions.
References
[1] Chen, C. Y., Liou, H. C., & Chang, J. S. (2006, July). Fast: an automatic generation system for grammar
tests. In Proceedings of the COLING/ACL on Interactive presentation sessions (pp. 1-4). Association for
Computational Linguistics.
[2] Brown, J., Firshkoff, G. And Eskenazi, M. (2005) Automatic Question Generation For Vocabulary
Assessment. Proceedings Of Hlt/Emnlp, 819–826. Vancuver, Canada.
[3] Sung, L., Lin, Y., And Chern, M.(2007). An Automatic Quiz Generation System For English Text. Seventh
Ieee International Conference On Advanced Learning Technologies.
[4] Lin, Y., Sung, L., And Chern, M (2007). An Automatic Multiple-Choice Question Generation Scheme For
English Adjective Understanding. Workshop On Modeling, Management And Generation Of
Problems/Questions In Elearning, The 15th International Conference On Computers In Education (Icce
2007), Pages 137-142, Hiroshima, Japan.
[5] Agarwal , M. And Mannem ,P. (2011). Automatic Gap-Fill Question Generation From Text Books. In
Proceedings Of The 6th Workshop On Innovative Use Of Nlp For Building Educational Applications.
Portland, Or, Usa. Pages 56-64.
[6] Trandabăț, D. Using semantic roles to improve summaries.(2007). In The 13th European Workshop on
73
Natural Language Generation (p. 164).
[7] Pizzato L., and Molla D. (2008). Indexing on Semantic Roles for Question Answering. Coling 2008:
Proceedings of the 2nd workshop on Information Retrieval for Question Answering (IR4QA). pages 74–81.
Manchester, UK.
[8] Palmer M., Gildea D., and Kingsbury P. (2005). The proposition bank: An annotated corpus of semantic
roles. Computational Linguistics, 31(1):71–106.
[9]Tkachenko M., and Simanovisky A. (2012). Named Entity Recognition: Exploring Features. Proceedings of
KONVENS 2012 (Main track: oral presentations), Vienna.
[10] Gomaa W. H. And Fahmy A. A. (2013). A survey of text similarity approaches. International Journal of
Computer Applications, 68(13), 13-18.
[11] Smith, F. T. & Waterman, S. M. (1981). Identification of Common Molecular Subsequences, Journal of
Molecular Biology 147: 195–197.
[12] Hall, P. A. V. & Dowling, G. R. (1980) Approximate string matching, Comput. Surveys, 12:381-402.
[13] Peterson, J. L. (1980). Computer programs for detecting and correcting spelling errors, Comm. Assoc.
Comput. Mach., 23:676-687.
[14] Jaro, M. A. (1989). Advances in record linkage methodology as applied to the 1985 census of
Tampa Florida, Journal of the American Statistical Society, vol. 84, 406, pp 414-420.
[15] Jaro, M. A. (1995). Probabilistic linkage of large public health data file, Statistics in Medicine 14 (5-7),
491-8.
[16] Dice, L. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3).
[17] Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura.
Bulletin dela Société Vaudoise des Sciences Naturelles 37, 547-579.
[18] Eugene F. K. (1987). Taxicab Geometry , Dover. ISBN 0-486-25202-7.
[19] Chapman, S. (2009). Simmetrics: a java & c#. net library of similarity metrics.
[20] Collobert R., Weston J., Bottou L E., Karlen M, Kavukcuoglu K, and Kuksa P. (2011). Natural language
processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493–2537, 2011.
[21] http://trec.nist.gov/tracks.html
9. The IISTE is a pioneer in the Open-Access hosting service and academic event
management. The aim of the firm is Accelerating Global Knowledge Sharing.
More information about the firm can be found on the homepage:
http://www.iiste.org
CALL FOR JOURNAL PAPERS
There are more than 30 peer-reviewed academic journals hosted under the hosting
platform.
Prospective authors of journals can find the submission instruction on the
following page: http://www.iiste.org/journals/ All the journals articles are available
online to the readers all over the world without financial, legal, or technical barriers
other than those inseparable from gaining access to the internet itself. Paper version
of the journals is also available upon request of readers and authors.
MORE RESOURCES
Book publication information: http://www.iiste.org/book/
IISTE Knowledge Sharing Partners
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open
Archives Harvester, Bielefeld Academic Search Engine, Elektronische
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial
Library , NewJour, Google Scholar