The document discusses using support vector machines (SVM) and various lexical, semantic, and syntactic features for question classification. It aims to develop a state-of-the-art machine learning based question classifier. Various features are discussed, including lexical features like n-grams and stemming, syntactic features like question headwords, and semantic features derived from named entity recognition, WordNet senses, and semantic word lists. SVM is used as the classifier to take advantage of its good performance for text classification tasks. The results show that combining these feature types can achieve accurate question classification.
In this research work we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again the question as well as paragraph. Text to finally reach at the index of the most probable answer with respect to question.
Question Classification using Semantic, Syntactic and Lexical featuresIJwest
This document summarizes research on improving question classification accuracy through the use of machine learning and a combination of semantic, syntactic, and lexical features. The researchers tested various classifiers like Naive Bayes, k-Nearest Neighbors, and Support Vector Machines on the UIUC question classification dataset. Their best results were achieved using a Support Vector Machine classifier trained on features including question headwords, hypernyms from WordNet, part-of-speech tags, and word shapes, achieving 96.2% accuracy for coarse-grained and 91.1% for fine-grained classification. This outperformed previous state-of-the-art results, demonstrating that combining semantic and syntactic features with lexical features improves automated question classification.
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
QUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGEijaia
This document discusses a proposed question answering system for the Marathi language that uses ontology as a knowledge base. The system aims to provide accurate answers to user questions in Marathi by analyzing queries semantically using ontologies. Ontologies are developed with help from domain experts and represent domain knowledge through semantic relations. The system first analyzes user questions syntactically and semantically. It then extracts candidate answers from the ontology and generates a precise answer in Marathi language to satisfy the original user query. The use of ontology for semantic analysis is meant to enhance the accuracy of answers provided by the question answering system.
Factoid based natural language question generation systemAnimesh Shaw
The proposed system addresses the generation of factoid or wh-questions from sentences in a corpus consisting of factual, descriptive and unbiased details. We discuss our heuristic algorithm for sentence simplification or pre-processing and the knowledge base extracted from previous step is stored in a structured format which assists us in further processing. We further discuss the sentence semantic relation which enables us to construct questions following certain recognizable patterns among different sentence entities, following by the evaluation of question generated.
OOAD - UML - Class and Object Diagrams - LabVicter Paul
The document discusses class diagrams and object diagrams. It explains that a class diagram shows the structure of a system by displaying classes, interfaces, and their relationships, while an object diagram shows specific instances of classes at a point in time. The document provides steps for constructing class diagrams, such as identifying classes and relationships. It also discusses how object diagrams are created based on class diagrams by instantiating classes and depicting their relationships.
In this research work we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again the question as well as paragraph. Text to finally reach at the index of the most probable answer with respect to question.
Question Classification using Semantic, Syntactic and Lexical featuresIJwest
This document summarizes research on improving question classification accuracy through the use of machine learning and a combination of semantic, syntactic, and lexical features. The researchers tested various classifiers like Naive Bayes, k-Nearest Neighbors, and Support Vector Machines on the UIUC question classification dataset. Their best results were achieved using a Support Vector Machine classifier trained on features including question headwords, hypernyms from WordNet, part-of-speech tags, and word shapes, achieving 96.2% accuracy for coarse-grained and 91.1% for fine-grained classification. This outperformed previous state-of-the-art results, demonstrating that combining semantic and syntactic features with lexical features improves automated question classification.
Resolving the semantics of vietnamese questions in v news qaict systemijaia
Recently we have built a VNewsQA/ICT system which can read the titles of Vietnamese news in the domain
of information and communication technology, then process and use them to answer the Vietnamese
questions of users. The architecture of VNewsQA/ICT system has two main components: 1) the first
component treats the simple Vietnamese sentences as its natural language textual data which is used to
answer the user’s questions; 2) the second component resolves the semantics of Vietnamese questions
which query the system. This paper introduces a semantic representation model and a processing model to
revolve the Vietnamese questions in VNewsQA/ICT system. These semantic representation and processing
models are able to resolve the semantics of eight Vietnamese question classes which are used in our
system.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
QUESTION ANSWERING SYSTEM USING ONTOLOGY IN MARATHI LANGUAGEijaia
This document discusses a proposed question answering system for the Marathi language that uses ontology as a knowledge base. The system aims to provide accurate answers to user questions in Marathi by analyzing queries semantically using ontologies. Ontologies are developed with help from domain experts and represent domain knowledge through semantic relations. The system first analyzes user questions syntactically and semantically. It then extracts candidate answers from the ontology and generates a precise answer in Marathi language to satisfy the original user query. The use of ontology for semantic analysis is meant to enhance the accuracy of answers provided by the question answering system.
Factoid based natural language question generation systemAnimesh Shaw
The proposed system addresses the generation of factoid or wh-questions from sentences in a corpus consisting of factual, descriptive and unbiased details. We discuss our heuristic algorithm for sentence simplification or pre-processing and the knowledge base extracted from previous step is stored in a structured format which assists us in further processing. We further discuss the sentence semantic relation which enables us to construct questions following certain recognizable patterns among different sentence entities, following by the evaluation of question generated.
OOAD - UML - Class and Object Diagrams - LabVicter Paul
The document discusses class diagrams and object diagrams. It explains that a class diagram shows the structure of a system by displaying classes, interfaces, and their relationships, while an object diagram shows specific instances of classes at a point in time. The document provides steps for constructing class diagrams, such as identifying classes and relationships. It also discusses how object diagrams are created based on class diagrams by instantiating classes and depicting their relationships.
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHScsandit
The paper proposes a method to assess short answer written by student using friendship matrix,
representation of friendship graph. The Short Answer is a type of answer which is based on
facts. These answers are quite different from long answers and Multiple Choice Question
(MCQ) type answers. The friendship graph is a graph which is based on friendship condition
i.e. the nodes have only one common neighbor. Friendship matrix is the matrix form of the
friendship graph. The student answer is stored in a friendship matrix and the teacher answer is
stored in another friendship matrix and both the matrices are compared. Based on the number
of errors encountered from student answer an error marks is calculated and that number is
subtracted from full marks to get student grade.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
This document proposes a generalized definition language for implementing an object-based fuzzy class model. It begins by reviewing related work on defining fuzzy classes and identifying limitations in existing approaches. It then summarizes the authors' previous work developing a generalized fuzzy class structure and model. The document introduces several new data types for representing different types of fuzzy attributes. Finally, it proposes a formal definition language for the fuzzy class model that utilizes the new data types to define fuzzy class structure and accurately represent fuzzy data types and attribute values. The language is intended to serve as a data definition language for object-based fuzzy database systems.
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compoundsidescitation
This paper presents a supervised machine learning
approach that uses a decision tree learning algorithm for
recognition of Bengali noun-noun compounds as multiword
expression (M WE) from Bengali corpus. Our proposed
approach to MWE recognition has two steps: (1) extraction of
candidate multi-word expressions using chunk information
and various heuristic rules and (2) training the machine
learning algorithm to recognize a candidate multi-word
expression as Multi-word expression or not. A variety of
association measures have been used as features for
identifying MWEs. The proposed system is tested on a Bengali
corpus for identifying noun-noun compound MWEs from the
corpus.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
OOAD - UML - Sequence and Communication Diagrams - LabVicter Paul
The document discusses interaction diagrams, specifically sequence diagrams and communication diagrams. It explains that interaction diagrams show interactions between objects by depicting the messages exchanged. A sequence diagram emphasizes the time ordering of messages, showing objects arranged from left to right and messages ordered from top to bottom. A communication diagram emphasizes the structural organization of objects, showing them as vertices connected by links along which messages pass. Both diagram types are semantically equivalent but visualize information differently based on their focus. Examples of sequence and communication diagrams are provided for processes like patient admission to a hospital.
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
The document discusses assessing programming language reliability using soft computing techniques like fuzzy logic and genetic algorithms. It proposes using these methods to model programming language reliability based on linguistic variables like "Reliable", "Moderately Reliable", and "Not Reliable". The key factors examined for determining a programming language's reliability include syntax consistency, semantic consistency, error handling, modularity, and documentation. A soft computing system is simulated to evaluate programming languages based on these reliability criteria.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Implementation of Semantic Analysis Using Domain OntologyIOSR Journals
The document describes a semantic analysis system that analyzes feedback from an organization using domain ontology. The system first collects feedback data from students in an unstructured format. It then preprocesses the feedback using part-of-speech tagging to extract meaningful information. The system architecture includes preprocessing the feedback, matching entities in the feedback to an organization ontology using Jaccard similarity, and generating a summarized analysis of the feedback based on the ontology entities. The goal is to group related words and phrases expressed by students under the same entity to produce a meaningful summary for the organization.
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS ijnlc
The first step of processing a question in Question Answering(QA) Systems is to carry out a detailed analysis of the question for the purpose of determining what it is asking for and how to perfectly approach answering it. Our Question analysis uses several techniques to analyze any question given in natural language: a Stanford POS Tagger & parser for Arabic language, a named entity recognizer, tokenizer,
Stop-word removal, Question expansion, Question classification and Question focus extraction components. We employ numerous detection rules and trained classifier using features from this analysis to detect important elements of the question, including: 1) the portion of the question that is a referring to the answer (the focus); 2) different terms in the question that identify what type of entity is being asked for (the
lexical answer types); 3) Question expansion ; 4) a process of classifying the question into one or more of several and different types; and We describe how these elements are identified and evaluate the effect of accurate detection on our question-answering system using the Mean Reciprocal Rank(MRR) accuracy measure.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
This document describes a system for extracting named entities and their relationships from unstructured text data using n-gram features. It uses a hidden Markov model to extract and classify entities into types like person, location, organization. It then uses a conditional random field with kernel approach to detect relationships between the extracted entities. The system takes unstructured text as input, performs preprocessing like tokenization and stop word removal, extracts n-gram, part-of-speech and lexicon features which are then combined and used to train the HMM model to classify entities and CRF model to detect relationships between entities.
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
This document discusses using Naive Bayesian methods for paragraph-level text classification in the Kannada language. It evaluates the performance of the Naive Bayesian and Naive Bayesian Multinomial models on a corpus of 1791 paragraphs from four categories (Commerce, Social Sciences, Natural Sciences, Aesthetics). Dimensionality reduction techniques like removing stop words and words with low term frequency are applied before classification. The results show that the Naive Bayesian Multinomial model outperforms the simple Naive Bayesian approach for paragraph classification in Kannada.
Domain Specific Named Entity Recognition Using Supervised ApproachWaqas Tariq
This paper introduces Named Entity Recognition approach for textual corpus. Supervised Statistical methods are used to develop our system. Our system can be used to categorize NEs belonging to a particular domain for which it is being trained. As Named Entities appears in text surrounded by contexts (words that are left or right of the NE), we will be focusing on extracting NE contexts from text and then perform statistical computing on them. We are using n-gram modeling for extracting contexts from text. Our methodology first extracts left and right tri-grams surrounding NE instances in the training corpus and calculate their probabilities. Then all the extracted tri-grams along with their calculated probabilities are stored in a file. During testing, system detects unrecognized NEs in the testing corpus and categorize them using the tri-gram probabilities calculated during training time. The proposed system consists of two modules namely Knowledge acquisition and NE Recognition. Knowledge acquisition module extracts the tri-grams surrounding NEs in the training corpus and NE Recognition module performs the categorization of Named Entities in the testing corpus.
Question and Answer System (QAS) are some of the many challenges for natural language understanding and interfaces. In this paper we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again the question as well as paragraph. A name entity recognizer and a Part of Speech tagger are applied on each of these words to encode necessary of information. After that the text to finally reach at the index of the most probable answer with respect to question. In this the entropy algorithm is used to find the exact answer.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
This document discusses a computational fluid dynamics (CFD) analysis of a shell and tube heat exchanger with different baffle inclinations. The study aims to determine the optimal baffle inclination angle and mass flow rate. It analyzes heat transfer characteristics for baffle inclinations of 0, 10 and 20 degrees. The results indicate that a helical baffle configuration forces fluid rotation, increasing heat transfer rates and coefficients more than a segmental baffle design. Overall, the CFD simulation allows determination of outlet temperatures, pressure drops, and optimal design parameters for improved heat exchanger performance.
This document summarizes a research paper that proposes a hybrid evolutionary clustering approach for optimized routing in mobile ad hoc networks. It uses particle swarm optimization (PSO) and ant colony optimization (ACO) to perform spatial clustering of nodes. Greedy routing is then used to find routes, and when dead ends are encountered, genetic algorithms are applied to find alternative routes. The approach aims to improve greedy routing performance and recovery from dead ends by avoiding the use of floating nodes. Simulation results showed improved greedy routing and fewer concave nodes compared to other methods.
This document describes a hardware and software codesign for a bus monitoring system using ZigBee and GPRS technologies. The system aims to improve bus operation efficiency and punctuality. It consists of wireless identification devices on each bus, a station monitor at each stop, and a monitoring center. Each device has a unique ID. The station monitor detects bus arrivals and departures via signal strength changes from the bus devices. It sends this data via GPRS to the monitoring center, allowing real-time tracking of bus locations. The low cost of ZigBee devices allows monitoring of all buses and stations. The system design successfully leverages the advantages of both ZigBee and GPRS networks.
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHScsandit
The paper proposes a method to assess short answer written by student using friendship matrix,
representation of friendship graph. The Short Answer is a type of answer which is based on
facts. These answers are quite different from long answers and Multiple Choice Question
(MCQ) type answers. The friendship graph is a graph which is based on friendship condition
i.e. the nodes have only one common neighbor. Friendship matrix is the matrix form of the
friendship graph. The student answer is stored in a friendship matrix and the teacher answer is
stored in another friendship matrix and both the matrices are compared. Based on the number
of errors encountered from student answer an error marks is calculated and that number is
subtracted from full marks to get student grade.
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in present scenario when everything is dependent on machine or everything is computerized, communication between computer and human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all those techniques on different parameters.
This document proposes a generalized definition language for implementing an object-based fuzzy class model. It begins by reviewing related work on defining fuzzy classes and identifying limitations in existing approaches. It then summarizes the authors' previous work developing a generalized fuzzy class structure and model. The document introduces several new data types for representing different types of fuzzy attributes. Finally, it proposes a formal definition language for the fuzzy class model that utilizes the new data types to define fuzzy class structure and accurately represent fuzzy data types and attribute values. The language is intended to serve as a data definition language for object-based fuzzy database systems.
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compoundsidescitation
This paper presents a supervised machine learning
approach that uses a decision tree learning algorithm for
recognition of Bengali noun-noun compounds as multiword
expression (M WE) from Bengali corpus. Our proposed
approach to MWE recognition has two steps: (1) extraction of
candidate multi-word expressions using chunk information
and various heuristic rules and (2) training the machine
learning algorithm to recognize a candidate multi-word
expression as Multi-word expression or not. A variety of
association measures have been used as features for
identifying MWEs. The proposed system is tested on a Bengali
corpus for identifying noun-noun compound MWEs from the
corpus.
The document discusses two neural network models for reading comprehension tasks: the Attentive Reader model proposed by Herman et al. in 2015 and the Stanford Reader model proposed by Chen et al. in 2016. The author implemented a two-layer attention model inspired by these previous models that achieves a 1.5% higher accuracy on reading comprehension tasks compared to the Stanford Reader.
OOAD - UML - Sequence and Communication Diagrams - LabVicter Paul
The document discusses interaction diagrams, specifically sequence diagrams and communication diagrams. It explains that interaction diagrams show interactions between objects by depicting the messages exchanged. A sequence diagram emphasizes the time ordering of messages, showing objects arranged from left to right and messages ordered from top to bottom. A communication diagram emphasizes the structural organization of objects, showing them as vertices connected by links along which messages pass. Both diagram types are semantically equivalent but visualize information differently based on their focus. Examples of sequence and communication diagrams are provided for processes like patient admission to a hospital.
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
The document discusses assessing programming language reliability using soft computing techniques like fuzzy logic and genetic algorithms. It proposes using these methods to model programming language reliability based on linguistic variables like "Reliable", "Moderately Reliable", and "Not Reliable". The key factors examined for determining a programming language's reliability include syntax consistency, semantic consistency, error handling, modularity, and documentation. A soft computing system is simulated to evaluate programming languages based on these reliability criteria.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Implementation of Semantic Analysis Using Domain OntologyIOSR Journals
The document describes a semantic analysis system that analyzes feedback from an organization using domain ontology. The system first collects feedback data from students in an unstructured format. It then preprocesses the feedback using part-of-speech tagging to extract meaningful information. The system architecture includes preprocessing the feedback, matching entities in the feedback to an organization ontology using Jaccard similarity, and generating a summarized analysis of the feedback based on the ontology entities. The goal is to group related words and phrases expressed by students under the same entity to produce a meaningful summary for the organization.
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS ijnlc
The first step of processing a question in Question Answering(QA) Systems is to carry out a detailed analysis of the question for the purpose of determining what it is asking for and how to perfectly approach answering it. Our Question analysis uses several techniques to analyze any question given in natural language: a Stanford POS Tagger & parser for Arabic language, a named entity recognizer, tokenizer,
Stop-word removal, Question expansion, Question classification and Question focus extraction components. We employ numerous detection rules and trained classifier using features from this analysis to detect important elements of the question, including: 1) the portion of the question that is a referring to the answer (the focus); 2) different terms in the question that identify what type of entity is being asked for (the
lexical answer types); 3) Question expansion ; 4) a process of classifying the question into one or more of several and different types; and We describe how these elements are identified and evaluate the effect of accurate detection on our question-answering system using the Mean Reciprocal Rank(MRR) accuracy measure.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
This document describes a system for extracting named entities and their relationships from unstructured text data using n-gram features. It uses a hidden Markov model to extract and classify entities into types like person, location, organization. It then uses a conditional random field with kernel approach to detect relationships between the extracted entities. The system takes unstructured text as input, performs preprocessing like tokenization and stop word removal, extracts n-gram, part-of-speech and lexicon features which are then combined and used to train the HMM model to classify entities and CRF model to detect relationships between entities.
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
This document discusses using Naive Bayesian methods for paragraph-level text classification in the Kannada language. It evaluates the performance of the Naive Bayesian and Naive Bayesian Multinomial models on a corpus of 1791 paragraphs from four categories (Commerce, Social Sciences, Natural Sciences, Aesthetics). Dimensionality reduction techniques like removing stop words and words with low term frequency are applied before classification. The results show that the Naive Bayesian Multinomial model outperforms the simple Naive Bayesian approach for paragraph classification in Kannada.
Domain Specific Named Entity Recognition Using Supervised ApproachWaqas Tariq
This paper introduces Named Entity Recognition approach for textual corpus. Supervised Statistical methods are used to develop our system. Our system can be used to categorize NEs belonging to a particular domain for which it is being trained. As Named Entities appears in text surrounded by contexts (words that are left or right of the NE), we will be focusing on extracting NE contexts from text and then perform statistical computing on them. We are using n-gram modeling for extracting contexts from text. Our methodology first extracts left and right tri-grams surrounding NE instances in the training corpus and calculate their probabilities. Then all the extracted tri-grams along with their calculated probabilities are stored in a file. During testing, system detects unrecognized NEs in the testing corpus and categorize them using the tri-gram probabilities calculated during training time. The proposed system consists of two modules namely Knowledge acquisition and NE Recognition. Knowledge acquisition module extracts the tri-grams surrounding NEs in the training corpus and NE Recognition module performs the categorization of Named Entities in the testing corpus.
Question and Answer System (QAS) are some of the many challenges for natural language understanding and interfaces. In this paper we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again the question as well as paragraph. A name entity recognizer and a Part of Speech tagger are applied on each of these words to encode necessary of information. After that the text to finally reach at the index of the most probable answer with respect to question. In this the entropy algorithm is used to find the exact answer.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
This document discusses a computational fluid dynamics (CFD) analysis of a shell and tube heat exchanger with different baffle inclinations. The study aims to determine the optimal baffle inclination angle and mass flow rate. It analyzes heat transfer characteristics for baffle inclinations of 0, 10 and 20 degrees. The results indicate that a helical baffle configuration forces fluid rotation, increasing heat transfer rates and coefficients more than a segmental baffle design. Overall, the CFD simulation allows determination of outlet temperatures, pressure drops, and optimal design parameters for improved heat exchanger performance.
This document summarizes a research paper that proposes a hybrid evolutionary clustering approach for optimized routing in mobile ad hoc networks. It uses particle swarm optimization (PSO) and ant colony optimization (ACO) to perform spatial clustering of nodes. Greedy routing is then used to find routes, and when dead ends are encountered, genetic algorithms are applied to find alternative routes. The approach aims to improve greedy routing performance and recovery from dead ends by avoiding the use of floating nodes. Simulation results showed improved greedy routing and fewer concave nodes compared to other methods.
This document describes a hardware and software codesign for a bus monitoring system using ZigBee and GPRS technologies. The system aims to improve bus operation efficiency and punctuality. It consists of wireless identification devices on each bus, a station monitor at each stop, and a monitoring center. Each device has a unique ID. The station monitor detects bus arrivals and departures via signal strength changes from the bus devices. It sends this data via GPRS to the monitoring center, allowing real-time tracking of bus locations. The low cost of ZigBee devices allows monitoring of all buses and stations. The system design successfully leverages the advantages of both ZigBee and GPRS networks.
1. ICT (information and communication technologies) have transformed education by facilitating access to information, communication, and new modes of learning. The internet and mobile technologies have expanded opportunities for online, blended, and mobile learning.
2. Students now need digital literacy skills to effectively search, evaluate, and utilize the vast amount of online information. Key 21st century skills identified include digital literacy, inventive thinking, effective communication, and high productivity.
3. ICT have impacted teaching by shifting teachers' roles from knowledge transmitters to learning facilitators. They have also changed student roles from passive recipients to active participants in collaborative learning. Technologies like interactive whiteboards enhance interactive learning.
This document describes the design and implementation of a printed monopole antenna for use in the 2.4-2.4835 GHz Industrial Scientific and Medical (ISM) band. It begins with an introduction to wireless communications in the ISM band and the challenges of developing small, low-cost integrated antennas for these applications. It then outlines the methodology for antenna design which includes calculating dimensions, simulation, observation, and hardware implementation. The design specification and steps taken to design the rectangular microstrip patch antenna are provided, including choosing parameters like resonant frequency, dielectric material, and substrate height. Simulation results for parameters like radiation pattern and bandwidth are analyzed. The document concludes that printed monopole antennas are well-suited for ISM band applications
This document summarizes a study on using a fuzzy total margin based support vector machine (FTM-SVM) approach to handle class imbalance in machine learning classification problems. It discusses how traditional SVM classifiers can overfit to the majority class in imbalanced data sets. The proposed FTM-SVM method aims to address this issue by incorporating a total margin algorithm, different cost functions, and fuzzy membership functions to reduce the effect of outliers and noise on the minority class. The paper evaluates the FTM-SVM approach on artificial and imbalanced data sets, finding it achieves higher performance measures than some existing class imbalance learning methods.
This document describes the simulation of a requester device using VHDL to enable Ethernet communication. The requester device is designed to transmit and receive data through GPIO ports to allow connection to external devices. It consists of GPIO to FIFO and FIFO to GPIO blocks to transfer data between the ports and FIFO memory. The device is simulated using ModelSim software. The simulation demonstrates the forwarding of data from GPIO input to output through the FIFO blocks, showing it can function as a mediator for data transfer required for Ethernet communication platforms.
This document summarizes research on utilizing fly ash to treat domestic wastewater. Fly ash was collected and characterized, then used as a filter media in two containers with thicknesses of 5 cm and 10 cm. Domestic wastewater was treated by passing it through the fly ash filters. Testing showed the 10 cm thick fly ash filter reduced biochemical oxygen demand by 71.48%, chemical oxygen demand by 66.59%, and total solids by 69.02% compared to untreated wastewater. The research concludes that fly ash is effective at removing various impurities from domestic wastewater and is a low-cost option for small-scale wastewater treatment.
This document describes the design and implementation of a printed rectangular monopole antenna for wireless networks. It aims to create a broadband antenna for frequencies like Bluetooth, Wi-Fi, and WiMAX between 2.4-2.4835 GHz. The antenna is printed on a PCB with a rectangular patch and ground plane. It is fed using a microstrip line. The design achieves a bandwidth of 4.1-4.26 GHz through optimization of parameters like patch size and feed length. Both software simulation and hardware implementation are conducted, with the hardware results showing slightly reduced bandwidth compared to simulation. The antenna demonstrates good performance for broadband wireless applications.
This document summarizes a research paper that designed and implemented sphere decoding (SD) for multiple-input multiple-output (MIMO) systems using an FPGA. It used Newton's iterative method to calculate the matrix inverse as part of the SD algorithm, which reduces complexity compared to direct matrix inversion. The authors implemented SD for a 2x2 MIMO system with 4-QAM modulation. Simulation results showed that Newton's method converged after 7 iterations, and SD successfully calculated the minimum Euclidean distance vector.
This document discusses India's smart cities initiative and the role of public-private partnerships. It notes that India's urban population is growing rapidly and current infrastructure cannot support this growth. The government plans to build 100 smart cities to address issues like pollution, congestion, and resource scarcity. Public-private partnerships are seen as key to providing the large investments needed, estimated at over $10 billion per city. PPPs can help develop smart infrastructure, healthcare, mobility, technology and energy systems. The document analyzes how PPPs can ensure quality infrastructure and services to enable smart city development in India.
This document summarizes a proposed architecture for remote patient monitoring using wireless sensor networks. The architecture allows virtual groups to be formed between patients, nurses, and doctors to enable remote analysis of patient data collected by wireless body area networks (WBANs). The patient data is transmitted through an underlying environmental sensor network to members of the virtual group. The proposed architecture addresses challenges of power supply for body sensor networks and quality of service guarantees.
The document compares the design of circular and square water tanks using the working stress method and limit state method. It was found that:
1) The limit state method requires less steel than the working stress method for both circular and square tank designs.
2) A circular tank design is more economical than a square tank design due to requiring less steel.
3) The limit state method results in a more rational and economical design compared to the traditional working stress method.
This document summarizes research on evaluating WiMAX network performance using vertical handoff. It describes the setup used, which includes 8 base stations to test handoff as a mobile station moves between cells. Graphs show the mobile station's throughput drops slightly during handoff, with maximum delay of 0.025 seconds. Vertical handoff between WiMAX and WLAN networks is also tested, with the document observing a smooth handoff between the networks as the mobile nodes move between their coverage areas.
This document analyzes the performance of a diesel engine fueled with blends of biodiesel derived from Cashew Nut Shell Liquid (CNSL) and ethanol. Experiments were conducted with diesel and blends containing 10%, 15%, 20% CNSL, as well as blends with 5% and 10% ethanol added to the 15% CNSL blend. Performance parameters like brake thermal efficiency, fuel consumption, emissions were measured and compared across fuel blends and to diesel. Results showed the 15% CNSL blend performed better than other blends, while adding ethanol reduced performance due to its lower energy content. This research evaluates CNSL biodiesel and its blends as potential alternatives to conventional diesel
This document summarizes an innovative routing algorithm called AntHocNet for mobile ad hoc networks. AntHocNet combines aspects of ant colony optimization and information bootstrapping to address the challenges of routing in dynamic mobile networks. Key elements of AntHocNet include the use of both reactive and proactive routing components, combining ant-based path sampling with a lightweight bootstrapping process to update routing information, and using a composite pheromone metric to guide path selection. The document evaluates the performance impacts of these different design components through simulation studies.
This document describes how to hack into a target machine using social engineering and SSH. It involves using Nmap to scan the target machine and find open ports, then using Hydra to brute force common username and password combinations to gain SSH access. Once logged in via SSH, the hacker can explore the system but does not have root privileges. The document provides steps to gain root access including viewing the /etc/passwd file to find the root username and attempt to su to gain root privileges on the target machine.
This document proposes a model for effectively gathering requirements from multiple sites within an organization. Requirements gathering is an important but challenging part of software development, made more difficult when requirements must be collected from different organizational units/sites. The proposed model is an iterative process that involves collecting requirements from each site for a given module, checking for contradictions or ambiguities, validating the practicality of requirements, and reconciling any issues found before moving to the next module. This process continues until all requirements from all sites have been gathered in a consistent, unambiguous manner.
This document summarizes a study on the foreign exchange exposure of Indian corporate firms from 2009 to 2013. The study estimated foreign exchange exposure using ordinary least squares regression with various trade-weighted exchange rate indices. Key findings include:
1) Foreign exchange exposure was estimated for a sample of 27 non-financial Indian firms using different exchange rate indices.
2) Exposure was measured as the sensitivity of stock returns to changes in exchange rates based on the model developed by Adler and Dumas.
3) Preliminary results found Indian firms may benefit from appreciation of the home currency and lose from depreciation, but overall exposure was weakly significant.
This study examined the tensile behavior of ferrocement composite panels with varying numbers of wire mesh layers and inclusion of steel fibers. 36 panels were cast and tested under direct tension. Panels were divided into groups based on number of mesh layers (1 to 6 layers) and use of steel fibers. Testing found that ultimate load, elongation and tensile strength increased with additional mesh layers due to higher reinforcement volume fraction. Panels with steel fibers exhibited 10-17% higher strength than non-fiber panels. Failure occurred through cracking perpendicular to the load direction. The study concluded that ferrocement properties directly correlate to the number of reinforcing mesh layers.
This document summarizes a project on sentiment analysis of tweets using lexicon-based approaches. It discusses tokenization, stop word removal, stemming, lemmatization, and lexicon-based sentiment analysis. Naive Bayes algorithms are also covered, explaining how they work and their applications, which include real-time prediction, text classification, and recommendation systems. Tools used for the analysis include Anaconda and Spider.
Question Classification using Semantic, Syntactic and Lexical featuresdannyijwest
This document summarizes research on improving question classification accuracy through the use of machine learning and a combination of semantic, syntactic, and lexical features. The researchers tested various classifiers like Naive Bayes, k-Nearest Neighbors, and Support Vector Machines on the UIUC question classification dataset. Their best results were achieved using a Support Vector Machine classifier trained on features including question headwords, hypernyms from WordNet, part-of-speech tags, and word shapes, achieving 96.2% accuracy for coarse-grained and 91.1% for fine-grained classification. This outperformed previous state-of-the-art results, demonstrating that combining semantic and syntactic features with lexical features improves automated question classification.
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
The document proposes using statistical machine translation via non-negative matrix factorization to address word ambiguity and mismatch problems in question retrieval for community question answering systems. It translates questions into other languages using Google Translate to leverage contextual information, representing the original and translated questions together in a matrix. Experimental results on a real CQA dataset show this approach improves over methods relying only on surface text matching.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
The document describes a system that automatically generates questions and answers from an unstructured document. It involves several steps: (1) simplifying complex sentences, (2) generating initial questions using named entities and semantic role labeling, (3) identifying subtopics using LDA and GMNTM models, (4) measuring syntactic correctness of questions, and (5) extracting answers using pattern matching. The system is expected to produce more accurate results compared to using only LDA for subtopic identification, as GMNTM also considers word order and semantics. Key techniques include semantic role labeling, Extended String Subsequence Kernel for similarity measurement, and syntactic tree kernel for question ranking.
Open domain question answering system using semantic role labelingeSAT Publishing House
1. The document describes a proposed open domain question answering system that uses semantic role labeling to extract answers from documents retrieved from the web.
2. The system consists of three modules: question processing, document retrieval, and answer extraction. Semantic role labeling is used in the answer extraction module to identify answers based on the question type.
3. An evaluation of the proposed system showed it achieved higher accuracy compared to a baseline system using only pattern matching for answer extraction.
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields collecting review from people about products and social and political events through the web. Currently, Sentiment Analysis concentrates for subjective statements or on subjectivity and overlook objective statements which carry sentiment(s). During the sentiment classification more challenging problem are faced due to the ambiguous sense of words, negation words and intensifier. Due to its importance the correct sense of target word is extracted and determined for which the similarity arise in WordNet Glosses. This paper presents a survey covering the techniques and methods in sentiment analysis and challenges appear in the field.
NLP Techniques for Question Answering.docxKevinSims18
Natural Language Processing (NLP) is a field of study that deals with the interaction between computers and human language. One of the most important applications of NLP is question answering, which involves the automatic answering of questions written in natural language. In this blog post, we will explore some of the NLP techniques used for question answering.
Architecture of an ontology based domain-specific natural language question a...IJwest
The document summarizes the architecture of an ontology-based domain-specific natural language question answering system. The proposed architecture defines four main modules: 1) question processing which analyzes and classifies questions and reformulates queries, 2) document retrieval which retrieves relevant documents, 3) document processing which processes retrieved documents, and 4) answer extraction which extracts and generates responses. Natural language processing techniques and ontologies are used to analyze questions and documents and extract relationships and answers. The system aims to generate concise, specific answers to natural language questions in a given domain and achieved 94% accuracy in testing.
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approach to facial age classification which allows a classifier to select the data from which it learns. The classifier is initially trained using a small pool of labeled training images. This is achieved by using the bilateral two dimension linear discriminant analysis. Then the most informative unlabeled image is found out from the unlabeled pool using the furthest nearest neighbor criterion, labeled by the user and added to the
appropriate class in the training set. The incremental learning is performed using an incremental version of bilateral two dimension linear discriminant analysis. This active learning paradigm is proposed to be applied to the k nearest neighbor classifier and the support vector machine classifier and to compare the performance of these two classifiers.
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
This was part of my inaugural lecture of Summer Internship on Machine Learning at NMAM Institute of Technology, Nitte on 7th June, 2018. A lot more than what was on this presentation was discussed. We spoke on the ethics of choices we make as developers, socio-cultural impact of AI and ML and the political repercussions of deploying ML and AI.
introduction to machine learning and nlpMahmoud Farag
The document discusses natural language processing (NLP) and machine learning. It defines NLP as a branch of artificial intelligence that develops systems allowing computers to understand and generate human language. NLP encompasses tasks like machine translation, speech recognition, named entity recognition, text classification, summarization and question answering. The document also discusses the complexities of human language and different levels of linguistic analysis used in NLP, including syntactic, semantic, discourse, pragmatic and morphological analysis.
Natural Language Processing Through Different Classes of Machine Learningcsandit
This document summarizes several papers on using different classes of machine learning for natural language processing tasks. It discusses supervised learning approaches for semantic orientation analysis and sentiment analysis. It also covers unsupervised learning approaches like Turney's work using semantic association to determine semantic orientation. Finally, it discusses semi-supervised learning and its ability to use both labeled and unlabeled data to help with NLP tasks on large, unprocessed datasets from the growing internet.
Application of hidden markov model in question answering systemsijcsa
By the increase of the volume of the saved information on web, Question Answering (QA) systems have been very important for Information Retrieval (IR). QA systems are a specialized form of information retrieval. Given a collection of documents, a Question Answering system attempts to retrieve correct answers to questions posed in natural language. Web QA system is a sample of QA systems that in this system answers retrieval from web environment doing. In contrast to the databases, the saved information on web does not follow a distinct structure and are not generally defined. Web QA systems is the task of automatically answering a question posed in Natural Language Processing (NLP). NLP techniques are used in applications that make queries to databases, extract information from text, retrieve relevant documents from a collection, translate from one language to another, generate text responses, or recognize spoken words converting them into text. To find the needed information on a mass of the non-structured information we have to use techniques in which the accuracy and retrieval factors are implemented well. In this paper in order to well IR in web environment, The QA system in designed and also implemented based on the Hidden Markov Model (HMM)
Semantic based automatic question generation using artificial immune systemAlexander Decker
The document describes a system that uses artificial immune systems and natural language processing techniques like semantic role labeling and named entity recognition to automatically generate questions from text. It introduces a model that applies these techniques to extract semantic patterns from sentences, trains a classifier using artificial immune systems to classify question types, and then generates questions by matching patterns. The system was tested on sentences from various sources and showed promising results, correctly determining question types 95% of the time and generating matching questions 87% of the time.
Supervised Sentiment Classification using DTDP algorithmIJSRD
Sentiment analysis is the process widely used in all fields and it uses the statistical machine learning approach for text modeling. The primarily used approach is Bag-of-words (BOW). Though, this technique has some limitations in polarity shift problem. Thus, here we propose a new method called Dual sentiment analysis (DSA) which resolves the polarity shift problem. Proposed method involves two approaches such as dual training and dual prediction (DPDT). First, we propose a data expansion technique by creating a reversed review for training data. Second, dual training and dual prediction algorithm is developed for doing analysis on sentiment data. The dual training algorithm is used for learning a sentiment classifier and the dual prediction algorithm is developed for classifying the review by considering two sides of one review.
The Role of Families and the Community Proposal Template (N.docxssusera34210
The Role of Families and the Community Proposal Template
(
Name of Presenter:
Focus of proposed presentation:
Age group your proposal will focus on:
)
Proposal Directions: Please complete each of the following sections of the proposal in order to demonstrate your competency in the area of the role that families and the community play in promoting optimal cognitive development. In each box, address the topic that is presented. The space for sharing your knowledge will expand with your text, so please do not feel limited by the space that is currently showing.
Explain how theory can influence the choices parents make when promoting their child’s cognitive development abilities for your chosen age group. Use specific examples from one theory of cognitive development that has been discussed this far in the course.
Explain how the environment that families create at home helps promote optimal cognitive development for your chosen age group. Provide at least two strategies that you would encourage parents to foster this type of environment.
Discuss the role that family plays in developing executive functions for your chosen age group. Provide at least two strategies that you suggest parents use to help foster the development of executive functions.
Examine the role that family plays in memory development for your chosen age group. Provide at least strategies parents can use to support memory development.
Examine the role that family plays in conceptual development for your chosen age group. Use ideas from your response to the Week 3 Discussion 1 forum to provide at least two strategies families can use to support development in this area.
Explain at least two community resources that would suggest families use to support the cognitive development of their children for your chosen age group.
Analyze of the role that you would play in helping to support families within your community to promote optimal cognitive development for your chosen age group.
Running Head: MINI-PROJECT: QUALITATIVE ANALYSIS 1
MINI-PROJECT: QUALITATIVE ANALYSIS 6
Mini-Project: Qualitative Analysis
Student’s Name
Institutional Affiliation
MINI-PROJECT: QUALITATIVE ANALYSIS
Introduction
It is important for qualitative data to be analyzed and the themes that emerge identified so that the data can be presented in a way that is understandable. Theme identification is an essential task in qualitative research and themes could mean abstract, often fuzzy, constructs which investigators identify before, during, and after data collection. I will discuss the themes that emerge from the data collected from the interview.Analyzing and presenting qualitative data in an understandable manner is a five step procedure that I will also explain in this paper.
Emergi ...
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
An Adaptive Approach for Subjective Answer Evaluationvivatechijri
The document presents a proposed adaptive approach for subjective answer evaluation. It discusses existing methods that use keyword matching and semantic similarity to evaluate long-form answers. The proposed system extracts keywords from answers, checks for similarity using KNN algorithm while also considering the relationship between keywords and sentences. It further checks for synonyms and similar words using semantic similarity to improve accuracy. Duplication detection is used to compare previous student answers and assign grades based on uniqueness. The system is implemented in ASP.NET and is shown to have better performance than existing methods in features like keywords extraction, synonyms mapping and duplication checking.
Deep learning based Arabic short answer grading in serious gamesIJECEIAES
Automatic short answer grading (ASAG) has become part of natural language processing problems. Modern ASAG systems start with natural language preprocessing and end with grading. Researchers started experimenting with machine learning in the preprocessing stage and deep learning techniques in automatic grading for English. However, little research is available on automatic grading for Arabic. Datasets are important to ASAG, and limited datasets are available in Arabic. In this research, we have collected a set of questions, answers, and associated grades in Arabic. We have made this dataset publicly available. We have extended to Arabic the solutions used for English ASAG. We have tested how automatic grading works on answers in Arabic provided by schoolchildren in 6th grade in the context of serious games. We found out those schoolchildren providing answers that are 5.6 words long on average. On such answers, deep learning-based grading has achieved high accuracy even with limited training data. We have tested three different recurrent neural networks for grading. With a transformer, we have achieved an accuracy of 95.67%. ASAG for school children will help detect children with learning problems early. When detected early, teachers can solve learning problems easily. This is the main purpose of this research.
This document summarizes a research paper that examines pricing strategy in a two-stage supply chain consisting of a supplier and retailer. The supplier offers a credit period to the retailer, who then offers credit to customers. A mathematical model is formulated to maximize total profit for the integrated supply chain system. The model considers three cases based on the relative lengths of the credit periods offered at each stage. Equations are developed to represent the profit functions for the supplier, retailer and overall system in each case. The goal is to determine the optimal selling price that maximizes total integrated profit.
The document discusses melanoma skin cancer detection using a computer-aided diagnosis system based on dermoscopic images. It begins with an introduction to skin cancer and melanoma. It then reviews existing literature on automated melanoma detection systems that use techniques like image preprocessing, segmentation, feature extraction and classification. Features extracted in other studies include asymmetry, border irregularity, color, diameter and texture-based features. The proposed system collects dermoscopic images and performs preprocessing, segmentation, extracts 9 features based on the ABCD rule, and classifies images using a neural network classifier to detect melanoma. It aims to develop an automated diagnosis system to eliminate invasive biopsy procedures.
This document summarizes various techniques for image segmentation that have been studied and proposed in previous research. It discusses edge-based, threshold-based, region-based, clustering-based, and other common segmentation methods. It also reviews applications of segmentation in medical imaging, plant disease detection, and other fields. While no single technique can segment all images perfectly, hybrid and adaptive methods combining multiple approaches may provide better results. Overall, image segmentation remains an important but challenging task in digital image processing and computer vision.
This document presents a test for detecting a single upper outlier in a sample from a Johnson SB distribution when the parameters of the distribution are unknown. The test statistic proposed is based on maximum likelihood estimates of the four parameters (location, scale, and two shape) of the Johnson SB distribution. Critical values of the test statistic are obtained through simulation for different sample sizes. The performance of the test is investigated through simulation, showing it performs well at detecting outliers when the contaminant observation represents a large shift from the original distribution parameters. An example application to census data is also provided.
This document summarizes a research paper that proposes a portable device called the "Disha Device" to improve women's safety. The device has features like live location tracking, audio/video recording, automatic messaging to emergency contacts, a buzzer, flashlight, and pepper spray. It is designed using an Arduino microcontroller connected to GPS and GSM modules. When the button is pressed, it sends an alert message with the woman's location, sets off an alarm, activates the flashlight and pepper spray for self-defense. The goal is to provide women a compact, one-click safety system to help them escape dangerous situations or call for help with just a single press of a button.
- The document describes a study that constructed physical fitness norms for female students attending social welfare schools in Andhra Pradesh, India.
- Researchers tested 339 students in classes 6-10 on speed, strength, agility and flexibility tests. Tests included 50m run, bend and reach, medicine ball throw, broad jump, shuttle run, and vertical jump.
- The results showed that 9th class students had the best average time for the 50m run. 10th class students had the highest flexibility on average. Strength and performance generally improved with increased class level.
This document summarizes research on downdraft gasification of biomass. It discusses how downdraft gasifiers effectively convert solid biomass into a combustible producer gas. The gasification process involves pyrolysis and reactions between hot char and gases that produce CO, H2, and CH4. Downdraft gasifiers are well-suited for biomass gasification due to their simple design and ability to manage the gasification process with low tar production. The document also reviews previous studies on gasifier configuration upgrades and their impact on performance, and the principles of downdraft gasifier operation.
This document summarizes the design and manufacturing of a twin spindle drilling attachment. Key points:
- The attachment allows a drilling machine to simultaneously drill two holes in a single setting, improving productivity over a single spindle setup.
- It uses a sun and planet gear arrangement to transmit power from the main spindle to two drilling spindles.
- Components like gears, shafts, and housing were designed using Creo software and manufactured. Drill chucks, bearings, and bits were purchased.
- The attachment was assembled and installed on a vertical drilling machine. It is aimed at improving productivity in mass production applications by combining two drilling operations into one setup.
The document presents a comparative study of different gantry girder profiles for various crane capacities and gantry spans. Bending moments, shear forces, and section properties are calculated and tabulated for 'I'-section with top and bottom plates, symmetrical plate girder, 'I'-section with 'C'-section top flange, plate girder with rolled 'C'-section top flange, and unsymmetrical plate girder sections. Graphs of steel weight required per meter length are presented. The 'I'-section with 'C'-section top flange profile is found to be optimized for biaxial bending but rolled sections may not be available for all spans.
This document summarizes research on analyzing the first ply failure of laminated composite skew plates under concentrated load using finite element analysis. It first describes how a finite element model was developed using shell elements to analyze skew plates of varying skew angles, laminations, and boundary conditions. Three failure criteria (maximum stress, maximum strain, Tsai-Wu) were used to evaluate first ply failure loads. The minimum load from the criteria was taken as the governing failure load. The research aims to determine the effects of various parameters on first ply failure loads and validate the numerical approach through benchmark problems.
This document summarizes a study that investigated the larvicidal effects of Aegle marmelos (bael tree) leaf extracts on Aedes aegypti mosquitoes. Specifically, it assessed the efficacy of methanol extracts from A. marmelos leaves in killing A. aegypti larvae (at the third instar stage) and altering their midgut proteins. The study found that the leaf extract achieved 50% larval mortality (LC50) at a concentration of 49 ppm. Proteomic analysis of larval midguts revealed changes in protein expression levels after exposure to the extract, suggesting its bioactive compounds can disrupt the midgut. The aim is to identify specific inhibitor proteins in the midg
This document presents a system for classifying electrocardiogram (ECG) signals using a convolutional neural network (CNN). The system first preprocesses raw ECG data by removing noise and segmenting the signals. It then uses a CNN to extract features directly from the ECG data and classify arrhythmias without requiring complex feature engineering. The CNN architecture contains 11 convolutional layers and is optimized using techniques like batch normalization and dropout. The system was tested on ECG datasets and achieved classification accuracy of over 93%, demonstrating its effectiveness at automated ECG classification.
This document presents a new algorithm for extracting and summarizing news from online newspapers. The algorithm first extracts news related to the topic using keyword matching. It then distinguishes different types of news about the same topic. A term frequency-based summarization method is used to generate summaries. Sentences are scored based on term frequency and the highest scoring sentences are selected for the summary. The algorithm was evaluated on news datasets from various newspapers and showed good performance in intrinsic evaluation metrics like precision, recall and F-score. Thus, the proposed method can effectively extract and summarize online news for a given keyword or topic.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Manufacturing Process of molasses based distillery ppt.pptx
Paper id 28201441
1. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
Question Classification: Using Support Vector Machine
77
and Lexical, Semantic and Sytactic Features
Kiran Yadav, Megha Mishra
M.E scholar sscet Bhilai, Prof. sscet Bhilai
Yadavkiran64@gmail.com
Abstract Question classification is play important role in the question answering system. The results of the
question classification find out the quality of the question answering system. In this paper, a question
classification algorithm based on SVM and feature, Support Vector Machine model is take to train a classifier
on coarse categories, there features also use for classify the category. SVM has been used for question
classification and have a good results. We use SVM as the classifier. The experiment results show that the
feature extraction can perform well with SVM and our approach can reach classification accuracy.
Index Terms-
Question answering, text classification, machine learning, support vector machine.
1. INTRODUCTION
In this work, we use a machine learning approach to
question classification. Task of question classification
as a supervised learning classification. In order to
prepare the learning model, we designed a deep
position of features that are prognostic of question
categories .
In this paper work this classification has two
purposes. It provides constraints on the answer types
that provide foster processing to just site and verify
the answer. Which city has the largest population? we
do not want to test each phrase in a document to look
that it gives an answer .
However, there characteristics of question
classification that mark it from the common work. On
one hand, questions are relatively short and contain
less word-based information equate with classifying
the entire text. On the other hand, small questions are
amenable for more correct and deeper-level In this
way, this work on question classification can be also
see as a case study is take semantic information to text
classification. Similar to syntactic information such as
part-of-speech tags, clear notion of how to use lexical
semantic information is to replace or augment each
word by its semantic class in the given context, then
generate a feature-based representation and learn a
mapping from this representation to the desired
property. This general scheme leaves several issues
open that make the analogy to syntactic categories
nontrivial.
First, it is not open which semantic category
is allow and how to develop them. Second, it is not
open how to hold the more dissimilar problem of
semantic when decisied the delegacy of a sentence.
Merge these three features and increase the accuracy
of the question classification by using these features.
Question classification plays an important role in
question answering. Features are the key to obtain an
accurate question classifier.
Question answering systems deal defferent it this
problem, by giving natural language de in which users
can explain their information required form of a
natural language question. Retrieve the exact answer
to that very same question in place of a set of
documents. natural language, from a (typically large)
collection of documents, such as the WWW.
The developing period of the q/a system in different
field is too long and recycle rate is so low. Developed
a state of the art machine leaning based question
classifier that use a rich a set of lexical, syntactic and
semantic features.
2. QUESTION CLASSIFICATION
Question Classification means it helps for give the
result of given question .It is mainly use for the
question answer system. It work category wise
example if any type of question it there and find the
answer in category it give fast result. When we search
any thing it search engine like google then it gives all
things which are related to that word which is in
search. But it gives the answer in category wise.
because of only the question`s answer is presented.
Table 1. The coarse question categories
Coarse
ABBR
DESC
ENTY
HUM
LOC
NUM
2. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
78
To simplify the following experiments, we assume
that one Question resides in only one category. That is
to say, unambiguous question is labeled with its most
probable category.
2.1 Question types
What is the fastest fish in the world?
What’s the colored part of the eye called?
What color is Mr. Spock’s blood?
Name a novel written by John Steinbeck.
What currency is used in Australia?
What is the fear of cockroaches called?
What are the historical trials following WorldWar II
called?
What is the world ’s best selling cookie?
What instrument is Ray Charles best known for
playing?
What language is mostly spoken in Brazil?
What letter adorns the flag of Rwanda?
What’s the highest hand in straight poker?
What is the state tree of Nebraska?
What is the best brand for a laptop computer?
What religion has the most members?
What game is Garry Kasparov really good at?
3. RELATED WORK
Hand-made Rule-based show on extracting names
using many of human-made rules set. basically the
systems consist of a set of patterns using grammatical
(e.g. part of speech), syntactic (e.g. word precedence)
and orthographic features (e.g. capitalization) in
combination with dictionaries An example for this
type of system is: "President rao said bankers talks
will make discussions on private, U.S. forces to leave
Iraq". In this example a proper noun follows a
person's title(president), then noun is a person's name
and proper noun that is started with capital character
(Iraq) after the verb (to leave) is a Location's name.
In this family of approaches, Appelt , propose a name
identification system based on carefully handcrafted
regular expression called FASTUS. They divided the
task into three steps: Recognizing Phrases,
Recognizing Patterns and Merging incidents These
approaches are relying on manually coded rules and
manually compiled corpora. These kinds of models
have better results for restricted domains, are capable
of detecting complex entities that learning models
have difficulty with. However, the rule-based NE
systems lack the ability of portability robustness, and
furthermore the high cost of the rule maintains
increases even though the data is slightly changed.
These type of approaches are often domain and
language specific and do
not necessarily adapt well to new domains and
languages.
In Machine Learning-based NER system,
the purpose of Named Entity Recognition approach is
converting identification problem into a classification
problem and employs a classification statistical model
to solve it. In this type of approach, the systems look
for patterns and relationships into text to make a
model using statistical models and machine learning
algorithms. The systems identify and classify nouns
into particular classes such as persons, locations,
times, etc base on this model, using machine learning
algorithms. There are two types of machine learning
model that are use for NER. Supervised and
Unsupervised machine learning model. Supervised
learning involves using a program that can learn to
classify a given set of labeled examples that are made
up of the same number of features.
Each example is thus represented with
respect to the different feature spaces. The learning
process is called supervised, because the people who
marked up the training examples are teaching the
program the right distinctions. The supervised
learning approach requires preparing labeled training
data to construct a statistical model, but it cannot
achieve a good performance without a large amount
of training data, because of data sparseness problem.
In recent years several statistical methods based on
supervised learning method were proposed. Bikel et.
al. propose a learning name-finder base on hidden
Markov model [8] called Nymbel, while Borthwick
et. al. investigates exploiting diverse knowledge
sources via maximum entropy in named entity
recognition [9,10]. A tagging of unknown proper
names system with Decision Tree model was
proposed by Bechet et. al. [5], while Wuet. al.
presented a named entity recognition system based on
support vector machines [2]. Unsupervised learning
method is another type of machine learning model,
where an unsupervised model learns without any
feedback. In unsupervised learning, the goal of the
program is to build representations from data. These
representations can then be used for data compression,
classifying, decision making, and other purposes.
Unsupervised learning is not a very popular approach
for NER and the systems that do use unsupervised
learning are usually not completely unsupervised. In
these types of approach, Collins et. al. discusses an
unsupervised model for named entity classification by
use of unlabeled examples of data [7],
Koimetal. Proposes an unsupervised named entity
classification models and their ensembles that uses a
small-scale named entity dictionary and an unlabeled
corpus for classifying named entities [4]. Unlike the
rulebased method, these types of approaches can be
easily port to different domain or languages. In
Hybrid NER system, the approach is to combine
rulebased and machine learning-based methods, and
make new methods using strongest points from each
method.
3. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
79
4. QUESTION FEATURES
One of the main challenges in developing a
supervised classifier for a particular domain is to
identifyand design a rich set of features – a process
which is generally referred to as feature engineering.
In the subsections that follow, we present the different
types of features that were used in the question
classifier, and how they are extracted from a given
question.
4.1Lexical features
Lexical features refer to word related features that are
extracted directly from the question. In this work,we
use word level n-grams as lexical features. We also
include in this section the techniques of stemming and
stop word removal, which can be used to reduce the
dimensionality of the feature set.
4.1.1 Stemming and Stop word removal
Stemming is a technique that reduces words to their
grammatical roots or stems, by removing their affixes.
For instance, after applying stemming, the words
inventing and invented both become invent. We
exploit this technique in our question classifier in the
following manner. First, we represent the question
using the bag-of-words model as previously
described. Second, we apply Porter’s stemming
algorithm (Porter, 1980) to transform each word into
its stem. The following two examples depict a
question before and after stemming are applied,
respectively.
(1) Which countries are bordered by France?
(2) Which country are border by Franc?
Another related technique is to remove stop words,
which are frequently occurring words with no
semantic value, such as the articles the and an. Both
of these techniques are mainly used to reduce the
feature space of the classifier – i.e., the number of
total features that need to be considered. This is
achieved by collapsing several different forms of the
same word into one distinct term by applying
stemming; or by eliminating words which are likely to
be present in most questions – stop words –, and
which do not provide useful information for the
classifier.
4.2 Syntactic Features
In addition to the information that is readily available
in the input instance, it is common in natural language
processing tasks to augment sentence representation
with syntactic categories, under the assumption that
the sought-after property, for which we seek the
classifier, depends on the syntactic role of a word in
the sentence rather than the specific word .
4.2.1Question headword
The question headword 1 is a word in a given
question that represents the information that is being
Sought after. In the following examples, the headword
is in bold face:
(1) What is Australia’s national flower?
(2) Name an American made motorcycle.
(3) Which country are Godiva chocolates from?
(4) What is the name of the highest mountain in
Africa?
In Example 1,2,3, the
headword flower provides the classifier with an
important clue to correctly classify the question to
ENTITY:PLANT. By the same token, motorcycle in
Example 4 renders hints that help classify the question
to ENTITY:VEHICLE. Indeed, the aforementioned
examples’ entire headword serves as an important
feature to unveil the question’s category, which is
why we dedicate a great effort to its accurate
extraction. Our baseline classifier makes use of the
standard POS information and phrase information
extracted by a shallow parser. Specifically, we use
chunks (non overlapping phrases) and head chunks,
.The following example illustrates the information
available when generating the syntax-augmented
feature-based representation. Question: Who was the
first woman killed in the Vietnam War? Chunking:
[NP Who] [VP was] [NP the first woman] [VP killed]
[PP in] [NP the Vietnam War] ?
The head chunks
denote the first noun or verb chunk after the question
word in a question. For example, in the above
question, the first noun chunk after the question word
who is ‘the first woman’. The features are represented
as abstract tags in each example.
4.3 Semantic Features
Similar logic can be applied to semantic categories. In
many cases, the property seems not depend on the
specific word used in the sentence – that could be
replaced without affecting this property – but rather
on its ‘meaning’. For example, given the question:
What Cuban dictator did Fidel Castro force out of
power in 1958?, we would like to determine that its
answer
Should be a name of a person. Knowing that dictator
refers to a person is essential to correct classification.
This work systematically
studies four semantic information sources and their
contribution to classification: (1) automatically
acquired named entity categories -NE, (2) word
senses in WordNet -SemWN, (3) manually
constructed word lists related to specific categories of
interest -SemCSR, and (4) automatically generated
4. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
80
semantically similar word lists (Zhang, D., & Lee, W.
S, 2003) -SemSWL.
For the four external semantic
information sources, we define semantic categories of
words and incorporate the information into question
classification in the
same way: if a word w occurs in a question, the
question representation is augmented with the
semantic category(ies), of the word. For example, in
the question: What is the state flower of California?
given that plant (for example) is the only semantic
class of flower, the feature extractor adds plant, an
abstract label to the question representation.
4.3.1 Named Entities
A named entity (NE) recognizer assigns a semantic
category to some of the noun phrases in the question.
The scope of the categories used here is broader than
the common named entity recognizer. With additional
categories that could help question answering, such as
profession, event, holiday, plant, sport, medical etc.,
we redefine our task in the direction of semantic
categorization. The named entity recognizer was built
on the shallow parser described in (Voorhees, E. M.
(2004).), and was trained to categorize noun phrases
into one of 34 different semantic categories of varying
specificity. Its overall accuracy (F¯ =1) is above 90%.
For the question Who was the woman killed in the
Vietnam War ?, the named entity tagger will return:
NE: Who was the [Num first] woman killed in the
[Event Vietnam War] ? As described above, the
identified named entities are added to the question
representation.
4.3.2WordNet Senses
In WordNet (C. Peters,2005)words are organized
according to their ‘senses’ (meanings). Words of the
same sense can, in principle, be exchanged in some
contexts. The senses are organized in a hierarchy of
hypernyms and hyponyms. Word senses provide
another effective way to describe the semantic
category of a word. For example, in WordNet 1.7, the
word water belongs to 5 senses. The first two senses
are:
Sense 1: binary compound that occurs at room
temperature
as a colorless odorless liquid;
Sense 2: body of water.
Sense 1 contains words fH2O, water} while Sense 2
contains
water, body of water. Sense 1 has a hypernym
Sense 3: binary compound); and one hyponym of
Sense 2 is (Sense 4: tap water). For each word in a
question, all of its sense IDs and direct hypernym and
hyponym IDs are extracted as features.
This approach possibly introduces
significant noise to classification since only a small
proportion of senses are really related.
5 SUPPORT VECTOR MACHINE
Machine learning tasks can be of several forms.
In supervised learning, the computer is presented with
example inputs and their desired outputs, given by a
"teacher", and the goal is to learn a general rule
that maps inputs to outputs. Spam filtering is an
example of supervised learning,
particular classification, where the learning algorithm
is presented with email (or other) messages labeled
beforehand as "spam" or "not spam", to produce a
computer program that labels unseen messages as
either spam or not.
In unsupervised learning, no labels
are given to the learning algorithm, leaving it on its
own to groups of similar inputs (clustering),density
estimates or projections of high-dimensional data that
can be visualised effectively.[2]:3 Unsupervised
learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end. Topic
modeling is an example of unsupervised learning,
where a program is given a list of human
language documents and is tasked to find out which
documents cover similar topics Supervised learning is
the machine learning task of inferring a function from
labeled training data.[1] The training data consist of a
set of training examples. In supervised learning, each
example is a pair consisting of an input object
(typically a vector) and a desired output value (also
called the supervisory signal). A supervised learning
algorithm analyzes the training data and produces an
inferred function, which can be used for mapping new
examples. An optimal scenario will allow for the
algorithm to correctly determine the class labels for
unseen instances. This requires the learning algorithm
to generalize from the training data to unseen
situations in a "reasonable" way (see inductive bias).
In machine learning, the problem of unsupervised
learning is that of trying to find hidden structure in
unlabeled data. Since the examples given to the
learner are unlabeled, there is no error or reward
signal to evaluate a potential solution. This
distinguishes unsupervised learning from supervised
learning and reinforcement learning.
Unsupervised
learning is closely related to the problem of density
estimation in statistics.[1] However unsupervised
learning also encompasses many other techniques that
seek to summarize and explain key features of the
data. Many methods employed in unsupervised
learning
5. International Journal of Research in Advent Technology, Vol.2, No.8, August 2014
E-ISSN: 2321-9637
81
6 CONCLUSION
In this paper we presented a detailed overview on
learning-based question classification approaches.
Question classification is a hard problem. In fact the
machine need to understand the question and classify
it to the right category. This is done by a series of
complicated steps. In this paper we reviewed different
learning methods and feature extraction techniques for
question classification. Deciding for the best model
and optimal set of features is not a simple problem.
Enhancing the feature space with syntactic and
semantic features can usually improve the
classification accuracy.
.
7 FUTURE WORK
In the question classification task, we have shown that
a machine learning-based classifier using solely
superficial features . Increase the accuracy in question
answer system with the combination of the three
feature by using svm (support vector system) method.
8 RESULT
It increases the accuracy of the answer detection. It
give 95.2% of accuracy.
Acknowledgements
I thank PROF .Megha Mishra for several valuable
suggestions and the entire SSCET team for help with
various components, feature suggestions and
guidance.
REFERENCES
[1] Question classification using support vector
machines. By Zhang, D., & Lee, W. S. (2003). In
Proceedings of the 26th annual international acm
sigir conference on researc and
developmentininformaionretrieval(pp.26–32).
[2] Voorhees, E. M. (2004). Overview of the trec
2004 question answering track. In E. M.
Voorhees & L. P.Buckland (Eds.), Trec (Vol.
Special Publication 500-261). National Institute
of Standards and Technology(NIST).
[3] Wang, Y.-C., Wu, J.-C., Liang, T., & Chang, J. S.
(2005). Web-based unsupervised learning for
queryformulation in question answering. In Ijcnlp
(p. 519-529).
[4] Accessingmultilingualinformation2005multilingu
alquestion answering track. In C. Peters (Ed.),
repositories.Berlin, Heidelberg: Springer-Verlag.
[5] Adaptive information extraction. ACM Comput.
Surv.,Turmo, J., Ageno, A., & Catal`a, N. (2006).
38(2), 4.Vallin, A., Magnini, B., Giampiccolo,
D., Aunimo, L., & Ayache, C. (2006).
[6] Improved inference for unlexicalized parsing by
Petrov, S., & Klein, D. (2007, April).. In Human
language technologies2007: The conference of
the north american chapter of the association for
computational .
[7] Question classification with semantic tree kernel.
Pan, Y., Tang, Y., Lin, L., & Luo, Y. (2008).
InProceedings of the 31st annual international
acm sigir conference on research and
development in information retrieval (pp. 837–
838). New York, NY, USA: AC
[8] Designing an interactive open-domain question
answering
[9] System by Quarteroni, S&Manandhar, S.
(2009)..forthcoming,Journal of Natural Language
Engineering,Volume 15 Issue 1.
[10]Biomedical Semantics by Chanlekha and Collier
(2010)Journalof,1:3
http://www.jbiomedsem.com/content/1/1/3
[11]Document Classification with Support Vector
Machines By Konstantin Mertsalov Principal
Scientist, Machine and Computational Learning
Rational Retention, LLC
kmertsalov@rationalretention.com January 2009
[12] Information Processing and Management journal
Trento, Italy(2011) homepage:
www.elsevier.com/ locate/ infoproman Linguistic
kernels for answer re-ranking in question
answering systems Alessandro Moschitti, Silvia
Quarteroni University of Trento, Via Sommarive
14, 38050 Povo.