The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis
(OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and
neutral polarity about the product, user reviews, user comments, and etc., in social media for the purpose
of decision making and Business Intelligence to individuals or organizations. In this paper, we have
performed an experimental study for different feature extraction or selection techniques available for
opinion mining task. This experimental study is carried out in four stages. First, the data collection process
has been done from readily available sources. Second, the pre-processing techniques are applied
automatically using the tools to extract the terms, POS (Parts-of-Speech). Third, different feature selection
or extraction techniques are applied over the content. Finally, the empirical study is carried out for
analyzing the sentiment polarity with different features.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
Evaluating query-independent object features for relevancy predictionNTNU
This paper presents a series of experiments investigating the effec- tiveness of query-independent features extracted from retrieved objects to predict relevancy. Features were grouped into a set of conceptual categories, and indi- vidually evaluated based on click-through data collected in a laboratory-setting user study. The results showed that while textual and visual features were useful for relevancy prediction in a topic-independent condition, a range of features can be effective when topic knowledge was available. We also re-visited the original study from the perspective of significant features identified by our experiments.
A Novel approach for Document Clustering using Concept ExtractionAM Publications
In this paper we present a novel approach to extract the concept from a document and cluster such set of documents depending on the concept extracted from each of them. We transform the corpus into vector space by using term frequency–inverse document frequency then calculate the cosine distance between each document, followed by clustering them using K means algorithm. We also use multidimensional scaling to reduce the dimensionality within the corpus. It results in the grouping of documents which are most similar to each other with respect to their content and the genre.
IRJET- Survey of Feature Selection based on Ant ColonyIRJET Journal
This document summarizes research on feature selection methods based on ant colony optimization algorithms. It first divides common feature selection approaches into filter, wrapper, and hybrid methods. It then discusses how ant colony optimization algorithms are well-suited for feature selection problems due to their ability to handle multiple objectives. The document reviews related work applying ant colony optimization to feature selection with neural networks and support vector machines. It concludes that ant colony optimization shows promise for feature selection but requires further testing on real-world datasets.
This document discusses machine learning algorithms and their applications. It begins with an abstract discussing supervised, unsupervised, and reinforcement learning techniques. It then discusses machine learning in more detail, explaining that machine learning algorithms represent data instances with a set of features and classify instances based on their labels. The main focus is on supervised and unsupervised learning techniques and their performance parameters. It provides an overview of support vector machines, neural networks, and other machine learning algorithms. In summary, the document provides a survey of different machine learning techniques, how they work, and their applications.
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS ijnlc
The first step of processing a question in Question Answering(QA) Systems is to carry out a detailed analysis of the question for the purpose of determining what it is asking for and how to perfectly approach answering it. Our Question analysis uses several techniques to analyze any question given in natural language: a Stanford POS Tagger & parser for Arabic language, a named entity recognizer, tokenizer,
Stop-word removal, Question expansion, Question classification and Question focus extraction components. We employ numerous detection rules and trained classifier using features from this analysis to detect important elements of the question, including: 1) the portion of the question that is a referring to the answer (the focus); 2) different terms in the question that identify what type of entity is being asked for (the
lexical answer types); 3) Question expansion ; 4) a process of classifying the question into one or more of several and different types; and We describe how these elements are identified and evaluate the effect of accurate detection on our question-answering system using the Mean Reciprocal Rank(MRR) accuracy measure.
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
Evaluating query-independent object features for relevancy predictionNTNU
This paper presents a series of experiments investigating the effec- tiveness of query-independent features extracted from retrieved objects to predict relevancy. Features were grouped into a set of conceptual categories, and indi- vidually evaluated based on click-through data collected in a laboratory-setting user study. The results showed that while textual and visual features were useful for relevancy prediction in a topic-independent condition, a range of features can be effective when topic knowledge was available. We also re-visited the original study from the perspective of significant features identified by our experiments.
A Novel approach for Document Clustering using Concept ExtractionAM Publications
In this paper we present a novel approach to extract the concept from a document and cluster such set of documents depending on the concept extracted from each of them. We transform the corpus into vector space by using term frequency–inverse document frequency then calculate the cosine distance between each document, followed by clustering them using K means algorithm. We also use multidimensional scaling to reduce the dimensionality within the corpus. It results in the grouping of documents which are most similar to each other with respect to their content and the genre.
IRJET- Survey of Feature Selection based on Ant ColonyIRJET Journal
This document summarizes research on feature selection methods based on ant colony optimization algorithms. It first divides common feature selection approaches into filter, wrapper, and hybrid methods. It then discusses how ant colony optimization algorithms are well-suited for feature selection problems due to their ability to handle multiple objectives. The document reviews related work applying ant colony optimization to feature selection with neural networks and support vector machines. It concludes that ant colony optimization shows promise for feature selection but requires further testing on real-world datasets.
This document discusses machine learning algorithms and their applications. It begins with an abstract discussing supervised, unsupervised, and reinforcement learning techniques. It then discusses machine learning in more detail, explaining that machine learning algorithms represent data instances with a set of features and classify instances based on their labels. The main focus is on supervised and unsupervised learning techniques and their performance parameters. It provides an overview of support vector machines, neural networks, and other machine learning algorithms. In summary, the document provides a survey of different machine learning techniques, how they work, and their applications.
QUESTION ANALYSIS FOR ARABIC QUESTION ANSWERING SYSTEMS ijnlc
The first step of processing a question in Question Answering(QA) Systems is to carry out a detailed analysis of the question for the purpose of determining what it is asking for and how to perfectly approach answering it. Our Question analysis uses several techniques to analyze any question given in natural language: a Stanford POS Tagger & parser for Arabic language, a named entity recognizer, tokenizer,
Stop-word removal, Question expansion, Question classification and Question focus extraction components. We employ numerous detection rules and trained classifier using features from this analysis to detect important elements of the question, including: 1) the portion of the question that is a referring to the answer (the focus); 2) different terms in the question that identify what type of entity is being asked for (the
lexical answer types); 3) Question expansion ; 4) a process of classifying the question into one or more of several and different types; and We describe how these elements are identified and evaluate the effect of accurate detection on our question-answering system using the Mean Reciprocal Rank(MRR) accuracy measure.
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
PROPOSAL OF A TWO WAY SORTING ALGORITHM AND PERFORMANCE COMPARISON WITH EXIST...IJCSEA Journal
An algorithm is any well-defined procedure or set of instructions, that takes some input in the form of some values, processes them and gives some values as output. Sorting involves rearranging information into either ascending or descending order. Sorting is considered as a fundamental operation in computer science as it is used as an intermediate step in many operations. A new sorting algorithm namely ‘An Endto-End Bi-directional Sorting (EEBS) Algorithm’ is proposed to address the shortcomings of the current popular sorting algorithms. The goal of this research is to perform an extensive empirical analysis of the newly developed algorithm and present its functionality. The results of the analysis proved that EEBS is much more efficient than the other algorithms having O(n2 ) complexity, like bubble, selection and insertion sort..
An unsupervised feature selection algorithm with feature ranking for maximizi...Asir Singh
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
IRJET- Implementation of Automatic Question Paper Generator SystemIRJET Journal
This document describes a proposed system for automatically generating question papers from input documents. The system performs several steps: it converts PDF/document files to text, conducts preprocessing like removing stop words, uses natural language processing and TF-IDF for key phrase extraction, checks phrases against Wikipedia for domain knowledge, generates triplets for questions, and checks the quality of generated questions using linguistic rules. The system aims to make the question paper generation process faster, more randomized and secure compared to traditional manual methods.
An Automatic Question Paper Generation : Using Bloom's TaxonomyIRJET Journal
This document presents a system for automatically generating exam questions and categorizing them according to Bloom's Taxonomy. It uses natural language processing techniques like part-of-speech tagging to analyze questions and identify keywords and verbs. Rules are developed to match question patterns and keywords to the appropriate Bloom's Taxonomy category. A randomization algorithm is also introduced to randomly select questions from the database and avoid repetitions. The system aims to help educators automatically analyze exam questions and ensure a balance of cognitive levels according to Bloom's Taxonomy. Preliminary results found the rules could successfully categorize questions in the test set. The proposed system has applications for educational institutions, universities, and government exams.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
A syntactic analysis model for vietnamese questions in v dlg~tabl systemijnlc
This paper introduces a syntactic analysis model that we propose to parse and process the Vietnamese questions about tablets in V-DLG~TABL system, which is a Vietnamese Question – Answering system working based on automatic dialog mechanism. The V-DLG~TABL system is built to support clients using Vietnamese questions for searching tablets based on interaction between the clients and the system. We apply the “Phrase Structure Grammar” of Noam Chomsky to develop a syntactic analysis model that is specific and suitable for the V-DLG~TABL system. This syntactic analysis model is used to implement the “V-DLG~TABL Syntactic Parsing and Processing” component of the system.
11.software modules clustering an effective approach for reusabilityAlexander Decker
This document summarizes previous work on using clustering techniques for software module classification and reusability. It discusses hierarchical clustering and non-hierarchical clustering methods. Previous studies have used these techniques for software component classification, identifying reusable software modules, course clustering based on industry needs, mobile phone clustering based on attributes, and customer clustering based on electricity load. The document provides background on clustering analysis and its uses in various domains including software testing, pattern recognition, and software restructuring.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
DEVNAGARI NUMERALS CLASSIFICATION AND RECOGNITION USING AN INTEGRATED APPROACHijfcstjournal
Character recognition has always been a challenging field for the researchers. There has been an astounding progress in the development of the systems for character recognition. OCR performs the recognition of the text in the scanned document image and converts it into editable form. The OCR process can have several stages like preprocessing, segmentation, recognition and post processing. The recognition generally, consists of feature extraction and classification. The choice of features and classification scheme affects the performance of OCR largely. In this paper, a classification scheme is proposed for the Devnagari numerals, which forms the basis for recognition. This approach integrates the structural features and water reservoir analogy based feature to classify the Devnagari numeral. In order to classify a single numeral, at most four checks are required. This increases the efficiency of the proposed scheme.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS ijcsit
Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR) systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance judgment which is not usually very reliable. We show that position shifting downwards of the relevant document within the ranked list is followed by Average Precision decreasing. The variation of Average Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this variation is negligible. In addition, we try to estimate the regularity of the Average Precision value changes, when we assume that we are switching the arbitrary number of relevance judgments within the existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have approximately same Average Precision value as a single relevant document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the document position in the list. Also, we show that in the case of Serbian-to-English human translation query followed by English-to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the parameters for measuring the IR system performance are also subject to change.
Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani
This document discusses text classification and provides an overview of the key concepts. It defines text classification as predicting which predefined category a text belongs to. Popular applications include filtering emails and news articles. The document outlines supervised learning as the main approach, where a classifier is trained on manually classified examples to learn how to categorize new texts. It also covers representing texts as vectors for classification, including feature extraction, selection, and weighting. Common supervised learning algorithms mentioned are support vector machines, boosted decision stumps, random forests and naive Bayesian methods.
This document provides tips and sample answers for common interview questions for a nanny position. It discusses how to answer questions about yourself, your strengths, reasons for leaving previous jobs, weaknesses, knowledge of the organization, and how you have improved your skills. For each question, it offers steps to formulate an effective answer, including connecting your experience to the employer's needs and providing evidence to support your strengths. Sample answers are provided for questions about career goals, knowledge of the organization, and professional development activities.
This document provides tips and sample answers for common marketing director interview questions. It discusses how to answer questions about yourself, your strengths, career goals, reasons for leaving previous jobs, weaknesses, knowledge of the organization, and ways you've improved your marketing skills. For each question, it offers steps and strategies for crafting effective responses, including giving brief introductions, summarizing relevant experience, and relating your abilities to the job requirements. Sample answers are provided for questions about strengths, weaknesses, reasons for leaving a job, and knowledge of the organization.
Teen drug abuse is a growing problem, with more teens being exposed to drugs like MDMA (molly), ecstasy, LSD, cocaine, and marijuana. Peers, music, parties, and raves influence teens to take drugs. The rave scene in particular has evolved to involve widespread drug use, especially MDMA. A tragic example was a 15-year-old who died of a drug overdose at an EDC rave in 2010, leading to bans on raves and gloving. While gloving is now a dance form, it remains banned in many places due to its perceived link to drug promotion.
PROPOSAL OF A TWO WAY SORTING ALGORITHM AND PERFORMANCE COMPARISON WITH EXIST...IJCSEA Journal
An algorithm is any well-defined procedure or set of instructions, that takes some input in the form of some values, processes them and gives some values as output. Sorting involves rearranging information into either ascending or descending order. Sorting is considered as a fundamental operation in computer science as it is used as an intermediate step in many operations. A new sorting algorithm namely ‘An Endto-End Bi-directional Sorting (EEBS) Algorithm’ is proposed to address the shortcomings of the current popular sorting algorithms. The goal of this research is to perform an extensive empirical analysis of the newly developed algorithm and present its functionality. The results of the analysis proved that EEBS is much more efficient than the other algorithms having O(n2 ) complexity, like bubble, selection and insertion sort..
An unsupervised feature selection algorithm with feature ranking for maximizi...Asir Singh
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
IRJET- Implementation of Automatic Question Paper Generator SystemIRJET Journal
This document describes a proposed system for automatically generating question papers from input documents. The system performs several steps: it converts PDF/document files to text, conducts preprocessing like removing stop words, uses natural language processing and TF-IDF for key phrase extraction, checks phrases against Wikipedia for domain knowledge, generates triplets for questions, and checks the quality of generated questions using linguistic rules. The system aims to make the question paper generation process faster, more randomized and secure compared to traditional manual methods.
An Automatic Question Paper Generation : Using Bloom's TaxonomyIRJET Journal
This document presents a system for automatically generating exam questions and categorizing them according to Bloom's Taxonomy. It uses natural language processing techniques like part-of-speech tagging to analyze questions and identify keywords and verbs. Rules are developed to match question patterns and keywords to the appropriate Bloom's Taxonomy category. A randomization algorithm is also introduced to randomly select questions from the database and avoid repetitions. The system aims to help educators automatically analyze exam questions and ensure a balance of cognitive levels according to Bloom's Taxonomy. Preliminary results found the rules could successfully categorize questions in the test set. The proposed system has applications for educational institutions, universities, and government exams.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
A syntactic analysis model for vietnamese questions in v dlg~tabl systemijnlc
This paper introduces a syntactic analysis model that we propose to parse and process the Vietnamese questions about tablets in V-DLG~TABL system, which is a Vietnamese Question – Answering system working based on automatic dialog mechanism. The V-DLG~TABL system is built to support clients using Vietnamese questions for searching tablets based on interaction between the clients and the system. We apply the “Phrase Structure Grammar” of Noam Chomsky to develop a syntactic analysis model that is specific and suitable for the V-DLG~TABL system. This syntactic analysis model is used to implement the “V-DLG~TABL Syntactic Parsing and Processing” component of the system.
11.software modules clustering an effective approach for reusabilityAlexander Decker
This document summarizes previous work on using clustering techniques for software module classification and reusability. It discusses hierarchical clustering and non-hierarchical clustering methods. Previous studies have used these techniques for software component classification, identifying reusable software modules, course clustering based on industry needs, mobile phone clustering based on attributes, and customer clustering based on electricity load. The document provides background on clustering analysis and its uses in various domains including software testing, pattern recognition, and software restructuring.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidate’s features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
DEVNAGARI NUMERALS CLASSIFICATION AND RECOGNITION USING AN INTEGRATED APPROACHijfcstjournal
Character recognition has always been a challenging field for the researchers. There has been an astounding progress in the development of the systems for character recognition. OCR performs the recognition of the text in the scanned document image and converts it into editable form. The OCR process can have several stages like preprocessing, segmentation, recognition and post processing. The recognition generally, consists of feature extraction and classification. The choice of features and classification scheme affects the performance of OCR largely. In this paper, a classification scheme is proposed for the Devnagari numerals, which forms the basis for recognition. This approach integrates the structural features and water reservoir analogy based feature to classify the Devnagari numeral. In order to classify a single numeral, at most four checks are required. This increases the efficiency of the proposed scheme.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS ijcsit
Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR) systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance judgment which is not usually very reliable. We show that position shifting downwards of the relevant document within the ranked list is followed by Average Precision decreasing. The variation of Average Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this variation is negligible. In addition, we try to estimate the regularity of the Average Precision value changes, when we assume that we are switching the arbitrary number of relevance judgments within the existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have approximately same Average Precision value as a single relevant document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the document position in the list. Also, we show that in the case of Serbian-to-English human translation query followed by English-to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the parameters for measuring the IR system performance are also subject to change.
Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani
This document discusses text classification and provides an overview of the key concepts. It defines text classification as predicting which predefined category a text belongs to. Popular applications include filtering emails and news articles. The document outlines supervised learning as the main approach, where a classifier is trained on manually classified examples to learn how to categorize new texts. It also covers representing texts as vectors for classification, including feature extraction, selection, and weighting. Common supervised learning algorithms mentioned are support vector machines, boosted decision stumps, random forests and naive Bayesian methods.
This document provides tips and sample answers for common interview questions for a nanny position. It discusses how to answer questions about yourself, your strengths, reasons for leaving previous jobs, weaknesses, knowledge of the organization, and how you have improved your skills. For each question, it offers steps to formulate an effective answer, including connecting your experience to the employer's needs and providing evidence to support your strengths. Sample answers are provided for questions about career goals, knowledge of the organization, and professional development activities.
This document provides tips and sample answers for common marketing director interview questions. It discusses how to answer questions about yourself, your strengths, career goals, reasons for leaving previous jobs, weaknesses, knowledge of the organization, and ways you've improved your marketing skills. For each question, it offers steps and strategies for crafting effective responses, including giving brief introductions, summarizing relevant experience, and relating your abilities to the job requirements. Sample answers are provided for questions about strengths, weaknesses, reasons for leaving a job, and knowledge of the organization.
Teen drug abuse is a growing problem, with more teens being exposed to drugs like MDMA (molly), ecstasy, LSD, cocaine, and marijuana. Peers, music, parties, and raves influence teens to take drugs. The rave scene in particular has evolved to involve widespread drug use, especially MDMA. A tragic example was a 15-year-old who died of a drug overdose at an EDC rave in 2010, leading to bans on raves and gloving. While gloving is now a dance form, it remains banned in many places due to its perceived link to drug promotion.
Change management and version control of Scientific Applicationsijcsit
The development process of scientific applications is largely dependent on scientific progress and the
experimental research results. Thus, dealing with frequent changes is one of the main problems faced by
the developers of scientific software. Taking into account the results of the survey conducted among
scientists in the HP-SEE project, the implementation of change management and version control software
processes is inevitable. In this paper, we propose software engineering principles that should be included
in the development process to improve the version control and change management. Moreover, we give
some specific recommendations for their implementation, thereby making a slight modification of already
generally accepted templates and methods. The development steps practiced by scientists should not be
replaced completely, but they need to be supplemented with appropriate practices, documents and formal
methods. We also emphasize the reasons for the inclusion of these two processes and the consequences that
may arise as a result of their non-application.
This document provides tips and sample answers for common interview questions for an HR specialist position. It discusses how to answer questions about yourself, your strengths, career goals, reasons for leaving previous jobs, weaknesses, knowledge of the organization, and ways you've improved your HR knowledge. For each question, it offers steps and examples to effectively communicate your qualifications and experience in a positive light.
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA BOUNDARY CONDITIONSijcsit
We present the findings of analysis of elementary cellular automata (ECA) boundary conditions. Fixed and variable boundaries are attempted. The outputs of linear feedback shift registers (LFSRs) act as continuous inputs to the two boundaries of a one-dimensional (1-D) Elementary Cellular Automata (ECA) are analyzed and compared. The results show superior randomness features and the output string has passed the Diehard statistical battery of tests. The design has strong correlation immunity and it is inherently amenable for VLSI implementation. Therefore it can be considered to be a good and viable candidate for parallel pseudo random number generation
MULTI-PARAMETER BASED PERFORMANCE EVALUATION OF CLASSIFICATION ALGORITHMSijcsit
Diabetes disease is amongst the most common disease in India. It affects patient’s health and also leads to
other chronic diseases. Prediction of diabetes plays a significant role in saving of life and cost. Predicting
diabetes in human body is a challenging task because it depends on several factors. Few studies have reported the performance of classification algorithms in terms of accuracy. Results in these studies are difficult and complex to understand by medical practitioner and also lack in terms of visual aids as they arepresented in pure text format. This reported survey uses ROC and PRC graphical measures toimproveunderstanding of results. A detailed parameter wise discussion of comparison is also presented which lacksin other reported surveys. Execution time, Accuracy, TP Rate, FP Rate, Precision, Recall, F Measureparameters are used for comparative analysis and Confusion Matrix is prepared for quick review of each
algorithm. Ten fold cross validation method is used for estimation of prediction model. Different sets of
classification algorithms are analyzed on diabetes dataset acquired from UCI repository
Performance Variations in Profiling Mysql Server on the Xen Platform: Is It X...ijcsit
Reliability of a performance model is quintessential to the robust resource management of applications on
the cloud platform. Existing studies show that the contention for shared I/O induces temporal performance
variations in a guest VM and heterogeneity in the underlying hardware leads to relative performance
difference between guest VMs of the same abstract type. In this work, we demonstrate that a guest VM
exhibits significant performance variations across repeated runs in spite of contention free hosting of a
single guest VM on a physical machine. Also, notable performance difference between guest VMs created
equal on physical machines of a homogeneous cluster is noticed. Systematic examination of the components
involved in the request processing identifies disk I/O as the source of variations. Further investigation
establishes that the root cause of the variations is linked with how MySQL manages the storage of tables
and indexes on the guest VM's disk file system. The observed variations in performance raise the challenge
of creating a consistent and repeatable profile. To this end, we present and evaluate a black box approach
based on database population from a snapshot to reduce the perceived performance variations. The
experimental results show that the profile created for a database populated using a snapshot can be used
for performance modeling up to 80% CPU utilization. We validate our findings on the Amazon EC2 cloud
platform.
This document defines various accounting and business terms. It provides concise definitions for terms like a fortiori analysis, knowledge, leadership, transformational leadership, global leadership, goals, value, mindset, culture, communication, absorption costing, and many other accounting and business concepts. The definitions aim to concisely explain the key elements and meanings of each term.
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
The document describes a proposed model for sentiment analysis of movie reviews using natural language processing and machine learning approaches. The model first applies various data pre-processing techniques to the dataset, including tokenization, pruning, filtering tokens, and stemming. It then investigates the performance of classifiers like Naive Bayes and SVM combined with different feature selection schemes, including term occurrence, binary term occurrence, term frequency and TF-IDF. Experiments are run using n-grams up to 4-grams to determine the best approach for sentiment analysis.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The document discusses sarcasm detection using logistic regression. It compares the performance of logistic regression and SVM classification for sarcasm detection. Logistic regression achieved higher accuracy of 93.5% for sarcasm detection, with lower execution time compared to SVM classification. The proposed approach uses data preprocessing, feature extraction using N-grams, and trains a logistic regression classifier on a manually labeled dataset to classify text as sarcastic or non-sarcastic. Accuracy and execution time analysis shows logistic regression performs better than SVM for this task.
This document summarizes recent approaches in opinion mining and sentiment analysis (OMSA) in online social networks. It discusses 15 different frameworks and methods that have been proposed, including the Twitter Opinion Mining framework, an election prediction model using user influence factors, and a fuzzy deep belief network approach. The document analyzes the key steps and characteristics of each approach, such as data collection, preprocessing, classification techniques, and performance results. Overall, the paper reviews the state-of-the-art in OMSA research and highlights areas for future improvements.
Using Hybrid Approach Analyzing Sentence Pattern by POS Sequence over TwitterIRJET Journal
This document presents a study that uses part-of-speech (POS) sequence analysis to determine sentence patterns in tweets for sentiment analysis purposes. The study extracts 2-tag and 3-tag POS sequences from tweets and uses information gain to select the top sequences. Supervised classification with support vector machines is then performed using the POS sequences as features. The results show distinguishable sentence pattern groups for positive and negative tweets, and incorporating POS sequences can improve sentiment analysis accuracy compared to using lexicons alone.
The document summarizes research on aspect-based sentiment analysis. It discusses four main tasks in aspect-based sentiment analysis: aspect term extraction, aspect term polarity identification, aspect category detection, and aspect category polarity identification. It then reviews several approaches researchers have used for each task, including supervised methods like conditional random fields and support vector machines, as well as unsupervised methods. The document concludes by comparing results from different studies on restaurant and laptop review datasets.
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
A Survey On Sentiment Analysis Of Movie ReviewsShannon Green
This document provides a literature review on sentiment analysis of movie reviews. It discusses how sentiment analysis uses natural language processing, computational linguistics and text analytics to categorize the polarity of opinions in text as positive, negative or neutral. The document summarizes several research papers on sentiment analysis methods at the document, sentence and entity levels. Supervised machine learning classifiers like SVM generally perform better than unsupervised lexicon-based approaches. The document also discusses challenges in aspect-level sentiment analysis and analyzing sentiments in other domains like social media posts.
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...Andrew Parish
This document summarizes a research paper that analyzed sentiment of movie reviews written in Bangla using machine learning techniques. The researchers collected a dataset of over 4,000 Bangla movie reviews labeled as positive or negative. Using this dataset, they tested support vector machine and long short-term memory models, achieving 88.9% and 82.42% accuracy respectively. The paper also reviewed other prior work on Bangla sentiment analysis and compared different machine learning methods.
Sentiment Analysis and Classification of Tweets using Data MiningIRJET Journal
This document summarizes research on using data mining techniques to perform sentiment analysis on tweets. The researchers collected tweets from Twitter and preprocessed the text to make it usable for building sentiment classifiers. They used three classifiers - K-Nearest Neighbor, Naive Bayes, and Decision Tree - and compared the results to determine which provided the best accuracy. Rapid Miner tool was used to preprocess the text, build the classifiers, and analyze the results. The goal was to determine people's sentiments expressed in their tweets and correctly classify them.
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...AM Publications
Sentiment analysis is an interdisciplinary field between natural language processing, artificial intelligence and text mining. The main key of the sentiment analysis is the polarity that is meant by the sentiment is positive or negative (Chen, 2012). In this study using the method of classification support vector machine with the amount of data consumer reviews amounted to 648 data. The data obtained from consumer reviews from the marketplace with products sold is hand phone. The results of this study get 3 aspects that indicate sentiment analysis on the marketplace aspects of service, delivery and products. The slang dictionary used for the normation process is 552 words slang. This study compares the characteristic analysis to obtain the best classification result, because classification accuracy is influenced by characteristic analysis process. The result of comparison value from characteristic analysis between n-gram and TF-IDF by using Support Vector Machine method found that Unigram has the highest accuracy value, with accuracy value 80,87%. The results of this study explain that in the case of analysis sentiment at the aspect level with the comparison of characteristics with the classification model of support vector machine found that the analysis model of unigram character and classification of support vector machine is the best model
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
This document summarizes a proposed framework for sentiment classification using fuzzy logic. The framework aims to detect both implicit and explicit sentiment expressions in text by incorporating multiple datasets and techniques. It involves preprocessing text data, classifying sentiment, applying fuzzy logic to reduce emotions, and using fuzzy c-means clustering to further group similar emotions. The goal is to more accurately extract sentiment from transcripts by identifying both implicit and explicit expressions as well as topics through this combined approach. Evaluation metrics like precision, recall and f-measure will be used to assess performance.
Mining of product reviews at aspect levelijfcstjournal
Today’s world is a world of Internet, almost all work can be done with the help of it, from simple mobile
phone recharge to biggest business deals can be done with the help of this technology. People spent their
most of the times on surfing on the Web; it becomes a new source of entertainment, education,
communication, shopping etc. Users not only use these websites but also give their feedback and
suggestions that will be useful for other users. In this way a large amount of reviews of users are collected
on the Web that needs to be explored, analyse and organized for better decision making. Opinion Mining or
Sentiment Analysis is a Natural Language Processing and Information Extraction task that identifies the
user’s views or opinions explained in the form of positive, negative or neutral comments and quotes
underlying the text. Aspect based opinion mining is one of the level of Opinion mining that determines the
aspect of the given reviews and classify the review for each feature. In this paper an aspect based opinion
mining system is proposed to classify the reviews as positive, negative and neutral for each feature.
Negation is also handled in the proposed system. Experimental results using reviews of products show the
effectiveness of the system.
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
This document summarizes research on sentiment polarity analysis of Twitter data from different events. It discusses how Twitter data can be used for opinion mining and sentiment analysis. Several papers that used techniques like naive Bayes classifier, support vector machines, and dual sentiment analysis on Twitter data are summarized. The document also provides an overview of the key steps involved in a Twitter sentiment analysis system, including data collection, preprocessing, feature extraction, training a classification model, and evaluating accuracy. The goal of analyzing sentiments on Twitter is to understand public opinions on different topics and events.
This document summarizes different techniques for sentiment analysis, including supervised and unsupervised methods. It discusses sentiment analysis at the document, sentence and entity/aspect level. Supervised techniques covered are Naive Bayes, Support Vector Machines, and Decision Trees. Unsupervised techniques discussed are semantic orientation and SentiWordNet-based approaches. The document provides advantages and disadvantages of each technique and compares their performance, finding that supervised methods like SVM generally have higher accuracy but require large labeled training datasets.
A Study On Sentiment Analysis Methods And ToolsJim Jimenez
The document summarizes sentiment analysis methods and tools. It discusses how sentiment analysis is used to analyze opinions expressed in text sources like blogs, reviews and social media to determine whether the sentiment is positive, negative or neutral. It describes the key steps in sentiment analysis as opinion retrieval from sources, opinion classification (identifying text as expressing positive or negative sentiment) and opinion summarization. It also outlines different techniques used for sentiment analysis including supervised machine learning algorithms and lexicon-based methods.
This document proposes a model to estimate overall sentiment score by applying rules of inference from discrete mathematics. It discusses sentiment analysis and related work using techniques like supervised/unsupervised learning. The problem is identifying sentiment components and restricting patterns for feature identification. Most approaches focus on nouns/adjectives but not verbs/adverbs. The model preprocesses product review datasets using NLTK for stemming, parsing and tokenizing. It builds a lexicon dictionary of positive and negative words. The Lexical Pattern Sentiment Analysis algorithm uses both lexicon and pattern mining - it selects sentence patterns, checks for positive/negative words in the lexicon, and calculates an overall sentiment score.
This document discusses opinion mining and sentiment analysis for business intelligence purposes. It provides an overview of related work on extracting opinions from text to classify sentiments. The paper surveys techniques like lexicon-based approaches and machine learning algorithms for sentiment classification. It also discusses how opinion mining can help business analysts extract relevant information from large amounts of unstructured data on the web to make informed decisions. Future work may involve applying techniques like neural networks and improving information retrieval from XML data sources.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
The Python for beginners. This is an advance computer language.
An experimental study of feature
1. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
DOI :10.5121/ijscai.2015.4102 15
AN EXPERIMENTAL STUDY OF FEATURE
EXTRACTION TECHNIQUES IN OPINION
MINING
J. Ashok Kumar1
, S. Abirami2
Research Scholar1
, Assistant Professor2
1, 2
Department of Information Science and Technology, Anna University, Chennai, India
ABSTRACT
The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis
(OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and
neutral polarity about the product, user reviews, user comments, and etc., in social media for the purpose
of decision making and Business Intelligence to individuals or organizations. In this paper, we have
performed an experimental study for different feature extraction or selection techniques available for
opinion mining task. This experimental study is carried out in four stages. First, the data collection process
has been done from readily available sources. Second, the pre-processing techniques are applied
automatically using the tools to extract the terms, POS (Parts-of-Speech). Third, different feature selection
or extraction techniques are applied over the content. Finally, the empirical study is carried out for
analyzing the sentiment polarity with different features.
KEYWORDS
Sentiment Analysis, Opinion Mining, Feature Extraction, Polarity Classification, and Sentiment Polarity
1. INTRODUCTION
In this emerging trend, OMSA plays a vital role in social media contents, and it is used to
determine the polarities of the contents into positive, negative and neutral for the product, user
reviews, user comments, and etc. The sentiments are usually studied at the document level,
sentence level, entity and feature or aspect level. The feature selection or extraction is one of the
most important tasks in OMSA. An entity is the hierarchical representation of components and
subcomponents. Each component is associated with set of attributes, whereas the large amount of
documents is processed for sentiment with different features such as n-grams, part-of-speech,
location based features, lexicon based features, syntactic features, structural or discourse features,
and etc., [10].
In this paper, the experimental study is carried out with a freely available dataset for different
feature selection techniques. This work is presented as a framework into the data collection
process, pre-processing, feature selection, and experimental study for the performance evaluation.
In section 2, the OMSA related works are presented. In section 3, the OMSA framework is
described for data collection process, pre-processing technique and feature selection methods. In
section 4, the experimental study is carried out. In section 5, the conclusion is presented with
future challenges and developments.
2. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
16
2. RELATED WORKS
In OMSA, many feature selection or extraction techniques available. But only few related works
are presented in this paper as follows. Jose M. Chenlo et al [10] demonstrated wide range of
features such as n-grams, Part-of-speech, Location based features, Lexicon based features,
Syntactic features, Structural or discourse features, and etc., for sentiment classification. In [9],
the OMSA approach is presented with different frameworks and algorithms as a review and their
results were compared and analyzed for readily available datasets. Farhan Hassan Khan et al. [4]
proposed a new TOM framework to predict the polarity of words into positive or negative
feelings in tweets, and to improve the accuracy level of this classification by using the noun and
adjective features. Malhar Anjaria et al. [13] introduced a model to predict the election result by
applying the user influence factor (re-tweets and each party garners) and extracting opinions
using direct and indirect feature on the basis of the supervised algorithms such as simple
probabilistic model, Uniform classification model, achieves maximum margin hyper plane, feed
forward network, and dimensionality reduction by using the unigram, bigram and a unigram +
bigram features. Tapia-Rosero A et al. [18] employed a method to detect similarity shaped
membership functions in group decision making process by applying the get shape-string and
feature-string algorithms. Jun ma et al. [11] stated a method to reduce the chance of applying
inappropriate decisions in the multi-criteria group decision making (MCGDM) in three levels.
Whereas the term set will be divided into several semantic-equal groups for the criterion,
identifies an appropriate criterion, and each individual criterion to observe similarity of two
opinions.
Xiaolin Zheng et al. [20] presented an unsupervised dependency analysis-based approach to
extract appraisal expression pattern such as domain, aspect word, sentiment word, background
word, and review. Vinodhini G et al. [19] introduced two frameworks by the combination of
classifiers with principal component analysis (PCA) to reduce the dimension of feature set. By
extracting the features from opinions expressed by users, and providing the positive, negative and
neutral values of nouns, adjectives and verbs, Isidro Penalver-Martinez et al. [8] presented an
innovative method called ontology based opinion mining to improve the feature-based opinion
mining by employing the ontologies in selection of features and to provide a new vector analysis-
based method for sentiment analysis. Alvaro Ortigosa et al. [1] introduced a new method is called
sentiment extraction and change detection for extracting sentiments from texts. Using random
independent, weighted versions, and random subspaces of the feature space respectively, Gang
Wang et al. [5] conducted the comparative assessment to measure the performance of three
ensemble methods (Bagging, Boosting, and Random Subspace) with five learners. Daekook Kang
et al. [3] presented a new framework by combining the VIKOR ranking method and sentiment
analysis for measurement of customer satisfaction in mobile services using the dictionary of
attributes and dictionary of sentiment words which are expressed in verb phrases, adjective
phrases and adverbial phrases. Sheng Huang et al. [17] proposed an automatic construction
strategy of domain specific sentiment lexicon based on constrained label propagation by using
sentiment term extraction nouns and noun phrases, adjectives, adverbs and its phrases. Arturo
Montejo-Raez et al. [2] employed a method for sentiment classification by using weights of Word
Net graph.
3. OMSA FRAMEWORK
The OMSA framework consists of the data collection process, pre-processing techniques, feature
selection or extraction, and evaluation. This process is shown in Fig. 1, and described below in
detail.
3. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
17
Figure 1. OMSA Framework
3.1 Data Collection Process and Pre-processing
The data collection process is the first stage in OMSA approach. In this stage, a freely available
dataset is used for preprocessing the data. This readily available dataset is accessed for the
purpose of experimental study. The collected dataset serves as input to the pre-processing stage
and further the feature selection or extraction method has been applied over it to classify the
polarity into positive, negative, and neutral. In this stage, the large amount of data is processed
using the tools Gate Tool [7] and Semantria API [21].These tools are processing the data very
quickly. Further, the process is analyzed for various feature extraction or selection method as
discussed in section 3.3. The processing time is also compared in the above mentioned tools.
3.2 Feature Extraction methods
In this stage, all the documents available in the corpus are represented as Bag of words (BOW),
and it is easy and very efficient method in text classification [3]. For this BOW, the sophisticated
feature method needs to be applied to understand the documents in sentiment classification task.
In this work, the POS, entity, phrases, weighting schemes and document features are considered
for sentiment classification task.
3.2.1 Parts of Speech (POS)
In Parts of Speech (POS), the entire document content is represented as unigrams and N-grams,
and which are divided into three groups. Group 1 consists of single words is called unigrams, and
Group 2 consists of multiword is also called as N-grams. In this feature sets, the most relevant
features are considered for sentiment classification.
3.2.2. Document Level
The document level features are considered to classify the textual reviews on a single topic into
positive, negative, and neutral. In general, the document features determines the overall sentiment
polarity.
3.2.3. Phrase Level
The phrase level features are used to determine whether an expression is positive, negative or
neutral. Fourth, the entity features are used to extract name, location, address, and etc.
3.2.4. Weighting Scheme
The weighting scheme feature (tf-idf) for single word and multiword are considered for the
sentiment classification in the document. The tf-idf value is calculated based on the below
mentioned formula.
4. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
18
)df
N(2LogFrequency)Document(InverseIDF =
(t)IDFd)(t,tfd)(t,Weight ×=
Where N is the total number of documents, df is represented as document frequency and tf is
represented as term frequency, and t represents terms and d represents documents. An
experimental study of the above methods has been carried out in this paper and discussed in the
next section.
4. EXPERIMENTAL STUDY
An experimental procedure has been carried out with an extension of the OMSA approach [9]. In
this approach, the Polarity Classification Algorithm (PCA) and evaluation procedure is applied to
verify the accuracy. The evaluation procedure is tested with four different datasets namely Apple,
Google, Microsoft, and Twitter. The contents or texts in the dataset are focused only on the topic
of the companies as named above. Each datasets contained the tweet sentiment (positive,
negative, neutral, and irrelevant) of the count 1313, 1381, 1415, and 1404 respectively. These
datasets attained the accuracy of 96.73%, 96.89%, 96.96%, and 96.93% with the average
accuracy of 96.88%. Also, the obtained average precision, recall, and F-measure are compared
[9].
By using the above work as a model finding, the semantria trip advisor dataset is used for the
sentiment polarity classification, and which contains 200 review documents. These documents are
processed in GATE tool [7]. In this tool, unigrams, N-grams, and weighting scheme (tf-idf) are
extracted using the plug-ins called TermRaider and PMI Score. Then, the extracted features are
processed in Semantria API for the sentiment polarity [21]. The large amount of data is processed
in less time for polarity classification. In this experiment, POS based features, entity, phrases,
document features and weighting schemes are only considered as features. The dataset is
classified into positive, negative, and neutral for the above mentioned features. The polarity
scores are calculated as 1 for positive, -1 for negative, and 0 for neutral. The classification
performance is evaluated and analyzed by using the confusion matrices, precision, recall, F-
measure, and accuracy across the various features and the results are tabulated in Table 1 and
Table 3.
Table 1. Types of features with count Table 2. Confusion matrix
The class X, Y, and Z are represented as positive, negative, and neutral respectively in confusion
matrices as shown in Table 2. The diagonal elements tpX, tpY, tpZ indicates that the correctly
classified data for each class and the remaining elements are incorrectly classified data.
eZX+eYX+tpX
tpX
XPrecision =
5. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
19
Where tpX is the number of true positive predictions for the class X and eYX, eZX are false
positives.
eXZ+eXY+tpX
tpX
XRecall =
Where tpX is the number of true positive predictions for the class X, and eXY, eXZ are false
negatives.
Recall+Precision
Recall)x(Precisionx2
measure-F =
NeutralsFalse+NeutralsTrue+NegativeFalse+NegativeTrue+PositiveFalse+PositiveTrue
)NeutralsTrue+NegativeTrue+PositiveTrue(x2
Accuracy=
Table 3. The results obtained by using the confusion matrices
The OMSA approach is evaluated by using a single dataset is with six different feature levels to
verify the predictive results. The polarity score is counted for all six features as shown in Table 1
and then the evaluated the polarity scores with 26 trained polarity scores. In this approach, the
three polarity scores (positive, negative, and neutral) are considered for classification. The
irrelevant score is not considered additionally in this OMSA approach. The obtained accuracy
level is shown Table 3 and Fig.3 for the six different features (97.96% single word, 97.90%
multiword, 95.91% document level, 96.41% phrase level, 98.36% tf-idf single word, 98.26% tf-
idf multiword) in a single dataset. The precision, recall, and F-measure values will vary for
different trained polarity scores and the accuracy level is considered as same all the respective
feature sets.
6. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
20
Figure 2. The graphical representation of Precision, Recall, and F-measures (a - f)
5. CONCLUSION AND FUTURE DIRECTIONS
The feature selection or extraction is one of the most important tasks in Opinion mining and
Sentimental Analysis (OSMA) for calculating the polarity score. In this paper, we implemented
the OMSA approach and analyzed the results by using a single dataset for different feature
extraction or selection techniques namely single word, Multiword, Document Level, Phrase
Level, Tf-idf single word and Tf-idf Multiword. The results seems to be different for above
mentioned features. There are many challenges and future developments possible in OMSA
approach like short length and irregular structure of the content such as named entity recognition,
anaphora resolution, parsing, sarcasm, sparsity, abbreviations, poor spellings, punctuation and
grammar, incomplete sentences, and the applications in [18] strategic planning, suitability
analysis, and applications like fuzzy control, fuzzy time series to find the similarities, [11]
missing value and unclear answers, [20] clause level and into aspect-based review summarization,
sentiment classification, and personalized recommendation systems, [12] corresponding guidance
and interference, [6] ontology, [12] weight of the edges, [17] constraints knowledge between
sentiment terms and distinguishing the aspect-specific polarities.
7. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.4, No.1, February 2015
21
REFERENCES
[1] Alvaro Ortigosa, Jose M. Martin, Rosa M. Carro.: Sentiment analysis in Facebook and its application
to e-learning. Computers in Human Behavior. 31, 527-541 (2014)
[2] Arturo Montejo-Raez, Eugenio Martinez-Camara, M. Teresa Martin-Valdivia, L. Alfonso Urena-
Lopez.: Ranked WordNet graph for Sentiment Polarity Classification in Twitter. Computer Speech
and Language. 28, 93-107 (2014)
[3] Daekook Kang, Yongtae Park.: Review-based measurement of customer satisfaction in mobile
service: Sentiment analysis and VIKOR approach. Expert Systems with Applications. 41, 1041-1050
(2014)
[4] Farhan Hassan Khan, Saba Bashir, Usman Qamar.: TOM: Twitter opinion mining framework using
hybrid classification scheme. Decision Support Systems. 57, 245-257 (2014)
[5] Gang Wang, Jianshan Sun, Jian Ma, Kaiquan Xu, Jibao Gu.: Sentiment Classification: The
contribution of ensemble learning. Decision support systems. 57, 77-93 (2014)
[6] Gowsikhaa D, Abirami S, Baskaran R.: Construction of image ontology using low level features for
image retrieval. Proceedings of the International Conference on Computer Communication and
Informatics. 129-134 (2012)
[7] H. Cunningham, A. Hanbury, and S. Rüger. Scaling up high-value retrieval to medium-volume data.
In H. Cunningham, A. Hanbury, and S. Rüger, editors, Advances in Multidisciplinary Retrieval (the
1st Information Retrieval Facility Conference). LNCS volume number: 6107, Lecture Notes in
Computer Science, Vienna, Austria, May 2010. Sprin.
[8] Isidro Penalver-Martinez, Francisco Garcia-Sanchez, Rafael Valencia-Garcia, Miguel Angel
Rodriguez-Garcia, Valentin Moreno, Anabel Fraga, Jose Luis Sanchez-Cervantes.: Feature-based
opinion mining through ontologies. Expert Systems with Applications. 41, 5995-6008 (2014)
[9] J. Ashok Kumar, S. Abirami, S. Murugappan.: Performance analysis of the recent role of OMSA
approaches in Online Social Networks. SAI-2014, 21-32, (2014).
[10] Jose M. Chenlo, David E. Losada.: An empirical study of sentence features for subjectivity and
polarity classification. Information Sciences. 280, 275-288 (2014)
[11] Jun Ma, Jie Lu, Guangquan Zhang.: A three-level-similarity measuring method of participant
opinions in multiple-criteria group decision supports. Decision Support Systems. 59, 74-83 (2014)
[12] Kyoungok Kim, Jaewook Lee.: Sentiment visualization and classification via semi-supervised
nonlinear dimensionality reduction. Pattern Recognition. 47, 758-768 (2014)
[13] Malhar Anjaria & Ram Mohana Reddy Guddeti.: Influence factor based opinion mining of twitter
data using supervised learning. Sixth IEEE International conference on communication systems and
networks (COMSNETS). ISSN: 1409-5982, (2014)
[14] Ning Ma, Yijun Liu.: SuperedgeRank algorithm and its application in identifying opinion leader of
online public opinion supernetwork. Expert Systems with Applications. 41, 1357-1368 (2014)
[15] Rui Xia, Chengqing Zong, Shoushan Li.: Ensemble of feature sets and classification algorithms for
sentiment classification. Information Scinces. 181, 1138-1152 (2011).
[16] R.V. Vidhu Bhala, S. Abirami.: Trends in word sense disambiguation. Artificial Intelligence Review:
An International Science and Engineering Journal. DOI 10.1007/s10462-012-9331-5, Springer,
(2012)
[17] Sheng Huang, Zhendong Niu, Chongyang Shi.: Automatic construction of domain-specific sentiment
lexicon based on constrained label propagation. Knowledge-Based Systems. 56, 191-200 (2014)
[18] Tapia-Rosero A, A. Bronselaer, G. De Tre.: A method based on shape-similarity for detecting similar
opinions in group decision-making. Information Sciences. 258, 291-311 (2014)
[19] Vinodhini G, Chandrasekaran RM.: Measuring quality of hybrid opinion mining model for e-
commerce application. Measurement. 55, 101-109 (2014)
[20] Xiaolin Zheng, Zhen Lin, Xiaowei Wang, Kwei-Jay Lin, Meina Song.: Incorporating appraisal
expression patterns into topic modeling for aspect and sentiment word identification. Knowledge-
Based Systems. 61, 29-47 (2014)
[21] https://semantria.com/