This document summarizes a research paper on the importance of including a neutral category in sentiment analysis using fuzzy clustering. It discusses related work on sentiment analysis and clustering algorithms. It then presents a model that uses fuzzy c-means clustering to analyze tweets about the International Criminal Court in Kenya. The model represents sentiments in a 3D plot to capture varying levels of positivity, negativity, and neutrality, rather than just positive or negative. The goal is to gain more information from sentiments than a simple two-category classification.
Min-based qualitative possibilistic networks are one of the effective tools for a compact representation of decision problems under uncertainty. The exact approaches for computing decision based on possibilistic networks are limited by the size of the possibility distributions.
Generally, these approaches are based on possibilistic propagation algorithms. An important step in the computation of the decision is the transformation of the DAG into a secondary structure, known as the junction trees. This transformation is known to be costly and represents a difficult problem. We propose in this paper a new approximate approach for the computation
of decision under uncertainty within possibilistic networks. The computing of the optimal optimistic decision no longer goes through the junction tree construction step. Instead, it is performed by calculating the degree of normalization in the moral graph resulting from the merging of the possibilistic network codifying knowledge of the agent and that codifying its preferences.
A Term Paper for the Course of Theories and Approaches in Language Teaching(...DawitDibekulu
at the end of this presentation you will be able to:
Identify and know the concept of:
Theory and Hypothesis
Approach, Method and Techniques
Skill, Competence and Performance
Know the relation between them
Identify their difference
Know their benefit for ELT
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Min-based qualitative possibilistic networks are one of the effective tools for a compact representation of decision problems under uncertainty. The exact approaches for computing decision based on possibilistic networks are limited by the size of the possibility distributions.
Generally, these approaches are based on possibilistic propagation algorithms. An important step in the computation of the decision is the transformation of the DAG into a secondary structure, known as the junction trees. This transformation is known to be costly and represents a difficult problem. We propose in this paper a new approximate approach for the computation
of decision under uncertainty within possibilistic networks. The computing of the optimal optimistic decision no longer goes through the junction tree construction step. Instead, it is performed by calculating the degree of normalization in the moral graph resulting from the merging of the possibilistic network codifying knowledge of the agent and that codifying its preferences.
A Term Paper for the Course of Theories and Approaches in Language Teaching(...DawitDibekulu
at the end of this presentation you will be able to:
Identify and know the concept of:
Theory and Hypothesis
Approach, Method and Techniques
Skill, Competence and Performance
Know the relation between them
Identify their difference
Know their benefit for ELT
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Neural perceptual model to global local vision for the recognition of the log...ijaia
This paper gives the definition of Transparent Neural Network “TNN” for the simulation of the globallocal
vision and its application to the segmentation of administrative document image. We have developed
and have adapted a recognition method which models the contextual effects reported from studies in
experimental psychology. Then, we evaluated and tested the TNN and the multi-layer perceptron “MLP”,
which showed its effectiveness in the field of the recognition, in order to show that the TNN is clearer for
the user and more powerful on the level of the recognition. Indeed, the TNN is the only system which makes
it possible to recognize the document and its structure.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
Uncertainty is a pervasive in real world environment due to vagueness, is associated with the
difficulty of making sharp distinctions and ambiguity, is associated with situations in which the
choices among several precise alternatives cannot be perfectly resolved. Analysis of large
collections of uncertain data is a primary task in the real world applications, because data is
incomplete, inaccurate and inefficient. Representation of uncertain data in various forms such
as Data Stream models, Linkage models, Graphical models and so on, which is the most simple,
natural way to process and produce the optimized results through Query processing. In this
paper, we propose the Uncertain Data model can be represented as Possibilistic data model
and vice versa for the process of uncertain data using various data models such as possibilistic
linkage model, Data streams, Possibilistic Graphs. This paper presents representation and
process of Possiblistic Linkage model through Possible Worlds with the use of product-based
operator.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In order to solve the complex decision-making problems, there are many approaches and systems based on the fuzzy theory were proposed. In 1998, Smarandache introduced the concept of single-valued neutrosophic set as a complete development of fuzzy theory. In this paper, we research on the distance measure between single-valued neutrosophic sets based on the H-max measure of Ngan et al. [8]. The proposed measure is also a distance measure between picture fuzzy sets which was introduced by Cuong in 2013 [15]. Based on the proposed measure, an Adaptive Neuro Picture Fuzzy Inference System (ANPFIS) is built and applied to the decision making for the link states in interconnection networks. In experimental evaluation on the real datasets taken from the UPV (Universitat Politècnica de València) university, the performance of the proposed model is better than that of the related fuzzy methods.
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
As the web is increasing exponentially, so it is very much difficult to provide relevant information to the information seekers. While searching some information on the web, users can easily fade out in rich hypertext. The existing techniques provide the results that are not up to the mark. This paper focuses on the technique that helps in offering more accurate results, especially in case of Homographs. Homograph is a word that shares the same written form but has different meanings. The technique that shows how senses of words can play an important role in offering accurate search results, is described in following sections. While adopting this technique user can receive only relevant pages on the top of the search result.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
MODELLING OF INTELLIGENT AGENTS USING A–PROLOGijaia
Nowadays, research in artificial intelligence has widely grown in areas such as knowledge representation,
goal-directed behaviour and knowledge reusability, all of them directly relevant to improving intelligent
agents in computer games. In a particular way, we focus on the development of a novel algorithm that
allow an agent to combine fundamental theories such as reasoning, learning and simulation. This
algorithm combines a system of logical rules and a simulation mechanism based on learning that makes
our agent has an infallible mechanism for decision-making into the game “connect four”. The logic system
is developed in a modelling language known as Answer Set Programming or A–Prolog. This paradigm is
the integration of two well-known languages, namely Prolog and ASP.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
The main concept of neutrosophy is that any idea has not only a certain degree of truth but also a degree of falsity and indeterminacy in its own right. Although there are many applications of neutrosophy in different disciplines, the incorporation of its logic in education and psychology is rather scarce compared to other fields. In this study, the Satisfaction with Life Scale was converted into the neutrosophic form and the results were compared in terms of confirmatory analysis by convolutional neural networks. To sum up, two different formulas are proposed at the end of the study to determine the validity of any scale in terms of neutrosophy. While the Lawshe methodology concentrates on the dominating opinions of experts limited by a one-dimensional data space analysis, it should be advocated that the options can be placed in three-dimensional data space in the neutrosophic analysis . The effect may be negligible for a small number of items and participants, but it may create enormous changes for a large number of items and participants. Secondly, the degree of freedom of Lawshe technique is only 1 in 3D space, whereas the degree of freedom of neutrosophical scale is 3, so researchers have to employ three separate parameters of 3D space in neutrosophical scale while a resarcher is restricted in a 1D space in Lawshe technique in 3D space. The third distinction relates to the analysis of statistics. The Lawhe technical approach focuses on the experts' ratio of choices, whereas the importance and correlation level of each item for the analysis in neutrosophical logic are analysed. The fourth relates to the opinion of experts. The Lawshe technique is focused on expert opinions, yet in many ways the word expert is not defined. In a neutrosophical scale, however, researchers primarily address actual participants in order to understand whether the item is comprehended or opposed to or is imprecise. In this research, an alternative technique is presented to construct a valid scale in which the scale first is transformed into a neutrosophical one before being compared using neural networks. It may be concluded that each measuring scale is used for the desired aim to evaluate how suitable and representative the measurements obtained are so that its content validity can be evaluated.
PSO OPTIMIZED INTERVAL TYPE-2 FUZZY DESIGN FOR ELECTIONS RESULTS PREDICTIONijfls
Interval type-2 fuzzy logic systems (IT2FLSs), have recently shown great potential in various applications with dynamic uncertainties. It is believed that additional degree of uncertainty provided by IT2FL allows for better representation of the uncertainty and vagueness present in prediction models. However, determining the parameters of the membership functions of IT2FL is important for providing optimum
performance of the system. Particle Swarm Optimization (PSO) has attracted the interest of researchers due to their simplicity, effectiveness and efficiency in solving real-world optimization problems. In this paper, a novel optimal IT2FLS is designed, applied for predicting winning chances in elections. PSO is
used as an optimized algorithm to tune the parameter of the primary membership function of the IT2FL to improve the performance and increase the accuracy of the IT2F set. Simulation results show the superiority of the PSO-IT2FL to the similar non-optimal IT2FL system with an increase in the prediction.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
Neural perceptual model to global local vision for the recognition of the log...ijaia
This paper gives the definition of Transparent Neural Network “TNN” for the simulation of the globallocal
vision and its application to the segmentation of administrative document image. We have developed
and have adapted a recognition method which models the contextual effects reported from studies in
experimental psychology. Then, we evaluated and tested the TNN and the multi-layer perceptron “MLP”,
which showed its effectiveness in the field of the recognition, in order to show that the TNN is clearer for
the user and more powerful on the level of the recognition. Indeed, the TNN is the only system which makes
it possible to recognize the document and its structure.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
Uncertainty is a pervasive in real world environment due to vagueness, is associated with the
difficulty of making sharp distinctions and ambiguity, is associated with situations in which the
choices among several precise alternatives cannot be perfectly resolved. Analysis of large
collections of uncertain data is a primary task in the real world applications, because data is
incomplete, inaccurate and inefficient. Representation of uncertain data in various forms such
as Data Stream models, Linkage models, Graphical models and so on, which is the most simple,
natural way to process and produce the optimized results through Query processing. In this
paper, we propose the Uncertain Data model can be represented as Possibilistic data model
and vice versa for the process of uncertain data using various data models such as possibilistic
linkage model, Data streams, Possibilistic Graphs. This paper presents representation and
process of Possiblistic Linkage model through Possible Worlds with the use of product-based
operator.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In order to solve the complex decision-making problems, there are many approaches and systems based on the fuzzy theory were proposed. In 1998, Smarandache introduced the concept of single-valued neutrosophic set as a complete development of fuzzy theory. In this paper, we research on the distance measure between single-valued neutrosophic sets based on the H-max measure of Ngan et al. [8]. The proposed measure is also a distance measure between picture fuzzy sets which was introduced by Cuong in 2013 [15]. Based on the proposed measure, an Adaptive Neuro Picture Fuzzy Inference System (ANPFIS) is built and applied to the decision making for the link states in interconnection networks. In experimental evaluation on the real datasets taken from the UPV (Universitat Politècnica de València) university, the performance of the proposed model is better than that of the related fuzzy methods.
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
As the web is increasing exponentially, so it is very much difficult to provide relevant information to the information seekers. While searching some information on the web, users can easily fade out in rich hypertext. The existing techniques provide the results that are not up to the mark. This paper focuses on the technique that helps in offering more accurate results, especially in case of Homographs. Homograph is a word that shares the same written form but has different meanings. The technique that shows how senses of words can play an important role in offering accurate search results, is described in following sections. While adopting this technique user can receive only relevant pages on the top of the search result.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
MODELLING OF INTELLIGENT AGENTS USING A–PROLOGijaia
Nowadays, research in artificial intelligence has widely grown in areas such as knowledge representation,
goal-directed behaviour and knowledge reusability, all of them directly relevant to improving intelligent
agents in computer games. In a particular way, we focus on the development of a novel algorithm that
allow an agent to combine fundamental theories such as reasoning, learning and simulation. This
algorithm combines a system of logical rules and a simulation mechanism based on learning that makes
our agent has an infallible mechanism for decision-making into the game “connect four”. The logic system
is developed in a modelling language known as Answer Set Programming or A–Prolog. This paradigm is
the integration of two well-known languages, namely Prolog and ASP.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
The main concept of neutrosophy is that any idea has not only a certain degree of truth but also a degree of falsity and indeterminacy in its own right. Although there are many applications of neutrosophy in different disciplines, the incorporation of its logic in education and psychology is rather scarce compared to other fields. In this study, the Satisfaction with Life Scale was converted into the neutrosophic form and the results were compared in terms of confirmatory analysis by convolutional neural networks. To sum up, two different formulas are proposed at the end of the study to determine the validity of any scale in terms of neutrosophy. While the Lawshe methodology concentrates on the dominating opinions of experts limited by a one-dimensional data space analysis, it should be advocated that the options can be placed in three-dimensional data space in the neutrosophic analysis . The effect may be negligible for a small number of items and participants, but it may create enormous changes for a large number of items and participants. Secondly, the degree of freedom of Lawshe technique is only 1 in 3D space, whereas the degree of freedom of neutrosophical scale is 3, so researchers have to employ three separate parameters of 3D space in neutrosophical scale while a resarcher is restricted in a 1D space in Lawshe technique in 3D space. The third distinction relates to the analysis of statistics. The Lawhe technical approach focuses on the experts' ratio of choices, whereas the importance and correlation level of each item for the analysis in neutrosophical logic are analysed. The fourth relates to the opinion of experts. The Lawshe technique is focused on expert opinions, yet in many ways the word expert is not defined. In a neutrosophical scale, however, researchers primarily address actual participants in order to understand whether the item is comprehended or opposed to or is imprecise. In this research, an alternative technique is presented to construct a valid scale in which the scale first is transformed into a neutrosophical one before being compared using neural networks. It may be concluded that each measuring scale is used for the desired aim to evaluate how suitable and representative the measurements obtained are so that its content validity can be evaluated.
PSO OPTIMIZED INTERVAL TYPE-2 FUZZY DESIGN FOR ELECTIONS RESULTS PREDICTIONijfls
Interval type-2 fuzzy logic systems (IT2FLSs), have recently shown great potential in various applications with dynamic uncertainties. It is believed that additional degree of uncertainty provided by IT2FL allows for better representation of the uncertainty and vagueness present in prediction models. However, determining the parameters of the membership functions of IT2FL is important for providing optimum
performance of the system. Particle Swarm Optimization (PSO) has attracted the interest of researchers due to their simplicity, effectiveness and efficiency in solving real-world optimization problems. In this paper, a novel optimal IT2FLS is designed, applied for predicting winning chances in elections. PSO is
used as an optimized algorithm to tune the parameter of the primary membership function of the IT2FL to improve the performance and increase the accuracy of the IT2F set. Simulation results show the superiority of the PSO-IT2FL to the similar non-optimal IT2FL system with an increase in the prediction.
PSO OPTIMIZED INTERVAL TYPE-2 FUZZY DESIGN FOR ELECTIONS RESULTS PREDICTIONijfls
Interval type-2 fuzzy logic systems (IT2FLSs), have recently shown great potential in various applications
with dynamic uncertainties. It is believed that additional degree of uncertainty provided by IT2FL allows
for better representation of the uncertainty and vagueness present in prediction models. However,
determining the parameters of the membership functions of IT2FL is important for providing optimum
performance of the system. Particle Swarm Optimization (PSO) has attracted the interest of researchers
due to their simplicity, effectiveness and efficiency in solving real-world optimization problems. In this
paper, a novel optimal IT2FLS is designed, applied for predicting winning chances in elections. PSO is
used as an optimized algorithm to tune the parameter of the primary membership function of the IT2FL to
improve the performance and increase the accuracy of the IT2F set. Simulation results show the superiority
of the PSO-IT2FL to the similar non-optimal IT2FL system with an increase in the prediction.
Sentiment classification aims to detect information such as opinions, explicit , implicit feelings expressed
in text. The most existing approaches are able to detect either explicit expressions or implicit expressions of
sentiments in the text separately. In this proposed framework it will detect both Implicit and Explicit
expressions available in the meeting transcripts. It will classify the Positive, Negative, Neutral words and
also identify the topic of the particular meeting transcripts by using fuzzy logic. This paper aims to add
some additional features for improving the classification method. The quality of the sentiment classification
is improved using proposed fuzzy logic framework .In this fuzzy logic it includes the features like Fuzzy
rules and Fuzzy C-means algorithm.The quality of the output is evaluated using the parameters such as
precision, recall, f-measure. Here Fuzzy C-means Clustering technique measured in terms of Purity and
Entropy. The data set was validated using 10-fold cross validation method and observed 95% confidence
interval between the accuracy values .Finally, the proposed fuzzy logic method produced more than 85 %
accurate results and error rate is very less compared to existing sentiment classification techniques.
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
For one dimensional homogeneous, isotropic aquifer, without accretion the governing Boussinesq
equation under Dupuit assumptions is a nonlinear partial differential equation. In the present paper
approximate analytical solution of nonlinear Boussinesq equation is obtained using Homotopy
perturbation transform method(HPTM). The solution is compared with the exact solution. The
comparison shows that the HPTM is efficient, accurate and reliable. The analysis of two important aquifer
parameters namely viz. specific yield and hydraulic conductivity is studied to see the effects on the height
of water table. The results resemble well with the physical phenomena.
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
Sentiment analysis and Opinion mining has emerged as a popular and efficient technique for information retrieval and web data analysis. The exponential growth of the user generated content has opened new horizons for research in the field of sentiment analysis. This paper proposes a model for sentiment analysis of movie reviews using a combination of natural language processing and machine learning approaches. Firstly, different data pre-processing schemes are applied on the dataset. Secondly, the behaviour of twoclassifiers, Naive Bayes and SVM, is investigated in combination with different feature selection schemes to
obtain the results for sentiment analysis. Thirdly, the proposed model for sentiment analysis is extended to
obtain the results for higher order n-grams.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
This is the final report for Networks & Data Mining Techniques project focusing on mining social network to estimate public opinion about entities and associated keywords. This project mines Twitter for recent feeds and analyzes them to estimate sentiment score, discussed entity and describing keywords in each tweet. This data is then exploited to elicit overall sentiment associated with each entity. Entities and keywords extracted is also used to form an entity-keyword bigraph. This graph is further used to detect entity communities and keywords found within those communities. Presented implementation works in linear time.
An overview on data mining designed for imbalanced datasetseSAT Journals
Abstract The imbalanced datasets with the classifying categories are not around equally characterized. A problem in imbalanced dataset occurs in categorization, where the amount of illustration of single class will be greatly lesser than the illustrations of the previous classes. Current existence brought improved awareness during implementation of machine learning methods to complex real world exertion, which is considered by several through imbalanced data. In machine learning the imbalanced datasets has become a critical problem and also usually found in many implementation such as detection of fraudulent calls, bio-medical, engineering, remote-sensing, computer society and manufacturing industries. In order to overcome the problems several approaches have been proposed. In this paper a study on Imbalanced dataset problem and examine various sampling method utilized in favour of evaluation of the datasets, moreover the interpretation methods are further suitable for imbalanced datasets mining. Keywords: Imbalance Problems, Imbalanced datasets, sampling strategies, Machine Learning.
Depression Detection in Tweets using Logistic Regression Modelijtsrd
In the growing world of modernization, mental health issues like depression, anxiety and stress are very normal among people and social media like Facebook, Instagram and Twitter have boosted the growth of such mental health. Everything has its legitimacy and negative mark. During this pandemic, people are more likely to suffer from mental health issues, they are available 24 7 and are cut off from the real world. Past examinations have shown that individuals who invest more energy via online media are bound to be depressed. In this project, we find out people who are depressed based on their tweets, followers, following and many other factors. For this, I have trained and tested our text classifier, which will distinguish between the user who is depressed or not depressed. Rahul Kumar Sharma | Vijayakumar A "Depression Detection in Tweets using Logistic Regression Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41284.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-miining/41284/depression-detection-in-tweets-using-logistic-regression-model/rahul-kumar-sharma
Presentation Understanding Research MethodologyIn conducting s.docxChantellPantoja184
Presentation
Understanding Research Methodology
In conducting social science research, the social scientist seeks to understand, and in turn explain, the world in which he or she lives. Rather than simply rely on what they observe and apply assumptions, beliefs, or general guesses to explain observations, social scientists approach this endeavor for an increased understanding using a systematic scientific method. Social scientists in the fields of homeland security, emergency management, and many others take this approach because it is their ultimate intention to go beyond their own personal understanding of why things happen. They want to inform others of these explanations and contribute to a greater body of knowledge. The purpose of developing, testing, and refining explanations for what is observed is to ultimately predict future behaviors or prescribe potential remedies for negative conduct in the form of policies.
Research methodology is comprised of the approaches, designs, plans, methods, and tools or instruments scientists will use to conduct their exploration. Remember that social science includes studying phenomena and activities related to emergency management, criminal justice, and homeland security. Consider an example to help understand this need for a systematic approach to studying your surroundings to devise a strategy or policy.
In this example, a planner known as Officer Lightly works in a local law enforcement department and is directed to develop a community policing plan with the intent to solicit and incorporate the assistance of citizens in reducing the annual number of property crimes each year. The former planner, Officer Grimly, had planned to develop a program based on his own beliefs about what would work. Officer Grimly simply briefed and published the plan to his department's leadership and then moved on to his next assignment. However, Officer Lightly is familiar with the scientific process and understands its value for tackling social science projects. Officer Lightly determines there is a wide assortment of objectives he might pursue, but he knows he needs to first start with a specific research question and then develop and test a hypothesis. Depending on the findings from his test of the hypothesis, he may proceed in his original direction or decide to take a different course.
Officer Lightly decides to craft two research questions and at least one hypothesis for each. He has formulated the following:
· Research Question 1 (R1): Where in the community do property crimes occur in the largest concentrations?
· Hypothesis 1 for R1: If an area in the community is low income, property crimes are higher.
· Research Question 2 (R2): What are citizens in areas of high crime currently doing in response to, or to protect against, property crimes?
· Hypothesis 1 for R2: If citizens act purposively to prevent property crime, they will not be victims of property crime.
Measuring Phenomena
In examining Officer Light.
Analysis of the Content course of Standard VI to VIII (Tamil, English, Mathematics, Science and Social science) Text Books prescribed by Government of Tamil Nadu, and content course of standard IX - X ( for UG) , XI – XII (for PG) Computer Science Text Books Prescribed by Government of Tamil Nadu.
Similar to Importance of the neutral category in fuzzy clustering of sentiments (20)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Importance of the neutral category in fuzzy clustering of sentiments
1. International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.2, April 2014
DOI : 10.5121/ijfls.2014.4201 1
IMPORTANCE OF THE NEUTRAL CATEGORY IN FUZZY
CLUSTERING OF SENTIMENTS
Lawrence Nderu1
, Nicolas Jouandeau2
and Herman Akdag2
1
Jomo Kenyatta University of Agriculture and Technology, Kenya
2
University of Paris 8-LIASD, France,
ABSTRACT
Social media is said to have an impact on the public discourse and communication in the society. It is in-
creasingly being used in the political context. Social networks sites such as Facebook, Twitter and other
microblogging services provide an opportunity for public to give opinions about some issues of interest.
Twitter is an ideal platform for users to spread not only information in general but also political opinions,
whereas Facebook provides the capability for direct dialogs. A lot of studies have shown that a need exists
for stakeholders to collect, monitor, analyze, summarize and visualize these social media views. Some au-
thors have tended to categorize these comments as either positive or negative ignoring the neutral cate-
gory. In this paper, we demonstrate the importance of the neutral category in the clustering of sentiments
from the social media. We then demonstrate the use of fuzzy clustering for this kind of task.
KEYWORDS
Sentiments Analysis, Fuzzy Clustering, Social Media, Neutral Category, Unsupervised Classification
1. Introduction
Clustering is a type of unsupervised learning that automatically forms clusters of similar things
[1]. During cluster analysis the aim is to put similar things in one cluster and dissimilar things in a
different cluster [2]. Several types of clustering have been developed and a lot of literature exists
on clustering methods. In fuzzy clustering, data elements can belong to more than one cluster and
associated with each of the data points are membership grades which indicate the degree to which
a data point belongs to the different clusters [3].
Sentiments found within comments, feedback or critiques provide useful indicators for many
different purposes. Sentiment analysis provides policy makers and politicians with an opportunity
to estimate the public sentiments with respect to policies, public services or political issues [4].
With the massive growth of the social media a lot of outlets for publishing one’s opinion (s) exist.
A number of authors have considered sentiments analysis as a two class question, i.e. positive or
negative [5]. We assert the importance of treating neutrals as a fully qualified category.
The rest of the paper is organized as follows: first a discussion of related work on public senti-
ment analysis or opinion mining is presented followed by a discussion of clustering algorithms.
We then present our model in this study and then conclusion and future work.
2. International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.2, April 2014
2
2. Related Work
2.1 Sentiments Analysis
Sentiment analysis has attracted a lot of interest in the recent past. An increasing number of scien-
tists are tackling the task of automated sentiments detection and classification. This has resulted
in a computing domain called effective computing [4]. More and more users are posting about
products and services they use, or expressing their political and religious views on issues. Such
data can be efficiently used for marketing, making political decisions or social studies.
The computational treatment of opinion, sentiment and subjectivity has recently attracted a great
deal of attention [6],[7]. This is in part due to the potential of application areas [8][7]. In senti-
ment analysis neutrality is handled in various ways, this depends on the technique that is being
used [9]. A number of papers tend to ignore the neutral category under the assumption that neu-
tral texts lie near the boundary of the binary classifier. It is also assumed that not much can be
learnt from neutral texts comparing to the ones with clear positive or negative sentiment [9],[4].
Sentiment about a subject is the orientation (or polarity of the opinion on the subject that deviates
from the neutral state [10]. The reasoning behind having the neutral sentiment as a category on its
own is based on the fact that some opinions are facts. Thus, the statement “the song was well
done” has a positive polarity. The statement “the song fails to impress” has a negative polarity,
but the statement “the song is in the higher note” has a neutral polarity and indeed provides fur-
ther information about the song. It is therefore not uncommon to have statements that are neutral.
2.2 Clustering
Clustering is a technique used in unsupervised learning. The aim is to cluster similar data points
in one cluster and dissimilar points in a different group. Most clustering algorithms do not rely on
assumptions common to conventional statistical methods, such as the underlying statistical distri-
bution of data. This means that clustering algorithms are useful in situations where little prior
knowledge about the distribution of the data is known [2]. Since clusters can formally be seen as
subsets of data set, one possible classification of clustering methods can be according to whether
the subsets are fuzzy or hard.
Hard clustering methods are based on classical set theory and require that an object either does or
does not belong to a cluster. Hard clustering involves partitioning the data into specified number
of mutually exclusive subsets. Fuzzy clustering methods, however allow the objects to belong to
several clusters simultaneously, with different degrees of membership. In many situations, fuzzy
clustering is more natural than hard clustering. Objects on the boundaries between several classes
are not forced to fully belong to one of the classes, but rather are assigned membership degrees 0
and 1 indicating their partial membership [2].
The potential of clustering algorithms to reveal the underlying structure in data can be exploited
in a wide variety of applications [11]. Clustering techniques can be applied to data that is quanti-
tative (numerical), qualitative (categorical) or a mixture of both. The data is typically observa-
tions of some physical process. Each observation consists of n measured variables, grouped into
an n-dimensional column vector Zk = [z1k, z2k, …,znk]T , Zk n . A set of N observation is
denoted by Z = {Zk | k=1, 2, ..., N } and is represented by an n X N matrix. Various definitions of
a cluster can be formulated, depending on the objective of clustering. The discussion on hard
clustering is covered by [12][11], below we present a discussion of fuzzy clustering.
3. International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.2, April 2014
3
2.2.1 Fuzzy Clustering
The generalization of the hard partition to the fuzzy case follows directly by allowing µik to at-
tain real values in [0, 1]. Conditions for a fuzzy partition matrix are given by[8].
µik∊ [0, 1], 1 ≤ i ≤ c, 1 ≤ k ≤ N
∑ µ݅݇
ୀଵ = 1, 1 ≤ k ≤ N
0 <∑ ߤ݅݇ே
ୀଵ < ܰ, 1 ≤ ݅ ≤ ܿ
The ith row of the fuzzy partition matrix U contains values of the ith membership function of the
fuzzy subset Ai of Z. The total membership of each zk in Z equals one. Considering the range of
values from [0, 1]. A number of fuzzy clustering algorithms exist; below we discuss the fuzzy C-
means clustering.
2.2.1.1 Fuzzy C-Means Clustering
Fuzzy C-means (also called C methods) is an example of fuzzy clustering which allows one point
to belong to one or more clusters [13]. Fuzzy C-means was developed by Dunn and improved by
Bezdek and is frequently used in many areas including pattern recognition [14].
2.3 Neutral class
Neutrality in sentiment analysis is handled differently, by different authors [15]. This mostly de-
pends on the technique that is being used. In lexicon- based techniques then eutrality score of the
words is taken into account in order to either detect neutral opinions or filter them out and enable
algorithms to focus on words with positive and negative sentiment [16][19]. The inclusion of the
neutral class in our study is based on the fact that in sentiment analysis we believe that not every-
thing is positive or negative.
2.3.1. From Computing with Words to Computing with Numbers
Computing in its usual sense is centered on manipulation of numbers and symbols. In contrast,
computing with words is a methodology in which the objects of computation are words and
propositions drawn from a natural language [17]. Computing with words is inspired by the re-
markable human capability to perform a wide variety of physical and mental tasks without any
measurements and any computations [18]. The basic difference between perception and meas-
urements is that measurements are crisp (hard), whereas perception is fuzzy [12]. Many aspects of
different activities in the real world cannot be assessed in a quantitative form, but in a qualitative
one [19]. Fuzzy linguistic modelling has been widely used and has provided very good results, it
deals with qualitative aspects that are presented in qualitative terms by means of linguistic vari-
ables [20].
When sentiments are expressed in general as either positive or negative some important informa-
tion is lost. Since sentiments are expressed using the natural language, we consider the interpreta-
tion of those sentiments using a set of seven terms (more could be used) and their semantics. Fig.
1, demonstrates the computing with words interpretation of numerical values [21]. For sentiments
to be beneficial, it is important that we extract a lot of information from them. For example, con-
sider the answer to the question “how are people responding to this ad campaign”. An answer that
says negative could lead us into changing the whole ad campaign and yet maybe if we knew it is
4. International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.2, April 2014
4
at a value of 0.67 (as shown in Figure 1), we could just carry out some improvements and the
sentiments could change.
Classifying sentiments as either positive or negative, is very general and some information could
be lost, adding the category neutral improves, but still does not capture all the information in the
sentiments. Below we demonstrate fuzzy clustering of sentiments.
Fig.1. How positive is a sentiment
2.4 The Proposed Model
A total of 140 negative tweets on the International Criminal Court (ICC), in Kenya were col-
lected. The aim is to propose a model that provides more information than just telling us that the
sentiments were negative. The choice of the tweets was done by a human being. The Cohen’s
Kappa which is used to check for inter-rater reliability [22] was used. Generally, a Kappa value >
0.70 was considered to be a satisfactory agreement. Table 1, shows the results of the rating by 2
experts. A value of 0 shows a sentiment that could be considered neutral, while 1 is a sentiment
that is highly negative. The two rates agreed on 87 tweets, and this are the ones that were tabu-
lated.
Table 1.The results of the Analyzed Tweets
Numerical
values
0 0.17 0.33 0.5 0.67 0.83 1
Count 0 5 3 8 18 24 29
A sentiment could have some parts that are positive, negative or even neutral. Our aim in this
mode of sentiment analysis is to gain all the necessary information from the sentiment. In a 3
dimension plot this could be as shown in figure 2, where horizontal axis= positivity of the senti-
ment, width = negativity of the sentiment and vertical shows the neutrality or the sentiment.
5. International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.2, April 2014
5
Fig.2. A 3-d plot of the sentiments values
3. Conclusion
Information found in the text data is either: Objective information (facts) or Subjective informa-
tion (opinions). In this paper we have considered the problem as a three-class classification prob-
lem (positive, neutral or negative), with varying levels of positivity, neutrality or negativity rather
than a two-class problem (positive or negative). The fuzzy clustering which consistsof optimal
grouping of given data points together but in such a way that each data point can be shared by one
or more cluster by belonging partially to another has been used. The idea that our model is pursu-
ing here is that in some cases the sentiment holder could be persuaded to change his/her view if
for example it is not “a very strong negative” sentiment.
In our future work, we want to implement a sentiment analysis system that is maintained auto-
matically with the exact value of how negative, positive or neutral a sentiment is. Such a system
could help opinion leaders for example to know the kind of individual who could be persuaded to
change their sentiments over an issue.
References
[1] A. Jain, M. Murty, and P. Flynn, “Data clustering: a review,” ACM Comput. Surv., 1999.
[2] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets Syst., vol. 158, no. 19, pp.
2095–2117, Oct. 2007.
[3] S. Krinidis and V. Chatzis, “A robust fuzzy local information C-Means clustering algorithm.,” IEEE
Trans. Image Process., vol. 19, no. 5, pp. 1328–37, May 2010.
[4] F. Dzogang and M. Lesot, “Expressions of graduality for sentiments analysis—A survey,” Fuzzy
Syst. (FUZZ …, 2010.
[5] R. Prabowo and M. Thelwall, “Sentiment analysis: A combined approach,” J. Informetr., vol. 3, no. 2,
pp. 143–157, Apr. 2009.
[6] X. Ding, S. M. Street, B. Liu, and P. S. Yu, “A Holistic Lexicon-Based Approach to Opinion Min-
ing,” 2008.
[7] F. Breu, S. Guggenbichler, and J. Wollmann, “No Title,” Vasa, 2008.
[8] A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the Web,”
Comput. Networks ISDN Syst., vol. 29, no. 8–13, pp. 1157–1166, Sep. 1997.
[9] V. Vryniotis, “The importance of Neutral Class in Sentiment Analysis | DatumBox,” 2013. [Online].
Available: http://blog.datumbox.com/the-importance-of-neutral-class-in-sentiment-analysis/. [Ac-
cessed: 19-Feb-2014].
6. International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.2, April 2014
6
[10] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment analyzer: extracting sentiments about a
given topic using natural language processing techniques,” Third IEEE Int. Conf. Data Min., pp. 427–
434, 2003.
[11] Y. Sharon, J. Wright, and Y. Ma, “Minimum sum of distances estimator: robustness and stability,”
Am. Control Conf. 2009. …, pp. 524–530, 2009.
[12] R. Zass, “A Unifying Approach to Hard and Probabilistic Clustering 2 . Clustering and Complete
Positivity,” no. 2, 2007.
[13] S. Paper, “A FUZZY HIERARCHICAL CLUSTERING METHOD FOR CLUSTERING
DOCUMENTS BASED ON DYNAMIC CLUSTER CENTERS,” vol. 30, no. 1, pp. 169–172, 2007.
[14] R. L. Cannon, J. V Dave, and J. C. Bezdek, “Efficient Implementation of the Fuzzy c-Means Cluster-
ing Algorithms.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 2, pp. 248–55, Feb. 1986.
[15] M. Koppel and J. Schler, “The Importance of Neutral Examples for Learning Sentiment,” 2006.
[16] T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Riloff, and S.
Patwardhan, “OpinionFinder : A system for subjectivity analysis,” no. October, pp. 34–35, 2005.
[17] L. a. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning—I,”
Inf. Sci. (Ny)., vol. 8, no. 3, pp. 199–249, Jan. 1975.
[18] G. Bordogna and G. Pasi, “A Fuzzy Linguistic Approach Generalizing Boolean Information Retriev-
al : A Model and Its Evaluation,” JASIS, vol. 44, no. 2, pp. 70–82, 1993.
[19] M. Decision-making, F. Herrera, and L. Martínez, “A Model Based on Linguistic 2-Tuples for Deal-
ing with Multigranular Hierarchical Linguistic Contexts,” vol. 31, no. 2, pp. 227–234, 2001.
[20] L. Nderu, N. Jouandeau, and H. Akdag, “Towards Universal Rating of Online Multimedia Content,”
2013.
[21] F. Herrera, “A 2-tuple fuzzy linguistic representation model for computing with words - Fuzzy Sys-
tems, IEEE Transactions on,” vol. 8, no. 6, pp. 746–752, 2000.
[22] I. R. Application, I. Kappa, F. Yellow-bellied, F. Red-bellied, and R. Cooters, “Cohen ’ s Kappa,” pp.
1–3.