This document proposes a framework for characterizing emotional interactions in social networks to distinguish friends from acquaintances. It collects posts and comments from social networks, develops lexicons to analyze informal language, generates features to assess text subjectivity, trains a model to classify text subjectivity, and uses this to train an SVM model that predicts relationships with 87% accuracy.
A SURVEY OF S ENTIMENT CLASSIFICATION TECHNIQUES USED FOR I NDIAN REGIONA...ijcsa
Sentiment Analysis is a natural language processing
task that extracts sentiment from various text for
ms
and classifies them according to positive, negative
or neutral polarity. It analyzes emotions, feeling
s, and
the attitude of a speaker or a writer towards a con
text. This paper gives comparative study of various
sentiment classification techniques and also discus
ses in detail two main categories of sentiment
classification techniques these are machine based a
nd lexicon based. The paper also presents challenge
s
associated with sentiment analysis along with lexic
al resources available.
A SURVEY OF S ENTIMENT CLASSIFICATION TECHNIQUES USED FOR I NDIAN REGIONA...ijcsa
Sentiment Analysis is a natural language processing
task that extracts sentiment from various text for
ms
and classifies them according to positive, negative
or neutral polarity. It analyzes emotions, feeling
s, and
the attitude of a speaker or a writer towards a con
text. This paper gives comparative study of various
sentiment classification techniques and also discus
ses in detail two main categories of sentiment
classification techniques these are machine based a
nd lexicon based. The paper also presents challenge
s
associated with sentiment analysis along with lexic
al resources available.
Evaluation of Support Vector Machine and Decision Tree for Emotion Recognitio...journalBEEI
In this paper, the performance of Support Vector Machine (SVM) and Decision Tree (DT) in classifying emotions from Malay folklores is presented. This work is the continuation of our storytelling speech synthesis work to add emotions for a more natural storytelling. A total of 100 documents from children short stories are collected and used as the datasets of the text-based emotion recognition experiment. Term Frequency-Inverse Document Frequency (TF-IDF) is extracted from the text documents and classified using SVM and DT. Four types of common emotions, which are happy, angry, fearful and sad are classified using the two classifiers. Results showed that DT outperformed SVM by more than 22.2% accuracy rate. However, the overall emotion recognition is only at moderate rate suggesting an improvement is needed in future work. The accuracy of the emotion recognition should be improved in future studies by using semantic feature extractors or by incorporating deep learning for classification.
Text to Emotion Extraction Using Supervised Machine Learning TechniquesTELKOMNIKA JOURNAL
Proliferation of internet and social media has greatly increased the popularity of text
communication. People convey their sentiment and emotion through text which promotes lively
communication. Consequently, a tremendous amount of emotional text is generated on different social
media and blogs in every moment. This has raised the necessity of automated tool for emotion mining from
text. There are various rule based approaches of emotion extraction form text based on emotion intensity
lexicon. However, creating emotion intensity lexicon is a time consuming and tedious process. Moreover,
there is no hard and fast rule for assigning emotion intensity to words. To solve these difficulties, we
propose a machine learning based approach of emotion extraction from text which relies on annotated
example rather emotion intensity lexicon. We investigated Multinomial Naïve Bayesian (MNB) Classifier,
Artificial Neural Network (ANN) and Support Vector Machine (SVM) for mining emotion from text. In our
setup, SVM outperformed other classifiers with promising accuracy.
Analysis of anaphora resolution system forijitjournal
Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers.
It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in
Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for
English Language in which pronouns refers to the intended noun in discourse. In this paper two
computational models are proposed for anaphora resolution. Resolution of anaphora is based on various
factors among which these models use Recency factor and Animistic Knowledge. Recency factor is
implemented by using Lappin Leass approach in first model and using Centering approach in second
model. Information about animacy is obtained by Gazetteer method. The identification of animistic
elements is employed to improve the accuracy of the system. This paper demonstrates experiment
conducted by both the models on different data sets from different domains. A comparative result of both
the model is summarized and conclusion is drawn for the best suitable model.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
Sentiment analysis in short informal texts like product reviews is more challenging. Short texts are
sparse, noisy, and lack of context information. Traditional text classification methods may not be suitable
for analyzing sentiment of short texts given all those difficulties. A common approach to overcome these
problems is to enrich the original texts with additional semantics to make it appear like a large document of
text. Then, traditional classification methods can be applied to it. In this study, we developed an automatic
sentiment analysis system of short informal Indonesian texts using Naïve Bayes and Synonym Based
Feature Expansion. The system consists of three main stages, preprocessing and normalization, features
expansion and classification. After preprocessing and normalization, we utilize Kateglo to find some
synonyms of every words in original texts and append them. Finally, the text is classified using Naïve
Bayes. The experiment shows that the proposed method can improve the performance of sentiment
analysis of short informal Indonesian product reviews. The best sentiment classification performance using
proposed feature expansion is obtained by accuracy of 98%.The experiment also show that feature
expansion will give higher improvement in small number of training data than in the large number of them.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Artifacia
This is the presentation from our first AI Meet held on Nov 19, 2016.
You can join Artifacia AI Meet Bangalore Group: https://www.meetup.com/Artifacia-AI-Meet/
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
Speech emotion recognition enables a computer system to records sounds and realizes the emotion of the
speaker. we are still far from having a natural interaction between the human and machine because
machines cannot distinguishes the emotion of the speaker. For this reason it has been established a new
investigation field, namely “the speech emotion recognition systems”. The accuracy of these systems
depend on the various factors such as the type and the number of the emotion states and also the classifier
type. In this paper, the classification methods of C5.0, Support Vector Machine (SVM), and the
combination of C5.0 and SVM (SVM-C5.0) are verified, and their efficiencies in speech emotion
recognition are compared. The utilized features in this research include energy, Zero Crossing Rate (ZCR),
pitch, and Mel-scale Frequency Cepstral Coefficients (MFCC). The results of paper demonstrate that the
effectiveness proposed SVM-C5.0 classification method is more efficient in recognizing the emotion of the
between -5.5 % and 8.9 % depending on the number of emotion states than SVM, C5.0.
Emotion detection from text using data mining and text miningSakthi Dasans
Emotion detection from text using data mining and text mining
Based on research paper published by Faculty of Engineering, The University of Tokushima at IEEE 2007 we build an intelligent system under the title Emotelligence on Text to recognize human emotion from textual contents.
i.e. if you give an input string , our system would possibly able to say the emotion behind that textual content.
Evaluation of Support Vector Machine and Decision Tree for Emotion Recognitio...journalBEEI
In this paper, the performance of Support Vector Machine (SVM) and Decision Tree (DT) in classifying emotions from Malay folklores is presented. This work is the continuation of our storytelling speech synthesis work to add emotions for a more natural storytelling. A total of 100 documents from children short stories are collected and used as the datasets of the text-based emotion recognition experiment. Term Frequency-Inverse Document Frequency (TF-IDF) is extracted from the text documents and classified using SVM and DT. Four types of common emotions, which are happy, angry, fearful and sad are classified using the two classifiers. Results showed that DT outperformed SVM by more than 22.2% accuracy rate. However, the overall emotion recognition is only at moderate rate suggesting an improvement is needed in future work. The accuracy of the emotion recognition should be improved in future studies by using semantic feature extractors or by incorporating deep learning for classification.
Text to Emotion Extraction Using Supervised Machine Learning TechniquesTELKOMNIKA JOURNAL
Proliferation of internet and social media has greatly increased the popularity of text
communication. People convey their sentiment and emotion through text which promotes lively
communication. Consequently, a tremendous amount of emotional text is generated on different social
media and blogs in every moment. This has raised the necessity of automated tool for emotion mining from
text. There are various rule based approaches of emotion extraction form text based on emotion intensity
lexicon. However, creating emotion intensity lexicon is a time consuming and tedious process. Moreover,
there is no hard and fast rule for assigning emotion intensity to words. To solve these difficulties, we
propose a machine learning based approach of emotion extraction from text which relies on annotated
example rather emotion intensity lexicon. We investigated Multinomial Naïve Bayesian (MNB) Classifier,
Artificial Neural Network (ANN) and Support Vector Machine (SVM) for mining emotion from text. In our
setup, SVM outperformed other classifiers with promising accuracy.
Analysis of anaphora resolution system forijitjournal
Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers.
It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in
Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for
English Language in which pronouns refers to the intended noun in discourse. In this paper two
computational models are proposed for anaphora resolution. Resolution of anaphora is based on various
factors among which these models use Recency factor and Animistic Knowledge. Recency factor is
implemented by using Lappin Leass approach in first model and using Centering approach in second
model. Information about animacy is obtained by Gazetteer method. The identification of animistic
elements is employed to improve the accuracy of the system. This paper demonstrates experiment
conducted by both the models on different data sets from different domains. A comparative result of both
the model is summarized and conclusion is drawn for the best suitable model.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
Sentiment analysis in short informal texts like product reviews is more challenging. Short texts are
sparse, noisy, and lack of context information. Traditional text classification methods may not be suitable
for analyzing sentiment of short texts given all those difficulties. A common approach to overcome these
problems is to enrich the original texts with additional semantics to make it appear like a large document of
text. Then, traditional classification methods can be applied to it. In this study, we developed an automatic
sentiment analysis system of short informal Indonesian texts using Naïve Bayes and Synonym Based
Feature Expansion. The system consists of three main stages, preprocessing and normalization, features
expansion and classification. After preprocessing and normalization, we utilize Kateglo to find some
synonyms of every words in original texts and append them. Finally, the text is classified using Naïve
Bayes. The experiment shows that the proposed method can improve the performance of sentiment
analysis of short informal Indonesian product reviews. The best sentiment classification performance using
proposed feature expansion is obtained by accuracy of 98%.The experiment also show that feature
expansion will give higher improvement in small number of training data than in the large number of them.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Artifacia
This is the presentation from our first AI Meet held on Nov 19, 2016.
You can join Artifacia AI Meet Bangalore Group: https://www.meetup.com/Artifacia-AI-Meet/
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
Speech emotion recognition enables a computer system to records sounds and realizes the emotion of the
speaker. we are still far from having a natural interaction between the human and machine because
machines cannot distinguishes the emotion of the speaker. For this reason it has been established a new
investigation field, namely “the speech emotion recognition systems”. The accuracy of these systems
depend on the various factors such as the type and the number of the emotion states and also the classifier
type. In this paper, the classification methods of C5.0, Support Vector Machine (SVM), and the
combination of C5.0 and SVM (SVM-C5.0) are verified, and their efficiencies in speech emotion
recognition are compared. The utilized features in this research include energy, Zero Crossing Rate (ZCR),
pitch, and Mel-scale Frequency Cepstral Coefficients (MFCC). The results of paper demonstrate that the
effectiveness proposed SVM-C5.0 classification method is more efficient in recognizing the emotion of the
between -5.5 % and 8.9 % depending on the number of emotion states than SVM, C5.0.
Emotion detection from text using data mining and text miningSakthi Dasans
Emotion detection from text using data mining and text mining
Based on research paper published by Faculty of Engineering, The University of Tokushima at IEEE 2007 we build an intelligent system under the title Emotelligence on Text to recognize human emotion from textual contents.
i.e. if you give an input string , our system would possibly able to say the emotion behind that textual content.
HaiXiu: Emotion Recognition from MovementsDebmalya Sinha
This gives an overview of HaiXiu, a software that detects arousal levels from body movements and how it connects to the other means of emotion detection techniques. The presentation was given at Microsoft Research Asia, Beijing.
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On? Steve Poole
JavaOne 2016 talk
Using your mind to interact with computers is a long-standing desire. Advances in technology have made it more practical, but is it ready for prime time? This session presents practical examples and a walkthough of how to build a Java-based end-to-end system to drive a remote-controlled droid with nothing but the power of thought. Combining off-the-shelf EEG headsets with cloud technology and IoT, the presenters showcase what capabilities exist today. Beyond mind control (if there is such a concept), the session shows other ways to communicate with your computer besides the keyboard. It will help you understand the art of the possible and decide if it's time to leave the capsule to communicate with your computer.
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...Sebastian Raschka
In this talk, I present a machine learning approach to build a music recommendation system that can identify happy songs in a music library based on song lyrics. Also, this presentation covers a general introduction to predictive modeling, naive Bayes classification, and text classification.
DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGSijscai
Our study employs sentiment analysis to evaluate the compatibility of Amazon.com reviews with their
corresponding ratings. Sentiment analysis is the task of identifying and classifying the sentiment expressed
in a piece of text as being positive or negative. On e-commerce websites such as Amazon.com, consumers
can submit their reviews along with a specific polarity rating. In some instances, there is a mismatch between
the review and the rating. To identify the reviews with mismatched ratings we performed sentiment analysis
using deep learning on Amazon.com product review data. Product reviews were converted to vectors using
paragraph vector, which then was used to train a recurrent neural network with gated recurrent unit. Our
model incorporated both semantic relationship of review text and product information. We also developed a
web service application that predicts the rating score for a submitted review using the trained model and if
there is a mismatch between predicted rating score and submitted rating score, it provides feedback to the
reviewer.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
Generating domain specific sentiment lexicons using the Web Directory acijjournal
In this paper we aim at proposing a method to automatically build a sentiment lexicon which is domain based. There has been a demand for the construction of generated and labeled sentiment lexicon. For data on the social web (E.g., tweets), methods which make use of the synonymy relation don't work well, as we completely ignore the significance of terms belonging to specific domains. Here we propose to
generate a sentiment lexicon for any domain specified, using a twofold method. First we build sentiment scores using the micro-blogging data, and then we use these scores on the ontological structure provided by Open Directory Project [1], to build a custom sentiment lexicon for analyzing domain specific microblogging data.
Textual Document Categorization using Bigram Maximum Likelihood and KNNRounak Dhaneriya
In recent year’s text mining has evolved as a vast field of research in machine learning and artificial intelligence. Text Mining is a difficult task to conduct with an unstructured data format. This research work focuses on the classification of textual data of three different literature books. The collection of data is extracted from books entitled: Oliver Twist, Don Quixote, Pride and Prejudge. We used two different algorithms: KNN and Bigram based Maximum Likelihood for the mentioned purpose and the evaluation of accuracy is done using the confusion matrix. The results suggest that the text mining using bigram based maximum likelihood logic performs well.
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
This is the final report for Networks & Data Mining Techniques project focusing on mining social network to estimate public opinion about entities and associated keywords. This project mines Twitter for recent feeds and analyzes them to estimate sentiment score, discussed entity and describing keywords in each tweet. This data is then exploited to elicit overall sentiment associated with each entity. Entities and keywords extracted is also used to form an entity-keyword bigraph. This graph is further used to detect entity communities and keywords found within those communities. Presented implementation works in linear time.
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSIJDKP
Due to the enormous amount of data and opinions being produced, shared and transferred everyday across the internet and other media, Sentiment analysis has become vital for developing opinion mining systems. This paper introduces a developed classification sentiment analysis using deep learning networks and introduces comparative results of different deep learning networks. Multilayer Perceptron (MLP) was developed as a baseline for other networks results. Long short-term memory (LSTM) recurrent neural network, Convolutional Neural Network (CNN) in addition to a hybrid model of LSTM and CNN were developed and applied on IMDB dataset consists of 50K movies reviews files. Dataset was divided to 50% positive reviews and 50% negative reviews. The data was initially pre-processed using Word2Vec and word embedding was applied accordingly. The results have shown that, the hybrid CNN_LSTM model have outperformed the MLP and singular CNN and LSTM networks. CNN_LSTM have reported the accuracy of 89.2% while CNN has given accuracy of 87.7%, while MLP and LSTM have reported accuracy of 86.74% and 86.64 respectively. Moreover, the results have elaborated that the proposed deep learning models have also outperformed SVM, Naïve Bayes and RNTN that were published in other works using English datasets.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
Optimizer algorithms and convolutional neural networks for text classificationIAESIJAI
Lately, deep learning has improved the algorithms and the architectures of several natural language processing (NLP) tasks. In spite of that, the performance of any deep learning model is widely impacted by the used optimizer algorithm; which allows updating the model parameters, finding the optimal weights, and minimizing the value of the loss function. Thus, this paper proposes a new convolutional neural network (CNN) architecture for text classification (TC) and sentiment analysis and uses it with various optimizer algorithms in the literature. Actually, in NLP, and particularly for sentiment classification concerns, the need for more empirical experiments increases the probability of selecting the pertinent optimizer. Hence, we have evaluated various optimizers on three types of text review datasets: small, medium, and large. Thereby, we examined the optimizers regarding the data amount and we have implemented our CNN model on three different sentiment analysis datasets so as to binary label text reviews. The experimental results illustrate that the adaptive optimization algorithms Adam and root mean square propagation (RMSprop) have surpassed the other optimizers. Moreover, our best CNN model which employed the RMSprop optimizer has achieved 90.48% accuracy and surpassed the state-of-the-art CNN models for binary sentiment classification problems.
Social media recommendation based on people and tags (final)
A framework for emotion mining from text in online social networks(final)
1. A Framework for Emotion Mining from
Text in Online Social Networks
Published in:
Data Mining Workshops (ICDMW), 2010 IEEE
International Conference
Mohamed Yassine Hazem Hajj
Department of Electrical and Computer Engineering
American University of Beirut
Beirut, Lebanon
E-mail: {mmy18, hh63}@aub.edu.lb
Advisors: Yin-Fu Huang
Student: Chen-Ting Huang
2. Abstract & Introduction
A new framework is proposed for characterizing emotional interactions in
social networks, and then using these characteristics to distinguish friends
from acquaintances.
The framework includes a model for data collection, database schemas,
data processing and data mining steps.
The paper presents a new perspective for studying friendship relations and
emotions’ expression in online social networks
3. Abstract & Introduction
Text mining techniques are performed on comments retrieved from a
social network.
The purpose is not to identify specific emotions but rather to tell if the text
contains emotions or not.
In other words, if the text is subjective reflecting the writer’s affect and
emotional state or if it is factual and objective.
4. Related Work
SentiWordNet: a publicly available lexical resource for opinion mining
NTUSD
Face Book
9. Lexicons Development
This step deals with the informal language of social network
a. Social acronyms
a. Emotions
10. Lexicons Development
This step deals with the informal language of social network
c. Foreign Languages Lexicons:
It is also necessary to take into consideration the use of foreign
languages.
12. Feature generation
Computes new features from available raw data collected in step 1 to
assess subjectivity of text.
It uses word-matches to existing affective lexicons (by SetiWordNet)
and employs new lexicons developed in step 2
Comments collected from step 1 along with the features computed in this
step will be stored in the Sentiment Mining Database
13. Feature generation
In order to assess the subjectivity of the text, several features need to be
computed.
a. Affective lexicon based on SentiWordNet
b. Based on intentional misspelling errors and grammatical
markers (such as punctuation and capitalized letters)
c. Based on social acronyms, interjections and emoticons.
15. Data Preprocessing
Feature selection was performed based on a correlation measure to
remove redundant attributes.
16. Data Preprocessing
The goal was to deduce whether the text is subjective or not.
Example: knowing whether the writer uses few or many punctuation
marks would definitely denote some sort of subjectivity.
Discretization was done using clustering. The k-means algorithm was
performed on each attribute with k=3 or k=4.
The values in each attribute were normalized using Min-Max normalization
which mapped them to the range [0;1].
17. Training Model for Text Subjectivity
Creating a Training
model for text subjectivity
18. Training Model for Text Subjectivity
The training was done in step 5 using the k-means clustering algorithm
with k =3.
a. Objective or factual texts,
b. Moderately subjective texts
c. Subjective texts
The output of the model was the centroids of the three clusters and is
used in step 6 for classifying the comments with respect to subjectivity.
20. Text subjectivity classification
This step uses the centroids generated in the previous steps.
Employs the k-nearest neighbor algorithm with k=1 to classify all
comments into one of three subjectivity levels.
The manual annotation was then compared to the predicted result by the
model.
26. Friendship Classification
We collected the comments shared for 1213 pairs of users.
We used the output of step 6 which is text subjectivity classification in
order to get the subjectivity measure of all comments shared between
the pair.
SVM algorithm was used to predict whether the pair consists of close
friends or just acquaintances.
Every user reported a list of his/her close friends. Using 10-fold cross-
validation, the SVM model reported an accuracy of 87%.
27. Conclusion
It presents a new perspective for studying friendship relations and
emotions’ expression in online social networks
We developed new lexicons that cover common expressions used by
online users.
The model predicted the right class with 88% accuracy. When the
resulting model was used to predict relationship strength between two
users, the prediction reported an accuracy of 87%.
Editor's Notes
藉由
在這篇 paper 中主要的目標便是從 social network 的文字中提取 emotional content