This document describes a sentiment analysis and classification algorithm that utilizes an independent term matching scheme sensitive to word count patterns. The algorithm calculates sentiment scores for comments by determining the Gaussian mixture of the sentiment ratings of matched words in the comment. It accounts for the impact of negations. Sample comments are then classified and rated for sentiment to evaluate the algorithm.
Altus Dynamics 2016 - Is Your Dashboard a Picasso?Sparkrock
Â
Presentation by Janice Taylor on February 5th, 2016.
Having too many KPIâs and/or a cluttered dashboard is worse than none at all. Join Janice Taylor from Jet Reports to review effective KPIâs and dashboards that cause action, not confusion.
Altus Dynamics 2016 - Is Your Dashboard a Picasso?Sparkrock
Â
Presentation by Janice Taylor on February 5th, 2016.
Having too many KPIâs and/or a cluttered dashboard is worse than none at all. Join Janice Taylor from Jet Reports to review effective KPIâs and dashboards that cause action, not confusion.
Carlos alvarez coaching, pnl emociones febrero 2015IAPEM
Â
Trabajo editado sobre las emociones, las habilidades emocionales mediante tĂŠcnicas y estrategias de la ProgramaciĂłn NeurolingĂźĂstica (PNL) del Coaching como medio para el desarrollo personal en diversos contextos relacionales.
The most integral part of our work is to extract Aspects from User Feedback and associate Sentiment and Opinion terms to them. The dataset we have at our disposal to work upon, is a set of feedback documents for various departments in a Hospital in XML format which have comments represented in tags. It contains about 65000 responses to a survey taken in a Hospital. Every response or comment is treated as a sentence or a set of them. We perform a sentence level aspect and sentiment extraction and we attempt to understand and mine User Feedback data to gather aspects from it. Further to it, we extract the sentiment mentions and evaluate them contextually for sentiment and associate those sentiment mentions with the corresponding aspects. To start with, we perform a clean up on the User Feedback data, followed by aspect extraction and sentiment polarity calculation, with the help of POS tagging and SentiWordNet filters respectively. The obtained sentiments are further classified according to a set of Linguistic rules and the scores are normalized to nullify any noise that might be present. We lay emphasis on using a rule based approach; rules being Linguistic rules that correspond to the positioning of various parts-of-speech words in a sentence.
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
Â
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
Â
The task of sentiment analysis of reviews is carried out using manually built / automatically generated
lexicon resources of their own with which terms are matched with lexicon to compute the term count for
positive and negative polarity. On the other hand the Sentiwordnet, which is quite different from other
lexicon resources that gives scores (weights) of the positive and negative polarity for each word. The
polarity of a word namely positive, negative and neutral have the score ranging between 0 to 1 indicates
the strength/weight of the word with that sentiment orientation. In this paper, we show that using the
Sentiwordnet, how we could enhance the performance of the classification at both sentence and document
level.
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsIJECEIAES
Â
Sentiment analysis is the pre-eminent technology to extract the relevant information from the data domain. In this paper cross domain sentimental classification approach Cross_BOMEST is proposed. Proposed approach will extract â ve words using existing BOMEST technique, with the help of Ms Word Introp, Cross_BOMEST determines â ve words and replaces all its synonyms to escalate the polarity and blends two different domains and detects all the self-sufficient words. Proposed Algorithm is executed on Amazon datasets where two different domains are trained to analyze sentiments of the reviews of the other remaining domain. Proposed approach contributes propitious results in the cross domain analysis and accuracy of 92 % is obtained. Precision and Recall of BOMEST is improved by 16% and 7% respectively by the Cross_BOMEST.
An Improved sentiment classification for objective word.IJSRD
Â
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields. Customer sentiments play a very important role in daily life. Currently, Sentiment classification focused on subjective statements and ignores objective statements which also carry sentiment. During the sentiment classification, problem is faced due to the ambiguous sense (meaning) of words and negation words. In word sense disambiguation method semantic scores calculated from SentiWordNet of WordNet glosses terms. The correct sense of the word is extracted and determined similarity in WordNet glosses terms. SentiWordNet extract first sense of word which used in general sense. This work aims at improving the sentiment classification by modifying the sentiment values returned by SentiWordNet and compare classification accuracy of support vector machine and naĂĆĂÂŻve bays.
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields collecting review from people about products and social and political events through the web. Currently, Sentiment Analysis concentrates for subjective statements or on subjectivity and overlook objective statements which carry sentiment(s). During the sentiment classification more challenging problem are faced due to the ambiguous sense of words, negation words and intensifier. Due to its importance the correct sense of target word is extracted and determined for which the similarity arise in WordNet Glosses. This paper presents a survey covering the techniques and methods in sentiment analysis and challenges appear in the field.
Carlos alvarez coaching, pnl emociones febrero 2015IAPEM
Â
Trabajo editado sobre las emociones, las habilidades emocionales mediante tĂŠcnicas y estrategias de la ProgramaciĂłn NeurolingĂźĂstica (PNL) del Coaching como medio para el desarrollo personal en diversos contextos relacionales.
The most integral part of our work is to extract Aspects from User Feedback and associate Sentiment and Opinion terms to them. The dataset we have at our disposal to work upon, is a set of feedback documents for various departments in a Hospital in XML format which have comments represented in tags. It contains about 65000 responses to a survey taken in a Hospital. Every response or comment is treated as a sentence or a set of them. We perform a sentence level aspect and sentiment extraction and we attempt to understand and mine User Feedback data to gather aspects from it. Further to it, we extract the sentiment mentions and evaluate them contextually for sentiment and associate those sentiment mentions with the corresponding aspects. To start with, we perform a clean up on the User Feedback data, followed by aspect extraction and sentiment polarity calculation, with the help of POS tagging and SentiWordNet filters respectively. The obtained sentiments are further classified according to a set of Linguistic rules and the scores are normalized to nullify any noise that might be present. We lay emphasis on using a rule based approach; rules being Linguistic rules that correspond to the positioning of various parts-of-speech words in a sentence.
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
Â
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
Â
The task of sentiment analysis of reviews is carried out using manually built / automatically generated
lexicon resources of their own with which terms are matched with lexicon to compute the term count for
positive and negative polarity. On the other hand the Sentiwordnet, which is quite different from other
lexicon resources that gives scores (weights) of the positive and negative polarity for each word. The
polarity of a word namely positive, negative and neutral have the score ranging between 0 to 1 indicates
the strength/weight of the word with that sentiment orientation. In this paper, we show that using the
Sentiwordnet, how we could enhance the performance of the classification at both sentence and document
level.
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsIJECEIAES
Â
Sentiment analysis is the pre-eminent technology to extract the relevant information from the data domain. In this paper cross domain sentimental classification approach Cross_BOMEST is proposed. Proposed approach will extract â ve words using existing BOMEST technique, with the help of Ms Word Introp, Cross_BOMEST determines â ve words and replaces all its synonyms to escalate the polarity and blends two different domains and detects all the self-sufficient words. Proposed Algorithm is executed on Amazon datasets where two different domains are trained to analyze sentiments of the reviews of the other remaining domain. Proposed approach contributes propitious results in the cross domain analysis and accuracy of 92 % is obtained. Precision and Recall of BOMEST is improved by 16% and 7% respectively by the Cross_BOMEST.
An Improved sentiment classification for objective word.IJSRD
Â
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields. Customer sentiments play a very important role in daily life. Currently, Sentiment classification focused on subjective statements and ignores objective statements which also carry sentiment. During the sentiment classification, problem is faced due to the ambiguous sense (meaning) of words and negation words. In word sense disambiguation method semantic scores calculated from SentiWordNet of WordNet glosses terms. The correct sense of the word is extracted and determined similarity in WordNet glosses terms. SentiWordNet extract first sense of word which used in general sense. This work aims at improving the sentiment classification by modifying the sentiment values returned by SentiWordNet and compare classification accuracy of support vector machine and naĂĆĂÂŻve bays.
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields collecting review from people about products and social and political events through the web. Currently, Sentiment Analysis concentrates for subjective statements or on subjectivity and overlook objective statements which carry sentiment(s). During the sentiment classification more challenging problem are faced due to the ambiguous sense of words, negation words and intensifier. Due to its importance the correct sense of target word is extracted and determined for which the similarity arise in WordNet Glosses. This paper presents a survey covering the techniques and methods in sentiment analysis and challenges appear in the field.
With the rapidly increasing growth in the field of internet and web usage, it has become essential to use a certain specific powerful tool, which should be capable to analyze and rank all these available reviews/opinion on the web/Internet. In this paper we have propose a new and effective approach which uses a powerful sentiment analysis procedure which will be based on an ontological adjustment and arrangements. This study also aims to understand pos tag order to get detailed observation for any review or opinion, it also helps in identifying all present positive /Negative sentiments and suggest a proper sentence inclination. For this we have used reviews available on internet regarding Nokia and Stanford parser for the purpose or pos tagging.
Strengths-based nursing (SBN) is an approach to care in which eigh.docxcpatriciarpatricia
Â
Strengths-based nursing (SBN) is an approach to care in which eight core values which guide nursing action, thereby promoting empowerment, hope and self-efficacy. In caring for patients and families, the nurse focuses on their inner and outer strengths-that is, on what patients and families do that best helps them deal with problems and minimize deficits. SBN creating environments and experiences that better enable patients and their families to take control over their lives and health care decisions.
SBN respects a person's self-knowledge and values choice and self-determination, even though there are always limits to the choices available and a person's ability to act in her or his own interest is affected by circumstances, knowledge, and predisposition. it is as important to consider patients' deficits as it is to consider their strengths; both are essential aspects of the whole person. The current health care system is changing into a new system that focuses more on community-based and primary care with the hospitals forming the pillar of the health care system although they are not the primary service (Lind and Smith, 2008). This change has brought about a strength based nursing care which is aimed at developing an individualâs strength to encourage and help in healing. From the perspective of SBN, the nurse's role is to help patients achieve their goals in the healthiest possible way.
SBN sees the nurse's role not as deciding for others but rather as listening attentively and deeply in order to clarify, elaborate, explain, provide information, make suggestions, connect people with resources, and advocate for patients and their families so they may hear their own voices and make their voices heard. Strengths-Based Care (SBC) requires that the nurse use a process to uncover the personâs concerns, get to know the patient and members of the family as individuals, and discover their strengths in order to plan and carry out nursing care.
Nurses require strong nursing leadership to enable them practice strength-based nursing care. Strength based nursing care has a prospective of becoming a game changer in nursing and also revolutionize healthcare. In this approach the focus is redirected from shortages and crisis to use of strength of resources to deal with problems and overcome any shortcomings (Gottlieb, 2012). The medical model need not be a deficit model. The two are not mutually exclusive. Physicians can diagnose and treat problems and also have a strengths perspective and practice whole-person care.
HOLMES INSTITUTE
FACULTY OF
HIGHER EDUCATION
HS1031 Introduction to Programming â Assignment I
Assessment Details and Submission Guidelines
Trimester T1 2019
Unit Code HS1031
Unit Title Introduction to Programming
Assessment Type Individual Assignment
Assessment Title Assignment I
Purpose of the
assessment (with ULO
Mapping)
Assess studentâs ability to develop algorithmic solutions to programming problems.
Due to the fast growth of World Wide Web the online communication has increased. In recent times the communication focus has shifted to social networking. In order to enhance the text methods of communication such as tweets, blogs and chats, it is necessary to examine the emotion of user by studying the input text. Online reviews are posted by customers for the products and services on offer at a website portal. This has provided impetus to substantial growth of online purchasing making opinion analysis a vital factor for business development. To analyze such text and reviews sentiment analysis is used. Sentiment analysis is a sub domain of Natural Language Processing which acquires writerâs feelings about several products which are placed on the internet through various comments or posts. It is used to find the opinion or response of the user. Opinion may be positive, negative or neutral. In this paper a review on sentiment analysis is done and the challenges and issues involved in the process are discussed. The approaches to sentiment analysis using dictionaries such as SenticNet, SentiFul, SentiWordNet, and WordNet are studied. Dictionary-based approaches are efficient over a domain of study. Although a generalized dictionary like WordNet may be used, the accuracy of the classifier get affected due to issues like negation, synonyms, sarcasm, etc.
w
Sentimental analysis of audio based customer reviews without textual conversionIJECEIAES
Â
The current trends or procedures followed in the customer relation management system (CRM) are based on reviews, mails, and other textual data, gathered in the form of feedback from the customers. Sentiment analysis algorithms are deployed in order to gain polarity results, which can be used to improve customer services. But with evolving technologies, lately reviews or feedbacks are being dominated by audio data. As per literature, the audio contents are being translated to text and sentiments are analyzed using natural processing language techniques. However, these approaches can be time consuming. The proposed work focuses on analyzing the sentiments on the audio data itself without any textual conversion. The basic sentiment analysis polarities are mostly termed as positive, negative, and natural. But the focus is to make use of basic emotions as the base of deciding the polarity. The proposed model uses deep neural network and features such as Mel frequency cepstral coefficients (MFCC), Chroma and Mel Spectrogram on audio-based reviews.
Similar to Sentiment Analysis for IET ATC 2016 (20)
Improving predictability and performance by relating the number of events and...Asoka Korale
Â
Many processes require an estimate of the time over which to observe a certain number of events. The applications include queuing models and buffer management in electronics and telecommunications and the characterization of trading patterns in market surveillance. It is common practice in these applications to take a deterministic approach, modeling the events over intervals of time of a particular duration or considering the event inter-arrival times in order to estimate an average rate and a measure of its dispersion.
In this paper however we establish a probabilistic relationship between the number of events and the time over which to observe them. The total time over which to observe a certain number of events is equivalent to the sum of their event inter-arrival times, making the number of events and the number of inter-arrival times in the sum also equivalent. By this sum of random variables, we establish a stochastic relationship between the number of events and the total time interval over which to observe them, allowing greater flexibility in characterizing the relationships between the underlying distributions. We also use this relationship to estimate the uncertainty in the time interval taken to observe a certain number of events and relate it to an uncertainty in the average number of events observed in that interval.
The event inter-arrival times are thus modeled as a sequence of random variables drawn from a single distribution. These random variables could be drawn from a distribution estimated from historical data governing the particular arrival process or from a particular distribution used to model it. The subject of this paper is then to utilize this idea to model the behavior of a queue and server system where each state and the state transition probabilities are also stochastic. Clearer insights in to the performance of such systems is also envisaged with this type of analysis.
Improving predictability and performance by relating the number of events and...Asoka Korale
Â
In this paper however we establish a probabilistic relationship between the number of events and the time over which to observe them. The total time over which to observe a certain number of events is equivalent to the sum of their event inter-arrival times, making the number of events and the number of inter-arrival times in the sum also equivalent. By this sum of random variables, we establish a stochastic relationship between the number of events and the total time interval over which to observe them, allowing greater flexibility in characterizing the relationships between the underlying distributions. We also use this relationship to estimate the uncertainty in the time interval taken to observe a certain number of events and relate it to an uncertainty in the average number of events observed in that interval.
The event inter-arrival times are thus modeled as a sequence of random variables drawn from a single distribution. These random variables could be drawn from a distribution estimated from historical data governing the particular arrival process or from a particular distribution used to model it. The subject of this paper is then to utilize this idea to model the behaviour of a queue and server system where each state and the state transition probabilities are also stochastic. Clearer insights in to the performance of such systems is also envisaged with this type of analysis.
The event inter-arrival times are thus modeled as a sequence of random variables drawn from a single distribution. These random variables could be drawn from a distribution estimated from historical data governing the particular arrival process or from a particular distribution used to model it. The subject of this paper is then to utilize this idea to model the behaviour of a queue and server system where each state and the state transition probabilities are also stochastic. Clearer insights in to the performance of such systems is also envisaged with this type of analysis.
Novel price models in the capital marketAsoka Korale
Â
In one contribution we estimate the random process governing the movement in the price and simulate a series of realizations drawn from the same underlying process to gain insights in to unusual behavior that could manifest. In this modeling we estimate the distribution of the sequence of random variables governing the prices and generate a series of realizations or paths with the same underlying distribution. We also develop an equation to predict the expected deviation in price between two points in a sequence of prices and a measure of the uncertainty in this deviation.
In another contribution we model prices as a linear regression on consecutive prices to estimate its movement and arrive at an estimate for the distribution in the error of such a prediction. By these techniques we estimate the trend and maximum deviation in price that could be expected over a sequence of prices in order to optimize the alert thresholds.
Through these analyses we also observe that the variance in the price is dependent on the number of samples in a sequence of prices over which the measurement is made due to the behavior of the random process governing its movement and propose that the variance be estimated over a fixed number of consecutive prices ensuring a more stable and consistent estimate.
Modeling prices for capital market surveillanceAsoka Korale
Â
We estimate the random process governing the movement in the price and simulate a series of realizations drawn from the same underlying process to gain insights in to unusual behavior that could manifest. In this modeling we estimate the distribution of the sequence of random variables governing the prices and generate a series of realizations or paths with the same underlying distribution. We also develop an equation to predict the expected deviation in price between two points in a sequence of prices and a measure of the uncertainty in this deviation.
In another contribution we model prices as a linear regression on consecutive prices to estimate its movement and arrive at an estimate for the distribution in the error of such a prediction. By these techniques we estimate the trend and maximum deviation in price that could be expected over a sequence of prices in order to optimize the alert thresholds.
Through these analyses we also observe that the variance in the price is dependent on the number of samples in a sequence of prices over which the measurement is made due to the behavior of the random process governing its movement and propose that the variance be estimated over a fixed number of consecutive prices ensuring a more stable and consistent estimate.
Entity profling and collusion detectionAsoka Korale
Â
In this paper we present a novel trader profiling and collusion detection algorithm that models trading characteristics and detects collusive trading behavior. Traders place their orders in response to market conditions and the demand and supply for the security as observed in the order book. In the absence of information asymmetry, we would expect to see groups of traders follow similar trading strategies in search of profit or those that are fulfilling other roles like the provision of liquidity.
We employ two novel approaches to detecting potential collusive behaviour. In the first, the cumulative effect of trading between each pair of traders and their overall standing in the market in terms of the total number of trades and the total volume traded is observed. In the second, we create overlapping groups of traders by âfuzzy clusteringâ a set of features that characterize their trading behaviour and identify collusive behaviour through a process of cluster profiling and outlier detection.
Entity Profiling and Collusion DetectionAsoka Korale
Â
We employ two novel approaches to detecting potential collusive behavior. In the first, the cumulative effect of trading between each pair of traders and their overall standing in the market in terms of the total number of trades and the total volume traded is observed. In the second, we create overlapping groups of traders by âfuzzy clusteringâ a set of features that characterize their trading behavior and identify collusive behavior through a process of cluster profiling and outlier detection.
Markov Decision Processes in Market SurveillanceAsoka Korale
Â
In this paper we present an algorithm based on AI and machine learning techniques that estimates the average trading behavior of a trader by modeling the transactions performed in response to the observed state of the market and the expected profits and loses made with respect to each transaction. Through this modeling we can compare between the behaviors of different traders in addition to capturing the actions of individual traders in response to market conditions. Through this we aim to infer activities that provide certain participants an unfair advantage over others, allowing us to learn newer ways of market manipulation.
A framework for dynamic pricing electricity consumption patterns via time ser...Asoka Korale
Â
Clustering individual household electricity consumption patterns enables a utility to design pricing plans catered to groups of households in a particular locality to more accurately reflect the cost of supply at a particular time of day.
In this paper we model each time series as an Autoregressive Moving Average (ARMA) process with an optimal model order determined by the Akaike Information Criterion when the parameters estimated by the Hannan-Rissanen algorithm converge. The estimated model has the representation of a transfer function with a frequency response defined by the ARMA parameters. We use the frequency response as the means to further refine the within cluster profiling and classification of the objects.
Through our modeling we are also able to identify instances where the consumption behavior exhibits patterns that are uncharacteristic or not in line with the behavior or consumption profiles of the other households in a particular locality providing insights in to potential faults, fraud or illegal activity.
A framework for dynamic pricing electricity consumption patterns via time ser...Asoka Korale
Â
Clustering individual household electricity consumption patterns enables a utility to design pricing plans catered to groups of households in a particular locality to more accurately reflect the cost of supply at a particular time of day.
In this paper we model each time series as an Autoregressive Moving Average (ARMA) process with an optimal model order determined by the Akaike Information Criterion when the parameters estimated by the Hannan-Rissanen algorithm converge. The estimated model has the representation of a transfer function with a frequency response defined by the ARMA parameters. We use the frequency response as the means to further refine the within cluster profiling and classification of the objects.
Through our modeling we are also able to identify instances where the consumption behavior exhibits patterns that are uncharacteristic or not in line with the behavior or consumption profiles of the other households in a particular locality providing insights in to potential faults, fraud or illegal activity.
Customer lifetime value model is based on the discounted cash flows arising from the average annual revenues contributed by each customer (model A).
The second model is also based on the discounted cash flows arising from the average annual revenues contributed by a subscriber but a constant annual growth rate is also assumed to govern the rise in the growth in revenues (model B).
The improvements to be considered include giving due consideration to estimating the future cash flows and growth rates through regression analysis, accounting for the other revenue streams that the subscriber contributes such as DTV memberships and the value of the subscriberâs social network.
Forecasting models for Customer Lifetime ValueAsoka Korale
Â
The note presents some commonly used models in telecommunications demand forecasting. The models are presented for use in forecasting CLV with appropriately prepared revenue data.
Strategies for enhancing utilization levels in under utilized cells through targeted advertising using means such as SMS/ cell broadcast (SABP) require estimating the load and usage levels in those UMTS cells. This note lists the considerations involved in making this estimate and briefly describes the kinds of algorithms employed to maximize utilization levels by maximizing the efficient use of radio and network resources.
Cell load KPIs in support of event triggered Cellular Yield MaximizationAsoka Korale
Â
A scheme for enhancing cellular yield in 3G systems. Cell utilization can be enhanced by observing cell load on a near real time basis and making offers to those subscribers in under utilized cells at those times when the cell can accommodate a higher level of traffic.
This note presents the potential avenues available for location monitoring and position estimation and through that the possibilities for vehicular traffic estimation in a Sri Lankan context.
The network and user equipment allow for location monitoring and position estimation through several methods made available through the 3GPP standards.
Mixed Numeric and Categorical Attribute Clustering AlgorithmAsoka Korale
Â
A Matlab implementation of a Mixed Numeric and Categorical attribute clustering algorithm for digital marketing segmentations. Validated via distribution analysis and segment profile. Algorithm performance characterized through convergence of intermediate variables and parameters.
Introduction to Bit Coin Model describing the key underlying technological features, operational details, uses and applications. Implications for Mobile Operators.
Estimating Gaussian Mixture Densities via an implemetation of the Expectaatio...Asoka Korale
Â
Estimating Gaussian Mixture Densities via a Matlab implementation of the Expectation Maximization Algorithm. Decomposing an arbitrary distribution in to component Normal Distributions to facilitate clustering, state modeling & profiling.
Mapping Mobile Average Revenue per User to Personal Income level via Househol...Asoka Korale
Â
A method to estimate personal Income levels from Mobile Average Revenue per User determined through Household Income and Expenditure Surveys conducted by the Census Department. Districtwise blended ARPU adjusted to match with Household / Personal Mobile expenditures on a district level.
1. 1
A SENTIMENT ANAYSIS AND CLASSIFICATION ALGORITHM
UTILIZING AN INDEPENDENT TERM MATCHING SCHEME
SENSITIVE TO WORD COUNT PATERNS
Authors:
Asoka Korale, Ph.D., C.Eng., MIET
Chanuka Perera, Dip., ABE(UK)
Eranda Adikari, B.Sc., C.Eng., MIESL
Nadeesha Ekanayake, B.Sc.,
2. 2
Business Drivers of âSentiment Analysisâ & Classification
Devise a Customer focused Corporate Strategy
Help Determine Areas of Future Investments
Analysis of Customer Feedback for Decision making
Insights on Corporate Image, Service Level and Performance
Business Process Improvement âŚ
3. 3
Objective of the Modeling
Prioritize Comments by Sentiment (Severity of Feedback)
Classify Comments to Pre Defined Categories
Rate Sentiment contained in Feedback
Analyze Feedback Comments, Prioritize and Classify for Timely Action
Direct each Class to Appropriate Authority in Priority Order for Timely action
4. 4
âSentimentâ a Definition
Concise âCommentsâ give insight to âEmotionalâ content of message
Emotional Dimensions of Words
Valence (Happiness), Activation (Arousal), Dominance
An Opinion, View held or Expressed
Only âSelectâ words convey âEmotionâ
Dictionaries of rated Words across each Emotional Dimension
Account separately for âNegationsâ
Words rated for âSentimentâ by Human agents via large Surveys
Introduce Local Language Support
5. 5
Feedback Comment Classification Process
Supervised Methods employ âTraining Sequencesâ
Technique uses word Combinations, Patterns, Frequencies
Grouping comments on a âThemeâ or Criteria in to âClassesâ
Requires Pre Classified Comments
Suitable for classifying large texts
6. 6
Sentiment Analysis via Independent Term Matching
Assumptions -
Twitter, FB & Customer
comments
Each term in a comment independent of others
Valence, Activation and Dominance components of each word drawn from a
Normal Distribution with specified Mean and Standard Deviation
Combined overall sentiment rating of matched words occurs at
maximum of the sum of the individual Normal Densities
Overall Sentiment in a comment represented by the combined effect of
the sentiment of individual words in the comment
Suitable for small text data
Ref: http://www.csc.ncsu.edu/faculty/healey/tweet_viz/
7. 7
Algorithm â Sentiment Score for each Comment
I. Comments in
Series: Each
Analyzed
Separately
II. Select a Comment,
Convert words to
Lower case and
Remove Punctuation
V. Compute a Normal Density
Function with Mean and Standard
Deviation corresponding to each
Attribute of each matched word by
scaling a Standard Normal Random
Variable
III. Find match in Dictionary for
each word in selected comment
and get corresponding mean and
standard deviation
IV. Extract Mean and Standard
Deviation of âValenceâ and
âActivationâ attributes of each
matched word from Dictionary
Vi. Compute the sum of
the Density functions
corresponding to each
attribute of all matched
words in the comment
Vii. Determine Maximum point âmax-GMMâ of the sum of the Density functions to arrive
at an average score for the effect of that attribute across all words in the comment
Âľ =
Âľ1
Âľ2
âŚ
âŚ
Âľ đ
đ =
đ1
đ2
âŚ
âŚ
đ đ
Comment
Words Valence Rating Activation Rating
Dictionary
Value Mean Std Dev Mean Std Dev
'service' 6.83 1.54 2.95 2.09
'good' 7.89 1.24 3.66 2.72
'late' 3.32 1.17 5.57 2.56
Simple
Average 6.01 1.32 4.06 2.46
Word Valence Rating Activation Rating
max- GMM 7.5 3.7
8. 8
Gaussian Mixtures in Rating âTotal Sentimentâ
ďĽ
ď˝
ď˝
N
k
kkk mxgpxf
1
);();( ďłďą
N
pk
1
ď˝
2
2
1
2
1
),;(
ďˇ
ďˇ
ď¸
ďś
ď§
ď§
ď¨
ďŚ ď
ď
ď˝ k
kmx
k
kk emxg
ďł
ďłď°
ďł
the mean and stand deviation of the Normal Distribution of the ratings of each
matched word
overall sentiment xcomment of a comment in a particular dimension is then determined as
Consider the cumulative effect of all matched sentiment bearing words via the sum of the
individual probability densities.
x represents the sentiment score, N the number of matched words in a comment
kkm ďł,
where and
which is the point at which the probability of the mixture of distribution is
a maximum, and so is the most likely value for the overall sentiment of
a comment composed of several words.
);(
max
ďąxf
x
xcomment ď˝
9. 9
Overall Valance (Happiness) and Activation (Arousal) of a comment
Comment Words Valence Rating Activation Rating
Dictionary Value Mean Std Dev Mean Std Dev
'service' 6.83 1.54 2.95 2.09
'good' 7.89 1.24 3.66 2.72
'late' 3.32 1.17 5.57 2.56
Simple Average 6.01 1.32 4.06 2.46
Word Valence Rating Activation Rating
max- GMM 7.5 3.7
Figure 1: Gaussian Mixtures of matched words in
the Valence Dimension
Figure 2: Gaussian Mixtures of matched words in
the Activation Dimension
10. 10
IMPACT OF âNEGATIONSâ ON TOTAL RATING
Comment Words Valence Rating Activation Rating
Dictionary Value Mean Std Dev Mean Std Dev
'service' 6.83 1.54 2.95 2.09
Not 'good' 6.65 1.24 6.38 2.72
'late' 3.32 1.17 5.57 2.56
Simple Average 5.6 1.32 4.97 2.46
Word Valence Rating Activation Rating
max- GMM 6.7 4.5
Comment Words Valence Rating Activation Rating
Dictionary Value Mean Std Dev Mean Std Dev
'service' 6.83 1.54 2.95 2.09
'good' 7.89 1.24 3.66 2.72
'late' 3.32 1.17 5.57 2.56
Simple Average 6.01 1.32 4.06 2.46
âthe service was not good and lateââthe service was good but was lateâ
Word Valence Rating Activation Rating
max- GMM 7.5 3.7
ď Account for Negations by adjusting the sentiment score of word immediately following the negation in a
direction opposite in polarity to its matched directory sentiment value.
ď The magnitude of the adjustment made corresponds to the standard deviation of the particular rating value
being adjusted.
ď The magnitude of the adjustment can also be user definable
11. 11
Variance in Max GMM and Simple Average Measure
ď It is seen that 90% of the time the samples are
within +/- 0.5 in the case of the Valence Attribute.
ď The CDF of the difference in the Activation attribute
is tightly centered on the origin indicating hardly any
variance.
ď This is also an indication that most comments
convey sentiments of a single polarity and only a
few comments (less than 10%) have words with
conflicting emotional content.
Figure 1: Variance between GMM and Simple Average
measures for estimating overall comment sentiment
A measure of the degree of disparate emotions in the comments
12. 12
Sample Comments for Rating and Classification
1.HOTLINE ISSUES - DELAY IN ANSWERING - CX SERVICE ASSISTANCE
Today morning CX has called to the 444 HL for Movie Ticket & he has waited
for more than 10 mins in the line, regarding this now CX was very
disappointed on our service. So pls be kind enough to chk on ths & give the
call back to the CX ASAP. * Note: - Regarding this issue CX need the call
back from one of our manager & CX has requested not to charge a single
rupee from his no for this issue.
2.Yes,man magea prshnaya kiyapu gaman eyaa magea prshnea wisaduwaa
he's a good
3.Yes kad pin nambar signal
4.Wenath ayathana wala mema pahasukam nomati nisa
5.very good service
6.uparimaya
7.Uparima
8.think so
9.thanks
10.Super
11.Solved
12.She resolved my problem.
13.Service nallam
14.Sambanda weemata boho welawak giya nisa
15.recharge
16.Prashnayata pilithura hodin pahadili kara dima
17. Payak athulatha gataluwa nirakaranaya karanwa kiuwa. Thawamath
gataluwa nirakaranaya kara natha.
18.oba ayathanaya sewawan sadaha ihala mudalak ayakarana nisa
19.no mms setting laba dunnada save kala nohaka
20.nam apahu e tika ewanna
21.Mata awashshaya u pilithurau pahadili lesa laba ganemata hakiuna.
22.mage parshnata pilithuru dunna.
23.lotari SMS stop
24.Its professional
25.ing tone sewawa ain kirima
26.I submitted Xtv reg form on 27th oct at yr crescat arcade. They told to call
me on 28th wed to give the AC No
27.Hot line eka answer karapu girlge voice eka and care eka good
28.Hi kohomada? Mama mea dawas wala plan karagena yanawa mage next
music video eka karanna. Song eka "Mata Rawana" :-)
29.harima pehediliwa mage getaluwa nirakaranaya kala thanks
30.Good service but shortcomings due to some arrogant customer care
officers
31.good men
32Good
33.getaluwa hadunagenimata noheki wiya..
34.First of all its great to be treated as a privilege customer. Reason is simple.
I'm using X mobile connection and XTV, because dialog has the better
35.durakathanayata pilithuru denda epai eke hoda naraka kiyanna.
36.Cx need to add the CHU CHU TV which is a kids channel to the channel
list.Since this channel is available on another TV connection.Cx need this
channel to activate for XTV aswell.Please check on this and do the needfull.
Thank you
37.Customer service personal have to be trained better cause they can't think
out of the box.
38.bashawa wenaskaranna
13. 13
Sentiment Aggregates on Sample Comments
Fig 1: Heat Map of Sentiment rated sample comments Fig 2: Sentiment Dimensions of sample comments
14. 14
A Novel Association Rule Mining Algorithm
⢠Initialize (at level L1) by determining set of all Items {I} that meet minimum support criteria
⢠Determine support for all pairs of items {Ii,Ij} (i ~= j) in {I}
⢠Determine rules for all pairs of items of the form Ii->Ij
⢠At each subsequent level (Lp), p > 1
⢠Determine item combinations that meet minimum support criteria
⢠Items at subsequent stages selected from rules of previous stage that met min support
criteria
⢠Antecedent at subsequent level (Lp+1) is formed by merging the antecedent and
consequent terms of the rules that meet the minimum support criteria at level Lp
⢠Stop when combined terms no longer meet min support criteria
Deriving likely word combinations (Keyword Selection)
⢠Selection Measures NBANBASupport /)()( ďď˝ďŽ
)( BAConfidence ďŽ )(/)( ASupportBASupport ďď˝
)(/)&( ABA EPEEPď˝
)/( AB EEPď˝
15. 15
Simplifying Assumptions of the NaĂŻve Bayes Technique
Sli
)(/),,...,,()/,...,( 2121 jjNjN CPCXXXPCXXXP ď˝
)(/),,..,,(),,...,/( 3221 jJNjN CPCXXXPCXXXPď˝
)(/)()/()......,,..,/( 21 jjjnjN CPCPCXPCXXXPď˝
)/(),,.../( 2 jijNi CXPCXXXP ď˝
)/)...(/()/()/,...,,( 2121 jNjjjN CXCXPCXPCXXXP ď˝
Under the assumption of conditional independence of word Xi given class Cj
)}()/({
max
)/( jj
j
CPCXP
C
XCP ď˝
)}()./().../()/({
max
21 jjNjj
j
CPCXPCXPCXP
C
ď˝
probability of a sequence of words {Xi} in a comment given class Cj
Probability of class C given a set
of words X = {X1,X2âŚ,XN}
16. 16
Classification via NaĂŻve Bayes
Assumptions -
The order of words {Xi} in a comment is independent of each other given
the class {Cj}
A class is determined solely on the specific words in a comment and
their frequency of occurrence in that comment
Conditional Independence of the words in a comment given the class of
the comment
a âbag of words modelâ
17. 17
Performance of the Classification Algorithm
Accuracy greater than 75% on predicted classes
Accuracy greater than 90% on training samples
Performance will further increase with preprocessing and filtering
single word comments donât convey meaningful category information
Use misclassified comments to âRetrainâ algorithm
Key Words for classification via Association Rules
18. 18
Algorithm Implementation & Results
⢠Algorithm designed and built from first principals using Matlab programming language
⢠Local Language Support by updating Dictionary with Sinhala and Tamil words conveying emotion
⢠59,000 comments analyzed and Rated for Sentiment and Classified / Binned in to six categories
⢠Improved Classification by word relationships (key words) derived from Association Rule Mining
⢠3000 Training comments used with six classes for Training Model
⢠Fast implementation processing all comments in a few hours
⢠A Word vs. Frequency Analysis used to determine which new words to add to the Dictionary
⢠The Sentiment rating is a means to âprioritizeâ the handling of the sorted and binned comments
⢠Performance improvement by âre-classifyingâ , miss classified comments and reuse in Training
19. 19
Conclusion
⢠Pre Processing â improved performance by retaining only relevant words and word combinations
for the classification the business, purpose of the analysis
⢠Spelling mistakes will cause problems as words will not match those in dictionary
⢠Update Dictionary with new words and miss spelled words
⢠Introduce limits on the minimum number of words that should be matched for a comment to
be analyzed â for increased reliability
⢠Independent Term Matching â doesnât necessarily capture âmeaningâ of comment
⢠short comments can be analyzed to assess overall sentiment
⢠Rate the emotional content in a comment
⢠Algorithm can provide other segmentations by matching words specific to the purpose of routing
⢠Naïve Bayes gave good classification accuracy
⢠The severity of sentiment in the classified comment used to prioritize comment handling
⢠Simple averaging of the attribute values to arrive at the combined effect of all matched words in a
comment can also be considered and may give results that are not that far off from the assumption
of Normality