The document discusses mining user opinions from hotel reviews through sentiment analysis and data mining techniques. It describes how sentiment analysis can be used to identify aspects of hotels that customers like or dislike in order to improve sales and margins. It also discusses some limitations of machines in sentiment analysis and examples. The document then outlines the data mining process used, including data cleaning, preprocessing with part-of-speech tagging and sentiment lexicon tagging. It finds issues with sentiment lexicon coverage and proposes rule-based and relation-based mining as solutions. Validation results show 84% precision and 78% recall for the sentiment analysis techniques.
Social CRM is based on the simple premise that you are able to interact with your customers based on their needs, not your rules. It is an extension of CRM, not a replacement, and among the important
benefits is that it adds value back to the users and customers.
Social CRM is based on the simple premise that you are able to interact with your customers based on their needs, not your rules. It is an extension of CRM, not a replacement, and among the important
benefits is that it adds value back to the users and customers.
This presentation covers the application of Big Data principles in Customer Experience Management. I present data models to help companies integrate, organize and analyze their disparate data sources (e.g., operational, financial, constituency and customer feedback) to improve the customer experience and customer loyalty.
Presented by Marjorie Hlava, president of Access Innovations, Inc. on August 10, 2011. Part two of the Special Libraries Association's Leveraging Your Taxonomy series.
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
These slides introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency,
word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 and WordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable. An extension in RDF format, including also scripts for data
processing, is under development.
This presentation covers the application of Big Data principles in Customer Experience Management. I present data models to help companies integrate, organize and analyze their disparate data sources (e.g., operational, financial, constituency and customer feedback) to improve the customer experience and customer loyalty.
Presented by Marjorie Hlava, president of Access Innovations, Inc. on August 10, 2011. Part two of the Special Libraries Association's Leveraging Your Taxonomy series.
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
These slides introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency,
word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 and WordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable. An extension in RDF format, including also scripts for data
processing, is under development.
Analyzing Arguments during a Debate using Natural Language Processing in PythonAbhinav Gupta
This presentation will guide you through the application of Python NLP Techniques to analyze arguments during a debate and define a strategy to figure out the winner of the debate on the basis of strength and relevance of the arguments.
This is made for PyCon India 2015.
For details : https://in.pycon.org/cfp/pycon-india-2015/proposals/analyzing-arguments-during-a-debate-using-natural-language-processing-in-python/
Contact me : abhinav.gpt3@gmail.com
5. User’s Opinions in Hotel
Identify Potential Hotel
Predict what ASPECTS customers like
Sales and Margin
Sentiment Analysis
6. Some Limitations of machines
Unable to read like a human
Cannot detect sarcasm
Expression of sentiments in different topic and domain
Polarity analysis
Facts Vs Opinion
7. Some machine limitation examples
“The service is as good as none”. Negation not obvious to
machine
“Swimming pool is big enough to swim with comfort” ,
“There is a big crowd at the counter complaining”. Polarity
might change with context.
“The room is warmer than the lobby”. Comparisons are
hard to classify
11. Cleaning The “Dirty” Reviews
Frequent problem : Data inconsistencies
Duplicate data
Spelling Errors != Trim from data
Foreign accent and characters
Singular / Plural conversion
Punctuations removal / replacement
Noise and incomplete data
Naming convention misused, same name but different meaning
13. Data Preprocessing
Polarity tagging using sentiment lexicon
Occurrence
HIGH
Sentiment Lexicon
Tag
The Word
+VE
BEST
Part of Speech Tag
ADJ
14. Findings
Part of Speech Tagging (POS) using Brill Tagger - NO
PROBLEM
-95% accuracy of POS tagging words after data cleaning
15. Findings
Polarity tagging using sentiment lexicon – BIG PROBLEM
-40% sentiment words not found in sentiment lexicon
-10% sentiment words with a positive or negative polarity
found are in the neutral section of sentiment lexicon
16. Problems
Sentiment lexicon not comprehensive
Domain Independent Sentiment Words
Domain Dependent Sentiment Words
20. Analysis - Bayesian
To determine polarity of sentiments
P(X | Y) = P(X) P(Y | X) / P(Y)
Probability that a sentiments is positive or negative, given
it's contents
P(sentiment | sentence) = P(sentiment)P(sentence |
sentiment) / P(sentence)
21. Validation
• Precision = N (agree & found) / N (found)
• High precision means most of the correct sentiment
words are found by the system
• Recall = N (agree & found) / N (agree)
• High recall means most of found sentiment words are
correctly labeled by the system
23. Validation Results
It is found that out of the 350 aspect-unlabelled sentiment
word pairs,
294 are founded by the methods. Thus, the precision is
about 84%.
The recall : 276 words are corrected labelled by the
system, which is about 78%
Process of exploration and analysisBy automatic / semi automatic meansWith little or no human interactionsTo discover meaningful patterns and rulesExponential growth of user’s opinionsLimitations of human analysisAccuracy of human analysisMachines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
Increase in social media and web user Increase in valuable opinion oriented data in Hotel due to web expansionIdentify potential hotel to stay by looking at the aspectsIdentify best prospects (ASPECTS), and retain customersPredict what ASPECTS customers like and promote accordinglyLearn parameters influencing trends in sales and margins Identification of opinions for customers