• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



3 Embeds 22

http://localhost 15
http://sdnim.com 6 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Process of exploration and analysisBy automatic / semi automatic meansWith little or no human interactionsTo discover meaningful patterns and rulesExponential growth of user’s opinionsLimitations of human analysisAccuracy of human analysisMachines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
  • Increase in social media and web user Increase in valuable opinion oriented data in Hotel due to web expansionIdentify potential hotel to stay by looking at the aspectsIdentify best prospects (ASPECTS), and retain customersPredict what ASPECTS customers like and promote accordinglyLearn parameters influencing trends in sales and margins Identification of opinions for customers

Fypca5 Fypca5 Presentation Transcript

  • Mining User’s Opinions in Hotel TEY JUN HONG U095074X
  • Content  Background Formulating the problem  Data Mining Process  Techniques  Analysis
  • What is Data Mining? Extraction of patterns Automatic Means Little human Interactions
  • The Webhttp://www
  • User’s Opinions in Hotel Identify Potential Hotel Predict what ASPECTS customers like Sales and MarginSentiment Analysis
  • Some Limitations of machines Unable to read like a human Cannot detect sarcasm Expression of sentiments in different topic and domain Polarity analysis Facts Vs Opinion
  • Some machine limitation examples “The service is as good as none”. Negation not obvious to machine “Swimming pool is big enough to swim with comfort” , “There is a big crowd at the counter complaining”. Polarity might change with context. “The room is warmer than the lobby”. Comparisons are hard to classify
  • Project
  • Sentiment Analysis Prediction of sentence polarity Classification of polarity for sentiment lexicon Detection of relations
  • Data Mining Process
  • Cleaning The “Dirty” Reviews Frequent problem : Data inconsistencies Duplicate data Spelling Errors != Trim from data Foreign accent and characters Singular / Plural conversion Punctuations removal / replacement Noise and incomplete data Naming convention misused, same name but different meaning
  • Data Preprocessing Part Of Speech Tags
  • Data Preprocessing Polarity tagging using sentiment lexicon Occurrence HIGH Sentiment Lexicon Tag The Word +VE BEST Part of Speech Tag ADJ
  • Findings Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM -95% accuracy of POS tagging words after data cleaning
  • Findings Polarity tagging using sentiment lexicon – BIG PROBLEM -40% sentiment words not found in sentiment lexicon -10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
  • Problems Sentiment lexicon not comprehensive Domain Independent Sentiment Words Domain Dependent Sentiment Words
  • Solutions Rule Based Mining Relation Based Mining
  • Rule Based Mining
  • Relation Based Mining
  • Analysis - Bayesian To determine polarity of sentiments P(X | Y) = P(X) P(Y | X) / P(Y) Probability that a sentiments is positive or negative, given its contents P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  • Validation• Precision = N (agree & found) / N (found)• High precision means most of the correct sentiment words are found by the system• Recall = N (agree & found) / N (agree)• High recall means most of found sentiment words are correctly labeled by the system
  • Validation Results
  • Validation Results It is found that out of the 350 aspect-unlabelled sentiment word pairs, 294 are founded by the methods. Thus, the precision is about 84%. The recall : 276 words are corrected labelled by the system, which is about 78%
  • Application Reviews Rating Aspect Rating Summary of reviews