Fypca5

171 views
133 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
171
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Process of exploration and analysisBy automatic / semi automatic meansWith little or no human interactionsTo discover meaningful patterns and rulesExponential growth of user’s opinionsLimitations of human analysisAccuracy of human analysisMachines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
  • Increase in social media and web user Increase in valuable opinion oriented data in Hotel due to web expansionIdentify potential hotel to stay by looking at the aspectsIdentify best prospects (ASPECTS), and retain customersPredict what ASPECTS customers like and promote accordinglyLearn parameters influencing trends in sales and margins Identification of opinions for customers
  • Fypca5

    1. 1. Mining User’s Opinions in Hotel TEY JUN HONG U095074X
    2. 2. Content  Background Formulating the problem  Data Mining Process  Techniques  Analysis
    3. 3. What is Data Mining? Extraction of patterns Automatic Means Little human Interactions
    4. 4. The Webhttp://www
    5. 5. User’s Opinions in Hotel Identify Potential Hotel Predict what ASPECTS customers like Sales and MarginSentiment Analysis
    6. 6. Some Limitations of machines Unable to read like a human Cannot detect sarcasm Expression of sentiments in different topic and domain Polarity analysis Facts Vs Opinion
    7. 7. Some machine limitation examples “The service is as good as none”. Negation not obvious to machine “Swimming pool is big enough to swim with comfort” , “There is a big crowd at the counter complaining”. Polarity might change with context. “The room is warmer than the lobby”. Comparisons are hard to classify
    8. 8. Project
    9. 9. Sentiment Analysis Prediction of sentence polarity Classification of polarity for sentiment lexicon Detection of relations
    10. 10. Data Mining Process
    11. 11. Cleaning The “Dirty” Reviews Frequent problem : Data inconsistencies Duplicate data Spelling Errors != Trim from data Foreign accent and characters Singular / Plural conversion Punctuations removal / replacement Noise and incomplete data Naming convention misused, same name but different meaning
    12. 12. Data Preprocessing Part Of Speech Tags
    13. 13. Data Preprocessing Polarity tagging using sentiment lexicon Occurrence HIGH Sentiment Lexicon Tag The Word +VE BEST Part of Speech Tag ADJ
    14. 14. Findings Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM -95% accuracy of POS tagging words after data cleaning
    15. 15. Findings Polarity tagging using sentiment lexicon – BIG PROBLEM -40% sentiment words not found in sentiment lexicon -10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
    16. 16. Problems Sentiment lexicon not comprehensive Domain Independent Sentiment Words Domain Dependent Sentiment Words
    17. 17. Solutions Rule Based Mining Relation Based Mining
    18. 18. Rule Based Mining
    19. 19. Relation Based Mining
    20. 20. Analysis - Bayesian To determine polarity of sentiments P(X | Y) = P(X) P(Y | X) / P(Y) Probability that a sentiments is positive or negative, given its contents P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
    21. 21. Validation• Precision = N (agree & found) / N (found)• High precision means most of the correct sentiment words are found by the system• Recall = N (agree & found) / N (agree)• High recall means most of found sentiment words are correctly labeled by the system
    22. 22. Validation Results
    23. 23. Validation Results It is found that out of the 350 aspect-unlabelled sentiment word pairs, 294 are founded by the methods. Thus, the precision is about 84%. The recall : 276 words are corrected labelled by the system, which is about 78%
    24. 24. Application Reviews Rating Aspect Rating Summary of reviews

    ×