Successfully reported this slideshow.

Fyp ca2

652 views

Published on

  • Be the first to comment

  • Be the first to like this

Fyp ca2

  1. 1. MINING USER’SOPINIONS ON HOTELS
  2. 2. BRIEF RECAP ON CA1
  3. 3. Literature Review / Background Web is a huge database of opinions on hotels Commercial Possibilities / Business Intelligence “What others think” is an important element in decision making Opinion Mining / Sentiment Analysis
  4. 4. Far From a Solved Problem Impossible for human read every single opinions  Machines can be trained to do this People always express more than one opinion Use of Sarcasm and Negation Expression of sentiments in different topic and domain  eg big: Positive when swimming pool is big enough to swim, Negative when the queue is long
  5. 5. How to train a machine to analyzesentiments Natural Language Processing (NLP)  Transform opinion to a format the machine understand Artificial Intelligence  Machine are able to use information given by NLP and a lot of math to analyze sentiments  Make the machine determine what is facts and opinions like how a normal human understand them by reading
  6. 6. Problems of Machine Subjectivity and Sentiment Analyze polarity Opinion rating Sentiment intensity Different domains / topic context Facts Vs Opinion
  7. 7. Ambiguity to machine examples “The swimming pool is better than the tennis court”.  Comparisons are hard to classify “This hotel is very boleh lah”  Use of Slang and cultural communication “This breakfast is as good as none”  Negativity not obvious to machine “The weather is hot”  In different context, the statement has different polarity
  8. 8. WHAT IS DONE IN CA1
  9. 9. EXTRACTION – Preparing machine to analyze data
  10. 10. Review and aspects extraction process Extract important datasets from review websites Word handling to refine datasets Use part of speech tagging to label text to extract aspects which are nouns Determine aspects / features that people are concerned about from these reviews by occurrence and context
  11. 11. Part of Speech Tagging Assigning a label to every word in the text to allow machine to do something with it
  12. 12. Word Handling Dictionary / Spelling Correction Slang Check Foreign language check Singular / Plural conversion Duplicate check
  13. 13. END OF CA1
  14. 14. CA2 : Data Processing
  15. 15. Classifying Sentiments using someexisting methods Naïve Bayes  To determine polarity of sentiments Maximum Entropy  Using probability distributions on the basis of partial knowledge Support Vector machine  Analyze patterns and classify sentiments
  16. 16. Naïve Bayes Classifier To determine polarity of sentiments P(X | Y) = P(X)P(Y | X) / P(Y) Probability that a sentiments is positive or negative, given its contents Probability of a word occurring given a positive or negative sentiment Assumptions: There is no link between words P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  17. 17. Problem with Naïve Bayes Polarity does not change with domain Words within sentiments have no relationship with each other Words not found in lexicon might be missed by Naïve Bayes resulting in inaccuracy of polarity No opinion rating to determine which sentiment is more polar
  18. 18. Solution to Naïve Bayes Establish domain sentiment relations Establish domain aspects relations Establish aspects sentiments relations Estimate polarity for unseeded sentiments Estimate strength of polarity on sentiments
  19. 19. Establishing relations Establish domain by categorizing aspects founded into domains such as food, location and security Finding occurrence of aspects / sentiments within sentences for a particular domain Finding polarity of sentences, aspects and sentiments and establishing relations Domain Sentiments Aspects
  20. 20. Finding polarity for unseeded sentiments After establishing relations, we have a graph of nodes (Sentiments / Aspects) Some nodes have no polarity after naïve bayes but its connected nodes might have polarity Determine the probability that the node is positive or negative given its surrounding nodes
  21. 21. Estimating the strength of polarity Determine the strength of the polarity of an unseeded node given that amount of traversal surrounding nodes with polarity has to take to reach it Find the shortest path to reach an unseeded node which will result in a spanning tree This will determine the strength of polarity
  22. 22. Implementation Using Dijkstra Algorithm to find the spanning tree
  23. 23. Implementation Find the cost to get from surrounding nodes to an unseed node
  24. 24. END OF CA2
  25. 25. What is going to happen in CA3?
  26. 26. Prototyping Refining parameters to come up with a prototype mainly to solve the following problems:  Analyze polarity  Opinion rating  Sentiment intensity  Different domains / topic context Manually analyze reviews myself and check prototype for effectiveness and seek to improve accuracy
  27. 27. Prototype testing Enlarging dataset from various hotel review site Merging results to find correlations between sentiments expression on different sites Testing on different domain such as food to get domain dependent results

×