Your SlideShare is downloading. ×
0
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Fyp ca2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Fyp ca2

473

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
473
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MINING USER’SOPINIONS ON HOTELS
  • 2. BRIEF RECAP ON CA1
  • 3. Literature Review / Background Web is a huge database of opinions on hotels Commercial Possibilities / Business Intelligence “What others think” is an important element in decision making Opinion Mining / Sentiment Analysis
  • 4. Far From a Solved Problem Impossible for human read every single opinions  Machines can be trained to do this People always express more than one opinion Use of Sarcasm and Negation Expression of sentiments in different topic and domain  eg big: Positive when swimming pool is big enough to swim, Negative when the queue is long
  • 5. How to train a machine to analyzesentiments Natural Language Processing (NLP)  Transform opinion to a format the machine understand Artificial Intelligence  Machine are able to use information given by NLP and a lot of math to analyze sentiments  Make the machine determine what is facts and opinions like how a normal human understand them by reading
  • 6. Problems of Machine Subjectivity and Sentiment Analyze polarity Opinion rating Sentiment intensity Different domains / topic context Facts Vs Opinion
  • 7. Ambiguity to machine examples “The swimming pool is better than the tennis court”.  Comparisons are hard to classify “This hotel is very boleh lah”  Use of Slang and cultural communication “This breakfast is as good as none”  Negativity not obvious to machine “The weather is hot”  In different context, the statement has different polarity
  • 8. WHAT IS DONE IN CA1
  • 9. EXTRACTION – Preparing machine to analyze data
  • 10. Review and aspects extraction process Extract important datasets from review websites Word handling to refine datasets Use part of speech tagging to label text to extract aspects which are nouns Determine aspects / features that people are concerned about from these reviews by occurrence and context
  • 11. Part of Speech Tagging Assigning a label to every word in the text to allow machine to do something with it
  • 12. Word Handling Dictionary / Spelling Correction Slang Check Foreign language check Singular / Plural conversion Duplicate check
  • 13. END OF CA1
  • 14. CA2 : Data Processing
  • 15. Classifying Sentiments using someexisting methods Naïve Bayes  To determine polarity of sentiments Maximum Entropy  Using probability distributions on the basis of partial knowledge Support Vector machine  Analyze patterns and classify sentiments
  • 16. Naïve Bayes Classifier To determine polarity of sentiments P(X | Y) = P(X)P(Y | X) / P(Y) Probability that a sentiments is positive or negative, given its contents Probability of a word occurring given a positive or negative sentiment Assumptions: There is no link between words P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  • 17. Problem with Naïve Bayes Polarity does not change with domain Words within sentiments have no relationship with each other Words not found in lexicon might be missed by Naïve Bayes resulting in inaccuracy of polarity No opinion rating to determine which sentiment is more polar
  • 18. Solution to Naïve Bayes Establish domain sentiment relations Establish domain aspects relations Establish aspects sentiments relations Estimate polarity for unseeded sentiments Estimate strength of polarity on sentiments
  • 19. Establishing relations Establish domain by categorizing aspects founded into domains such as food, location and security Finding occurrence of aspects / sentiments within sentences for a particular domain Finding polarity of sentences, aspects and sentiments and establishing relations Domain Sentiments Aspects
  • 20. Finding polarity for unseeded sentiments After establishing relations, we have a graph of nodes (Sentiments / Aspects) Some nodes have no polarity after naïve bayes but its connected nodes might have polarity Determine the probability that the node is positive or negative given its surrounding nodes
  • 21. Estimating the strength of polarity Determine the strength of the polarity of an unseeded node given that amount of traversal surrounding nodes with polarity has to take to reach it Find the shortest path to reach an unseeded node which will result in a spanning tree This will determine the strength of polarity
  • 22. Implementation Using Dijkstra Algorithm to find the spanning tree
  • 23. Implementation Find the cost to get from surrounding nodes to an unseed node
  • 24. END OF CA2
  • 25. What is going to happen in CA3?
  • 26. Prototyping Refining parameters to come up with a prototype mainly to solve the following problems:  Analyze polarity  Opinion rating  Sentiment intensity  Different domains / topic context Manually analyze reviews myself and check prototype for effectiveness and seek to improve accuracy
  • 27. Prototype testing Enlarging dataset from various hotel review site Merging results to find correlations between sentiments expression on different sites Testing on different domain such as food to get domain dependent results

×