Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Opinion Mining on Hotel Reviews


Published on

It is based on an interesting research topic known as opinion mining. Our objective was to mine reviews from an online guide service TripAdvisor and establish a ranking system.

Published in: Data & Analytics
  • If we are speaking about saving time and money this site ⇒ ⇐ is going to be the best option!! I personally used lots of times and remain highly satisfied.
    Are you sure you want to  Yes  No
    Your message goes here
  • Check the source ⇒ ⇐ This site is really helped me out gave me relief from headaches. Good luck!
    Are you sure you want to  Yes  No
    Your message goes here
  • Ich kann eine Website empfehlen. Er hat mir wirklich geholfen. ⇒ ⇐ Zufrieden und beeindruckt.
    Are you sure you want to  Yes  No
    Your message goes here
  • My brother found Custom Writing Service and ordered a couple of works. Their customer service is outstanding, never left a query unanswered.
    Are you sure you want to  Yes  No
    Your message goes here
  • Got a new Iphone 6 in just 7 days completing surveys and offers! Now I'm just a few days away from completing and receiving my samsung tablet! Highly recommended! Definitely the best survey site out there! 
    Are you sure you want to  Yes  No
    Your message goes here

Opinion Mining on Hotel Reviews

  1. 1. Opinion Mining on Hotel Reviews CSE-4250: PROJECT AND THESIS-I Presented By Md. Rafeedul Bar Chowdhury ID: Zahidul Haque ID: Mahmud Hossain ID: Soumik Das Bibon ID:
  2. 2. Course Teacher MIR TAFSEER NAYEEM Lecturer, Dept. of Computer Science & Engineering Ahsanullah University Of Science & Technology 18-Jun-15 2
  3. 3. What is Opinion? Opinions are thoughts influencing decision making. • Which schools should I apply to? • Which professor to work for? • Whom should I vote for? • Which hotel should I book? 18-Jun-15 3
  4. 4. Whom shall I ask for opinions? Pre Web  Friends and relatives  Acquaintances Post Web  Blogs (google blogs, livejournal)  E-commerce sites (amazon, ebay)  Review sites (CNET, PC Magazine)  Discussion forums (, 18-Jun-15 4
  5. 5. Have enough opinions! Now that I have got enough opinions, I can take decisions… Is it really enough? 18-Jun-15 5
  6. 6. …Having opinions is not enough  Searching for reviews is quite difficult  Searching for opinions is not as convenient as general web search  Huge amounts of information available on one topic  Difficult and quite impossible to analyze each and every review separately  Expression of reviews are different in many ways “overall, this hotel is my first choice at Cox’s Bazar…” “the facilities are good but the service isn’t so…” “best in Cox’s Bazar by quality and value” “…disappointing” 18-Jun-15 6
  7. 7. What is Opinion Mining? Computational study of opinions, sentiments and emotions expressed in text.  Detects the contextual polarity of text (positive or, neutral or, negative)  Derives the opinion, or the attitude of an opinion holder. 18-Jun-15 7
  8. 8. Mining opinions… Its about finding out what people think… 18-Jun-15 8
  9. 9. Topic of our research Mining traveler experiences/reviews expressed on services provided by hotels in Bangladesh 18-Jun-15 9
  10. 10. Our research domain  We are mining reviews from the travelling guide, “TripAdvisor”  We are starting our work from the district that attracts tourists the most which is Cox’s Bazar  Primarily we are mining reviews of travelers visiting various hotels in this tourist gathering area 18-Jun-15 10
  11. 11. TripAdvisor 2 3 1 18-Jun-15 11
  12. 12. TripAdvisor(Cont.) 1) Searching panel for hotels, situated on different locations/areas e.g. Search Cox’s Bazar, Bangladesh, Asia for Long Beach Hotel 2) Panel for booking hotels based on prices 3) Searching Panel for a sorted list of hotels based on particular preferences e.g. types of hotel, most rated hotels etc. 18-Jun-15 12
  13. 13. TripAdvisor(Cont.) 1 2 18-Jun-15 13
  14. 14. TripAdvisor(Cont.) 1) List of hotels sorted by traveler ratings 2) Panel where travelers checks vacancy and books hotels e.g. Check-In Date/Check-Out Date of travelers 18-Jun-15 14
  15. 15. TripAdvisor(Cont.) 1 2 3 18-Jun-15 15
  16. 16. TripAdvisor(Cont.) 1) Hotel rating in stars(out of 5) based on type of reviews (positive or, negative) given by travelers e.g. Rating: 4.5 out of 5 stars; 95 Reviews 2) Position of the hotel among other hotels in an area depending on positive feedbacks from the travelers e.g. No.1 of 24 hotels in Cox’s Bazar 3) Special services provided by the hotel e.g. Free parking, Free breakfast etc. 18-Jun-15 16
  17. 17. Reviewers of TripAdvisor 1 3 52 4 18-Jun-15 17
  18. 18. Reviewers of TripAdvisor(Cont.) 1) Name and residence of the reviewing traveler e.g. MUHassan, Dhaka City, Bangladesh 2) Information about the profile of the reviewer on “TripAdvisor” site e.g. Top Contributor, 50 Reviews, 19 helpful votes etc. 3) Rating provided by the reviewer and the time of the review e.g. 5 out of 5 stars, Reviwed 12,June,2015 etc. 4) The review of the traveler in a large opinioned paragraph 5) Voting panel for other users to vote a review if it is helpful or not! 18-Jun-15 18
  19. 19. Reviewers of TripAdvisor(Cont.) 1 2 18-Jun-15 19
  20. 20. Reviewers of TripAdvisor(Cont.) 1) All the reviews provided by a certain reviewer on the site 2) Feedback given by a hotel manager upon reviewing of a traveler 18-Jun-15 20
  21. 21. Problem domain of our research  Integrity of the reviews can not be ensured  Due to competition among the hotels high possibility of false reviews  Amount of reviews highly varies which sometimes results in inaccurate ratings of hotels 18-Jun-15 21
  22. 22. Solution to these problems…  Only the reviews given by verified travelers will be prioritized first e.g. Top Contributor, Contributor etc.  For the rating system to work there has to be at least 10 or more reviews 18-Jun-15 22
  23. 23. Goal of our research  Rating the hotels according to reviews based on potential opinion holders  Finding out the hotels which impacts the most upon the traveling tax  Mining each individual reviews then summarizing them according to polarity to get accurate opinion about a hotel 18-Jun-15 23
  24. 24. Works we have done so far…  Primarily we have parsed data from popular hotels in Cox’s Bazar  We have mined and stored data of 15 hotels initially  This data will work as our data set to further our research which contains reviews and reviewers information  We have collected another data set from National Board of Revenue(NBR) about the revenue on traveling tax 18-Jun-15 24
  25. 25. Tools for Parsing data  The language we used for parsing data, -python(2.7.9)  We used a library for extracting data from HTML and XML files, -beautifulsoup (vers. 4) 18-Jun-15 25
  26. 26. Flowchart for parsing data 18-Jun-15 26
  27. 27. Code for parsing data 18-Jun-15 27
  28. 28. Code for parsing data(Cont.) 1) ‘head’ : this variable stores headline of the reviews 2) ‘date’ : time when the review was given 3) ‘member_info’ : name and residence of the reviewer 4) ‘member_badge’ : Information about reviewers profile e.g. Top Contributor, 50 Reviews etc. 5) ‘mngr_com’ : Feedbacks on reviews from hotel managers 6) ‘review’ : The whole review will be stored in this variable 1 2 3 4 5 6 18-Jun-15 28
  29. 29. Data set created from the parsed data 1 2 3 5 4 18-Jun-15 29
  30. 30. Data set on traveling tax  This is our data set which will be used to calculate the impact of hotels on traveling tax  It contains data on collected revenue from the tourism sector till fiscal year ’13 - ’14 to ’14 - ’15 18-Jun-15 30
  31. 31. Data set on traveling tax(Cont.) 18-Jun-15 31
  32. 32. Why Prioritize Opinions?  Opinion prioritizing is a must in order to distinguish mature and professional reviewers opinion from an amateur reviewers opinion.  A professionals point of view and interest will definitely make a difference in case of ranking a hotel.  There will always be some integrity/accuracy issues when it comes to opinions as someone stating a feature as satisfactory while other will state otherwise ( Whose opinion to choose? )
  33. 33. Prioritizing Opinions(Cont.)  In TripAdvisor, there is already a priority rating system available for the reviewers! 1. Reviewer: (3 – 5) reviews 2. Senior Reviewer: (6 – 10) reviews 3. Contributor: (11 – 20) reviews 4. Senior Contributor: (21 – 49) reviews 5. Top Contributor: (50+) reviews
  34. 34. Prioritizing Opinions(Cont.)  For the prioritizing purpose, we gave a potential opinion holders value for each class of reviewers. 1. Reviewer: 2 2. Senior Reviewer: 4 3. Contributor: 5 4. Senior Contributor: 7 5. Top Contributor: 10  This value indicates which reviewer gets more priority than others.
  35. 35. Opinion Subjectivity  Opinion Subjectivity is categorizing reviews as positive, negative or objective. Methods we are using for calculating opinion subjectivity: -SentiWordNet -Semi Supervised Learning Approach -Naive Bayes Method
  36. 36. SentiWordNet  It is a lexical resource that is open publically for opinion mining  It scores every word in the database as positive(+) negative(-) or objective/neutral values.
  37. 37. Semi Supervised Learning Approach  In the beginning synsets is labeled manually. i.e SentiWordNet in our case  Labeling will be expanded by machine learning approach. i.e we are using Naive Bayes Method  This process is efficient and helps avoiding labeling errors.
  38. 38. Naive Bayes Method  This method is used to predict the values of words that are not in the database or synsets.  The main formula is: Where, P(c) = Probability of Unlabeled data(c) P(d) = Probability of Labeled data(d)
  39. 39. Final Calculation  The formula we will use for ranking hotel is: Z = 𝑖=1 𝑛 (Potential Opinion Holders Value × Opinions Subjectivity) Where, Z is the deciding value for an individual hotel, which will place it in the appropriate position in the ranks.
  40. 40. For instance:  Let’s assume: Hotel 1: Long Beach Hotel Hotel 2: Resort Labiba Bilash Potential Opinion Holders value: 7 Potential Opinion Holders Value: 2 Opinion Subjectivity: +15.7 Opinion Subjectivity: +12.3 For, Hotel 1: Z = 7 * (+15.7) For, Hotel 2: Z = 2 * (+12.3) = 109.9 = 24.6 So, Hotel 1 will be ranked above Hotel 2. (assuming there is only one review in each hotels)
  41. 41. References  [1] B. Liu, “Sentiment Analysis and Subjectivity.” A Chapter in Handbook of Natural Language Processing, 2nd Edition, 2010.  [2] Kavita Ganesan , Hyun Duk kim, “Opinion Mining-A Short Tutorial”.  [3] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques” 2002.  [4]  [5] 18-Jun-15 41
  42. 42. 18-Jun-15 42