Procuring digital preservation CAN be quick and painless with our new dynamic...
Analyzing User Reviews in Tourism with Topic Models
1. ENTER 2015 Research Track Slide Number 1
Analyzing User Reviews in Tourism
with Topic Models
Marco Rossetti, Fabio Stella, Longbin Cao and
Markus Zanker*
Alpen-Adria-Universität Klagenfurt, Austria
mzanker@acm.org
http://www.aau.at
* The presenter acknowledges the financial support of the European Union (EU), the
European Regional Development Fund (ERDF), the Austrian Federal Government
and
the State of Carinthia in the Interreg IV Italien-Österreich programme (project
acronym O-STAR).
2. ENTER 2015 Research Track Slide Number 2
Agenda
• Motivation
• Topic Models
• Application scenarios
• Results
• Conclusions
3. ENTER 2015 Research Track Slide Number 3
Motivation
• Evergrowing vast amounts of data
– ~200 mio. reviews on Tripadvisor
– Valuable opinion source
• Need for automated processing of data harvested from the Web.
• Two principal (research) directions
– Machine Learning (ML): fitting general purpose statistical models to data
– Semantic Web: goal to move from the traditional „unstructured“ Web to a web of
data (annotate data with semantic descriptors and efficient reasoning
mechanisms)
• Topic Model is within the ML direction, but it promises to detect semantic
ties between words
4. ENTER 2015 Research Track Slide Number 4
Topic Model 1/3
• Method to organize, search and summarize electronic
documents
• „..algorithms for discovering the themes that pervade a large
and otherwise unstructured collection of documents.“ [Blei, CACM, 2012]
• Unsupervised learning strategy that builds on the basic idea:
– Big corpus of documents such as reviews
– Uncover hidden topical patterns
– Annotate documents according to those topics
5. ENTER 2015 Research Track Slide Number 5
Topic Model 2/3
• Topic: coherent and meaningful bag of words
• Words: can be related to several topics
(homonyms)
• Documents: can be about several topics
• Example: documents can be about cats and dogs:
– Kitten, cat, meow..
– Dog, bone,…
6. ENTER 2015 Research Track Slide Number 6
Topic Model 3/3
• Intuition: Topics are probability distributions over
words and this discrete distribution generates
observations (words in documents).
• Computation task: Compute the topic structure
given the observations (Posterior).
– Approximation of ..
– .. distribution over words for each topic
– .. topic proportion for each document
– .. topic assignment to each occurence of a word in a
document
7. ENTER 2015 Research Track Slide Number 7
Example
Topic
“Location”
Topic
“Food”
Topic
“Rooms”
walking_distance breakfast Shower
station service bathroom
city_centre Restaurant mattress
metro Bar room
close Food tv
“The hotel was right in the centre of the city, at walking
distance from the city centre! Huge breakfast with nice food!”
“I stayed in this hotel with my friends, the room
was cheap, but the shower was broken and the
mattress was very hard!”
“The room was nice, with a flat tv, but the breakfast was so
poor! I didn’t have enough food.”
Room
Food
Location
8. ENTER 2015 Research Track Slide Number 8
Goal and Contributions
1. Explore opportunities for application of
the Topic Model* method in the Tourism
domain.
2. Provide empirical evidence for their utility.
* Note that it is a family of many different methods.
9. ENTER 2015 Research Track Slide Number 9
Scenario 1: Item
recommendation
• Users write reviews about topics that they care
about (preference)
• Textual reviews associated to an overall rating
explain what aspects of the item were particularly
assessed
“The hotel was right in the centre of the city, at
walking distance from the city centre! Huge
breakfast with nice food!”
10. ENTER 2015 Research Track Slide Number 10
Topic-Criteria model 1/3
• User profiles (UP) created from topic distributions
in own reviews
𝑈�ሺ�, �ሺ=
σ �൫�ห��� ൯��� ∈��
|��|
11. ENTER 2015 Research Track Slide Number 11
Topic-Criteria model 2/3
• Item profiles created from reviews and ratings
𝐼�ሺ�, �ሺ=
σ �൫�ห�𝑖� ൯∙ �𝑖�� 𝑖� ∈��
σ �൫�ห�𝑖� ൯� 𝑖� ∈��
12. ENTER 2015 Research Track Slide Number 12
Topic-Criteria model 3/3
• Prediction based on the sum of products for all
topics
– Weight parameter fitted to data
– Assumption that not all topics are equally influential
�Ƹ𝑖� = ሺ 𝑈�ሺ𝑖, �ሺ∙ 𝐼�ሺ�, �ሺ∙ ��
�
�=1
13. ENTER 2015 Research Track Slide Number 13
Results for Scenario 1
YELP-5-5 YELP-10-10 TA-3-3 TA-5-5
KNN-IB 1,0709 1,0249 1,0531 0,9601
KNN-UB 1,1088 1,0424 1,0715 0,9447
PMF 1,0956 1,0389 1,0373 0,9946
TC 1,0706 1,0247 1,0625 0,9719
TC-W 1,0599 0,9955 1,0916 0,9776
• Evaluation on datasets from YELP (restaurants) and
Tripadvisor (hotels) with different levels of sparsity
• Accuracy results (RMSE) of Topic-Criteria model
comparable to Nearest-Neighbor and Matrix Factorization
approaches, BUT richer user profiles and we could
explain which topics have been considered in real user
interaction!
14. ENTER 2015 Research Track Slide Number 14
Scenario 2: Analytics
• Anecdotal evidence on what topics might explain a good
or bad rating for a service provider or a destination.
• BUT: risk of fallacies due to e.g. cherry-picking.
Cleanliness in reviews on
Orlando hotels
Business in reviews on
New York hotels
dirty mold bugs smelled smell filthy
carpet musty stained disgusting bed_bugs
black mildew moldy stains bites dust
musty_smell refund
internet free free_internet access
wireless internet_access
wireless_internet business_center
computers free_wireless business
boarding gym center print
free_internet_access printer bottled
passes
15. ENTER 2015 Research Track Slide Number 15
Scenario 3: Automated
Interpretation of reviews
• Automatically derive different properties from a review
such as:
– Rating value: extract topics from the written text and match with
them with the item profile – if users writes about strengths of the
hotel high score
– Identify reviews where the associated rating value is / is not
coherent with the predicted rating to identify fake reviews or
rank more plausible reviews higher
– Identify reviews with more breath / broader scope (see Daniel
Leung‘s thesis)
16. ENTER 2015 Research Track Slide Number 16
Conclusions
• Several application scenarios for the Topic Model method in
the tourism domain identified
• Empirical evidence that proposed Topic-Criteria model
achieves comparable or better results than baseline
recommendation methods
• Future work:
– Different extensions of Topic Model methods employing supervised
learning
– Contrasting derived topic distributions with real user assessments
17. ENTER 2015 Research Track Slide Number 17
Thank you for your attention!
Questions?
Questions?
Questions?
Markus Zanker
Intelligent Systems and Business Informatics
Alpen-Adria-Universität Klagenfurt, Austria
M: mzanker@acm.org
P: +43 463 2700 3753
Skype: markuszanker
W: http://www.isbi.at/mzanker
Visit: http://www.recommenderbook.net
18. ENTER 2015 Research Track Slide Number 18
Project OSTAR
• Development of an innovative online system for
recommending individual tours and trails in alpine regions
– Research partners:
• EURAC research, Bolzano, Italy
• Free University Bolzano-Bozen, Italy
• Autonomous Province of Bolzano – South Tyrol (Dept. for spatial and
statistical informatics)
• Alpen-Adria-Universität Klagenfurt
– Application partners:
• Tourism regions in Carinthia and South Tyrol
– Runtime: 2012-2014
– Programme:
• Interreg IV Italy-Austria