SlideShare a Scribd company logo
1 of 18
ENTER 2015 Research Track Slide Number 1
Analyzing User Reviews in Tourism
with Topic Models
Marco Rossetti, Fabio Stella, Longbin Cao and
Markus Zanker*
Alpen-Adria-Universität Klagenfurt, Austria
mzanker@acm.org
http://www.aau.at
* The presenter acknowledges the financial support of the European Union (EU), the
European Regional Development Fund (ERDF), the Austrian Federal Government
and
the State of Carinthia in the Interreg IV Italien-Österreich programme (project
acronym O-STAR).
ENTER 2015 Research Track Slide Number 2
Agenda
• Motivation
• Topic Models
• Application scenarios
• Results
• Conclusions
ENTER 2015 Research Track Slide Number 3
Motivation
• Evergrowing vast amounts of data
– ~200 mio. reviews on Tripadvisor
– Valuable opinion source
• Need for automated processing of data harvested from the Web.
• Two principal (research) directions
– Machine Learning (ML): fitting general purpose statistical models to data
– Semantic Web: goal to move from the traditional „unstructured“ Web to a web of
data (annotate data with semantic descriptors and efficient reasoning
mechanisms)
• Topic Model is within the ML direction, but it promises to detect semantic
ties between words
ENTER 2015 Research Track Slide Number 4
Topic Model 1/3
• Method to organize, search and summarize electronic
documents
• „..algorithms for discovering the themes that pervade a large
and otherwise unstructured collection of documents.“ [Blei, CACM, 2012]
• Unsupervised learning strategy that builds on the basic idea:
– Big corpus of documents such as reviews
– Uncover hidden topical patterns
– Annotate documents according to those topics
ENTER 2015 Research Track Slide Number 5
Topic Model 2/3
• Topic: coherent and meaningful bag of words
• Words: can be related to several topics
(homonyms)
• Documents: can be about several topics
• Example: documents can be about cats and dogs:
– Kitten, cat, meow..
– Dog, bone,…
ENTER 2015 Research Track Slide Number 6
Topic Model 3/3
• Intuition: Topics are probability distributions over
words and this discrete distribution generates
observations (words in documents).
• Computation task: Compute the topic structure
given the observations (Posterior).
– Approximation of ..
– .. distribution over words for each topic
– .. topic proportion for each document
– .. topic assignment to each occurence of a word in a
document
ENTER 2015 Research Track Slide Number 7
Example
Topic
“Location”
Topic
“Food”
Topic
“Rooms”
walking_distance breakfast Shower
station service bathroom
city_centre Restaurant mattress
metro Bar room
close Food tv
“The hotel was right in the centre of the city, at walking
distance from the city centre! Huge breakfast with nice food!”
“I stayed in this hotel with my friends, the room
was cheap, but the shower was broken and the
mattress was very hard!”
“The room was nice, with a flat tv, but the breakfast was so
poor! I didn’t have enough food.”
Room
Food
Location
ENTER 2015 Research Track Slide Number 8
Goal and Contributions
1. Explore opportunities for application of
the Topic Model* method in the Tourism
domain.
2. Provide empirical evidence for their utility.
* Note that it is a family of many different methods.
ENTER 2015 Research Track Slide Number 9
Scenario 1: Item
recommendation
• Users write reviews about topics that they care
about (preference)
• Textual reviews associated to an overall rating
explain what aspects of the item were particularly
assessed
“The hotel was right in the centre of the city, at
walking distance from the city centre! Huge
breakfast with nice food!”
ENTER 2015 Research Track Slide Number 10
Topic-Criteria model 1/3
• User profiles (UP) created from topic distributions
in own reviews
𝑈�ሺ�, �ሺ=
σ �൫�ห��� ൯��� ∈��
|��|
ENTER 2015 Research Track Slide Number 11
Topic-Criteria model 2/3
• Item profiles created from reviews and ratings
𝐼�ሺ�, �ሺ=
σ �൫�ห�𝑖� ൯∙ �𝑖�� 𝑖� ∈��
σ �൫�ห�𝑖� ൯� 𝑖� ∈��
ENTER 2015 Research Track Slide Number 12
Topic-Criteria model 3/3
• Prediction based on the sum of products for all
topics
– Weight parameter fitted to data
– Assumption that not all topics are equally influential
�Ƹ𝑖� = ሺ 𝑈�ሺ𝑖, �ሺ∙ 𝐼�ሺ�, �ሺ∙ ��
�
�=1
ENTER 2015 Research Track Slide Number 13
Results for Scenario 1
YELP-5-5 YELP-10-10 TA-3-3 TA-5-5
KNN-IB 1,0709 1,0249 1,0531 0,9601
KNN-UB 1,1088 1,0424 1,0715 0,9447
PMF 1,0956 1,0389 1,0373 0,9946
TC 1,0706 1,0247 1,0625 0,9719
TC-W 1,0599 0,9955 1,0916 0,9776
• Evaluation on datasets from YELP (restaurants) and
Tripadvisor (hotels) with different levels of sparsity
• Accuracy results (RMSE) of Topic-Criteria model
comparable to Nearest-Neighbor and Matrix Factorization
approaches, BUT richer user profiles and we could
explain which topics have been considered in real user
interaction!
ENTER 2015 Research Track Slide Number 14
Scenario 2: Analytics
• Anecdotal evidence on what topics might explain a good
or bad rating for a service provider or a destination.
• BUT: risk of fallacies due to e.g. cherry-picking.
Cleanliness in reviews on
Orlando hotels
Business in reviews on
New York hotels
dirty mold bugs smelled smell filthy
carpet musty stained disgusting bed_bugs
black mildew moldy stains bites dust
musty_smell refund
internet free free_internet access
wireless internet_access
wireless_internet business_center
computers free_wireless business
boarding gym center print
free_internet_access printer bottled
passes
ENTER 2015 Research Track Slide Number 15
Scenario 3: Automated
Interpretation of reviews
• Automatically derive different properties from a review
such as:
– Rating value: extract topics from the written text and match with
them with the item profile – if users writes about strengths of the
hotel  high score
– Identify reviews where the associated rating value is / is not
coherent with the predicted rating to identify fake reviews or
rank more plausible reviews higher
– Identify reviews with more breath / broader scope (see Daniel
Leung‘s thesis)
ENTER 2015 Research Track Slide Number 16
Conclusions
• Several application scenarios for the Topic Model method in
the tourism domain identified
• Empirical evidence that proposed Topic-Criteria model
achieves comparable or better results than baseline
recommendation methods
• Future work:
– Different extensions of Topic Model methods employing supervised
learning
– Contrasting derived topic distributions with real user assessments
ENTER 2015 Research Track Slide Number 17
Thank you for your attention!
Questions?
Questions?
Questions?
Markus Zanker
Intelligent Systems and Business Informatics
Alpen-Adria-Universität Klagenfurt, Austria
M: mzanker@acm.org
P: +43 463 2700 3753
Skype: markuszanker
W: http://www.isbi.at/mzanker
Visit: http://www.recommenderbook.net
ENTER 2015 Research Track Slide Number 18
Project OSTAR
• Development of an innovative online system for
recommending individual tours and trails in alpine regions
– Research partners:
• EURAC research, Bolzano, Italy
• Free University Bolzano-Bozen, Italy
• Autonomous Province of Bolzano – South Tyrol (Dept. for spatial and
statistical informatics)
• Alpen-Adria-Universität Klagenfurt
– Application partners:
• Tourism regions in Carinthia and South Tyrol
– Runtime: 2012-2014
– Programme:
• Interreg IV Italy-Austria

More Related Content

What's hot

Context-Aware Points of Interest Suggestion with Dynamic Weather Data Management
Context-Aware Points of Interest Suggestion with Dynamic Weather Data ManagementContext-Aware Points of Interest Suggestion with Dynamic Weather Data Management
Context-Aware Points of Interest Suggestion with Dynamic Weather Data ManagementMatthias Braunhofer
 
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...Seoul National University
 
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...Seoul National University
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsMatthias Braunhofer
 
Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017Adrian Mladenic Grobelnik
 
Entity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problemsEntity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problemsYuliya Rubtsova
 
Wang Xingrun_CV(v1.2)
Wang Xingrun_CV(v1.2)Wang Xingrun_CV(v1.2)
Wang Xingrun_CV(v1.2)Xingrun Wang
 
Chun-Feng Chen Resume (First Release)
Chun-Feng Chen Resume (First Release)Chun-Feng Chen Resume (First Release)
Chun-Feng Chen Resume (First Release)Chun-Feng Chen
 
IT3010 Lecture Design and Creation
IT3010 Lecture Design and CreationIT3010 Lecture Design and Creation
IT3010 Lecture Design and CreationBabakFarshchian
 
Analysis computerscience disciplines
Analysis computerscience disciplinesAnalysis computerscience disciplines
Analysis computerscience disciplinesManuela Aparicio
 
Survey Research in Software Engineering
Survey Research in Software EngineeringSurvey Research in Software Engineering
Survey Research in Software EngineeringDaniel Mendez
 
Learning analytics exemplar template
Learning analytics exemplar templateLearning analytics exemplar template
Learning analytics exemplar templateSimon Buckingham Shum
 
Predicting College STEM Enrollment using HPCC Systems in Educational Research
Predicting College STEM Enrollment using HPCC Systems in Educational ResearchPredicting College STEM Enrollment using HPCC Systems in Educational Research
Predicting College STEM Enrollment using HPCC Systems in Educational ResearchHPCC Systems
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 

What's hot (20)

Context-Aware Points of Interest Suggestion with Dynamic Weather Data Management
Context-Aware Points of Interest Suggestion with Dynamic Weather Data ManagementContext-Aware Points of Interest Suggestion with Dynamic Weather Data Management
Context-Aware Points of Interest Suggestion with Dynamic Weather Data Management
 
msf562-syllabus
msf562-syllabusmsf562-syllabus
msf562-syllabus
 
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
 
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
 
WRITING SERVICES
WRITING SERVICESWRITING SERVICES
WRITING SERVICES
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start Recommendations
 
Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017
 
Entity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problemsEntity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problems
 
How to Think Like A Statistician
How to Think Like A StatisticianHow to Think Like A Statistician
How to Think Like A Statistician
 
Wang Xingrun_CV(v1.2)
Wang Xingrun_CV(v1.2)Wang Xingrun_CV(v1.2)
Wang Xingrun_CV(v1.2)
 
Chun-Feng Chen Resume (First Release)
Chun-Feng Chen Resume (First Release)Chun-Feng Chen Resume (First Release)
Chun-Feng Chen Resume (First Release)
 
IT3010 Lecture Design and Creation
IT3010 Lecture Design and CreationIT3010 Lecture Design and Creation
IT3010 Lecture Design and Creation
 
Analysis computerscience disciplines
Analysis computerscience disciplinesAnalysis computerscience disciplines
Analysis computerscience disciplines
 
Survey Research in Software Engineering
Survey Research in Software EngineeringSurvey Research in Software Engineering
Survey Research in Software Engineering
 
ICSE12 SEE.ppt
ICSE12 SEE.pptICSE12 SEE.ppt
ICSE12 SEE.ppt
 
Learning analytics exemplar template
Learning analytics exemplar templateLearning analytics exemplar template
Learning analytics exemplar template
 
Predicting College STEM Enrollment using HPCC Systems in Educational Research
Predicting College STEM Enrollment using HPCC Systems in Educational ResearchPredicting College STEM Enrollment using HPCC Systems in Educational Research
Predicting College STEM Enrollment using HPCC Systems in Educational Research
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic Technologies
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 

Viewers also liked

Viewers also liked (20)

GUEST FOCUS IN A DIGITAL WORLD
GUEST FOCUS IN A DIGITAL WORLDGUEST FOCUS IN A DIGITAL WORLD
GUEST FOCUS IN A DIGITAL WORLD
 
Methodology for the publication of Linked Open Data from small and medium siz...
Methodology for the publication of Linked Open Data from small and medium siz...Methodology for the publication of Linked Open Data from small and medium siz...
Methodology for the publication of Linked Open Data from small and medium siz...
 
@Spain is different. Co-branding strategies between Spanish national and regi...
@Spain is different. Co-branding strategies between Spanish national and regi...@Spain is different. Co-branding strategies between Spanish national and regi...
@Spain is different. Co-branding strategies between Spanish national and regi...
 
Tourists and Municipal Wi-Fi Networks. The case of Lugano (CH
Tourists and Municipal Wi-Fi Networks. The case of Lugano (CHTourists and Municipal Wi-Fi Networks. The case of Lugano (CH
Tourists and Municipal Wi-Fi Networks. The case of Lugano (CH
 
Re-conceptualizing MOOC Success
Re-conceptualizing MOOC SuccessRe-conceptualizing MOOC Success
Re-conceptualizing MOOC Success
 
Probabilistic Modelling of Influences on Travel Decision Making
Probabilistic Modelling of Influences on Travel Decision MakingProbabilistic Modelling of Influences on Travel Decision Making
Probabilistic Modelling of Influences on Travel Decision Making
 
An examination of the e-bookers and e-browsers in emerging markets – online b...
An examination of the e-bookers and e-browsers in emerging markets – online b...An examination of the e-bookers and e-browsers in emerging markets – online b...
An examination of the e-bookers and e-browsers in emerging markets – online b...
 
How to get research papers published
How to get research papers publishedHow to get research papers published
How to get research papers published
 
Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...
Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...
Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...
 
Mediating Roles of Self-Image Expression: Sharing Travel Information of SNSs
Mediating Roles of Self-Image Expression: Sharing Travel Information of SNSsMediating Roles of Self-Image Expression: Sharing Travel Information of SNSs
Mediating Roles of Self-Image Expression: Sharing Travel Information of SNSs
 
Smart Tourism Destinations: An Extended Conception of Smart Cities focusing o...
Smart Tourism Destinations: An Extended Conception of Smart Cities focusing o...Smart Tourism Destinations: An Extended Conception of Smart Cities focusing o...
Smart Tourism Destinations: An Extended Conception of Smart Cities focusing o...
 
FRIBOURG REGION
FRIBOURG REGIONFRIBOURG REGION
FRIBOURG REGION
 
E-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSING
E-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSINGE-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSING
E-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSING
 
Corporate identity communication on corporate websites: Evidence from the Hon...
Corporate identity communication on corporate websites: Evidence from the Hon...Corporate identity communication on corporate websites: Evidence from the Hon...
Corporate identity communication on corporate websites: Evidence from the Hon...
 
Travel Social Media Involvement: A Proposed Measure
Travel Social Media Involvement: A Proposed MeasureTravel Social Media Involvement: A Proposed Measure
Travel Social Media Involvement: A Proposed Measure
 
Smart and Connected Tourism Technologies
Smart and Connected Tourism TechnologiesSmart and Connected Tourism Technologies
Smart and Connected Tourism Technologies
 
An Application of Apriori Algorithm Association Rules Mining to Profiling the...
An Application of Apriori Algorithm Association Rules Mining to Profiling the...An Application of Apriori Algorithm Association Rules Mining to Profiling the...
An Application of Apriori Algorithm Association Rules Mining to Profiling the...
 
Co-creation of a tourist experience enhanced by technology, in the context of...
Co-creation of a tourist experience enhanced by technology, in the context of...Co-creation of a tourist experience enhanced by technology, in the context of...
Co-creation of a tourist experience enhanced by technology, in the context of...
 
Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...
 
Real Time Data Streaming Mediation Technique for Enhancing the Effectiveness ...
Real Time Data Streaming Mediation Technique for Enhancing the Effectiveness ...Real Time Data Streaming Mediation Technique for Enhancing the Effectiveness ...
Real Time Data Streaming Mediation Technique for Enhancing the Effectiveness ...
 

Similar to Analyzing User Reviews in Tourism with Topic Models

Combining analytics and user research
Combining analytics and user researchCombining analytics and user research
Combining analytics and user researchAlex Tarling
 
Unit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIUnit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIVijayAECE1
 
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...Platforma Otwartej Nauki
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Matthias Braunhofer
 
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningIRJET Journal
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterSymeon Papadopoulos
 
MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...
MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...
MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...multimediaeval
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineeringalessio_ferrari
 
Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Conor Duke
 
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docxASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docxsherni1
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
sourabh_bajaj_resume
sourabh_bajaj_resumesourabh_bajaj_resume
sourabh_bajaj_resumeYipei Wang
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 

Similar to Analyzing User Reviews in Tourism with Topic Models (20)

Combining analytics and user research
Combining analytics and user researchCombining analytics and user research
Combining analytics and user research
 
OpeNER: Open Tools to Perform Natural Language Processing on Accommodation Re...
OpeNER: Open Tools to Perform Natural Language Processing on Accommodation Re...OpeNER: Open Tools to Perform Natural Language Processing on Accommodation Re...
OpeNER: Open Tools to Perform Natural Language Processing on Accommodation Re...
 
Unit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIUnit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AI
 
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
 
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
Parsimonious and Adaptive Contextual Information Acquisition in Recommender S...
 
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion Mining
 
CV2015
CV2015CV2015
CV2015
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...
MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...
MediaEval 2018: NewsREEL Multimedia at MediaEval 2018: News Recommendation wi...
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities Topic Modelling to identify behavioral trends in online communities
Topic Modelling to identify behavioral trends in online communities
 
Data Science at Udemy
Data Science at UdemyData Science at Udemy
Data Science at Udemy
 
Resume_wp_UXD_pabloq
Resume_wp_UXD_pabloqResume_wp_UXD_pabloq
Resume_wp_UXD_pabloq
 
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docxASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Tldr
TldrTldr
Tldr
 
sourabh_bajaj_resume
sourabh_bajaj_resumesourabh_bajaj_resume
sourabh_bajaj_resume
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 

Recently uploaded

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Recently uploaded (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

Analyzing User Reviews in Tourism with Topic Models

  • 1. ENTER 2015 Research Track Slide Number 1 Analyzing User Reviews in Tourism with Topic Models Marco Rossetti, Fabio Stella, Longbin Cao and Markus Zanker* Alpen-Adria-Universität Klagenfurt, Austria mzanker@acm.org http://www.aau.at * The presenter acknowledges the financial support of the European Union (EU), the European Regional Development Fund (ERDF), the Austrian Federal Government and the State of Carinthia in the Interreg IV Italien-Österreich programme (project acronym O-STAR).
  • 2. ENTER 2015 Research Track Slide Number 2 Agenda • Motivation • Topic Models • Application scenarios • Results • Conclusions
  • 3. ENTER 2015 Research Track Slide Number 3 Motivation • Evergrowing vast amounts of data – ~200 mio. reviews on Tripadvisor – Valuable opinion source • Need for automated processing of data harvested from the Web. • Two principal (research) directions – Machine Learning (ML): fitting general purpose statistical models to data – Semantic Web: goal to move from the traditional „unstructured“ Web to a web of data (annotate data with semantic descriptors and efficient reasoning mechanisms) • Topic Model is within the ML direction, but it promises to detect semantic ties between words
  • 4. ENTER 2015 Research Track Slide Number 4 Topic Model 1/3 • Method to organize, search and summarize electronic documents • „..algorithms for discovering the themes that pervade a large and otherwise unstructured collection of documents.“ [Blei, CACM, 2012] • Unsupervised learning strategy that builds on the basic idea: – Big corpus of documents such as reviews – Uncover hidden topical patterns – Annotate documents according to those topics
  • 5. ENTER 2015 Research Track Slide Number 5 Topic Model 2/3 • Topic: coherent and meaningful bag of words • Words: can be related to several topics (homonyms) • Documents: can be about several topics • Example: documents can be about cats and dogs: – Kitten, cat, meow.. – Dog, bone,…
  • 6. ENTER 2015 Research Track Slide Number 6 Topic Model 3/3 • Intuition: Topics are probability distributions over words and this discrete distribution generates observations (words in documents). • Computation task: Compute the topic structure given the observations (Posterior). – Approximation of .. – .. distribution over words for each topic – .. topic proportion for each document – .. topic assignment to each occurence of a word in a document
  • 7. ENTER 2015 Research Track Slide Number 7 Example Topic “Location” Topic “Food” Topic “Rooms” walking_distance breakfast Shower station service bathroom city_centre Restaurant mattress metro Bar room close Food tv “The hotel was right in the centre of the city, at walking distance from the city centre! Huge breakfast with nice food!” “I stayed in this hotel with my friends, the room was cheap, but the shower was broken and the mattress was very hard!” “The room was nice, with a flat tv, but the breakfast was so poor! I didn’t have enough food.” Room Food Location
  • 8. ENTER 2015 Research Track Slide Number 8 Goal and Contributions 1. Explore opportunities for application of the Topic Model* method in the Tourism domain. 2. Provide empirical evidence for their utility. * Note that it is a family of many different methods.
  • 9. ENTER 2015 Research Track Slide Number 9 Scenario 1: Item recommendation • Users write reviews about topics that they care about (preference) • Textual reviews associated to an overall rating explain what aspects of the item were particularly assessed “The hotel was right in the centre of the city, at walking distance from the city centre! Huge breakfast with nice food!”
  • 10. ENTER 2015 Research Track Slide Number 10 Topic-Criteria model 1/3 • User profiles (UP) created from topic distributions in own reviews 𝑈�ሺ�, �ሺ= σ �൫�ห��� ൯��� ∈�� |��|
  • 11. ENTER 2015 Research Track Slide Number 11 Topic-Criteria model 2/3 • Item profiles created from reviews and ratings 𝐼�ሺ�, �ሺ= σ �൫�ห�𝑖� ൯∙ �𝑖�� 𝑖� ∈�� σ �൫�ห�𝑖� ൯� 𝑖� ∈��
  • 12. ENTER 2015 Research Track Slide Number 12 Topic-Criteria model 3/3 • Prediction based on the sum of products for all topics – Weight parameter fitted to data – Assumption that not all topics are equally influential �Ƹ𝑖� = ሺ 𝑈�ሺ𝑖, �ሺ∙ 𝐼�ሺ�, �ሺ∙ �� � �=1
  • 13. ENTER 2015 Research Track Slide Number 13 Results for Scenario 1 YELP-5-5 YELP-10-10 TA-3-3 TA-5-5 KNN-IB 1,0709 1,0249 1,0531 0,9601 KNN-UB 1,1088 1,0424 1,0715 0,9447 PMF 1,0956 1,0389 1,0373 0,9946 TC 1,0706 1,0247 1,0625 0,9719 TC-W 1,0599 0,9955 1,0916 0,9776 • Evaluation on datasets from YELP (restaurants) and Tripadvisor (hotels) with different levels of sparsity • Accuracy results (RMSE) of Topic-Criteria model comparable to Nearest-Neighbor and Matrix Factorization approaches, BUT richer user profiles and we could explain which topics have been considered in real user interaction!
  • 14. ENTER 2015 Research Track Slide Number 14 Scenario 2: Analytics • Anecdotal evidence on what topics might explain a good or bad rating for a service provider or a destination. • BUT: risk of fallacies due to e.g. cherry-picking. Cleanliness in reviews on Orlando hotels Business in reviews on New York hotels dirty mold bugs smelled smell filthy carpet musty stained disgusting bed_bugs black mildew moldy stains bites dust musty_smell refund internet free free_internet access wireless internet_access wireless_internet business_center computers free_wireless business boarding gym center print free_internet_access printer bottled passes
  • 15. ENTER 2015 Research Track Slide Number 15 Scenario 3: Automated Interpretation of reviews • Automatically derive different properties from a review such as: – Rating value: extract topics from the written text and match with them with the item profile – if users writes about strengths of the hotel  high score – Identify reviews where the associated rating value is / is not coherent with the predicted rating to identify fake reviews or rank more plausible reviews higher – Identify reviews with more breath / broader scope (see Daniel Leung‘s thesis)
  • 16. ENTER 2015 Research Track Slide Number 16 Conclusions • Several application scenarios for the Topic Model method in the tourism domain identified • Empirical evidence that proposed Topic-Criteria model achieves comparable or better results than baseline recommendation methods • Future work: – Different extensions of Topic Model methods employing supervised learning – Contrasting derived topic distributions with real user assessments
  • 17. ENTER 2015 Research Track Slide Number 17 Thank you for your attention! Questions? Questions? Questions? Markus Zanker Intelligent Systems and Business Informatics Alpen-Adria-Universität Klagenfurt, Austria M: mzanker@acm.org P: +43 463 2700 3753 Skype: markuszanker W: http://www.isbi.at/mzanker Visit: http://www.recommenderbook.net
  • 18. ENTER 2015 Research Track Slide Number 18 Project OSTAR • Development of an innovative online system for recommending individual tours and trails in alpine regions – Research partners: • EURAC research, Bolzano, Italy • Free University Bolzano-Bozen, Italy • Autonomous Province of Bolzano – South Tyrol (Dept. for spatial and statistical informatics) • Alpen-Adria-Universität Klagenfurt – Application partners: • Tourism regions in Carinthia and South Tyrol – Runtime: 2012-2014 – Programme: • Interreg IV Italy-Austria