SlideShare a Scribd company logo
1 of 24
ENTER 2018 Research Track Slide Number 1
Automated Assignment of
Hotel Descriptions to
Travel Behavioural Patterns
Lisa Glatzer, Julia Neidhardt and Hannes Werthner
E-Commerce TU Wien, Austria
lisa.glatzer@ec.tuwien.ac.at
http://www.ec.tuwien.ac.at
ENTER 2018 Research Track Slide Number 2
Background
• The Web has dramatically changed the tourism
industry; travellers book the accommodations
for their vacations increasingly online
• Web platforms aim to recommend hotels to
their customers that best fit their preferences
• However, tourism domain is very complex
• Therefore, novel, user-centric recommendation
approaches have been introduced, e.g., seven-
factor model
ENTER 2018 Research Track Slide Number 3
Seven-Factor Model
• Personality-based approach: factors combining
Big Five personality traits & 17 tourist roles
• Each factor reflects travel behavioural patterns
Sunlover
Educational
Independent
Cultural
Sportive
Riskseeker
Escapist
[Neidhardt et al., 2014]
ENTER 2018 Research Track Slide Number 4
Focus of the Work
• Analysis of hotel descriptions by travel
operators using text mining
• Classification of hotels with different
machine learning approaches
• Assignment of hotels to travel behavioural
patterns (i.e., seven factors)
ENTER 2018 Research Track Slide Number 5
Research Questions
(1) How can textual hotel descriptions be
used to identify concepts to enable a
classification of hotel descriptions?
(2) Can the identified concepts be assigned
to different predefined travel
behavioural patterns and, in turn, be
used to deliver recommendations?
ENTER 2018 Research Track Slide Number 6
State of the Art – Tourist Roles
• [Cohen, 1972] studied motives for people to
travel & established 4 different tourist roles
• [Gibson & Yiannakis, 2002] identified 17
tourist roles (15 in their previous work) &
studied relation of age, gender, education
and tourist preferences
• [Neidhardt et al., 2014/2015] present
7 different travel behavioural patterns –
the “Seven Factors”
ENTER 2018 Research Track Slide Number 7
State of the Art – Text Mining
to Extract Touristic Concepts
• [Lahlou et al., 2013] extract contextual
attributes from hotel reviews on TripAdvisor
for context-aware recommendations
• [Cosh, 2013] extracts key attributes of
destination from Wikipedia articles
• [Schmunk et al., 2014] extract product
properties from online reviews posted on
Booking.com and TripAdvisor
ENTER 2018 Research Track Slide Number 8
Methodology (1/5)
• Hotel descriptions provided by GIATA
• Digital information of over 364,000 hotels
by 67 different tourist providers
• Text example:
“<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/>
<strong>Lage:</strong> <P>Umgeben von Pinienw&#228;ldern, direkt am
kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye
erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit
zahlreichen Einkaufs- und Unterhaltungsm&#246glichkeiten etwa 35km…”
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 9
Methodology (2/5)
• Manual evaluation of 10 rand. selected hotels
• 20 descriptions per hotel on average
• Text length correlates with information gain
• Different provider offer similar descriptions
• “Templates” – Predefined structure of text
• Observations substantiated by statistical
analyses (lexical diversity - text length)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 10
Methodology (3/5)
• Extraction of html-content & text encoding
• Natural Language Processing
• Tokenizing
• Stopwords Removal
• Stemming
• Pruning
• Word vector generation (TF-IDF)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 11
Methodology (4/5)
• Mapping of hotel descriptions to Seven
Factors using three approaches
1. Clustering
2. Classification
3. Dictionary based approach
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 12
Methodology (5/5)
• Training, validation and evaluation with
labelled data set established by Austrian
travel operator
• Training & validation set: 371 hotels
• Test set: 180 hotels
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 13
1. Clustering (1/3)
• Unsupervised learning, fully automated
• Clustering method: K-Means
• Similarity measure: Cosine similarity
• Data: Training set with 371 hotels
• Number of cluster: 6 (based on various
clustering evaluation coefficients)
Goal: Generation of disjoint clusters of hotel
descriptions which reflect the Seven Factors
ENTER 2018 Research Track Slide Number 14
1. Clustering (2/3)
Distribution of Seven Factors
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
ENTER 2018 Research Track Slide Number 15
1. Clustering (3/3)
Distribution of travel operator
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
ENTER 2018 Research Track Slide Number 16
2. Classification
• Supervised learning
• Classifier: Naive Bayes, KNN, Decision Tree
• Validation: 10-fold cross validation
• Data: Training set with 371 hotels
Goal: Generation of seven models which can
be allocated to the Seven Factors
ENTER 2018 Research Track Slide Number 17
3. Dictionary
• Identification of most important words by
experts for all Seven Factors
Goal: Classification with attributes of dictionaries
Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang,
Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi
Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik
Independent Eigenregie, gemütlich, individuell, lokal, Zentrum
Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern,
Wellnessbereich, Whirlpool
Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz
Riskseeker Club, Stadt, Unterhaltung
Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
ENTER 2018 Research Track Slide Number 18
Classifcation vs. Dictionary
Validation of training set with 371 hotels
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Dictionary Precision Classification
ENTER 2018 Research Track Slide Number 19
Final Evaluation
Evaluation of best approaches with independent
test set
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Test Set (180 Hotels) Precision Training Set (371 Hotels)
Dict.: Sunlover
Class.: Other factors
ENTER 2018 Research Track Slide Number 20
Conclusion
• Allocation of hotels to tourist profiles using
textual data can be successfully implemented,
dependent on targeted user group
+ Sunlover, Escapist, Cultural, Sportive
- Educational, Independent, Riskseeker
• Majority of designed models are capable of
dealing with new hotel data
• Recommendations based on hotel descriptions
can be reasonable for recommender systems
ENTER 2018 Research Track Slide Number 21
Thanks for your attention!
ENTER 2018 Research Track Slide Number 22
References (1/3)
• Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages
77–128. Springer-Verlag New York.
• Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc.
• Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In
Recommender Systems Handbook, pages 367–386. Springer US.
• Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182.
• Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th
International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48.
• Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism
Research, 29(2): 358–383.
• Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1):
26–34.
• Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd.
• Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290.
• Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental
perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
ENTER 2018 Research Track Slide Number 23
References (2/3)
• Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method
for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems:
Theories and Applications (SITA).
• Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown
preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM.
• Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to
recommender systems. Information Technology & Tourism, 15(1): 49-69.
• Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven-
Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515).
Springer.
• Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In
Recommender Systems Handbook, pages 1-35. Springer US.
• Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision-
Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1):
253-265.
• Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation
system. 15th IEEE International Conference on Computer and Information Technology, pages 687–
691.
ENTER 2018 Research Track Slide Number 24
References (3/3)
• Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer-
Verlag London.
• Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship.
Wien - New York: Springer-Verlag.
• Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J.,
Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology &
Tourism, 15(1).
• Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303.
• Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from
Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1):
625–637.

More Related Content

Similar to Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns

Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...Stanislav Ivanov
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 
Efficiency of hotels web sites
Efficiency of hotels web sitesEfficiency of hotels web sites
Efficiency of hotels web sitesestrella_diaz
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdfHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdffarhanaaansari42
 
AI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAmine Bendahmane
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETIRJET Journal
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptxHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptxfarhanaaansari42
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational StatisticsSetia Pramana
 

Similar to Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns (20)

Concerns of integrated resort customers content analysis of reviews on TripAd...
Concerns of integrated resort customers content analysis of reviews on TripAd...Concerns of integrated resort customers content analysis of reviews on TripAd...
Concerns of integrated resort customers content analysis of reviews on TripAd...
 
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
 
Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...
 
Forecasting London Museum Visitors Using Google Trends Data (Research Note)
Forecasting London Museum Visitors Using Google Trends Data (Research Note)Forecasting London Museum Visitors Using Google Trends Data (Research Note)
Forecasting London Museum Visitors Using Google Trends Data (Research Note)
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Smart hotels of today and tomorrow
Smart hotels of today and tomorrowSmart hotels of today and tomorrow
Smart hotels of today and tomorrow
 
Efficiency of hotels web sites
Efficiency of hotels web sitesEfficiency of hotels web sites
Efficiency of hotels web sites
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdfHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
 
How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...
 
Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...
 
TR6124 Assignment 3.pptx
TR6124 Assignment 3.pptxTR6124 Assignment 3.pptx
TR6124 Assignment 3.pptx
 
Constructing a Data Warehouse Based Decision Support Platform for China Touri...
Constructing a Data Warehouse Based Decision Support Platform for China Touri...Constructing a Data Warehouse Based Decision Support Platform for China Touri...
Constructing a Data Warehouse Based Decision Support Platform for China Touri...
 
AI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAI techniques for tourism-oriented applications
AI techniques for tourism-oriented applications
 
What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptxHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pptx
 
Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)
 
Automatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual FeaturesAutomatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual Features
 
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 

Recently uploaded

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns

  • 1. ENTER 2018 Research Track Slide Number 1 Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Lisa Glatzer, Julia Neidhardt and Hannes Werthner E-Commerce TU Wien, Austria lisa.glatzer@ec.tuwien.ac.at http://www.ec.tuwien.ac.at
  • 2. ENTER 2018 Research Track Slide Number 2 Background • The Web has dramatically changed the tourism industry; travellers book the accommodations for their vacations increasingly online • Web platforms aim to recommend hotels to their customers that best fit their preferences • However, tourism domain is very complex • Therefore, novel, user-centric recommendation approaches have been introduced, e.g., seven- factor model
  • 3. ENTER 2018 Research Track Slide Number 3 Seven-Factor Model • Personality-based approach: factors combining Big Five personality traits & 17 tourist roles • Each factor reflects travel behavioural patterns Sunlover Educational Independent Cultural Sportive Riskseeker Escapist [Neidhardt et al., 2014]
  • 4. ENTER 2018 Research Track Slide Number 4 Focus of the Work • Analysis of hotel descriptions by travel operators using text mining • Classification of hotels with different machine learning approaches • Assignment of hotels to travel behavioural patterns (i.e., seven factors)
  • 5. ENTER 2018 Research Track Slide Number 5 Research Questions (1) How can textual hotel descriptions be used to identify concepts to enable a classification of hotel descriptions? (2) Can the identified concepts be assigned to different predefined travel behavioural patterns and, in turn, be used to deliver recommendations?
  • 6. ENTER 2018 Research Track Slide Number 6 State of the Art – Tourist Roles • [Cohen, 1972] studied motives for people to travel & established 4 different tourist roles • [Gibson & Yiannakis, 2002] identified 17 tourist roles (15 in their previous work) & studied relation of age, gender, education and tourist preferences • [Neidhardt et al., 2014/2015] present 7 different travel behavioural patterns – the “Seven Factors”
  • 7. ENTER 2018 Research Track Slide Number 7 State of the Art – Text Mining to Extract Touristic Concepts • [Lahlou et al., 2013] extract contextual attributes from hotel reviews on TripAdvisor for context-aware recommendations • [Cosh, 2013] extracts key attributes of destination from Wikipedia articles • [Schmunk et al., 2014] extract product properties from online reviews posted on Booking.com and TripAdvisor
  • 8. ENTER 2018 Research Track Slide Number 8 Methodology (1/5) • Hotel descriptions provided by GIATA • Digital information of over 364,000 hotels by 67 different tourist providers • Text example: “<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/> <strong>Lage:</strong> <P>Umgeben von Pinienw&#228;ldern, direkt am kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit zahlreichen Einkaufs- und Unterhaltungsm&#246glichkeiten etwa 35km…” Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 9. ENTER 2018 Research Track Slide Number 9 Methodology (2/5) • Manual evaluation of 10 rand. selected hotels • 20 descriptions per hotel on average • Text length correlates with information gain • Different provider offer similar descriptions • “Templates” – Predefined structure of text • Observations substantiated by statistical analyses (lexical diversity - text length) Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 10. ENTER 2018 Research Track Slide Number 10 Methodology (3/5) • Extraction of html-content & text encoding • Natural Language Processing • Tokenizing • Stopwords Removal • Stemming • Pruning • Word vector generation (TF-IDF) Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 11. ENTER 2018 Research Track Slide Number 11 Methodology (4/5) • Mapping of hotel descriptions to Seven Factors using three approaches 1. Clustering 2. Classification 3. Dictionary based approach Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 12. ENTER 2018 Research Track Slide Number 12 Methodology (5/5) • Training, validation and evaluation with labelled data set established by Austrian travel operator • Training & validation set: 371 hotels • Test set: 180 hotels Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 13. ENTER 2018 Research Track Slide Number 13 1. Clustering (1/3) • Unsupervised learning, fully automated • Clustering method: K-Means • Similarity measure: Cosine similarity • Data: Training set with 371 hotels • Number of cluster: 6 (based on various clustering evaluation coefficients) Goal: Generation of disjoint clusters of hotel descriptions which reflect the Seven Factors
  • 14. ENTER 2018 Research Track Slide Number 14 1. Clustering (2/3) Distribution of Seven Factors Cluster Hotels 0 135 1 43 2 38 3 48 4 59 5 48
  • 15. ENTER 2018 Research Track Slide Number 15 1. Clustering (3/3) Distribution of travel operator Cluster Hotels 0 135 1 43 2 38 3 48 4 59 5 48
  • 16. ENTER 2018 Research Track Slide Number 16 2. Classification • Supervised learning • Classifier: Naive Bayes, KNN, Decision Tree • Validation: 10-fold cross validation • Data: Training set with 371 hotels Goal: Generation of seven models which can be allocated to the Seven Factors
  • 17. ENTER 2018 Research Track Slide Number 17 3. Dictionary • Identification of most important words by experts for all Seven Factors Goal: Classification with attributes of dictionaries Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang, Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik Independent Eigenregie, gemütlich, individuell, lokal, Zentrum Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern, Wellnessbereich, Whirlpool Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz Riskseeker Club, Stadt, Unterhaltung Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
  • 18. ENTER 2018 Research Track Slide Number 18 Classifcation vs. Dictionary Validation of training set with 371 hotels 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Sunlover Educational Independent Cultural Sportive Riskseeker Escapist Precision Dictionary Precision Classification
  • 19. ENTER 2018 Research Track Slide Number 19 Final Evaluation Evaluation of best approaches with independent test set 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Sunlover Educational Independent Cultural Sportive Riskseeker Escapist Precision Test Set (180 Hotels) Precision Training Set (371 Hotels) Dict.: Sunlover Class.: Other factors
  • 20. ENTER 2018 Research Track Slide Number 20 Conclusion • Allocation of hotels to tourist profiles using textual data can be successfully implemented, dependent on targeted user group + Sunlover, Escapist, Cultural, Sportive - Educational, Independent, Riskseeker • Majority of designed models are capable of dealing with new hotel data • Recommendations based on hotel descriptions can be reasonable for recommender systems
  • 21. ENTER 2018 Research Track Slide Number 21 Thanks for your attention!
  • 22. ENTER 2018 Research Track Slide Number 22 References (1/3) • Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages 77–128. Springer-Verlag New York. • Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc. • Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In Recommender Systems Handbook, pages 367–386. Springer US. • Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182. • Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48. • Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism Research, 29(2): 358–383. • Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1): 26–34. • Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd. • Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290. • Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
  • 23. ENTER 2018 Research Track Slide Number 23 References (2/3) • Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems: Theories and Applications (SITA). • Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM. • Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to recommender systems. Information Technology & Tourism, 15(1): 49-69. • Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven- Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515). Springer. • Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In Recommender Systems Handbook, pages 1-35. Springer US. • Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision- Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1): 253-265. • Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation system. 15th IEEE International Conference on Computer and Information Technology, pages 687– 691.
  • 24. ENTER 2018 Research Track Slide Number 24 References (3/3) • Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer- Verlag London. • Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship. Wien - New York: Springer-Verlag. • Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J., Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology & Tourism, 15(1). • Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303. • Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1): 625–637.