SlideShare a Scribd company logo
ENTER 2018 Research Track Slide Number 1
Automated Assignment of
Hotel Descriptions to
Travel Behavioural Patterns
Lisa Glatzer, Julia Neidhardt and Hannes Werthner
E-Commerce TU Wien, Austria
lisa.glatzer@ec.tuwien.ac.at
http://www.ec.tuwien.ac.at
ENTER 2018 Research Track Slide Number 2
Background
• The Web has dramatically changed the tourism
industry; travellers book the accommodations
for their vacations increasingly online
• Web platforms aim to recommend hotels to
their customers that best fit their preferences
• However, tourism domain is very complex
• Therefore, novel, user-centric recommendation
approaches have been introduced, e.g., seven-
factor model
ENTER 2018 Research Track Slide Number 3
Seven-Factor Model
• Personality-based approach: factors combining
Big Five personality traits & 17 tourist roles
• Each factor reflects travel behavioural patterns
Sunlover
Educational
Independent
Cultural
Sportive
Riskseeker
Escapist
[Neidhardt et al., 2014]
ENTER 2018 Research Track Slide Number 4
Focus of the Work
• Analysis of hotel descriptions by travel
operators using text mining
• Classification of hotels with different
machine learning approaches
• Assignment of hotels to travel behavioural
patterns (i.e., seven factors)
ENTER 2018 Research Track Slide Number 5
Research Questions
(1) How can textual hotel descriptions be
used to identify concepts to enable a
classification of hotel descriptions?
(2) Can the identified concepts be assigned
to different predefined travel
behavioural patterns and, in turn, be
used to deliver recommendations?
ENTER 2018 Research Track Slide Number 6
State of the Art – Tourist Roles
• [Cohen, 1972] studied motives for people to
travel & established 4 different tourist roles
• [Gibson & Yiannakis, 2002] identified 17
tourist roles (15 in their previous work) &
studied relation of age, gender, education
and tourist preferences
• [Neidhardt et al., 2014/2015] present
7 different travel behavioural patterns –
the “Seven Factors”
ENTER 2018 Research Track Slide Number 7
State of the Art – Text Mining
to Extract Touristic Concepts
• [Lahlou et al., 2013] extract contextual
attributes from hotel reviews on TripAdvisor
for context-aware recommendations
• [Cosh, 2013] extracts key attributes of
destination from Wikipedia articles
• [Schmunk et al., 2014] extract product
properties from online reviews posted on
Booking.com and TripAdvisor
ENTER 2018 Research Track Slide Number 8
Methodology (1/5)
• Hotel descriptions provided by GIATA
• Digital information of over 364,000 hotels
by 67 different tourist providers
• Text example:
“<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/>
<strong>Lage:</strong> <P>Umgeben von Pinienw&#228;ldern, direkt am
kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye
erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit
zahlreichen Einkaufs- und Unterhaltungsm&#246glichkeiten etwa 35km…”
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 9
Methodology (2/5)
• Manual evaluation of 10 rand. selected hotels
• 20 descriptions per hotel on average
• Text length correlates with information gain
• Different provider offer similar descriptions
• “Templates” – Predefined structure of text
• Observations substantiated by statistical
analyses (lexical diversity - text length)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 10
Methodology (3/5)
• Extraction of html-content & text encoding
• Natural Language Processing
• Tokenizing
• Stopwords Removal
• Stemming
• Pruning
• Word vector generation (TF-IDF)
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 11
Methodology (4/5)
• Mapping of hotel descriptions to Seven
Factors using three approaches
1. Clustering
2. Classification
3. Dictionary based approach
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 12
Methodology (5/5)
• Training, validation and evaluation with
labelled data set established by Austrian
travel operator
• Training & validation set: 371 hotels
• Test set: 180 hotels
Data
Acquisition
Qualitative
Analysis
Data Pre-
Processing
Hotel
Assignment
Evaluation
ENTER 2018 Research Track Slide Number 13
1. Clustering (1/3)
• Unsupervised learning, fully automated
• Clustering method: K-Means
• Similarity measure: Cosine similarity
• Data: Training set with 371 hotels
• Number of cluster: 6 (based on various
clustering evaluation coefficients)
Goal: Generation of disjoint clusters of hotel
descriptions which reflect the Seven Factors
ENTER 2018 Research Track Slide Number 14
1. Clustering (2/3)
Distribution of Seven Factors
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
ENTER 2018 Research Track Slide Number 15
1. Clustering (3/3)
Distribution of travel operator
Cluster Hotels
0 135
1 43
2 38
3 48
4 59
5 48
ENTER 2018 Research Track Slide Number 16
2. Classification
• Supervised learning
• Classifier: Naive Bayes, KNN, Decision Tree
• Validation: 10-fold cross validation
• Data: Training set with 371 hotels
Goal: Generation of seven models which can
be allocated to the Seven Factors
ENTER 2018 Research Track Slide Number 17
3. Dictionary
• Identification of most important words by
experts for all Seven Factors
Goal: Classification with attributes of dictionaries
Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang,
Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi
Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik
Independent Eigenregie, gemütlich, individuell, lokal, Zentrum
Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern,
Wellnessbereich, Whirlpool
Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz
Riskseeker Club, Stadt, Unterhaltung
Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
ENTER 2018 Research Track Slide Number 18
Classifcation vs. Dictionary
Validation of training set with 371 hotels
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Dictionary Precision Classification
ENTER 2018 Research Track Slide Number 19
Final Evaluation
Evaluation of best approaches with independent
test set
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Sunlover Educational Independent Cultural Sportive Riskseeker Escapist
Precision Test Set (180 Hotels) Precision Training Set (371 Hotels)
Dict.: Sunlover
Class.: Other factors
ENTER 2018 Research Track Slide Number 20
Conclusion
• Allocation of hotels to tourist profiles using
textual data can be successfully implemented,
dependent on targeted user group
+ Sunlover, Escapist, Cultural, Sportive
- Educational, Independent, Riskseeker
• Majority of designed models are capable of
dealing with new hotel data
• Recommendations based on hotel descriptions
can be reasonable for recommender systems
ENTER 2018 Research Track Slide Number 21
Thanks for your attention!
ENTER 2018 Research Track Slide Number 22
References (1/3)
• Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages
77–128. Springer-Verlag New York.
• Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc.
• Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In
Recommender Systems Handbook, pages 367–386. Springer US.
• Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182.
• Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th
International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48.
• Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism
Research, 29(2): 358–383.
• Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1):
26–34.
• Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd.
• Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290.
• Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental
perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
ENTER 2018 Research Track Slide Number 23
References (2/3)
• Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method
for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems:
Theories and Applications (SITA).
• Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown
preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM.
• Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to
recommender systems. Information Technology & Tourism, 15(1): 49-69.
• Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven-
Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515).
Springer.
• Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In
Recommender Systems Handbook, pages 1-35. Springer US.
• Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision-
Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1):
253-265.
• Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation
system. 15th IEEE International Conference on Computer and Information Technology, pages 687–
691.
ENTER 2018 Research Track Slide Number 24
References (3/3)
• Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer-
Verlag London.
• Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship.
Wien - New York: Springer-Verlag.
• Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J.,
Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology &
Tourism, 15(1).
• Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303.
• Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from
Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1):
625–637.

More Related Content

Similar to Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns

Concerns of integrated resort customers content analysis of reviews on TripAd...
Concerns of integrated resort customers content analysis of reviews on TripAd...Concerns of integrated resort customers content analysis of reviews on TripAd...
Concerns of integrated resort customers content analysis of reviews on TripAd...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...
Stanislav Ivanov
 
Forecasting London Museum Visitors Using Google Trends Data (Research Note)
Forecasting London Museum Visitors Using Google Trends Data (Research Note)Forecasting London Museum Visitors Using Google Trends Data (Research Note)
Forecasting London Museum Visitors Using Google Trends Data (Research Note)
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
Galit Shmueli
 
Efficiency of hotels web sites
Efficiency of hotels web sitesEfficiency of hotels web sites
Efficiency of hotels web sitesestrella_diaz
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdfHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
farhanaaansari42
 
How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
AI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAI techniques for tourism-oriented applications
AI techniques for tourism-oriented applications
Amine Bendahmane
 
What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
IRJET Journal
 
Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Automatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual FeaturesAutomatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual Features
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
Setia Pramana
 
Location Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip RecommendationLocation Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip Recommendation
Raphael Troncy
 

Similar to Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns (20)

Concerns of integrated resort customers content analysis of reviews on TripAd...
Concerns of integrated resort customers content analysis of reviews on TripAd...Concerns of integrated resort customers content analysis of reviews on TripAd...
Concerns of integrated resort customers content analysis of reviews on TripAd...
 
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
Exploring the Determinants of Strategic Revenue Management with Idiosyncratic...
 
Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...Investigation of the revenue management practices of accommodation establishm...
Investigation of the revenue management practices of accommodation establishm...
 
Forecasting London Museum Visitors Using Google Trends Data (Research Note)
Forecasting London Museum Visitors Using Google Trends Data (Research Note)Forecasting London Museum Visitors Using Google Trends Data (Research Note)
Forecasting London Museum Visitors Using Google Trends Data (Research Note)
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Smart hotels of today and tomorrow
Smart hotels of today and tomorrowSmart hotels of today and tomorrow
Smart hotels of today and tomorrow
 
Efficiency of hotels web sites
Efficiency of hotels web sitesEfficiency of hotels web sites
Efficiency of hotels web sites
 
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdfHow to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
How to Scrape Hotel Reviews Data for Complete Hotel Review Analytics.pdf
 
How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...How effective are Asian hotels in communicating CSR efforts through the prope...
How effective are Asian hotels in communicating CSR efforts through the prope...
 
Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...
 
TR6124 Assignment 3.pptx
TR6124 Assignment 3.pptxTR6124 Assignment 3.pptx
TR6124 Assignment 3.pptx
 
Constructing a Data Warehouse Based Decision Support Platform for China Touri...
Constructing a Data Warehouse Based Decision Support Platform for China Touri...Constructing a Data Warehouse Based Decision Support Platform for China Touri...
Constructing a Data Warehouse Based Decision Support Platform for China Touri...
 
AI techniques for tourism-oriented applications
AI techniques for tourism-oriented applicationsAI techniques for tourism-oriented applications
AI techniques for tourism-oriented applications
 
What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...What does hotel location mean for the online consumer? Text analytics using o...
What does hotel location mean for the online consumer? Text analytics using o...
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
 
Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)Time-varying browsing behavior of hotel website users (Research Note)
Time-varying browsing behavior of hotel website users (Research Note)
 
Automatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual FeaturesAutomatic Hotel Photo Quality Assessment Based on Visual Features
Automatic Hotel Photo Quality Assessment Based on Visual Features
 
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
Content Analysis of Travel Reviews: Exploring the Needs of Tourists from Diff...
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Location Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip RecommendationLocation Embeddings for Next Trip Recommendation
Location Embeddings for Next Trip Recommendation
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns

  • 1. ENTER 2018 Research Track Slide Number 1 Automated Assignment of Hotel Descriptions to Travel Behavioural Patterns Lisa Glatzer, Julia Neidhardt and Hannes Werthner E-Commerce TU Wien, Austria lisa.glatzer@ec.tuwien.ac.at http://www.ec.tuwien.ac.at
  • 2. ENTER 2018 Research Track Slide Number 2 Background • The Web has dramatically changed the tourism industry; travellers book the accommodations for their vacations increasingly online • Web platforms aim to recommend hotels to their customers that best fit their preferences • However, tourism domain is very complex • Therefore, novel, user-centric recommendation approaches have been introduced, e.g., seven- factor model
  • 3. ENTER 2018 Research Track Slide Number 3 Seven-Factor Model • Personality-based approach: factors combining Big Five personality traits & 17 tourist roles • Each factor reflects travel behavioural patterns Sunlover Educational Independent Cultural Sportive Riskseeker Escapist [Neidhardt et al., 2014]
  • 4. ENTER 2018 Research Track Slide Number 4 Focus of the Work • Analysis of hotel descriptions by travel operators using text mining • Classification of hotels with different machine learning approaches • Assignment of hotels to travel behavioural patterns (i.e., seven factors)
  • 5. ENTER 2018 Research Track Slide Number 5 Research Questions (1) How can textual hotel descriptions be used to identify concepts to enable a classification of hotel descriptions? (2) Can the identified concepts be assigned to different predefined travel behavioural patterns and, in turn, be used to deliver recommendations?
  • 6. ENTER 2018 Research Track Slide Number 6 State of the Art – Tourist Roles • [Cohen, 1972] studied motives for people to travel & established 4 different tourist roles • [Gibson & Yiannakis, 2002] identified 17 tourist roles (15 in their previous work) & studied relation of age, gender, education and tourist preferences • [Neidhardt et al., 2014/2015] present 7 different travel behavioural patterns – the “Seven Factors”
  • 7. ENTER 2018 Research Track Slide Number 7 State of the Art – Text Mining to Extract Touristic Concepts • [Lahlou et al., 2013] extract contextual attributes from hotel reviews on TripAdvisor for context-aware recommendations • [Cosh, 2013] extracts key attributes of destination from Wikipedia articles • [Schmunk et al., 2014] extract product properties from online reviews posted on Booking.com and TripAdvisor
  • 8. ENTER 2018 Research Track Slide Number 8 Methodology (1/5) • Hotel descriptions provided by GIATA • Digital information of over 364,000 hotels by 67 different tourist providers • Text example: “<strong>Verpflegungsarten:</strong>All Inclusive Ultra<br/> <br/> <strong>Lage:</strong> <P>Umgeben von Pinienw&#228;ldern, direkt am kilometerlangen, breiten Sandstrand von Belek gelegen. Den kleinen Ort Kadriye erreichen Sie nach ca. 2,5km. Nach Belek sind es ca. 17km, nach Antalya mit zahlreichen Einkaufs- und Unterhaltungsm&#246glichkeiten etwa 35km…” Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 9. ENTER 2018 Research Track Slide Number 9 Methodology (2/5) • Manual evaluation of 10 rand. selected hotels • 20 descriptions per hotel on average • Text length correlates with information gain • Different provider offer similar descriptions • “Templates” – Predefined structure of text • Observations substantiated by statistical analyses (lexical diversity - text length) Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 10. ENTER 2018 Research Track Slide Number 10 Methodology (3/5) • Extraction of html-content & text encoding • Natural Language Processing • Tokenizing • Stopwords Removal • Stemming • Pruning • Word vector generation (TF-IDF) Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 11. ENTER 2018 Research Track Slide Number 11 Methodology (4/5) • Mapping of hotel descriptions to Seven Factors using three approaches 1. Clustering 2. Classification 3. Dictionary based approach Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 12. ENTER 2018 Research Track Slide Number 12 Methodology (5/5) • Training, validation and evaluation with labelled data set established by Austrian travel operator • Training & validation set: 371 hotels • Test set: 180 hotels Data Acquisition Qualitative Analysis Data Pre- Processing Hotel Assignment Evaluation
  • 13. ENTER 2018 Research Track Slide Number 13 1. Clustering (1/3) • Unsupervised learning, fully automated • Clustering method: K-Means • Similarity measure: Cosine similarity • Data: Training set with 371 hotels • Number of cluster: 6 (based on various clustering evaluation coefficients) Goal: Generation of disjoint clusters of hotel descriptions which reflect the Seven Factors
  • 14. ENTER 2018 Research Track Slide Number 14 1. Clustering (2/3) Distribution of Seven Factors Cluster Hotels 0 135 1 43 2 38 3 48 4 59 5 48
  • 15. ENTER 2018 Research Track Slide Number 15 1. Clustering (3/3) Distribution of travel operator Cluster Hotels 0 135 1 43 2 38 3 48 4 59 5 48
  • 16. ENTER 2018 Research Track Slide Number 16 2. Classification • Supervised learning • Classifier: Naive Bayes, KNN, Decision Tree • Validation: 10-fold cross validation • Data: Training set with 371 hotels Goal: Generation of seven models which can be allocated to the Seven Factors
  • 17. ENTER 2018 Research Track Slide Number 17 3. Dictionary • Identification of most important words by experts for all Seven Factors Goal: Classification with attributes of dictionaries Sunlover Strand, Swimmingpool, Wlan, Liege, Meer, Beach, inclusive, Internetzugang, Liegestuhl, Meer, Meerblick, Pool, Sonnenschirm, Sonnenterasse, Wifi Educational Buffet, Club, Halbpension, inclusive, Miniclub, Musik Independent Eigenregie, gemütlich, individuell, lokal, Zentrum Cultural Spa, Suite, superior, elegant, Carte, Bademantel, Mietsafe, modern, Wellnessbereich, Whirlpool Sportive Aktivität, Fitness, Fitnessraum, Sport, Tennis, Tischtennis, Tennisplatz, Golfplatz Riskseeker Club, Stadt, Unterhaltung Excapist Ruhig, gemütlich, Wellness, Park, Garten, Spa, Wellnessbereich
  • 18. ENTER 2018 Research Track Slide Number 18 Classifcation vs. Dictionary Validation of training set with 371 hotels 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Sunlover Educational Independent Cultural Sportive Riskseeker Escapist Precision Dictionary Precision Classification
  • 19. ENTER 2018 Research Track Slide Number 19 Final Evaluation Evaluation of best approaches with independent test set 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Sunlover Educational Independent Cultural Sportive Riskseeker Escapist Precision Test Set (180 Hotels) Precision Training Set (371 Hotels) Dict.: Sunlover Class.: Other factors
  • 20. ENTER 2018 Research Track Slide Number 20 Conclusion • Allocation of hotels to tourist profiles using textual data can be successfully implemented, dependent on targeted user group + Sunlover, Escapist, Cultural, Sportive - Educational, Independent, Riskseeker • Majority of designed models are capable of dealing with new hotel data • Recommendations based on hotel descriptions can be reasonable for recommender systems
  • 21. ENTER 2018 Research Track Slide Number 21 Thanks for your attention!
  • 22. ENTER 2018 Research Track Slide Number 22 References (1/3) • Aggarwal, C. C. & Zhai, C. (2012). A Survey of Text Clustering Algorithms. In Mining Text Data, pages 77–128. Springer-Verlag New York. • Bird, S., Klein, E. & Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc. • Burke, R. & Ramezani, M. (2011). Matching Recommendation Technologies and Domains. In Recommender Systems Handbook, pages 367–386. Springer US. • Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1): 164–182. • Cosh, K. (2013). Text mining Wikipedia to discover alternative destinations. The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE), pages 43–48. • Gibson, H. & Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of Tourism Research, 29(2): 358–383. • Goldberg, L. R. (1999). The Structure of Phenotypic Personality Traits. American Psychologist, 48(1): 26–34. • Gupta, G. K. (2006). Introduction to Data Mining with Case Studies. Prentice-Hall of India Pvt.Ltd. • Hippner, H. & Rentzmann, R. (2006). Text mining. Informatik-Spektrum, 29(4): 287–290. • Johansson, V. (2008). Lexical diversity and lexical density in speech and writing: a developmental perspective. Working Papers of Department of Linguistics and Phonetics, Lund University, 53:61–79.
  • 23. ENTER 2018 Research Track Slide Number 23 References (2/3) • Lahlou, F. Z., Mountassir, A., Benbrahim, H., & Kassou, I. (2013). A Text Classification Based Method for Context Extraction from Online Reviews. 8th International Conference on Intelligent Systems: Theories and Applications (SITA). • Neidhardt, J., Schuster, R., Seyfang, L. & Werthner, H. (2014). Eliciting the users’ unknown preferences. In Proceedings of the 8th ACM Conference on Recommender systems, 309-312. ACM. • Neidhardt, J., Seyfang, L., Schuster, R. & Werthner, H. (2015). A picture-based approach to recommender systems. Information Technology & Tourism, 15(1): 49-69. • Neidhardt, J., & Werthner, H. (2017). Travellers and Their Joint Characteristics Within the Seven- Factor Model. In Information and Communication Technologies in Tourism 2017 (pp. 503-515). Springer. • Ricci, F., Rokach, L. & Shapira, B. (2011). Introduction to Recommender Systems Handbook. In Recommender Systems Handbook, pages 1-35. Springer US. • Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment Analysis: Extracting Decision- Relevant Knowledge from UGC. Information and Communication Technologies in Tourism, 14(1): 253-265. • Sharma, Y., Bhatt, T. & Magon, R. (2015). A multi criteria review-based hotel recommendation system. 15th IEEE International Conference on Computer and Information Technology, pages 687– 691.
  • 24. ENTER 2018 Research Track Slide Number 24 References (3/3) • Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of Predictive Text Mining. Springer- Verlag London. • Werthner, H. & Klein, S. (1999). Information Technology and Tourism – A Challenging Relationship. Wien - New York: Springer-Verlag. • Werthner, H., Alzua-Sorzabal, A., Cantoni, L., Dickinger, A., Gretzel, U., Jannach, D., Neidhardt, J., Pröll, B., Ricci, F., Scaglione, M., Stangl, B., Stock, O. & Zanker, M. (2015). Information Technology & Tourism, 15(1). • Yiannakis, A. & Gibson, H. (1992). Roles tourists play. Annals of Tourism Research, 19(2): 287–303. • Xiang, Z., Du, Q., Ma, Y. & Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from Mining TripAdvisor Hotel Reviews. Information and Communication Technologies in Tourism, 17(1): 625–637.