SlideShare a Scribd company logo
ENTER 2018 Research Track Slide Number 1
Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb,
and Maria Lexhagenb
a Business Informatics Group
University of Applied Sciences Ravensburg-Weingarten, Germany
{name.surname}@hs-weingarten.de
b European Tourism Research Institute (ETOUR)
Mid-Sweden University, Sweden
{name.surname}@miun.se
Search engine traffic as input for
predicting tourist arrivals
ENTER 2018 Research Track Slide Number 2
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 3
Motivation
• Demand prediction in tourism
– Due to perishable nature of tourism products, accurate forecasts of
tourism demand are of utmost relevance (Frechtling, 2002; Fitzsimmons &
Fitzsimmons, 2002)
– Knowledge on long-term trends, imminent changes and short-term
intra-period fluctuations of demand are essential for tourism
management
Accuracy and reliability of demand forecasts can hardly be
overestimated for tourism businesses and policy makers (Frechtling, 2002)
• Limitations of autoregressive approaches
– Lack of historical data, influence of unexpected events, variety of input
factors and complexity of travel decision-making process (Song et al. 2010)
– Availability of travellers’ web search behaviour as additional input to
demand prediction
ENTER 2018 Research Track Slide Number 4
Objective
• Extend autoregressive forecasting approach by including
travellers´ web search behaviour
– Does the inclusion of time series data on web search behaviour
increase performance when forecasting tourist arrivals compared to
the purely autoregressive approach?
• Examine behavioural aspects of travellers related to
concrete search terms used in online search for trip planning
– Analyse temporal relationships between search terms and tourist
arrivals
– Identify patterns that reflect online planning behaviour of travellers
before visiting specific destinations
ENTER 2018 Research Track Slide Number 5
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 6
Search engine traffic for demand prediction
• Web search data to predict tourist arrivals
– Google web search data to improve tourism demand prediction
accuracy, compared to purely autoregressive models or exponential
smoothing time-series models (Önder & Gunter, 2016)
– Google web search data to increase forecasting performance using
autoregressive mixed-data sampling (AR-MIDAS) models
(Bangwayo-Skeete & Skeete, 2015)
– Google web search data and econometric indicators to improve
autoregressive prediction of tourist arrivals, comparing statistical and
data mining approaches (Höpken et al., 2017)
ENTER 2018 Research Track Slide Number 7
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 8
Specification of data set
• Tourist arrivals
– Monthly aggregated tourist arrivals (December 2005 - April 2012) for
the leading Swedish mountain destination Åre
– Specified separately for its major sending countries (Denmark, Finland,
Norway and the United Kingdom)
• Web search traffic
– Google Trends as approriate data source for above sending countries
– Represents relative search volume of popular search terms over time
and, thus, reflects peoples’ interest in specific search terms across
different geographic regions and topical domains
ENTER 2018 Research Track Slide Number 9
Collection of web search data
• Google Trends crawling algorithm
– Using search engine-based keyword recommendations by Google‘s
Keyword Planner
– Iterative algorithm to identify relevant keywords
• Starting with seed keyword „are“ and iterating over related keywords
suggested by Google keyword planner
• Normalization of search terms
– Examining search terms for close similarity based on linguistic
variations, synonyms or misspellings
• Transforming search terms by text processing techniques (tokenization,
character substitution, stemming, stop-word removal, generation of word
vector)
• Eliminating semantically identical search terms (cosine similarity = 1)
ENTER 2018 Research Track Slide Number 10
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 11
Construction of aggregated search indices
• Identifying optimal time-lag between each search query and
tourist arrivals
– Calculating de-trended cross-correlation analysis (DCCA) coefficients
for time lags 0 to 6, capturing travellers’ short- and mid-term online
travel planning behaviour
– Selecting time lag with maximal DCCA coefficient and weighting search
query by DCCA coefficient
• Constructing compound search indices
– Filtering queries by Hurst exponent in order to assure the search
indices to be constructed following the same auto-correlative patterns
as its corresponding tourist arrival series (Pan et al., 2017)
– Aggregate all weighted and time-lagged query series to compound
search index
ENTER 2018 Research Track Slide Number 12
Evaluation of search indices
Index evaluation metrics for different sending countries
High structural similarity between search indices and tourist arrivals
(de-trended cross-correlation analysis appropriate for potentially
non-stationary time series)
ENTER 2018 Research Track Slide Number 13
Evaluation of search indices
Index evaluation metrics for different sending countries
Similar Hurst exponents of arrival and search index time series,
indicating the same auto-correlative patterns
ENTER 2018 Research Track Slide Number 14
Evaluation of search indices
Index evaluation metrics for different sending countries
Prediction accuracy can be improved when autoregressive forecasting
models are extended by Google Trends data as additional predictor
ENTER 2018 Research Track Slide Number 15
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 16
Stationarity tests
Tests for stationarity and co-integration for arrival data and search indices
Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin
(KPSS) test confirm stationarity for DK, FI and UK but not for NO
ENTER 2018 Research Track Slide Number 17
Stationarity tests
Tests for stationarity and co-integration for arrival data and search indices
Johansen test shows co-integration relationships between search indices
and corresponding arrival series (-> no series transformations necessary)
ENTER 2018 Research Track Slide Number 18
Model building
• Buidling a prediction model
– Linear regression as statistical approach
– Autoregressive approach, using past 25 month as input data
– Search index, constructed from Google Trends data, as additional input
data
– Backward selection to eliminate irrelevant input (kitchen sink problem)
• Evaluation
– Prediction performance evaluated by sliding window validation
(moving a training and consecutive test window along data set)
– Shapiro-Wilk test to check for normal distribution of residuals
ENTER 2018 Research Track Slide Number 19
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 20
Comparison of forecasting performance
Comparison of prediction accuracy at different forecasting horizons
Adding Google Trends data reduces RMSE for all horizons and countries
ENTER 2018 Research Track Slide Number 21
Comparison of forecasting performance
Comparison of prediction accuracy at different forecasting horizons
Normally distributed residuals -> Google Trends model fits data well
ENTER 2018 Research Track Slide Number 22
Analysis of customers’ online search behaviour
Significant query lags for sending country Denmark
3 to 2 month before arrival -> search for activities in Sweden
One month before arrival -> more precise queries, searching specifically for Are
ENTER 2018 Research Track Slide Number 23
Analysis of customers’ online search behaviour
Significant query lags for sending country Denmark
Potential to
• Analyse customers online search behaviour and decision making process
• Identify most relevant keywords, used by tourists actually visiting the destination
(input to search engine optimization - SEO)
ENTER 2018 Research Track Slide Number 24
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 25
Conclusion and outlook
• Web search data as additional input to demand prediction
– Forecast model with Google Trends data as additional predictor
outperforms purely autoregressive approaches
• Analysis of customers’ online search behaviour
– Most significant search terms and time lags constitute valuable input
to analysing customers’ online search behaviour and decision making
process
• Open issues and future research activities
– Add further input data, e.g. customers‘ online interactions on social
media platforms like youtube, facebook, etc. or web navigation data
– Compare statistical approaches with data mining methods (e.g. deep
learning with neural networks)
ENTER 2018 Research Track Slide Number 26
Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb,
and Maria Lexhagenb
a Business Informatics Group
University of Applied Sciences Ravensburg-Weingarten, Germany
{name.surname}@hs-weingarten.de
b European Tourism Research Institute (ETOUR)
Mid-Sweden University, Sweden
{name.surname}@miun.se
Search engine traffic as input for
predicting tourist arrivals

More Related Content

Similar to Search engine traffic as input for predicting tourist arrivals

Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...
nicholes21
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
Galit Shmueli
 
Using Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing BehaviorUsing Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing Behavior
Julia Kiseleva
 
Website Development in Tourism and Hospitality: The Case of China
Website Development in Tourism and Hospitality: The Case of ChinaWebsite Development in Tourism and Hospitality: The Case of China
Website Development in Tourism and Hospitality: The Case of China
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
Dirk Lewandowski
 
Towards a M&E toolkit for Egypt's agricultural development projects
Towards a M&E toolkit for Egypt's agricultural development projectsTowards a M&E toolkit for Egypt's agricultural development projects
Towards a M&E toolkit for Egypt's agricultural development projects
International Food Policy Research Institute (IFPRI)
 
Bazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploadingBazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploading
Martin Bazley
 
Planning benchmarking webinar
Planning benchmarking webinarPlanning benchmarking webinar
Planning benchmarking webinar
Lora Cecere
 
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
John Makridis
 
2014 SEO Ranking Factors and Rank Correlation Google
2014 SEO Ranking Factors and Rank Correlation Google2014 SEO Ranking Factors and Rank Correlation Google
2014 SEO Ranking Factors and Rank Correlation Google
Joseph Hsieh
 
Innovation-related organizational decision-making: the case of responsive web...
Innovation-related organizational decision-making: the case of responsive web...Innovation-related organizational decision-making: the case of responsive web...
Innovation-related organizational decision-making: the case of responsive web...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Project for Public Spaces & National Center for Biking and Walking
 
Google SEO Ranking Factors and Rank Correlations 2014
Google SEO Ranking Factors and Rank Correlations 2014Google SEO Ranking Factors and Rank Correlations 2014
Google SEO Ranking Factors and Rank Correlations 2014
Михаил Тукнов
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
Clockwork Content Strategy Meetup Presentation
Clockwork Content Strategy Meetup PresentationClockwork Content Strategy Meetup Presentation
Clockwork Content Strategy Meetup Presentation
Clockwork
 
EdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensedEdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensed
JennyMunn.com
 
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
Istituto nazionale di statistica
 
Clinical Trial Performance Metrics Conference Dec 2016
Clinical Trial Performance Metrics Conference Dec 2016Clinical Trial Performance Metrics Conference Dec 2016
Clinical Trial Performance Metrics Conference Dec 2016
Mike Fitzpatrick
 
JanetAllbee
JanetAllbeeJanetAllbee
JanetAllbee
Janet Allbee
 
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Brian Alpert
 

Similar to Search engine traffic as input for predicting tourist arrivals (20)

Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Using Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing BehaviorUsing Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing Behavior
 
Website Development in Tourism and Hospitality: The Case of China
Website Development in Tourism and Hospitality: The Case of ChinaWebsite Development in Tourism and Hospitality: The Case of China
Website Development in Tourism and Hospitality: The Case of China
 
Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
 
Towards a M&E toolkit for Egypt's agricultural development projects
Towards a M&E toolkit for Egypt's agricultural development projectsTowards a M&E toolkit for Egypt's agricultural development projects
Towards a M&E toolkit for Egypt's agricultural development projects
 
Bazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploadingBazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploading
 
Planning benchmarking webinar
Planning benchmarking webinarPlanning benchmarking webinar
Planning benchmarking webinar
 
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
 
2014 SEO Ranking Factors and Rank Correlation Google
2014 SEO Ranking Factors and Rank Correlation Google2014 SEO Ranking Factors and Rank Correlation Google
2014 SEO Ranking Factors and Rank Correlation Google
 
Innovation-related organizational decision-making: the case of responsive web...
Innovation-related organizational decision-making: the case of responsive web...Innovation-related organizational decision-making: the case of responsive web...
Innovation-related organizational decision-making: the case of responsive web...
 
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
 
Google SEO Ranking Factors and Rank Correlations 2014
Google SEO Ranking Factors and Rank Correlations 2014Google SEO Ranking Factors and Rank Correlations 2014
Google SEO Ranking Factors and Rank Correlations 2014
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
Clockwork Content Strategy Meetup Presentation
Clockwork Content Strategy Meetup PresentationClockwork Content Strategy Meetup Presentation
Clockwork Content Strategy Meetup Presentation
 
EdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensedEdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensed
 
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
 
Clinical Trial Performance Metrics Conference Dec 2016
Clinical Trial Performance Metrics Conference Dec 2016Clinical Trial Performance Metrics Conference Dec 2016
Clinical Trial Performance Metrics Conference Dec 2016
 
JanetAllbee
JanetAllbeeJanetAllbee
JanetAllbee
 
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 

Search engine traffic as input for predicting tourist arrivals

  • 1. ENTER 2018 Research Track Slide Number 1 Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb, and Maria Lexhagenb a Business Informatics Group University of Applied Sciences Ravensburg-Weingarten, Germany {name.surname}@hs-weingarten.de b European Tourism Research Institute (ETOUR) Mid-Sweden University, Sweden {name.surname}@miun.se Search engine traffic as input for predicting tourist arrivals
  • 2. ENTER 2018 Research Track Slide Number 2 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 3. ENTER 2018 Research Track Slide Number 3 Motivation • Demand prediction in tourism – Due to perishable nature of tourism products, accurate forecasts of tourism demand are of utmost relevance (Frechtling, 2002; Fitzsimmons & Fitzsimmons, 2002) – Knowledge on long-term trends, imminent changes and short-term intra-period fluctuations of demand are essential for tourism management Accuracy and reliability of demand forecasts can hardly be overestimated for tourism businesses and policy makers (Frechtling, 2002) • Limitations of autoregressive approaches – Lack of historical data, influence of unexpected events, variety of input factors and complexity of travel decision-making process (Song et al. 2010) – Availability of travellers’ web search behaviour as additional input to demand prediction
  • 4. ENTER 2018 Research Track Slide Number 4 Objective • Extend autoregressive forecasting approach by including travellers´ web search behaviour – Does the inclusion of time series data on web search behaviour increase performance when forecasting tourist arrivals compared to the purely autoregressive approach? • Examine behavioural aspects of travellers related to concrete search terms used in online search for trip planning – Analyse temporal relationships between search terms and tourist arrivals – Identify patterns that reflect online planning behaviour of travellers before visiting specific destinations
  • 5. ENTER 2018 Research Track Slide Number 5 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 6. ENTER 2018 Research Track Slide Number 6 Search engine traffic for demand prediction • Web search data to predict tourist arrivals – Google web search data to improve tourism demand prediction accuracy, compared to purely autoregressive models or exponential smoothing time-series models (Önder & Gunter, 2016) – Google web search data to increase forecasting performance using autoregressive mixed-data sampling (AR-MIDAS) models (Bangwayo-Skeete & Skeete, 2015) – Google web search data and econometric indicators to improve autoregressive prediction of tourist arrivals, comparing statistical and data mining approaches (Höpken et al., 2017)
  • 7. ENTER 2018 Research Track Slide Number 7 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 8. ENTER 2018 Research Track Slide Number 8 Specification of data set • Tourist arrivals – Monthly aggregated tourist arrivals (December 2005 - April 2012) for the leading Swedish mountain destination Åre – Specified separately for its major sending countries (Denmark, Finland, Norway and the United Kingdom) • Web search traffic – Google Trends as approriate data source for above sending countries – Represents relative search volume of popular search terms over time and, thus, reflects peoples’ interest in specific search terms across different geographic regions and topical domains
  • 9. ENTER 2018 Research Track Slide Number 9 Collection of web search data • Google Trends crawling algorithm – Using search engine-based keyword recommendations by Google‘s Keyword Planner – Iterative algorithm to identify relevant keywords • Starting with seed keyword „are“ and iterating over related keywords suggested by Google keyword planner • Normalization of search terms – Examining search terms for close similarity based on linguistic variations, synonyms or misspellings • Transforming search terms by text processing techniques (tokenization, character substitution, stemming, stop-word removal, generation of word vector) • Eliminating semantically identical search terms (cosine similarity = 1)
  • 10. ENTER 2018 Research Track Slide Number 10 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 11. ENTER 2018 Research Track Slide Number 11 Construction of aggregated search indices • Identifying optimal time-lag between each search query and tourist arrivals – Calculating de-trended cross-correlation analysis (DCCA) coefficients for time lags 0 to 6, capturing travellers’ short- and mid-term online travel planning behaviour – Selecting time lag with maximal DCCA coefficient and weighting search query by DCCA coefficient • Constructing compound search indices – Filtering queries by Hurst exponent in order to assure the search indices to be constructed following the same auto-correlative patterns as its corresponding tourist arrival series (Pan et al., 2017) – Aggregate all weighted and time-lagged query series to compound search index
  • 12. ENTER 2018 Research Track Slide Number 12 Evaluation of search indices Index evaluation metrics for different sending countries High structural similarity between search indices and tourist arrivals (de-trended cross-correlation analysis appropriate for potentially non-stationary time series)
  • 13. ENTER 2018 Research Track Slide Number 13 Evaluation of search indices Index evaluation metrics for different sending countries Similar Hurst exponents of arrival and search index time series, indicating the same auto-correlative patterns
  • 14. ENTER 2018 Research Track Slide Number 14 Evaluation of search indices Index evaluation metrics for different sending countries Prediction accuracy can be improved when autoregressive forecasting models are extended by Google Trends data as additional predictor
  • 15. ENTER 2018 Research Track Slide Number 15 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 16. ENTER 2018 Research Track Slide Number 16 Stationarity tests Tests for stationarity and co-integration for arrival data and search indices Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test confirm stationarity for DK, FI and UK but not for NO
  • 17. ENTER 2018 Research Track Slide Number 17 Stationarity tests Tests for stationarity and co-integration for arrival data and search indices Johansen test shows co-integration relationships between search indices and corresponding arrival series (-> no series transformations necessary)
  • 18. ENTER 2018 Research Track Slide Number 18 Model building • Buidling a prediction model – Linear regression as statistical approach – Autoregressive approach, using past 25 month as input data – Search index, constructed from Google Trends data, as additional input data – Backward selection to eliminate irrelevant input (kitchen sink problem) • Evaluation – Prediction performance evaluated by sliding window validation (moving a training and consecutive test window along data set) – Shapiro-Wilk test to check for normal distribution of residuals
  • 19. ENTER 2018 Research Track Slide Number 19 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 20. ENTER 2018 Research Track Slide Number 20 Comparison of forecasting performance Comparison of prediction accuracy at different forecasting horizons Adding Google Trends data reduces RMSE for all horizons and countries
  • 21. ENTER 2018 Research Track Slide Number 21 Comparison of forecasting performance Comparison of prediction accuracy at different forecasting horizons Normally distributed residuals -> Google Trends model fits data well
  • 22. ENTER 2018 Research Track Slide Number 22 Analysis of customers’ online search behaviour Significant query lags for sending country Denmark 3 to 2 month before arrival -> search for activities in Sweden One month before arrival -> more precise queries, searching specifically for Are
  • 23. ENTER 2018 Research Track Slide Number 23 Analysis of customers’ online search behaviour Significant query lags for sending country Denmark Potential to • Analyse customers online search behaviour and decision making process • Identify most relevant keywords, used by tourists actually visiting the destination (input to search engine optimization - SEO)
  • 24. ENTER 2018 Research Track Slide Number 24 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 25. ENTER 2018 Research Track Slide Number 25 Conclusion and outlook • Web search data as additional input to demand prediction – Forecast model with Google Trends data as additional predictor outperforms purely autoregressive approaches • Analysis of customers’ online search behaviour – Most significant search terms and time lags constitute valuable input to analysing customers’ online search behaviour and decision making process • Open issues and future research activities – Add further input data, e.g. customers‘ online interactions on social media platforms like youtube, facebook, etc. or web navigation data – Compare statistical approaches with data mining methods (e.g. deep learning with neural networks)
  • 26. ENTER 2018 Research Track Slide Number 26 Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb, and Maria Lexhagenb a Business Informatics Group University of Applied Sciences Ravensburg-Weingarten, Germany {name.surname}@hs-weingarten.de b European Tourism Research Institute (ETOUR) Mid-Sweden University, Sweden {name.surname}@miun.se Search engine traffic as input for predicting tourist arrivals

Editor's Notes

  1. 1 min