SlideShare a Scribd company logo
1 of 26
ENTER 2018 Research Track Slide Number 1
Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb,
and Maria Lexhagenb
a Business Informatics Group
University of Applied Sciences Ravensburg-Weingarten, Germany
{name.surname}@hs-weingarten.de
b European Tourism Research Institute (ETOUR)
Mid-Sweden University, Sweden
{name.surname}@miun.se
Search engine traffic as input for
predicting tourist arrivals
ENTER 2018 Research Track Slide Number 2
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 3
Motivation
• Demand prediction in tourism
– Due to perishable nature of tourism products, accurate forecasts of
tourism demand are of utmost relevance (Frechtling, 2002; Fitzsimmons &
Fitzsimmons, 2002)
– Knowledge on long-term trends, imminent changes and short-term
intra-period fluctuations of demand are essential for tourism
management
Accuracy and reliability of demand forecasts can hardly be
overestimated for tourism businesses and policy makers (Frechtling, 2002)
• Limitations of autoregressive approaches
– Lack of historical data, influence of unexpected events, variety of input
factors and complexity of travel decision-making process (Song et al. 2010)
– Availability of travellers’ web search behaviour as additional input to
demand prediction
ENTER 2018 Research Track Slide Number 4
Objective
• Extend autoregressive forecasting approach by including
travellers´ web search behaviour
– Does the inclusion of time series data on web search behaviour
increase performance when forecasting tourist arrivals compared to
the purely autoregressive approach?
• Examine behavioural aspects of travellers related to
concrete search terms used in online search for trip planning
– Analyse temporal relationships between search terms and tourist
arrivals
– Identify patterns that reflect online planning behaviour of travellers
before visiting specific destinations
ENTER 2018 Research Track Slide Number 5
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 6
Search engine traffic for demand prediction
• Web search data to predict tourist arrivals
– Google web search data to improve tourism demand prediction
accuracy, compared to purely autoregressive models or exponential
smoothing time-series models (Önder & Gunter, 2016)
– Google web search data to increase forecasting performance using
autoregressive mixed-data sampling (AR-MIDAS) models
(Bangwayo-Skeete & Skeete, 2015)
– Google web search data and econometric indicators to improve
autoregressive prediction of tourist arrivals, comparing statistical and
data mining approaches (Höpken et al., 2017)
ENTER 2018 Research Track Slide Number 7
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 8
Specification of data set
• Tourist arrivals
– Monthly aggregated tourist arrivals (December 2005 - April 2012) for
the leading Swedish mountain destination Åre
– Specified separately for its major sending countries (Denmark, Finland,
Norway and the United Kingdom)
• Web search traffic
– Google Trends as approriate data source for above sending countries
– Represents relative search volume of popular search terms over time
and, thus, reflects peoples’ interest in specific search terms across
different geographic regions and topical domains
ENTER 2018 Research Track Slide Number 9
Collection of web search data
• Google Trends crawling algorithm
– Using search engine-based keyword recommendations by Google‘s
Keyword Planner
– Iterative algorithm to identify relevant keywords
• Starting with seed keyword „are“ and iterating over related keywords
suggested by Google keyword planner
• Normalization of search terms
– Examining search terms for close similarity based on linguistic
variations, synonyms or misspellings
• Transforming search terms by text processing techniques (tokenization,
character substitution, stemming, stop-word removal, generation of word
vector)
• Eliminating semantically identical search terms (cosine similarity = 1)
ENTER 2018 Research Track Slide Number 10
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 11
Construction of aggregated search indices
• Identifying optimal time-lag between each search query and
tourist arrivals
– Calculating de-trended cross-correlation analysis (DCCA) coefficients
for time lags 0 to 6, capturing travellers’ short- and mid-term online
travel planning behaviour
– Selecting time lag with maximal DCCA coefficient and weighting search
query by DCCA coefficient
• Constructing compound search indices
– Filtering queries by Hurst exponent in order to assure the search
indices to be constructed following the same auto-correlative patterns
as its corresponding tourist arrival series (Pan et al., 2017)
– Aggregate all weighted and time-lagged query series to compound
search index
ENTER 2018 Research Track Slide Number 12
Evaluation of search indices
Index evaluation metrics for different sending countries
High structural similarity between search indices and tourist arrivals
(de-trended cross-correlation analysis appropriate for potentially
non-stationary time series)
ENTER 2018 Research Track Slide Number 13
Evaluation of search indices
Index evaluation metrics for different sending countries
Similar Hurst exponents of arrival and search index time series,
indicating the same auto-correlative patterns
ENTER 2018 Research Track Slide Number 14
Evaluation of search indices
Index evaluation metrics for different sending countries
Prediction accuracy can be improved when autoregressive forecasting
models are extended by Google Trends data as additional predictor
ENTER 2018 Research Track Slide Number 15
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 16
Stationarity tests
Tests for stationarity and co-integration for arrival data and search indices
Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin
(KPSS) test confirm stationarity for DK, FI and UK but not for NO
ENTER 2018 Research Track Slide Number 17
Stationarity tests
Tests for stationarity and co-integration for arrival data and search indices
Johansen test shows co-integration relationships between search indices
and corresponding arrival series (-> no series transformations necessary)
ENTER 2018 Research Track Slide Number 18
Model building
• Buidling a prediction model
– Linear regression as statistical approach
– Autoregressive approach, using past 25 month as input data
– Search index, constructed from Google Trends data, as additional input
data
– Backward selection to eliminate irrelevant input (kitchen sink problem)
• Evaluation
– Prediction performance evaluated by sliding window validation
(moving a training and consecutive test window along data set)
– Shapiro-Wilk test to check for normal distribution of residuals
ENTER 2018 Research Track Slide Number 19
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 20
Comparison of forecasting performance
Comparison of prediction accuracy at different forecasting horizons
Adding Google Trends data reduces RMSE for all horizons and countries
ENTER 2018 Research Track Slide Number 21
Comparison of forecasting performance
Comparison of prediction accuracy at different forecasting horizons
Normally distributed residuals -> Google Trends model fits data well
ENTER 2018 Research Track Slide Number 22
Analysis of customers’ online search behaviour
Significant query lags for sending country Denmark
3 to 2 month before arrival -> search for activities in Sweden
One month before arrival -> more precise queries, searching specifically for Are
ENTER 2018 Research Track Slide Number 23
Analysis of customers’ online search behaviour
Significant query lags for sending country Denmark
Potential to
• Analyse customers online search behaviour and decision making process
• Identify most relevant keywords, used by tourists actually visiting the destination
(input to search engine optimization - SEO)
ENTER 2018 Research Track Slide Number 24
Content
• Introduction
• Related work
• Data collection and preparation
• Construction of web search indices with high predictive
power
• Model building and evaluation
• Results
• Conclusion and outlook
ENTER 2018 Research Track Slide Number 25
Conclusion and outlook
• Web search data as additional input to demand prediction
– Forecast model with Google Trends data as additional predictor
outperforms purely autoregressive approaches
• Analysis of customers’ online search behaviour
– Most significant search terms and time lags constitute valuable input
to analysing customers’ online search behaviour and decision making
process
• Open issues and future research activities
– Add further input data, e.g. customers‘ online interactions on social
media platforms like youtube, facebook, etc. or web navigation data
– Compare statistical approaches with data mining methods (e.g. deep
learning with neural networks)
ENTER 2018 Research Track Slide Number 26
Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb,
and Maria Lexhagenb
a Business Informatics Group
University of Applied Sciences Ravensburg-Weingarten, Germany
{name.surname}@hs-weingarten.de
b European Tourism Research Institute (ETOUR)
Mid-Sweden University, Sweden
{name.surname}@miun.se
Search engine traffic as input for
predicting tourist arrivals

More Related Content

Similar to Search engine traffic as input for predicting tourist arrivals

Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
John Makridis
 
EdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensedEdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensed
JennyMunn.com
 

Similar to Search engine traffic as input for predicting tourist arrivals (20)

On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Using Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing BehaviorUsing Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing Behavior
 
Website Development in Tourism and Hospitality: The Case of China
Website Development in Tourism and Hospitality: The Case of ChinaWebsite Development in Tourism and Hospitality: The Case of China
Website Development in Tourism and Hospitality: The Case of China
 
Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
 
Towards a M&E toolkit for Egypt's agricultural development projects
Towards a M&E toolkit for Egypt's agricultural development projectsTowards a M&E toolkit for Egypt's agricultural development projects
Towards a M&E toolkit for Egypt's agricultural development projects
 
Bazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploadingBazley understanding online audiences vsg conf march 2016 for uploading
Bazley understanding online audiences vsg conf march 2016 for uploading
 
Planning benchmarking webinar
Planning benchmarking webinarPlanning benchmarking webinar
Planning benchmarking webinar
 
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
Big Data Analytics and Knowledge Discovery through Location-Based Social Netw...
 
2014 SEO Ranking Factors and Rank Correlation Google
2014 SEO Ranking Factors and Rank Correlation Google2014 SEO Ranking Factors and Rank Correlation Google
2014 SEO Ranking Factors and Rank Correlation Google
 
Innovation-related organizational decision-making: the case of responsive web...
Innovation-related organizational decision-making: the case of responsive web...Innovation-related organizational decision-making: the case of responsive web...
Innovation-related organizational decision-making: the case of responsive web...
 
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
Creating a Safer System Through State Pedestrian and Bicycle Safety Campaigns...
 
Google SEO Ranking Factors and Rank Correlations 2014
Google SEO Ranking Factors and Rank Correlations 2014Google SEO Ranking Factors and Rank Correlations 2014
Google SEO Ranking Factors and Rank Correlations 2014
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
Clockwork Content Strategy Meetup Presentation
Clockwork Content Strategy Meetup PresentationClockwork Content Strategy Meetup Presentation
Clockwork Content Strategy Meetup Presentation
 
EdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensedEdNET15_- Jenny Munn - condensed
EdNET15_- Jenny Munn - condensed
 
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
 
Clinical Trial Performance Metrics Conference Dec 2016
Clinical Trial Performance Metrics Conference Dec 2016Clinical Trial Performance Metrics Conference Dec 2016
Clinical Trial Performance Metrics Conference Dec 2016
 
JanetAllbee
JanetAllbeeJanetAllbee
JanetAllbee
 
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
Cut Through the Web Analytics Fog: Using GA Data Grabber to Act on Google Ana...
 
Presentation for Final Evaluation of FOSTER-II.pptx
Presentation for Final Evaluation of FOSTER-II.pptxPresentation for Final Evaluation of FOSTER-II.pptx
Presentation for Final Evaluation of FOSTER-II.pptx
 

Recently uploaded

Recently uploaded (20)

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 

Search engine traffic as input for predicting tourist arrivals

  • 1. ENTER 2018 Research Track Slide Number 1 Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb, and Maria Lexhagenb a Business Informatics Group University of Applied Sciences Ravensburg-Weingarten, Germany {name.surname}@hs-weingarten.de b European Tourism Research Institute (ETOUR) Mid-Sweden University, Sweden {name.surname}@miun.se Search engine traffic as input for predicting tourist arrivals
  • 2. ENTER 2018 Research Track Slide Number 2 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 3. ENTER 2018 Research Track Slide Number 3 Motivation • Demand prediction in tourism – Due to perishable nature of tourism products, accurate forecasts of tourism demand are of utmost relevance (Frechtling, 2002; Fitzsimmons & Fitzsimmons, 2002) – Knowledge on long-term trends, imminent changes and short-term intra-period fluctuations of demand are essential for tourism management Accuracy and reliability of demand forecasts can hardly be overestimated for tourism businesses and policy makers (Frechtling, 2002) • Limitations of autoregressive approaches – Lack of historical data, influence of unexpected events, variety of input factors and complexity of travel decision-making process (Song et al. 2010) – Availability of travellers’ web search behaviour as additional input to demand prediction
  • 4. ENTER 2018 Research Track Slide Number 4 Objective • Extend autoregressive forecasting approach by including travellers´ web search behaviour – Does the inclusion of time series data on web search behaviour increase performance when forecasting tourist arrivals compared to the purely autoregressive approach? • Examine behavioural aspects of travellers related to concrete search terms used in online search for trip planning – Analyse temporal relationships between search terms and tourist arrivals – Identify patterns that reflect online planning behaviour of travellers before visiting specific destinations
  • 5. ENTER 2018 Research Track Slide Number 5 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 6. ENTER 2018 Research Track Slide Number 6 Search engine traffic for demand prediction • Web search data to predict tourist arrivals – Google web search data to improve tourism demand prediction accuracy, compared to purely autoregressive models or exponential smoothing time-series models (Önder & Gunter, 2016) – Google web search data to increase forecasting performance using autoregressive mixed-data sampling (AR-MIDAS) models (Bangwayo-Skeete & Skeete, 2015) – Google web search data and econometric indicators to improve autoregressive prediction of tourist arrivals, comparing statistical and data mining approaches (Höpken et al., 2017)
  • 7. ENTER 2018 Research Track Slide Number 7 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 8. ENTER 2018 Research Track Slide Number 8 Specification of data set • Tourist arrivals – Monthly aggregated tourist arrivals (December 2005 - April 2012) for the leading Swedish mountain destination Åre – Specified separately for its major sending countries (Denmark, Finland, Norway and the United Kingdom) • Web search traffic – Google Trends as approriate data source for above sending countries – Represents relative search volume of popular search terms over time and, thus, reflects peoples’ interest in specific search terms across different geographic regions and topical domains
  • 9. ENTER 2018 Research Track Slide Number 9 Collection of web search data • Google Trends crawling algorithm – Using search engine-based keyword recommendations by Google‘s Keyword Planner – Iterative algorithm to identify relevant keywords • Starting with seed keyword „are“ and iterating over related keywords suggested by Google keyword planner • Normalization of search terms – Examining search terms for close similarity based on linguistic variations, synonyms or misspellings • Transforming search terms by text processing techniques (tokenization, character substitution, stemming, stop-word removal, generation of word vector) • Eliminating semantically identical search terms (cosine similarity = 1)
  • 10. ENTER 2018 Research Track Slide Number 10 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 11. ENTER 2018 Research Track Slide Number 11 Construction of aggregated search indices • Identifying optimal time-lag between each search query and tourist arrivals – Calculating de-trended cross-correlation analysis (DCCA) coefficients for time lags 0 to 6, capturing travellers’ short- and mid-term online travel planning behaviour – Selecting time lag with maximal DCCA coefficient and weighting search query by DCCA coefficient • Constructing compound search indices – Filtering queries by Hurst exponent in order to assure the search indices to be constructed following the same auto-correlative patterns as its corresponding tourist arrival series (Pan et al., 2017) – Aggregate all weighted and time-lagged query series to compound search index
  • 12. ENTER 2018 Research Track Slide Number 12 Evaluation of search indices Index evaluation metrics for different sending countries High structural similarity between search indices and tourist arrivals (de-trended cross-correlation analysis appropriate for potentially non-stationary time series)
  • 13. ENTER 2018 Research Track Slide Number 13 Evaluation of search indices Index evaluation metrics for different sending countries Similar Hurst exponents of arrival and search index time series, indicating the same auto-correlative patterns
  • 14. ENTER 2018 Research Track Slide Number 14 Evaluation of search indices Index evaluation metrics for different sending countries Prediction accuracy can be improved when autoregressive forecasting models are extended by Google Trends data as additional predictor
  • 15. ENTER 2018 Research Track Slide Number 15 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 16. ENTER 2018 Research Track Slide Number 16 Stationarity tests Tests for stationarity and co-integration for arrival data and search indices Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test confirm stationarity for DK, FI and UK but not for NO
  • 17. ENTER 2018 Research Track Slide Number 17 Stationarity tests Tests for stationarity and co-integration for arrival data and search indices Johansen test shows co-integration relationships between search indices and corresponding arrival series (-> no series transformations necessary)
  • 18. ENTER 2018 Research Track Slide Number 18 Model building • Buidling a prediction model – Linear regression as statistical approach – Autoregressive approach, using past 25 month as input data – Search index, constructed from Google Trends data, as additional input data – Backward selection to eliminate irrelevant input (kitchen sink problem) • Evaluation – Prediction performance evaluated by sliding window validation (moving a training and consecutive test window along data set) – Shapiro-Wilk test to check for normal distribution of residuals
  • 19. ENTER 2018 Research Track Slide Number 19 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 20. ENTER 2018 Research Track Slide Number 20 Comparison of forecasting performance Comparison of prediction accuracy at different forecasting horizons Adding Google Trends data reduces RMSE for all horizons and countries
  • 21. ENTER 2018 Research Track Slide Number 21 Comparison of forecasting performance Comparison of prediction accuracy at different forecasting horizons Normally distributed residuals -> Google Trends model fits data well
  • 22. ENTER 2018 Research Track Slide Number 22 Analysis of customers’ online search behaviour Significant query lags for sending country Denmark 3 to 2 month before arrival -> search for activities in Sweden One month before arrival -> more precise queries, searching specifically for Are
  • 23. ENTER 2018 Research Track Slide Number 23 Analysis of customers’ online search behaviour Significant query lags for sending country Denmark Potential to • Analyse customers online search behaviour and decision making process • Identify most relevant keywords, used by tourists actually visiting the destination (input to search engine optimization - SEO)
  • 24. ENTER 2018 Research Track Slide Number 24 Content • Introduction • Related work • Data collection and preparation • Construction of web search indices with high predictive power • Model building and evaluation • Results • Conclusion and outlook
  • 25. ENTER 2018 Research Track Slide Number 25 Conclusion and outlook • Web search data as additional input to demand prediction – Forecast model with Google Trends data as additional predictor outperforms purely autoregressive approaches • Analysis of customers’ online search behaviour – Most significant search terms and time lags constitute valuable input to analysing customers’ online search behaviour and decision making process • Open issues and future research activities – Add further input data, e.g. customers‘ online interactions on social media platforms like youtube, facebook, etc. or web navigation data – Compare statistical approaches with data mining methods (e.g. deep learning with neural networks)
  • 26. ENTER 2018 Research Track Slide Number 26 Wolfram Höpkena, Tobias Eberlea, Matthias Fuchsb, and Maria Lexhagenb a Business Informatics Group University of Applied Sciences Ravensburg-Weingarten, Germany {name.surname}@hs-weingarten.de b European Tourism Research Institute (ETOUR) Mid-Sweden University, Sweden {name.surname}@miun.se Search engine traffic as input for predicting tourist arrivals

Editor's Notes

  1. 1 min