Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web
how user-generated content (UGC) can become a data
source for tourism research
Peter A. Johnson and Dr. Renee Sieber, McGill University
TTRA Canada Annual Conference, Guelph Ontario
Thursday October 15, 2009
• What is user generated content (UGC)?
• Examples of tourism-related UGC
• Tripadvisor study
• Challenges to UGC
• User-generated content is:
• content made publicly available over
• reﬂects creative effort
• created outside of professional
routines and practices (OECD, 2007)
• A popular travel rating site
• Determine the range and nature of reviews
of Nova Scotia
• Start search queries using “nova scotia” and
“halifax nova scotia”
• Web scrape as many reviews as possible
• Specialized computer software (robot or
• Automated extraction of website data
• Simulates “clicks” to drill down through a
• Outputs thousands of records in hours
Take home points
• UGC is an emerging source of data for
• getting and using UGC
• how to use results at larger scales
• Girardin, F., Dal Fiore, F., Rattic, C, and Blatt, J. (2008) Leveraging explicitly
disclosed location information to understand tourist dynamics: a case study.
Journal of Location Based Services 2(1), 41-56.
• Goodchild, M.F. (2007). Citizens as Sensors: The World of Volunteered
Geography. Geo Journal 69, 211-221.
• Gorman S P, (2007), Is academia missing the boat for the Geo Web
revolution? A response to Harvey’s commentary. Environment and Planning
B: Planning and Design 34(6), 949 – 950
• Haklay, Muki, Alex Singleton and Chris Parker, (2008). Web Mapping 2.0: The
Neogeography of the GeoWeb. Geography Compass 2(6), 2011-2039.