Mining the Web
how user-generated content (UGC) can become a data
source for tourism research

Peter A. Johnson and Dr. Re...
Outline

• What is user generated content (UGC)?
• Examples of tourism-related UGC
• Tripadvisor study
• Challenges to UGC
What is UGC?
• User-generated content is:
   • content made publicly available over
      the Internet
   • reflects creative effort
   ...
Tripadvisor study
• A popular travel rating site
• Determine the range and nature of reviews
  of Nova Scotia
• Start sear...
Web Scraping
• Specialized computer software (robot or
  spider)
• Automated extraction of website data
• Simulates “click...
5730 total reviews
5000

                                        4064
3750



2500

                        1513
1250


  ...
Web Scraping Results
Survey vs. UGC
              Survey           UGC
 Sample
            Controlled      Uncontrolled
  Type

Question    Ope...
77 Reviewed Locations
Accommodation Reviews
Attraction Reviews
                   activity
attractions
Restaurant Reviews
Total Destination Review Breakdown

Halifax
Annapolis Royal
Baddeck
Lunenburg
Dartmouth
Yarmouth
                  33%
Dig...
Accommodation Review Ratings
One Star
Two Stars
Three Stars
Four Stars                   7%
Five Stars                    ...
Attraction Review Ratings
One Star
Two Stars
Three Stars
Four Stars                 7%
                                5%
...
Restaurant Review Ratings
One Star
Two Stars
Three Stars
Four Stars                 7%
Five Stars                      8%
...
Challenges with UGC

• Quality varies widely
• Vendetta/self promotion
• Legal grey area
• Generalizability?
The Future

• Data gathering and analysis:
      • geolocate reviewers
      • content analysis of reviews
• Secondary UGC...
Tripadvisor iPhone Application
Yelp iPhone Application
Take home points

• UGC is an emerging source of data for
  tourism research
• Challenges:
   • getting and using UGC
   •...
Thank You!
     Further Reading
•   Girardin, F., Dal Fiore, F., Rattic, C, and Blatt, J. (2008) Leveraging explicitly
   ...
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Upcoming SlideShare
Loading in …5
×

Mining the Web: How user-generated content can become a data source for tourism research

3,694
-1

Published on

This is a presentation that I gave at the 2009 Travel and Tourism Research Association of Canada annual meeting.

Published in: Education, Business, Technology

Mining the Web: How user-generated content can become a data source for tourism research

  1. 1. Mining the Web how user-generated content (UGC) can become a data source for tourism research Peter A. Johnson and Dr. Renee Sieber, McGill University TTRA Canada Annual Conference, Guelph Ontario Thursday October 15, 2009
  2. 2. Outline • What is user generated content (UGC)? • Examples of tourism-related UGC • Tripadvisor study • Challenges to UGC
  3. 3. What is UGC?
  4. 4. • User-generated content is: • content made publicly available over the Internet • reflects creative effort • created outside of professional routines and practices (OECD, 2007) http://www.oecd.org/dataoecd/57/14/38393115.pdf
  5. 5. Tripadvisor study • A popular travel rating site • Determine the range and nature of reviews of Nova Scotia • Start search queries using “nova scotia” and “halifax nova scotia” • Web scrape as many reviews as possible
  6. 6. Web Scraping • Specialized computer software (robot or spider) • Automated extraction of website data • Simulates “clicks” to drill down through a web page • Outputs thousands of records in hours
  7. 7. 5730 total reviews 5000 4064 3750 2500 1513 1250 153 0 Attractions Restaurants Accommodations Reviews
  8. 8. Web Scraping Results
  9. 9. Survey vs. UGC Survey UGC Sample Controlled Uncontrolled Type Question Open/Close Generally Type Ended Open-Ended Research Investigative Exploratory Approach
  10. 10. 77 Reviewed Locations
  11. 11. Accommodation Reviews
  12. 12. Attraction Reviews activity
  13. 13. attractions Restaurant Reviews
  14. 14. Total Destination Review Breakdown Halifax Annapolis Royal Baddeck Lunenburg Dartmouth Yarmouth 33% Digby 40% Other 3% 4% 4% 6% 5% 5%
  15. 15. Accommodation Review Ratings One Star Two Stars Three Stars Four Stars 7% Five Stars 8% 10% 53% 22%
  16. 16. Attraction Review Ratings One Star Two Stars Three Stars Four Stars 7% 5% Five Stars 9% 56% 23%
  17. 17. Restaurant Review Ratings One Star Two Stars Three Stars Four Stars 7% Five Stars 8% 37% 17% 32%
  18. 18. Challenges with UGC • Quality varies widely • Vendetta/self promotion • Legal grey area • Generalizability?
  19. 19. The Future • Data gathering and analysis: • geolocate reviewers • content analysis of reviews • Secondary UGC: reviews of reviews • Instant feedback: iPhone effect
  20. 20. Tripadvisor iPhone Application
  21. 21. Yelp iPhone Application
  22. 22. Take home points • UGC is an emerging source of data for tourism research • Challenges: • getting and using UGC • how to use results at larger scales
  23. 23. Thank You! Further Reading • Girardin, F., Dal Fiore, F., Rattic, C, and Blatt, J. (2008) Leveraging explicitly disclosed location information to understand tourist dynamics: a case study. Journal of Location Based Services 2(1), 41-56. • Goodchild, M.F. (2007). Citizens as Sensors: The World of Volunteered Geography. Geo Journal 69, 211-221. • Gorman S P, (2007), Is academia missing the boat for the Geo Web revolution? A response to Harvey’s commentary. Environment and Planning B: Planning and Design 34(6), 949 – 950 • Haklay, Muki, Alex Singleton and Chris Parker, (2008). Web Mapping 2.0: The Neogeography of the GeoWeb. Geography Compass 2(6), 2011-2039. Contact: peter.johnson2@mail.mcgill.ca

×