This document describes a city attractions recommender system called TokyoGo that was developed using data collected from Foursquare APIs and web scraping. The system used machine learning techniques like NMF and DBScan clustering to analyze over 263,000 venue records and user tips to generate topic tags and recommendations. Evaluation of the model found it was able to distinguish preferences of local and foreign visitors to some extent. Further improvements and applications of the system are discussed.
7. Introduction Data Coll ResultsAnalysis Discussion
Topic tag Top words
theme tour
Famous shrine, totoro,
national park …
Game or outdoor Video game, sunshine, bandit
History Garden, manju, oldtokyo
Culture Shinkansen, coast, market
theme park and shopping
Indianajons, waterfall,
waterpark
10. Next steps:
• Improve the model with user – user similarity
• Include the seasonality and full tips data
• Train a neural network model to tag the image content
Introduction Data Coll ResultsAnalysis Discussion
• The method applied was able to distinguish (to a certain extent)
preferences of different groups (local, visitors from other areas in Japan,
and forigner travelers).
• My recommender system product of this project will include only the top
200 venues of each visitor source group (sum up to ~ 500 venues) as an
toy example that can be deployed on a small amazon instance. The
framework can be extended when more data available, and he business
features and A/B testing evaluation can be added.
• The NMF analysis indicates visitors to all the venues tend to mention
some food, which also indicates that food is an important element that
shared among all city attractions! Restaurant recommender is not the
topic of this project, but I am expecting to see interesting patterns among
different tourist sources in Tokyo.
Possible application is for tourist who wanna explore the city
An example screen shot of a venue page
Why atuo tagging !!!
Took all tags of a location and vectorize to feed the NMF topic model to find out similar topics
1) Cluster by location -> 2) cluster on other feats 3) combine cluster labels
An example screen shot of a venue page
Why atuo tagging !!!
Sample some data and give a overall accuracy
Change to one venue example with few photos /tagged and the final vector to feed NMF
Think about how to present results that are belonging to different cluster