Geo-Spotting: Mining Online Location-based
Services for Optimal Retail Store Placement
Dmytro Karamshuk
King's College Lon...
Optimal Retail Location Problem

Among L possible locations in the city select one where
a new store would be most profita...
Optimal Retail Location Problem
The problem is not new
●

A. Athiyaman. Location decision making: the case of retail servi...
Location-based social networks

●

check-in at places

●

share with your friends

●

receive bonuses for check-ins

●

se...
Check-ins around the world

over 40M users

over 4.5B check-ins
Collecting the Data

Dataset collected in New York
●

37K venues

●

47K users

●

621K checkins

●

May – November, 2010
...
How popular is a venue?

The distance between the two places is only few hundred meters
How popular is a venue?
Distribution of check-ins per place

Geographic distribution of venues

size = #checkin
●

popular...
Popularity and type of venue

●

different types and chains of
venues have different usage
patterns

●

we cannot compare ...
Co-location with other venues
How frequently we observe a Starbucks close to a
railway station?

Does it influence the pop...
User mobility between places
How many users go to a Starbucks
after railway station?

●

there is correspondence between c...
Optimal Retail Location Problem

Among L possible locations in the city select one where new store would be most popular.
Define the area
An area is defined as a disc of radius r around a point with geographical coordinates l

The area is descr...
Geographic features of an area
●

density – number of venues in the area

●

neighbors entropy – heterogeneity of venue ty...
Geographic features of an area
●

quality by Jensen
–

define inter-types attractiveness coefficients

–

weight surroundi...
Mobility features of an area
●

area popularity – total number of checkins in the area

●

transition density – intensity ...
Mobility features of an area
●

transition quality
●

define transition coefficients for each type

●

weight venues accor...
Ranking problem
Use area features

to rank all areas in a given set L

according to their potential popularity.

Compare w...
Evaluation metrics
Compare the predicted and ground truth rankings.
●

Top-K locations ranking – use NDCG@K

●

Accuracy o...
Performance of individual features
NDCG@10

●

some indicators are general across various chains while some are chain-spec...
Considering fusion of factors
Explore the fusion of features in a supervised learning approach

●

regression for ranking ...
Results of the supervised learning
NDCG@10

Individual features

Supervised learning

●

supervised learning has better pe...
The best location prediction

Supervised learning
Individual features
●

supervised learning yields reliable and significa...
Implications
●

we show how fine-grained data from location-based social networks
can be effectively explored in geographi...
Thank you for your attention!
Dmytro Karamshuk
King's College London
follow me on Twitter: @karamshuk
Upcoming SlideShare
Loading in...5
×

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

1,679

Published on

The problem of identifying the optimal location for a new retail store has been the focus of past research,
especially in the field of land economy, due to its importance in the success of a business. Traditional approaches to the problem have factored in demographics, revenue and aggregated human flow statistics from nearby or remote areas. However, the acquisition of relevant data is usually expensive. With the growth of location-based social networks, fine grained data describing user mobility and popularity of places has recently become attainable.

In this paper we study the predictive power of various machine learning features on the popularity of retail
stores in the city through the use of a dataset collected from Foursquare in New York. The features we mine are
based on two general signals: geographic, where features are formulated according to the types and density of nearby
places, and user mobility, which includes transitions between venues or the incoming flow of mobile users from distant areas. Our evaluation suggests that the best performing features are common across the three different commercial chains considered in the analysis, although variations may exist too, as explained by heterogeneities in the way retail facilities attract users. We also show that performance improves significantly when combining multiple features in supervised learning algorithms, suggesting that the retail success of a business may depend on multiple factors.

Published in: Technology, Education

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

  1. 1. Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement Dmytro Karamshuk King's College London Based on the paper: D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, C. Mascolo. Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, 2013
  2. 2. Optimal Retail Location Problem Among L possible locations in the city select one where a new store would be most profitable/popular.
  3. 3. Optimal Retail Location Problem The problem is not new ● A. Athiyaman. Location decision making: the case of retail service development in a closed population. In Academy of Marketing Studies, volume 15, page 13, 2010. ● O. Berman and D. Krass. The generalized maximal covering location problem. Computers & Operations Research, 29(6):563–581, 2002. ● A. Kubis and M. Hartmann. Analysis of location of large-area shopping centres. a probabilistic gravity model for the halle-leipzig area. Jahrbuch für Regionalwissenschaft, 27(1):43–57, 2007. ● Pablo Jensen. Network-based predictions of retail store commercial categories and optimal locations. Phys. Rev. E, 74:035101, Sep 2006. Our approach: explore fine-grained and cheap data from LBSN
  4. 4. Location-based social networks ● check-in at places ● share with your friends ● receive bonuses for check-ins ● search for places ● leave comments for others
  5. 5. Check-ins around the world over 40M users over 4.5B check-ins
  6. 6. Collecting the Data Dataset collected in New York ● 37K venues ● 47K users ● 621K checkins ● May – November, 2010 accounts for »25% of the original data
  7. 7. How popular is a venue? The distance between the two places is only few hundred meters
  8. 8. How popular is a venue? Distribution of check-ins per place Geographic distribution of venues size = #checkin ● popularity can be several orders of magnitude different from place to place ● probably it depends on the location and types of places
  9. 9. Popularity and type of venue ● different types and chains of venues have different usage patterns ● we cannot compare check-ins across venues of different chains but we can across individual chains Number of check-ins per place for individual chains of restaurants
  10. 10. Co-location with other venues How frequently we observe a Starbucks close to a railway station? Does it influence the popularity of a restaurant? Pablo Jensen. Analyzing the localization of retail stores with complex systems tools. IDA ’09, pages 10–20, Berlin, Heidelberg, 2009. Springer-Verlag.
  11. 11. User mobility between places How many users go to a Starbucks after railway station? ● there is correspondence between colocation and mobility patterns ● but also many discrepancies
  12. 12. Optimal Retail Location Problem Among L possible locations in the city select one where new store would be most popular.
  13. 13. Define the area An area is defined as a disc of radius r around a point with geographical coordinates l The area is described by a set of numeric features check-ins at venues in the disk. designed from
  14. 14. Geographic features of an area ● density – number of venues in the area ● neighbors entropy – heterogeneity of venue types ● competitiveness – percentage of competing venues
  15. 15. Geographic features of an area ● quality by Jensen – define inter-types attractiveness coefficients – weight surrounding venues by their attractiveness
  16. 16. Mobility features of an area ● area popularity – total number of checkins in the area ● transition density – intensity of transitions inside the area ● incoming flows – intensity of transitions from outside areas
  17. 17. Mobility features of an area ● transition quality ● define transition coefficients for each type ● weight venues according to the product of coefficient and check-ins volume
  18. 18. Ranking problem Use area features to rank all areas in a given set L according to their potential popularity. Compare with the ground truth: ranking of places basing on their actual popularity.
  19. 19. Evaluation metrics Compare the predicted and ground truth rankings. ● Top-K locations ranking – use NDCG@K ● Accuracy of the best prediction – Accuracy@X% of having the best predicted store in the Top-X% of ground truth ranking We explore random cross-validation approach and report average values across all experiments.
  20. 20. Performance of individual features NDCG@10 ● some indicators are general across various chains while some are chain-specific ● the lack of competitors in the area play positive role as do the existence of place attractors ● performance of In.Flow is in accordance with the fact that McDonalds attract more users from the remote areas
  21. 21. Considering fusion of factors Explore the fusion of features in a supervised learning approach ● regression for ranking – conduct regression using Linear Regression, SVR or M5P and then rank according to regressed values ● pair-wise ranking – learn on pair-wise comparison using neural networks RankNet Use the same evaluation methodology as for individual features.
  22. 22. Results of the supervised learning NDCG@10 Individual features Supervised learning ● supervised learning has better performance than the the best individual feature ● the combination of geographic features and mobility features yields better result than the combination of geographic features alone ● regression to rank with SVR is the best performing technique
  23. 23. The best location prediction Supervised learning Individual features ● supervised learning yields reliable and significantly improved result ● the best prediction lies in top-20% of the ground truth ranking with probability over 80%
  24. 24. Implications ● we show how fine-grained data from location-based social networks can be effectively explored in geographic retail analysis ● this can inspire further works in location-based advertising, developing indexes of urban areas, provision of location-based services etc. etc. ● particularly we see a lot of potential in the approach of measuring user flows from check-ins in various applications ● we also faced some challenges when scaling this approach to other chains and cities
  25. 25. Thank you for your attention! Dmytro Karamshuk King's College London follow me on Twitter: @karamshuk
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×