4. 2013
From the data point of view...
Check-in = User + Location + Time
4.5+ Billion check-ins, 6 Million check-
ins/day
5. 2013
From the data point of view...
Check-in = User + Location + Time
4.5+ Billion check-ins, 6 Million check-
ins/day
40 million users worldwide
60 million locations (venues)
18. 2013
●Based on the user’s own history
●Where have you been?
●What types of places do you go to?
●Are you familiar with this area? Are you a
tourist?
User Signals
28. 2013
The Data tell us...
• Human mobility patterns
– Spatial
– Temporal
• User context
– Where they are, where they’ve been, where
they’re (likely to be) going
– What they like, what their friends like
• Venue patterns
– Who goes here, when, and what they like
Everything starts with a checkin. That’s what most of people know Foursquare for. You click the big pushpin, select where you are at and checkin.
Let’s look at a checkin from a data point of view.
Users are social.
We can locations as Venues. Venues have a lot of data attached to it: Category, Tips, hours, website etc.
Users are social.
We can locations as Venues. Venues have a lot of data attached to it: Category, Tips, hours, website etc.
Foursquare is also a social network. So these users are connected to each other.
We can locations as Venues. Venues have a lot of data attached to it: Category, Tips, hours, website etc.
What does this data enable? A lot of data features like Explore, Ads relevance, checkin search. We don’t have the time to go through all of these. So we will talk about a few of these today...
Explore is our Search/Recommendations product.
E.g. If you select Best Nearby, the equivalent to a query-less search, we return pure recommendations
We can do pure search.
Then you can issue broad queries like coffee and get recommendations for them. These straddle what people define as search & recommendations.
There are a bunch of signals which power Explore. We will look at a few of these today.
Location specific: Where you search from/for. Densely/Sparsely populated area? etc.
One of the most important signals is Venue Rating. This is the one number we associate with every Venue which tells you how good a Venue is. This is context-independent signal. By that I mean the venue rating you see will be the same no matter who you or where you are or what you are searching for. This is our page rank.
Let’s talk about VenueRating a bit more. What goes into venue rating? One of the things we look at is the Like %. It’s the percentage of people who click on the Like “heart” on the Venue Page.
Most local review sites like Yelp & Urbanspoon use Average Review Score of the Like %
But remember, we have the checkin data. We know how many people have been to this place i.e. its Popularity
Do you know what this place is with over 1.3m checkins?
We can be even more intelligent. We can calculate loyalty…. Here is a comparison between Sightglass coffee and the Starbucks a block away.
We can identify people who are Coffee Experts based on the # of different coffee places they go to, # of tips they write there, # of likes/dislikes they give. If Coffee experts frequent your place, that definitely helps your rating.
We don’t stop there. We use lot’s of other factors like Tip Sentiment and Listiness that go into the final Venue Rating. The checkin data is a very rich source of information which lets us come up with a very smart and nuanced Venue Rating score.
The times that a place is popular is very important for ranking. Consider these signals about 3 places in NY. Each place has a different signature based on when people come to the place. Weekends have quite a different pattern to weekdays.
This plot shows 3 different places: Gorilla Coffee, Gray’s Papaya (a hot-dog place), Amorino (an ice cream shop)
See how Gorilla Coffee is busy more in the morning, where Amorino is busy in the evening.
Gray’s Papaya clearly has a strong lunch crowd, but also a late night peak on the weekends.
Including these signals is crucial to making a relevant recommendation. Again it’s the power of the checkin data which enables us to use temporal signals.
Because of checkin data, we can give personalized recommendations.
We can use our social graph to make recommendations based on your social network. We know where your friends have been to and use that to make great recommendations.
This graph is a little tough to read, but basically tells you how likely you are to checkin to the same place that one of your friends did.
Over on the left are the “explorers” (the red bar) who go to new places almost 70% of the time. And on the far right are the “followers,” who go to places their friends have been more than 80% of the time!
The median user (50th %-ile) repeats a friend’s check-in about 65% of the time. Thus this signal is huge is making a relevant recommendation.
to give you the best possible results set. But I want to quickly show you some of the other interesting features that the checkin data powers
Here’s one feature. Venue Similarity. You can see it on the right hand side of a Venue Page.
Venue Similarity is powered by what we call the the Place Graph.
Place graph is the network of connections between all places in the world. Here, the nodes are places, and the edges are a variety of different signals indicating the strength of the connection between places. There are a variety of signals to consider.
Flow is how often people directly move between two places
co-visitation is how many people visited the two places in the past
categories are a natural structure which determines similarity between places
also there are lots of free text describing places, such as menus, tips, and shouts, which implicitly connect places that share the same text
The data tells us about tourist venues. What places do tourists & out-of-towners visit. It tells you to visit the Kremlin when you are in Moscow, and visit the Notre Dame Cathedral when you are in Paris. From the checkin data, we know when people take trips to cities that are not your honetown. We can look at what places they visit when they are a tourist.
The data tells us where people go next after visiting a venue. We can look at consecutive checkins from users and generate a high-quality where-to-go-next data. I use this a lot to figure out that to do next. This is also great for to suggest you tourist itineraries
Checkin data lets you define accurate Venue Shapes. For large venues like SFO or Candlestick path, this is very important.
Can you guess what the venue on the left is? What about the one on the right?
The data lets us figure out venue closures automatically. We don’t have to wait for a human to report a closure. This is the checkin pattern for a closed venue in New York.
To summarize, the data tells us a lot about people. It tell us where they move in space & in time.
All of this data gives us a ton of insight into how people move around in the world. It’s a very unique data and as you have seen, we can build rich & interesting features on top of it. I believe so far we have just scratched the surface of what we can do with this.