1. Machine Learning and
Big Data at Foursquare
Blake Shaw, PhD
Data Scientist @ Foursquare
@metablake
2.
3. What is foursquare?
An app that helps you
explore your city and
connect with friends
A platform for location
based services and
4. What is foursquare?
People use foursquare to:
• check in to places
• discover new places
• share w/ friends
• get tips about places
• get deals
• earn points and badges
10. Learning with location data
• Check-ins are a rich source of data that
describe human behavior
• We apply machine learning algorithms to
the collective movement patterns of millions
of people to build exciting new services
11. Recommendation engine
• foursquare explore
provides realtime
recommendations using:
• location
• time of day
• check-in history
• friends preferences
15. Open questions
• How to measure similarity between people
and places?
• How to determine influence in large
networks of people and places?
• What statistics can we use to describe
people’s behavior in the real-world?
• How do we predict what information will be
16. Our data stack
• MongoDB
• Amazon S3, Elastic Mapreduce
• Hadoop
• Hive
• Flume
• R and Matlab
17. Join us!
foursquare is hiring!
85+ people and growing
foursquare.com/jobs
Blake Shaw
@metablake
blake@foursquare.com
Editor's Notes
\n
At foursquare, we think there is a great opportunity to leverage massive amounts of location data to help people better understand and connect to places\n
So, what is foursquare? It’s an app that help you explore your city and connect with friends.\n\nIt’s also a platform for people to build location based services and collect and share location data\n
People on foursquare “check-in” on their phones when they get to a place, to find out more about it, share that they are there with friends etc.\n
Foursquare is in a unique place, sitting at the intersection between mobile, social, and geo.\n\n\n
Foursquare is generating a ton of data, every second 35 people check-in to a location. \n\nThis data offers an unprecedented view into the behavior of millions of people worldwide, as they move around cities.\n\n\n
Here we see the growth of the service over the last two years since, it started in mid 2009\n
\n
Foursquare now has data on over 25 million places all over the world\n
Check-ins are a rich source of information describing human behavior.\n\nWe apply machine learning algorithms to the collective movement patterns of millions of people to build exciting new services.\n\nWe use a variety of ML algorithms, collaborative filtering, pagerank, clustering, classification and regression\n\n\n
For example, Last year we launched foursquare explore. A recommendation engine that uses a variety of signals to recommend places in real time that a user might be interested in.\n\nExplore uses a variety of machine learning models to rank venues, we combine many signals, including:\n\nthe location of the user, and the time of day\nthe persons past check-in history,\nthe places their friends check-in\nthe similarities between different venues\n\n
Consider these signals about places. Each place has a different signature based on who is coming to the place, when, and for how long.\n\nThis plot shows 3 different places:\n\nGorilla Coffee, Gray’s Papaya, Amorino (a restaurant)\n\nSee how gorilla coffee is busy more in the morning, where amorino is busy in the evening.\n\nGray’s papaya clearly has a strong lunch crowd, but also a late night peak on the weekends.\n\nHow can we use machine learning to learn from these signals which places are similar?\n\n\n
We also have unique signals that describe people,\n\nWhich people are friends. Who is checking in together. Etc\n\nFrom checkins we can build a large colocation network that can be used to better understand how people interact with each other in the real world.\n\nHere we see an example of graph embedding to the foursquare employee network. People are placed near each other in 2D if they often colocate at similar places.\n\n\n\n
Different parts of this map line up to the different places in the world where foursquare employees live.\n\nThis plot was made by applying minimum volume embedding, a non-linear graph based dimensionality reduction algorithm, to the foursquare employee network.\n\nEach person on this map can be described by thousands of numbers, showing how often the visit different places. The goals is to reduce the dimensionality of this space to 2D while preserving the strong pairwise relationships.\n\n\n\n
We are constantly considering the best ways to address many of these questions\n\n
All of this is possible because of our world-class data stack. Amazon S3 and EC2 allow us access to on-demand access to huge computational resources\n\n\n
Thanks so much.\n\nFoursquare is hiring, if these projects seem interesting to you, please contact us at foursquare.com/jobs\n
afinn sentiment analysis word list\n
Friend graph for marriage equalitiocalypse, who are friends who checked in at this event\n