Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From check-ins to recommendations 
Jon Hoffman @hoffrocket 
QCon NYC – June 11, 2014
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/scale-foursquare 
InfoQ.co...
Presented at QCon New York 
www.qconnewyork.com 
Purpose of QCon 
- to empower software development by facilitating the sp...
About Foursquare
Scaling in two parts 
• Part one: data storage 
• Part two: application complexity
Part 1: Data Storage 
2009
Table splits 
DB.A 
DB Checkins 
Venues 
Checkins 
Users 
Friends 
Venues 
DB.B 
Users 
Friends
Replication 
Master 
RW 
Slave 
RO 
Slave 
RO
Outgrowing our hardware 
• Not enough RAM for indexes and 
working data set 
• 100 writes/second/disk
Sharding 
• Manage ourselves in application code on 
top of postgres? 
• Use something called Cassandra? 
• Use something ...
Besides Mongo 
• Memcache 
• Elastic search 
– nearby venue search 
– user search 
• Custom data services 
– Read only key...
HFile Service: Read only KV Store 
Hadoop HFile Servers 
MR HDFS 
hfile_0_a 
hfile_0_b 
hfile_1_b 
hfile_0 
hfile_1 
Appli...
Caching Services 
Mongo Oplog Tailer Kafka 
Kafka 
Consumers 
Redis 
Cache 
Servers 
Application 
Servers 
getUserVenueCou...
Part 2: application complexity 
2009
RPC Tracing
Throttles
Remember the goats?
Monolithic problems 
• Compiling all the code, all the time 
• Deploying all the code all the time 
• Hard to isolate caus...
SOA Infancy 
• Single codebase, Multiple builds 
Web 
API 
Offline
Finagle Era 
• Twitter’s scala based RPC library 
service 
Geocoder 
{ 
GeocodeResponse 
geocode( 
1: 
GeocodeRequest 
r 
...
Benefits 
• Independent compile targets 
• Fined grained control on releases and 
bug fixes 
• Functional isolation
Problems 
• Duplication in packaging and 
deployment efforts 
• Hard to trace execution problems 
• Hard to define/change ...
Builds and deploys 
• single service definition file 
• consistent build packaging 
• simple deployment of canary & fleet ...
Monitoring 
• healthcheck endpoint over http 
• consistent metric names 
• dashboard for every service
Distributed Tracing
Exception Aggregation
Application Discovery 
• Finagle Server Sets + ZK
Circuit Breaking 
• Fast failing RPC calls after some error 
rate threshold 
• Loosely based on Netflix’s hystrix
SOA Problem Recap 
• Duplication in packaging and deployment efforts 
– Build and deploy automation 
• Hard to trace execu...
Organization 
• Smaller teams owning front to back 
implementation of features 
• Desire to have quick deploy cycles on 
n...
Remote Endpoints 
Wouldn’t it be cool if a developer 
could expose a new API endpoint 
without redeploying our still 
mono...
Remote Endpoint Benefits 
• Very easy to experiment with new 
endpoints 
• Tight contract for service interaction 
– JSON ...
Future work: Part 3? 
• Further isolating services with 
independent storage layers? 
• Completely automated continuous 
d...
Thanks! 
• Want to build these things? 
https://foursquare.com/jobs 
• jon@foursquare.com
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/scale-foursquare
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Scaling Foursquare: From Check-ins to Recommendations
Upcoming SlideShare
Loading in …5
×

Scaling Foursquare: From Check-ins to Recommendations

1,722 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1FQhXpx.

Jon Hoffman discusses the general architecture, storage systems and development practices created to handle the ever increasing volume and complexity at Foursquare. Filmed at qconnewyork.com.

Jon Hoffman has been a software engineer at Foursquare for over 4 years. He's led teams building features and backend infrastructure. Before Foursquare Jon spent a brief time working at a three person startup, built distributed systems at Goldman Sachs, worked on VoIP apps for the telephone company, created a medical record app for palm pilot, and graduated from Carnegie Mellon.

Published in: Technology
  • Be the first to comment

Scaling Foursquare: From Check-ins to Recommendations

  1. 1. From check-ins to recommendations Jon Hoffman @hoffrocket QCon NYC – June 11, 2014
  2. 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /scale-foursquare InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. About Foursquare
  5. 5. Scaling in two parts • Part one: data storage • Part two: application complexity
  6. 6. Part 1: Data Storage 2009
  7. 7. Table splits DB.A DB Checkins Venues Checkins Users Friends Venues DB.B Users Friends
  8. 8. Replication Master RW Slave RO Slave RO
  9. 9. Outgrowing our hardware • Not enough RAM for indexes and working data set • 100 writes/second/disk
  10. 10. Sharding • Manage ourselves in application code on top of postgres? • Use something called Cassandra? • Use something called HBase? • Use something called Mongo?
  11. 11. Besides Mongo • Memcache • Elastic search – nearby venue search – user search • Custom data services – Read only key value server – in memory cache with business logic
  12. 12. HFile Service: Read only KV Store Hadoop HFile Servers MR HDFS hfile_0_a hfile_0_b hfile_1_b hfile_0 hfile_1 Application Servers Zookeeper: - data type to machine mapping - key hash to shard mapping hfile_1_a
  13. 13. Caching Services Mongo Oplog Tailer Kafka Kafka Consumers Redis Cache Servers Application Servers getUserVenueCounts( 1: list<i64> userIds 2: list<ObjectId> venues)
  14. 14. Part 2: application complexity 2009
  15. 15. RPC Tracing
  16. 16. Throttles
  17. 17. Remember the goats?
  18. 18. Monolithic problems • Compiling all the code, all the time • Deploying all the code all the time • Hard to isolate cause of performance regressions and resource leaks
  19. 19. SOA Infancy • Single codebase, Multiple builds Web API Offline
  20. 20. Finagle Era • Twitter’s scala based RPC library service Geocoder { GeocodeResponse geocode( 1: GeocodeRequest r ) }
  21. 21. Benefits • Independent compile targets • Fined grained control on releases and bug fixes • Functional isolation
  22. 22. Problems • Duplication in packaging and deployment efforts • Hard to trace execution problems • Hard to define/change where things live • Networks aren’t reliable
  23. 23. Builds and deploys • single service definition file • consistent build packaging • simple deployment of canary & fleet ./service_releaser –j service_name
  24. 24. Monitoring • healthcheck endpoint over http • consistent metric names • dashboard for every service
  25. 25. Distributed Tracing
  26. 26. Exception Aggregation
  27. 27. Application Discovery • Finagle Server Sets + ZK
  28. 28. Circuit Breaking • Fast failing RPC calls after some error rate threshold • Loosely based on Netflix’s hystrix
  29. 29. SOA Problem Recap • Duplication in packaging and deployment efforts – Build and deploy automation • Hard to trace execution problems – Monitoring consistency – Distributed Tracing – Error aggregation • Hard to define/change where things live – Application discovery with zookeeper • Networks aren’t reliable – Circuit breaking
  30. 30. Organization • Smaller teams owning front to back implementation of features • Desire to have quick deploy cycles on new API endpoints
  31. 31. Remote Endpoints Wouldn’t it be cool if a developer could expose a new API endpoint without redeploying our still monolithic API server?
  32. 32. Remote Endpoint Benefits • Very easy to experiment with new endpoints • Tight contract for service interaction – JSON responses – all http params passed along • Clear path to breaking off more chunks from API monolith
  33. 33. Future work: Part 3? • Further isolating services with independent storage layers? • Completely automated continuous deployment • Hybrid immutable/mutable data storage – mongo & hfile & cache service
  34. 34. Thanks! • Want to build these things? https://foursquare.com/jobs • jon@foursquare.com
  35. 35. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/scale-foursquare

×