STP201 Efficiency at Scale - AWS re: Invent 2012

•

2 likes•1,770 views

In May of 2012, Socialcam exploded, gaining tens of millions of new users in just a few weeks. At the time, the service ran on 15 servers in a co-location facility in San Francisco. To meet new user traffic demands and continue to deliver maximum user satisfaction, Socialcam made the move to cloud services. With only two engineers and a constant barrage of users, there was limited time for technical transition, but Socialcam endured with no significant downtime. In this technical session, Socialcam co-founders Guillaume Luccisano and Ammon Bartram talk about their experience scaling Socialcam. They present the challenges they encountered, how they addressed them, and the technologies they used in the process. They focus particularly on how they used Amazon services in conjunction with their own hardware to keep Socialcam active with no significant downtime and no costly system redesign.

Ammon Bartram

• Wrote the iOs Socialcam app
• Previously lead video engineer at
Justin.tv
• Used to own a cow named Snork
Maiden

Guillaume Luccisano

• Built the Socialcam backend
• Previously worked on Rails
performance at Justin.tv
• French dude exploring the valley

We grew explosively
... and survived

Here is how

We’d just finished
Y-Combinator, and were
growing quickly

Simple rails stack, 15
servers in San Francisco,
Postgres/Mongo dbs

30k new users an hour and growing...

Things start to break

Add boxes in the colo?

We can’t. Little space in the
colo. Servers take 2 weeks
to arrive

Add EC2 instances?

How can servers on AWS talk to our
database?

SSH tunnels!
Then VPN!

Are video comments
essential?

No

OK, kill them!

Is non US traffic essential?

No

OK, kill it

Barebone deployment...
Push code live to production
boxes

(And no more test suite :p)

It’s 9am, time to sleep...

Got back to work the
same day at 11am...

Site is up, but not
healthy!

Add instances

Amazon has resource limits. We hit
them. Emergency late-night calls.
Thanks Tim!

We run our main db in
our own colo - a big SSD
box

We order faster drives
and more memory

(on amazon.com :p)

And take down the site to
upgrade

It’s bad, but sometimes, it’s ok!

Purpose of caching:
Minimize backend requests per user
request

Common caching
techniques:

Static full-page caching
Server side includes
Ajax loading
Fragment caching

Couldn’t do static :/

API and website were too dynamic

We used a hybrid
solution

Using a combination of them all
depending on what makes sense

API list requests are cache
with a templating system
(ala Server Side Includes)

We use heavily web
fragment caching on the
website

Video lists are loaded
through ajax on the website

Object caching

Reduce database hits per backend
request

We ❤ Memcache
Super Fast in memory Key/Value
store

Optimizing memcache ;)
Our custom tricks!

Use get multi
Video.get_by_ids([1, 2, 3, 4, 5])

=
One memcache call

Compress and serialize!

Cache = Rails.cache.instance_variable_get("@data")
SnappyCache = Memcached::SnappyCache.new(Cache)
SnappyYajlCache = Memcached::SnappyYajlCache.new(Cache)
SnappyMarshalCache = Memcached::SnappyMarshalCache.new(Cache)

Redis is like Memcache
But support complex structures:
Lists, sets, sorted sets, etc..

Initially, we stored videos in
Postres, and computed
feeds per request

Then we de-normalized it in
Mongo...

We switched to a canonical
normalized list of videos
stored in Postgresql...

Coupled with a de-
normalized video feed for
active users stored in Redis

The mongo problem was
fixed, but all was still not
well

We had1Billion rows
stored in Postgres
(and there was only 25gb left on our main db)

Joins are evil

Replaced them by
application logic

Then you can move
tables to their own DBs

(Don’t tell anyone: It’s ok to
duplicate some data to kill joins)

It was not
enough!
Followers table was too big for one server :/

Sharding is cutting a
table into multiple pieces,
each on its own server

Application logic routes
queries to the right shard

based on some field (id, name,
country...)

With dynamic migration:

class Relation
def shard(user_id)
if Redis.sismember("relation_shard", user_id)
# return new correct sharded table
else
# return old relation table
end
end
end

By... making heavy use
of AWS and open source
Postgres is awesome.
Redis is awesome.
Haproxy is awesome.

By not fearing hacks and
not being afraid of
breaking things

By turning off expensive,
non-vital features

(or entire countries; sorry Brazil!)

STP201 Efficiency at Scale - AWS re: Invent 2012

What's hot

How to become a practical Vim userKana Natsuno

Functional Programming with Streams in node.jsAdam Crabtree

Growing MongoDB on AWScolinthehowe

A rough guide to JavaScript Performanceallmarkedup

Os Fitzpatrick Sussman Swposcon2007

Os Alrubaieoscon2007

End to-End CoffeeScriptTrevorBurnham

node.js workshop- JavaScript AsyncQiong Wu

Distributed Queue System using GearmanEric Cho

AWS Lambdas are cool - Cheminfo Stories Day 1ChemAxon

AWS Customer Presentation - SmugmugAmazon Web Services

Gearman - Job QueueDiego Lewin

Introduction to performance tuning perl web applicationsPerrin Harkins

Word press workflows and gulpEli McMakin

MySQL::Replication (Melbourne Perl Mongers 2011-07)Alfie John

Docker in prodAriel Moskovich

TsWorkflowYongwu Choi

DB Replication With Railsschoefmax

Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.comIlya Grigorik

appborg, coffeesurgeon, moof, logging-systemendian7000

What's hot (20)

How to become a practical Vim user

Functional Programming with Streams in node.js

Growing MongoDB on AWS

A rough guide to JavaScript Performance

Os Fitzpatrick Sussman Swp

Os Alrubaie

End to-End CoffeeScript

node.js workshop- JavaScript Async

Distributed Queue System using Gearman

AWS Lambdas are cool - Cheminfo Stories Day 1

AWS Customer Presentation - Smugmug

Gearman - Job Queue

Introduction to performance tuning perl web applications

Word press workflows and gulp

MySQL::Replication (Melbourne Perl Mongers 2011-07)

Docker in prod

TsWorkflow

DB Replication With Rails

Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.com

appborg, coffeesurgeon, moof, logging-system

Similar to STP201 Efficiency at Scale - AWS re: Invent 2012

Why Wordnik went non-relationalTony Tam

MongoDB and AWS Best PracticesMongoDB

Web20expo Scalable Web Archroyans

Web20expo Scalable Web Archguest18a0f1

Web20expo Scalable Web Archmclee

Cache all the things #DCLondondigital006

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCCal Henderson

Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services

Production Debugging War StoriesIdo Flatow

Scalable Web Archroyans

Scalable Web Architectures - Common Patterns & ApproachesCal Henderson

Making it fast: Zotonic & PerformanceArjan

How it's made - MyGet (CloudBurst)Maarten Balliauw

2019 PHP Serbia - Boosting your performance with BlackfireMarko Mitranić

CMF: a pain in the F @ PHPDay 05-14-2011Alessandro Nadalin

89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramMohit Jain

Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny

12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul

89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramferreroroche11

Scaling Instagramiammutex

Similar to STP201 Efficiency at Scale - AWS re: Invent 2012 (20)

Why Wordnik went non-relational

MongoDB and AWS Best Practices

Web20expo Scalable Web Arch

Cache all the things #DCLondon

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC

Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...

Production Debugging War Stories

Scalable Web Arch

Scalable Web Architectures - Common Patterns & Approaches

Making it fast: Zotonic & Performance

How it's made - MyGet (CloudBurst)

2019 PHP Serbia - Boosting your performance with Blackfire

CMF: a pain in the F @ PHPDay 05-14-2011

89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram

Lessons Learned Migrating 2+ Billion Documents at Craigslist

12-Step Program for Scaling Web Applications on PostgreSQL

89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram

Scaling Instagram

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services

Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services

Costruire Applicazioni Moderne con AWSAmazon Web Services

Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services

Open banking as a serviceAmazon Web Services

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services

OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services

Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services

Computer Vision con AWSAmazon Web Services

Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services

Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services

API moderne real-time per applicazioni mobili e webAmazon Web Services

Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services

Tools for building your MVP on AWSAmazon Web Services

How to Build a Winning Pitch DeckAmazon Web Services

Building a web application without serversAmazon Web Services

Fundraising EssentialsAmazon Web Services

AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services

Introduzione a Amazon Elastic Container ServiceAmazon Web Services

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...

Esegui pod serverless con Amazon EKS e AWS Fargate

Costruire Applicazioni Moderne con AWS

Come spendere fino al 90% in meno con i container e le istanze spot

Open banking as a service

Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...

OpsWorks Configuration Management: automatizza la gestione e i deployment del...

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads

Computer Vision con AWS

Database Oracle e VMware Cloud on AWS i miti da sfatare

Crea la tua prima serverless ledger-based app con QLDB e NodeJS

API moderne real-time per applicazioni mobili e web

Database Oracle e VMware Cloud™ on AWS: i miti da sfatare

Tools for building your MVP on AWS

How to Build a Winning Pitch Deck

Building a web application without servers

Fundraising Essentials

AWS_HK_StartupDay_Building Interactive websites while automating for efficien...

Introduzione a Amazon Elastic Container Service

STP201 Efficiency at Scale - AWS re: Invent 2012

2. Ammon Bartram • Wrote the iOs Socialcam app • Previously lead video engineer at Justin.tv • Used to own a cow named Snork Maiden

3. Guillaume Luccisano • Built the Socialcam backend • Previously worked on Rails performance at Justin.tv • French dude exploring the valley

4. We grew explosively ... and survived Here is how

5. We’d just finished Y-Combinator, and were growing quickly

6. Simple rails stack, 15 servers in San Francisco, Postgres/Mongo dbs

7. Everything was working

8. Then...

9. BOOM! True viral growth

10. 30k new users an hour and growing... Things start to break

11. Not enough CPU!

12. Day 1

13. Add boxes in the colo? We can’t. Little space in the colo. Servers take 2 weeks to arrive

14. Add EC2 instances? How can servers on AWS talk to our database? SSH tunnels! Then VPN!

15. Are video comments essential? No OK, kill them!

16. Is non US traffic essential? No OK, kill it

17. Yeah Baby! Kill switches are born!

18. Barebone deployment... Push code live to production boxes (And no more test suite :p)

19. It’s 9am, time to sleep... Got back to work the same day at 11am... Site is up, but not healthy!

20. Day 2

21. Move things to workers!

22. Add more monitoring

23. Add instances Amazon has resource limits. We hit them. Emergency late-night calls. Thanks Tim!

24. DB is breaking Too many reads!

25. Buy more time!

26. We run our main db in our own colo - a big SSD box

27. We order faster drives and more memory (on amazon.com :p)

28. And take down the site to upgrade It’s bad, but sometimes, it’s ok!

29. Caching!

30. Purpose of caching: Minimize backend requests per user request

31. Common caching techniques: Static full-page caching Server side includes Ajax loading Fragment caching

32. Couldn’t do static :/ API and website were too dynamic

33. We used a hybrid solution Using a combination of them all depending on what makes sense

34. API list requests are cache with a templating system (ala Server Side Includes)

35. We use heavily web fragment caching on the website

36. Video lists are loaded through ajax on the website

37. Object caching Reduce database hits per backend request

38. We ❤ Memcache Super Fast in memory Key/Value store

39. Optimizing memcache ;) Our custom tricks!

40. Avoid keys expiration!

41. Use get multi Video.get_by_ids([1, 2, 3, 4, 5]) = One memcache call

42. Compress and serialize! Cache = Rails.cache.instance_variable_get("@data") SnappyCache = Memcached::SnappyCache.new(Cache) SnappyYajlCache = Memcached::SnappyYajlCache.new(Cache) SnappyMarshalCache = Memcached::SnappyMarshalCache.new(Cache)

43. DB is breaking again Too many writes!

44. (And Mongo doesn’t scale)

45. For us...

46. to the rescue

47. Redis is like Memcache But support complex structures: Lists, sets, sorted sets, etc..

48. Initially, we stored videos in Postres, and computed feeds per request Then we de-normalized it in Mongo...

49. This broke.

50. We switched to a canonical normalized list of videos stored in Postgresql...

51. Coupled with a de- normalized video feed for active users stored in Redis

52. The mongo problem was fixed, but all was still not well

53. We had1Billion rows stored in Postgres (and there was only 25gb left on our main db)

54. Postgres could not take it

55. Joins are evil Replaced them by application logic

56. Then you can move tables to their own DBs

57. (Don’t tell anyone: It’s ok to duplicate some data to kill joins)

58. It was not enough! Followers table was too big for one server :/

59. Time to shard

60. Sharding is cutting a table into multiple pieces, each on its own server

61. Application logic routes queries to the right shard based on some field (id, name, country...)

62. You can make that simple or complex

63. We decided to go simple :) user_id % N

64. Relation.shard(user_id).count

65. With dynamic migration: class Relation def shard(user_id) if Redis.sismember("relation_shard", user_id) # return new correct sharded table else # return old relation table end end end

66. To conclude We survived!

67. By... making heavy use of AWS and open source Postgres is awesome. Redis is awesome. Haproxy is awesome.

68. By not fearing hacks and not being afraid of breaking things

69. By turning off expensive, non-vital features (or entire countries; sorry Brazil!)

70. By calling in help from friends

71. And most importantly...

72. By only fixing things that were broken

STP201 Efficiency at Scale - AWS re: Invent 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to STP201 Efficiency at Scale - AWS re: Invent 2012

Similar to STP201 Efficiency at Scale - AWS re: Invent 2012 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

STP201 Efficiency at Scale - AWS re: Invent 2012