Scaling 40x on the ObjectRocket MongoDB Platform
Jon Hyman & Kenny Gorman
MongoDB World, June 25, 2014
NYC
@appboy @object...
A LITTLE BIT ABOUT
JON & APPBOY
Jon Hyman
CIO :: @jon_hyman
!
Appboy is a marketing
automation platform for apps
Harvard
B...
A LITTLE BIT ABOUT
KENNY &
OBJECTROCKET
Kenny Gorman
Co-Founder & Chief
Architect ::
@kennygorman
!
ObjectRocket is a high...
Agenda
• Evolution of Appboy’s MongoDB
installation as we grew to handle
billions of data points per month
!
• Operational...
MongoDB Evolution:
March, 2013
Mar May July Sept Nov Jan
Apr Jun Aug Oct Dec Feb
Mar
What did Appboy look like in March, 2013?
•~2.5 million events per day tracking 8 million users
• Event storage: every dat...
What did Appboy look like in March, 2013?
•~2.5 million events per day tracking 8 million users
• Event storage: every dat...
MongoDB Evolution:
April, 2013
Mar May July Sept Nov Jan
Apr Jun Aug Oct Dec Feb
Mar
Scaled 	

vertically
What happened in April, 2013?
• First enterprise client signs
• More than 50 million users
• They estimated sending us ove...
What happened in April, 2013?
• First enterprise client signs
• More than 50 million users
• They estimated sending us ove...
MongoDB Evolution:
April, 2013: holy crap!
ObjectRocket: Getting Started
• The landscape of a simple configuration
• It’s all about choosing shard keys
• Locks - you...
What are we going to do?
• Contain growth from data points:
• Shifted to Amazon Redshift for “raw data”
• Moved MongoDB to...
Shard key selections
• Users
• Had multiple ways to identify a user
• Device identifier, “external user id”, BSON ID
• Oft...
Shard key selections
• Users
• Had multiple ways to identify a user
• Device identifier, “external user id”, BSON ID
• Oft...
Shard key selections
• Pre-aggregated analytics
• Always query history for a single app
• 1 document per day per app per m...
MongoDB Evolution:
May - October, 2013
Mar May July Sept Nov Jan
Apr Jun Aug Oct Dec Feb
Mar
Scaled 	

vertically
Start sh...
What did Appboy look like in May - October, 2013?
• textPlus goes live, as do other customers
• > 1 billion events per mon...
MongoDB Evolution:
November, 2013
Mar May July Sept Nov Jan
Apr Jun Aug Oct Dec Feb
Mar
Scaled 	

vertically
Start shardin...
What happened in November, 2013?
• One of the largest European soccer apps
What happened in November, 2013?
• One of the largest European soccer apps
• Soccer games crushed us: 15 million data poin...
What happened in November, 2013?
• One of the largest European soccer apps
• Soccer games crushed us: 15 million data poin...
Shard key selections
• Pre-aggregated analytics
• Always query history for a single app
• 1 document per day per app per m...
Shard key selections
• Pre-aggregated analytics
• Always query history for a single app
• 1 document per day per app per m...
ObjectRocket: Capacity, Growth
• Concurrency
• Did I mention locks?
• Cache management
• Compaction
• The shell game
• Ind...
How to fix this?
• Fundamentally, all updates are going to a single document
• Can’t shard out a single document
• Asked O...
How to fix this?
• Fundamentally, all updates are going to a single document
• Can’t shard out a single document
• Asked O...
Write buffering
• Buffer writes to something that can be sharded out, then
flush to MongoDB
• Need something transactional...
Write buffering
Incoming data Flush to MongoDB
Write buffering
• Wrote write buffering over a weekend to buffer writes to
MongoDB every 3 seconds
!
Pre-aggregated analyt...
MongoDB Evolution:
January, 2014
Mar May July Sept Nov Jan
Apr Jun Aug Oct Dec Feb
Mar
Scaled 	

vertically
Start sharding...
What did Appboy look like in January, 2014?
• > 3 billion events per month
• 4, 100GB shards on ObjectRocket
• Performance...
Why was performance getting worse?
• Appboy customers send millions of messages in a single campaign,
most are sending hun...
Why was performance getting worse?
• Appboy customers send millions of messages in a single campaign,
most are sending hun...
ObjectRocket: Splits
• Split out collections to different MongoDB clusters
AfterBefore
What did Appboy look like in February, 2014?
• Splits helped
• > 4 billion events per month
• We needed more
What did Appboy look like in February, 2014?
• Splits helped
• > 4 billion events per month
• We needed more





Isolation
ObjectRocket: Isolation
• Isolate large enterprise customers on their own MongoDB
databases/clusters
• Appboy built this i...
Mar May July Sept Nov Jan
Apr Jun Aug Oct Dec Feb
Mar
Scaled 	

vertically
Start sharding
Everything 	

sharded
Various cu...
What’s next?
• Figure out capacity planning
• Continue down isolation path
0
15000000
30000000
45000000
60000000
Thanks!
jon@appboy.com
!
kgorman@objectrocket.com
@appboy @objectrocket @jon_hyman @kennygorman
Upcoming SlideShare
Loading in …5
×

How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

4,151 views

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,151
On SlideShare
0
From Embeds
0
Number of Embeds
3,339
Actions
Shares
0
Downloads
30
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRocket MongoDB Platform

  1. 1. Scaling 40x on the ObjectRocket MongoDB Platform Jon Hyman & Kenny Gorman MongoDB World, June 25, 2014 NYC @appboy @objectrocket @jon_hyman @kennygorman
  2. 2. A LITTLE BIT ABOUT JON & APPBOY Jon Hyman CIO :: @jon_hyman ! Appboy is a marketing automation platform for apps Harvard Bridgewater
  3. 3. A LITTLE BIT ABOUT KENNY & OBJECTROCKET Kenny Gorman Co-Founder & Chief Architect :: @kennygorman ! ObjectRocket is a highly available, sharded, unbelievably fast MongoDB as a service ObjectRocket eBay Shutterfly
  4. 4. Agenda • Evolution of Appboy’s MongoDB installation as we grew to handle billions of data points per month ! • Operational MongoDB issues we worked through
  5. 5. MongoDB Evolution: March, 2013 Mar May July Sept Nov Jan Apr Jun Aug Oct Dec Feb Mar
  6. 6. What did Appboy look like in March, 2013? •~2.5 million events per day tracking 8 million users • Event storage: every data point as a new document • Single, unsharded replica set on AWS (m2.xlarge) • Mostly long-tail customers; biggest app had 2M users
  7. 7. What did Appboy look like in March, 2013? •~2.5 million events per day tracking 8 million users • Event storage: every data point as a new document • Single, unsharded replica set on AWS (m2.xlarge) • Mostly long-tail customers; biggest app had 2M users ! Growing a lot on disk. :-( ! Started running into locking issues (30-40%). :-(
  8. 8. MongoDB Evolution: April, 2013 Mar May July Sept Nov Jan Apr Jun Aug Oct Dec Feb Mar Scaled vertically
  9. 9. What happened in April, 2013? • First enterprise client signs • More than 50 million users • They estimated sending us over 1 billion data points per month
  10. 10. What happened in April, 2013? • First enterprise client signs • More than 50 million users • They estimated sending us over 1 billion data points per month ! “Btw, we’re going live next month”
  11. 11. MongoDB Evolution: April, 2013: holy crap!
  12. 12. ObjectRocket: Getting Started • The landscape of a simple configuration • It’s all about choosing shard keys • Locks - you know you love them 20% 80%
  13. 13. What are we going to do? • Contain growth from data points: • Shifted to Amazon Redshift for “raw data” • Moved MongoDB to storing pre-aggregated analytics for time series data
 • Figure out sharding ASAP • Moved to ObjectRocket, worked on shard key selection • Sharding was hard: • Tough to figure out the right shard key, make tradeoffs • Rewrite a lot of application code to include shard keys in queries, inserts, adjust to life without unique indexes
  14. 14. Shard key selections • Users • Had multiple ways to identify a user • Device identifier, “external user id”, BSON ID • Often performed large scans of user bases
  15. 15. Shard key selections • Users • Had multiple ways to identify a user • Device identifier, “external user id”, BSON ID • Often performed large scans of user bases ! {_id: “hashed”} ! • Cache secondary identifiers to BSON ID to reduce scatter- gather queries • Doing scatter gathers goes against conventional wisdom
  16. 16. Shard key selections • Pre-aggregated analytics • Always query history for a single app • 1 document per day per app per metric ! {app_id: 1}
  17. 17. MongoDB Evolution: May - October, 2013 Mar May July Sept Nov Jan Apr Jun Aug Oct Dec Feb Mar Scaled vertically Start sharding Everything sharded
  18. 18. What did Appboy look like in May - October, 2013? • textPlus goes live, as do other customers • > 1 billion events per month, doing great! • 4, 100GB shards on ObjectRocket
  19. 19. MongoDB Evolution: November, 2013 Mar May July Sept Nov Jan Apr Jun Aug Oct Dec Feb Mar Scaled vertically Start sharding Everything sharded Various customer launches
  20. 20. What happened in November, 2013? • One of the largest European soccer apps
  21. 21. What happened in November, 2013? • One of the largest European soccer apps • Soccer games crushed us: 15 million data points per hour just from this app! • Lock percentage ran high, a single shard was pegged • Real-time analytics processing got severely delayed, adding more servers did not help (in fact, it made things worse)
  22. 22. What happened in November, 2013? • One of the largest European soccer apps • Soccer games crushed us: 15 million data points per hour just from this app! • Lock percentage ran high, a single shard was pegged • Real-time analytics processing got severely delayed, adding more servers did not help (in fact, it made things worse) Why a single shard?
  23. 23. Shard key selections • Pre-aggregated analytics • Always query history for a single app • 1 document per day per app per metric ! {app_id: 1}
  24. 24. Shard key selections • Pre-aggregated analytics • Always query history for a single app • 1 document per day per app per metric ! {app_id: 1}
  25. 25. ObjectRocket: Capacity, Growth • Concurrency • Did I mention locks? • Cache management • Compaction • The shell game • Indexing at scale
  26. 26. How to fix this? • Fundamentally, all updates are going to a single document • Can’t shard out a single document • Asked ObjectRocket for their suggestions
  27. 27. How to fix this? • Fundamentally, all updates are going to a single document • Can’t shard out a single document • Asked ObjectRocket for their suggestions ! Introduce write buffering
  28. 28. Write buffering • Buffer writes to something that can be sharded out, then flush to MongoDB • Need something transactional, so MongoDB was out for this • Decided on multiple Redis instances: • Redis has native hash data structure with atomic hash increments, works nicely with MongoDB in this use-case
  29. 29. Write buffering Incoming data Flush to MongoDB
  30. 30. Write buffering • Wrote write buffering over a weekend to buffer writes to MongoDB every 3 seconds ! Pre-aggregated analytics bottleneck was solved!
  31. 31. MongoDB Evolution: January, 2014 Mar May July Sept Nov Jan Apr Jun Aug Oct Dec Feb Mar Scaled vertically Start sharding Everything sharded Various customer launches Bad shard key hit upper limit Added write buffering
  32. 32. What did Appboy look like in January, 2014? • > 3 billion events per month • 4, 100GB shards on ObjectRocket • Performance started to have really bad bursty behavior: sometimes user experience would slow down to what we thought was unacceptable for our customers
  33. 33. Why was performance getting worse? • Appboy customers send millions of messages in a single campaign, most are sending hundreds of thousands to millions of messages each week • Campaign times tend to cluster together across all Appboy customers: evenings, Saturday/Sunday afternoons, etc.
 A lot of enormous read activity
  34. 34. Why was performance getting worse? • Appboy customers send millions of messages in a single campaign, most are sending hundreds of thousands to millions of messages each week • Campaign times tend to cluster together across all Appboy customers: evenings, Saturday/Sunday afternoons, etc.
 A lot of enormous read activity Reads and writes and more reads start conflicting :-( ! • Users visiting our dashboard during simultaneous large campaign sends would have sporadic poor performance
  35. 35. ObjectRocket: Splits • Split out collections to different MongoDB clusters AfterBefore
  36. 36. What did Appboy look like in February, 2014? • Splits helped • > 4 billion events per month • We needed more
  37. 37. What did Appboy look like in February, 2014? • Splits helped • > 4 billion events per month • We needed more
 
 
 Isolation
  38. 38. ObjectRocket: Isolation • Isolate large enterprise customers on their own MongoDB databases/clusters • Appboy built this in March, 2014 Enterprise customer Long-tail customer
  39. 39. Mar May July Sept Nov Jan Apr Jun Aug Oct Dec Feb Mar Scaled vertically Start sharding Everything sharded Various customer launches Bad shard key hit upper limit Added write buffering Start splitting DBs Isolation Summary
  40. 40. What’s next? • Figure out capacity planning • Continue down isolation path 0 15000000 30000000 45000000 60000000
  41. 41. Thanks! jon@appboy.com ! kgorman@objectrocket.com @appboy @objectrocket @jon_hyman @kennygorman

×