From 100s to 100s ofMillionsJuly 2011Erik Onnen
About Me•   Director of Platform Engineering at Urban Airship (.75    years)•   Previously Principal Engineer at Jive Soft...
In this Talk•   About Urban Airship•   Systems Overview•   A Tale of Storage Engines•   Our Cassandra Deployment•   Battle...
What is an Urban Airship?•   Hosting for mobile services that developers should not    build themselves•   Unified API for...
By The Numbers
By The Numbers•   Over 160 million active application installs use our system    across over 80 million unique devices
By The Numbers•   Over 160 million active application installs use our system    across over 80 million unique devices•   ...
By The Numbers•   Over 160 million active application installs use our system    across over 80 million unique devices•   ...
By The Numbers•   Over 160 million active application installs use our system    across over 80 million unique devices•   ...
By The Numbers•   Over 160 million active application installs use our system    across over 80 million unique devices•   ...
By The Numbers•   Over 160 million active application installs use our system    across over 80 million unique devices•   ...
Transactional System
Transactional System•   Edge Systems:    •   API - Apache/Python/django+piston+pycassa    •   Device negotiation - Java NI...
Transactional System•   Edge Systems:    •   API - Apache/Python/django+piston+pycassa    •   Device negotiation - Java NI...
A Tale of Storage Engines
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)•   We do use:
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)•   We do use:    •   Cassandra
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)•   We do use:    •   Cassandra  ...
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)•   We do use:    •   Cassandra  ...
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)•   We do use:    •   Cassandra  ...
A Tale of Storage Engines•   “Is there a NoSQL system you guys don’t use?”    •   Riak :)•   We do use:    •   Cassandra  ...
A Tale of Storage Engines
A Tale of Storage Engines
A Tale of Storage Engines•   PostgreSQL
A Tale of Storage Engines•   PostgreSQL    •   Bootstrapped the company on PostgreSQL in EC2
A Tale of Storage Engines•   PostgreSQL    •   Bootstrapped the company on PostgreSQL in EC2    •   Highly relational, lar...
A Tale of Storage Engines•   PostgreSQL    •   Bootstrapped the company on PostgreSQL in EC2    •   Highly relational, lar...
A Tale of Storage Engines•   PostgreSQL    •   Bootstrapped the company on PostgreSQL in EC2    •   Highly relational, lar...
A Tale of Storage Engines•   PostgreSQL    •   Bootstrapped the company on PostgreSQL in EC2    •   Highly relational, lar...
A Tale of Storage Engines
A Tale of Storage Engines•   MongoDB
A Tale of Storage Engines•   MongoDB    •   Initially, we loved Mongo
A Tale of Storage Engines•   MongoDB    •   Initially, we loved Mongo        •   Document databases are cool
A Tale of Storage Engines•   MongoDB    •   Initially, we loved Mongo        •   Document databases are cool        •   BS...
A Tale of Storage Engines•   MongoDB    •   Initially, we loved Mongo        •   Document databases are cool        •   BS...
A Tale of Storage Engines•   MongoDB    •   Initially, we loved Mongo        •   Document databases are cool        •   BS...
A Tale of Storage Engines•   MongoDB    •   Initially, we loved Mongo        •   Document databases are cool        •   BS...
A Tale of Storage Engines
A Tale of Storage Engines•   MongoDB - Read/Write Problems
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines•   MongoDB - Read/Write Problems    •   Early days (1.2) one global lock (reads block        wri...
A Tale of Storage Engines
A Tale of Storage Engines•   MongoDB - Update Problems
A Tale of Storage Engines•   MongoDB - Update Problems    •   Simple updates (i.e. counters) were fine
A Tale of Storage Engines•   MongoDB - Update Problems    •   Simple updates (i.e. counters) were fine    •   Bigger updat...
A Tale of Storage Engines•   MongoDB - Update Problems    •   Simple updates (i.e. counters) were fine    •   Bigger updat...
A Tale of Storage Engines•   MongoDB - Update Problems    •   Simple updates (i.e. counters) were fine    •   Bigger updat...
A Tale of Storage Engines•   MongoDB - Update Problems    •   Simple updates (i.e. counters) were fine    •   Bigger updat...
A Tale of Storage Engines
A Tale of Storage Engines•   MongoDB - Optimization Problems
A Tale of Storage Engines•   MongoDB - Optimization Problems    •   Compacting a collection locks the entire collection
A Tale of Storage Engines•   MongoDB - Optimization Problems    •   Compacting a collection locks the entire collection   ...
A Tale of Storage Engines•   MongoDB - Optimization Problems    •   Compacting a collection locks the entire collection   ...
A Tale of Storage Engines•   MongoDB - Optimization Problems    •   Compacting a collection locks the entire collection   ...
A Tale of Storage Engines
A Tale of Storage Engines•   MongoDB - Ops Issues
A Tale of Storage Engines•   MongoDB - Ops Issues    •   Lots of good information in mongostat
A Tale of Storage Engines•   MongoDB - Ops Issues    •   Lots of good information in mongostat    •   Recovering a crashed...
A Tale of Storage Engines•   MongoDB - Ops Issues    •   Lots of good information in mongostat    •   Recovering a crashed...
A Tale of Storage Engines•   MongoDB - Ops Issues    •   Lots of good information in mongostat    •   Recovering a crashed...
Cassandra at Urban Airship
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra•   Lots of L&P...
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra•   Lots of L&P...
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra•   Lots of L&P...
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra•   Lots of L&P...
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra•   Lots of L&P...
Cassandra at Urban Airship•   Summer of 2010 - no faith left in MongoDB started a    migration to Cassandra•   Lots of L&P...
Cassandra at Urban Airship
Cassandra at Urban Airship•   Why Cassandra?
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)        •   Lots o...
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)        •   Lots o...
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)        •   Lots o...
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)        •   Lots o...
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)        •   Lots o...
Cassandra at Urban Airship•   Why Cassandra?    •   Well suited for most of our data model (simple DAGs)        •   Lots o...
Cassandra at Urban Airship
Cassandra at Urban Airship•   Why Cassandra cont’d?
Cassandra at Urban Airship•   Why Cassandra cont’d?    •   Particularly well suited to working around EC2 availability
Cassandra at Urban Airship•   Why Cassandra cont’d?    •   Particularly well suited to working around EC2 availability    ...
Cassandra at Urban Airship•   Why Cassandra cont’d?    •   Particularly well suited to working around EC2 availability    ...
Cassandra at Urban Airship•   Why Cassandra cont’d?    •   Particularly well suited to working around EC2 availability    ...
Cassandra at Urban Airship•   Why Cassandra cont’d?    •   Particularly well suited to working around EC2 availability    ...
Cassandra at Urban Airship•   Why Cassandra cont’d?    •   Particularly well suited to working around EC2 availability    ...
Battle Scars - Development
Battle Scars - Development•   Know your data model
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development•   Know your data model    •   Creating indexes after the fact is a PITA    •   Design around w...
Battle Scars - Development
Battle Scars - Development•   Assume failure in the client
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Development•   Assume failure in the client    •   Read timeout vs. connection refused    •   When maintain...
Battle Scars - Ops
Battle Scars - Ops•   Cassandra in EC2:
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled    •   Disk I/O
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled    •   Disk I/O        •   Avoid EBS excep...
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled    •   Disk I/O        •   Avoid EBS excep...
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled    •   Disk I/O        •   Avoid EBS excep...
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled    •   Disk I/O        •   Avoid EBS excep...
Battle Scars - Ops•   Cassandra in EC2:    •   Ensure Dynamic Snitch is enabled    •   Disk I/O        •   Avoid EBS excep...
Battle Scars - Ops
Battle Scars - Ops•   Java Best Practices:
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts        ...
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts        ...
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts        ...
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts        ...
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts        ...
Battle Scars - Ops•   Java Best Practices:    •   All Java services are managed via the same set of        scripts        ...
Battle Scars - Ops             ParNew GC Effectiveness                                                        Mean Time Pa...
Battle Scars - Ops
Battle Scars - Ops•   Java Best Practices cont’d:
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)        •   Unders...
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)        •   Unders...
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)        •   Unders...
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)        •   Unders...
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)        •   Unders...
Battle Scars - Ops•   Java Best Practices cont’d:    •   Get familiar with GC logs (-XX:+PrintGCDetails)        •   Unders...
Battle Scars - Ops
Battle Scars - Ops•   Understand when to compact
Battle Scars - Ops•   Understand when to compact•   Understand upgrade implications for datafiles
Battle Scars - Ops•   Understand when to compact•   Understand upgrade implications for datafiles•   Watch hinted handoff ...
Battle Scars - Ops•   Understand when to compact•   Understand upgrade implications for datafiles•   Watch hinted handoff ...
Looking Forward
Looking Forward•   Cassandra is a great hammer but not everything is a nail
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)• ...
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)• ...
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)• ...
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)• ...
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)• ...
Looking Forward•   Cassandra is a great hammer but not everything is a nail•   Coprocessors would be awesome (hint hint)• ...
Thanks to•   jbellis, driftx•   Datastax•   Whoever wrote TDA•   SAP
Thanks!•   Urban Airship: http://urbanairship.com/•   We’re hiring! http://urbanairship.com/company/jobs/•   Me @eonnen or...
From 100s to 100s of Millions
From 100s to 100s of Millions
Upcoming SlideShare
Loading in...5
×

From 100s to 100s of Millions

25,129

Published on

Slides from Cassandra SF 2011

http://www.datastax.com/events/cassandrasf2011

Published in: Technology, Business
17 Comments
31 Likes
Statistics
Notes
No Downloads
Views
Total Views
25,129
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
17
Likes
31
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • From 100s to 100s of Millions

    1. 1. From 100s to 100s ofMillionsJuly 2011Erik Onnen
    2. 2. About Me• Director of Platform Engineering at Urban Airship (.75 years)• Previously Principal Engineer at Jive Software (3 years)• 12 years large scale, distributed systems experience going back to CORBA• Cassandra, HBase, Kafka and ZooKeeper contributor - most recently CASSANDRA-2463
    3. 3. In this Talk• About Urban Airship• Systems Overview• A Tale of Storage Engines• Our Cassandra Deployment• Battle Scars • Development Lessons Learned • Operations Lessons Learned• Looking Forward
    4. 4. What is an Urban Airship?• Hosting for mobile services that developers should not build themselves• Unified API for services across platforms• SLAs for throughput, latency
    5. 5. By The Numbers
    6. 6. By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices
    7. 7. By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second
    8. 8. By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins
    9. 9. By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins • Transactions - send push, check status, get content
    10. 10. By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins • Transactions - send push, check status, get content• At any given point in time, we have ~ 1.1 million secure socket connections into our transactional core
    11. 11. By The Numbers• Over 160 million active application installs use our system across over 80 million unique devices• Freemium API peaks at 700 requests/second, dedicated customer API 10K requests/second • Over half of those are device check-ins • Transactions - send push, check status, get content• At any given point in time, we have ~ 1.1 million secure socket connections into our transactional core• 6 months for the company to deliver 1M messages, just broke 4.2B
    12. 12. Transactional System
    13. 13. Transactional System• Edge Systems: • API - Apache/Python/django+piston+pycassa • Device negotiation - Java NIO + Hector • Message Delivery - Python, Java NIO + Hector • Device data - Java HTTPS endpoint
    14. 14. Transactional System• Edge Systems: • API - Apache/Python/django+piston+pycassa • Device negotiation - Java NIO + Hector • Message Delivery - Python, Java NIO + Hector • Device data - Java HTTPS endpoint• Persistence • Sharded PostgreSQL • Cassandra 0.7 • MongoDB 1.7
    15. 15. A Tale of Storage Engines
    16. 16. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?”
    17. 17. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)
    18. 18. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)• We do use:
    19. 19. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)• We do use: • Cassandra
    20. 20. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)• We do use: • Cassandra • HBase
    21. 21. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)• We do use: • Cassandra • HBase • Redis
    22. 22. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)• We do use: • Cassandra • HBase • Redis • MongoDB
    23. 23. A Tale of Storage Engines• “Is there a NoSQL system you guys don’t use?” • Riak :)• We do use: • Cassandra • HBase • Redis • MongoDB• We’re converging on Cassandra + PostgreSQL for transactional and HBase for long haul
    24. 24. A Tale of Storage Engines
    25. 25. A Tale of Storage Engines
    26. 26. A Tale of Storage Engines• PostgreSQL
    27. 27. A Tale of Storage Engines• PostgreSQL • Bootstrapped the company on PostgreSQL in EC2
    28. 28. A Tale of Storage Engines• PostgreSQL • Bootstrapped the company on PostgreSQL in EC2 • Highly relational, large index model
    29. 29. A Tale of Storage Engines• PostgreSQL • Bootstrapped the company on PostgreSQL in EC2 • Highly relational, large index model • Layered in memcached
    30. 30. A Tale of Storage Engines• PostgreSQL • Bootstrapped the company on PostgreSQL in EC2 • Highly relational, large index model • Layered in memcached • Writes weren’t scaling after ~ 6 months
    31. 31. A Tale of Storage Engines• PostgreSQL • Bootstrapped the company on PostgreSQL in EC2 • Highly relational, large index model • Layered in memcached • Writes weren’t scaling after ~ 6 months • Continued to use for several silos of data but needed a way to grow more easily
    32. 32. A Tale of Storage Engines
    33. 33. A Tale of Storage Engines• MongoDB
    34. 34. A Tale of Storage Engines• MongoDB • Initially, we loved Mongo
    35. 35. A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool
    36. 36. A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice
    37. 37. A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice • As data set grew, we learned a lot about MongoDB
    38. 38. A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice • As data set grew, we learned a lot about MongoDB • “MongoDB does not wait for a response by default when writing to the database.”
    39. 39. A Tale of Storage Engines• MongoDB • Initially, we loved Mongo • Document databases are cool • BSON is nice • As data set grew, we learned a lot about MongoDB • “MongoDB does not wait for a response by default when writing to the database.”
    40. 40. A Tale of Storage Engines
    41. 41. A Tale of Storage Engines• MongoDB - Read/Write Problems
    42. 42. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa)
    43. 43. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server
    44. 44. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating
    45. 45. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating • Replication would fall too far behind and stop
    46. 46. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating • Replication would fall too far behind and stop • No writes or updates
    47. 47. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating • Replication would fall too far behind and stop • No writes or updates • Effectively a failure for most clients
    48. 48. A Tale of Storage Engines• MongoDB - Read/Write Problems • Early days (1.2) one global lock (reads block writes and vice versa) • Later, one read lock, one write lock per server • Long running queries were often devastating • Replication would fall too far behind and stop • No writes or updates • Effectively a failure for most clients • With replication, queries for anything other than the shard key talk to every node in the cluster
    49. 49. A Tale of Storage Engines
    50. 50. A Tale of Storage Engines• MongoDB - Update Problems
    51. 51. A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine
    52. 52. A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O
    53. 53. A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O • Frequently spill to end of the collection datafile leaving “holes” but not sparse files
    54. 54. A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O • Frequently spill to end of the collection datafile leaving “holes” but not sparse files • Those “holes” get MMap’d even though they’re not used
    55. 55. A Tale of Storage Engines• MongoDB - Update Problems • Simple updates (i.e. counters) were fine • Bigger updates commonly resulted in large scans of the collection depending on position == heavy disk I/O • Frequently spill to end of the collection datafile leaving “holes” but not sparse files • Those “holes” get MMap’d even though they’re not used • Updates moving data acquire multiple locks commonly blocking other read/write operations
    56. 56. A Tale of Storage Engines
    57. 57. A Tale of Storage Engines• MongoDB - Optimization Problems
    58. 58. A Tale of Storage Engines• MongoDB - Optimization Problems • Compacting a collection locks the entire collection
    59. 59. A Tale of Storage Engines• MongoDB - Optimization Problems • Compacting a collection locks the entire collection • Read slave was too busy to be a backup, needed moar RAMs but were already on High-Memory EC2, nowhere else to go
    60. 60. A Tale of Storage Engines• MongoDB - Optimization Problems • Compacting a collection locks the entire collection • Read slave was too busy to be a backup, needed moar RAMs but were already on High-Memory EC2, nowhere else to go • Mongo MMaps everything - when your data set is bigger than RAM, you better have fast disks
    61. 61. A Tale of Storage Engines• MongoDB - Optimization Problems • Compacting a collection locks the entire collection • Read slave was too busy to be a backup, needed moar RAMs but were already on High-Memory EC2, nowhere else to go • Mongo MMaps everything - when your data set is bigger than RAM, you better have fast disks • Until 1.8, no support for sparse indexes
    62. 62. A Tale of Storage Engines
    63. 63. A Tale of Storage Engines• MongoDB - Ops Issues
    64. 64. A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat
    65. 65. A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat • Recovering a crashed system was effectively impossible without disabling indexes first (not the default)
    66. 66. A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat • Recovering a crashed system was effectively impossible without disabling indexes first (not the default) • Replica sets never worked for us in testing, lots of inconsistencies in failure scenarios
    67. 67. A Tale of Storage Engines• MongoDB - Ops Issues • Lots of good information in mongostat • Recovering a crashed system was effectively impossible without disabling indexes first (not the default) • Replica sets never worked for us in testing, lots of inconsistencies in failure scenarios • Scattered records lead to lots of I/O that hurt on bad disks (EC2)
    68. 68. Cassandra at Urban Airship
    69. 69. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra
    70. 70. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra• Lots of L&P testing, client analysis, etc.
    71. 71. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra• Lots of L&P testing, client analysis, etc.• December 2010 - Cassandra backed 85% of our Android stack’s persistence
    72. 72. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra• Lots of L&P testing, client analysis, etc.• December 2010 - Cassandra backed 85% of our Android stack’s persistence • Six EC2 XLS with each serving:
    73. 73. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra• Lots of L&P testing, client analysis, etc.• December 2010 - Cassandra backed 85% of our Android stack’s persistence • Six EC2 XLS with each serving: • 30GB data
    74. 74. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra• Lots of L&P testing, client analysis, etc.• December 2010 - Cassandra backed 85% of our Android stack’s persistence • Six EC2 XLS with each serving: • 30GB data • ~1000 reads/second/node
    75. 75. Cassandra at Urban Airship• Summer of 2010 - no faith left in MongoDB started a migration to Cassandra• Lots of L&P testing, client analysis, etc.• December 2010 - Cassandra backed 85% of our Android stack’s persistence • Six EC2 XLS with each serving: • 30GB data • ~1000 reads/second/node • ~750 writes/second/node
    76. 76. Cassandra at Urban Airship
    77. 77. Cassandra at Urban Airship• Why Cassandra?
    78. 78. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs)
    79. 79. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs) • Lots of UUIDs and hashes partition well
    80. 80. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs) • Lots of UUIDs and hashes partition well • Retrievals don’t need ordering beyond keys or TSD
    81. 81. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs) • Lots of UUIDs and hashes partition well • Retrievals don’t need ordering beyond keys or TSD • Rolling upgrades FTW
    82. 82. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs) • Lots of UUIDs and hashes partition well • Retrievals don’t need ordering beyond keys or TSD • Rolling upgrades FTW • Dynamic rebalancing and node addition
    83. 83. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs) • Lots of UUIDs and hashes partition well • Retrievals don’t need ordering beyond keys or TSD • Rolling upgrades FTW • Dynamic rebalancing and node addition • Column TTLs huge for us
    84. 84. Cassandra at Urban Airship• Why Cassandra? • Well suited for most of our data model (simple DAGs) • Lots of UUIDs and hashes partition well • Retrievals don’t need ordering beyond keys or TSD • Rolling upgrades FTW • Dynamic rebalancing and node addition • Column TTLs huge for us • Awesome community :)
    85. 85. Cassandra at Urban Airship
    86. 86. Cassandra at Urban Airship• Why Cassandra cont’d?
    87. 87. Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability
    88. 88. Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone
    89. 89. Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs
    90. 90. Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs • Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920)
    91. 91. Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs • Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920) • No SPoF
    92. 92. Cassandra at Urban Airship• Why Cassandra cont’d? • Particularly well suited to working around EC2 availability • Needed a cross AZ strategy - we had seen EBS issues in the past, didn’t trust fault containment w/n a zone • Didn’t want locality of replication so needed to stripe across AZs • Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920) • No SPoF • Ability to alter CLs on a per operation basis
    93. 93. Battle Scars - Development
    94. 94. Battle Scars - Development• Know your data model
    95. 95. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA
    96. 96. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows
    97. 97. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows • I/O problems
    98. 98. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows • I/O problems • Thrift problems
    99. 99. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows • I/O problems • Thrift problems • Count problems
    100. 100. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows • I/O problems • Thrift problems • Count problems • Favor JSON over packed binaries if possible
    101. 101. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows • I/O problems • Thrift problems • Count problems • Favor JSON over packed binaries if possible• Careful with Thrift in the stack
    102. 102. Battle Scars - Development• Know your data model • Creating indexes after the fact is a PITA • Design around wide rows • I/O problems • Thrift problems • Count problems • Favor JSON over packed binaries if possible• Careful with Thrift in the stack• Don’t fear the StorageProxy
    103. 103. Battle Scars - Development
    104. 104. Battle Scars - Development• Assume failure in the client
    105. 105. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused
    106. 106. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure
    107. 107. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure • Be ready to cleanup inconsistencies anyway
    108. 108. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure • Be ready to cleanup inconsistencies anyway • Verify client library assumptions and exception handling
    109. 109. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure • Be ready to cleanup inconsistencies anyway • Verify client library assumptions and exception handling • Retry now vs. retry later?
    110. 110. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure • Be ready to cleanup inconsistencies anyway • Verify client library assumptions and exception handling • Retry now vs. retry later? • Compensating action during failures?
    111. 111. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure • Be ready to cleanup inconsistencies anyway • Verify client library assumptions and exception handling • Retry now vs. retry later? • Compensating action during failures?• Don’t avoid the Cassandra code
    112. 112. Battle Scars - Development• Assume failure in the client • Read timeout vs. connection refused • When maintaining your own indexes, try and cleanup after failure • Be ready to cleanup inconsistencies anyway • Verify client library assumptions and exception handling • Retry now vs. retry later? • Compensating action during failures?• Don’t avoid the Cassandra code• Embed for testing
    113. 113. Battle Scars - Ops
    114. 114. Battle Scars - Ops• Cassandra in EC2:
    115. 115. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled
    116. 116. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled • Disk I/O
    117. 117. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled • Disk I/O • Avoid EBS except for snapshot backups or use S3
    118. 118. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled • Disk I/O • Avoid EBS except for snapshot backups or use S3 • Stripe ephemerals, not EBS volumes
    119. 119. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled • Disk I/O • Avoid EBS except for snapshot backups or use S3 • Stripe ephemerals, not EBS volumes • Avoid smaller instances all together
    120. 120. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled • Disk I/O • Avoid EBS except for snapshot backups or use S3 • Stripe ephemerals, not EBS volumes • Avoid smaller instances all together • Don’t always assume traversing close proximity AZs is more expensive
    121. 121. Battle Scars - Ops• Cassandra in EC2: • Ensure Dynamic Snitch is enabled • Disk I/O • Avoid EBS except for snapshot backups or use S3 • Stripe ephemerals, not EBS volumes • Avoid smaller instances all together • Don’t always assume traversing close proximity AZs is more expensive • Balance RAM cost vs. the cost of additional hosts and spending time w/ GC logs
    122. 122. Battle Scars - Ops
    123. 123. Battle Scars - Ops• Java Best Practices:
    124. 124. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts
    125. 125. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase
    126. 126. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump
    127. 127. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx
    128. 128. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx • Init scripts use the same scripts operators do
    129. 129. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx • Init scripts use the same scripts operators do • Bare metal will rock your world
    130. 130. Battle Scars - Ops• Java Best Practices: • All Java services are managed via the same set of scripts • In most cases, operators don’t treat Cassandra different from HBase • Simple mechanism to take thread or heap dump • All logging is consistent - GC, application, stdx • Init scripts use the same scripts operators do • Bare metal will rock your world • +UseLargePages will rock your world too
    131. 131. Battle Scars - Ops ParNew GC Effectiveness Mean Time ParNew GC300 0.04225 0.03150 0.02 75 0.01 0 0 MB Collected Collection Time (ms) Bare Metal EC2 XL Bare Metal EC2 XL ParNew Collection Count 60000 45000 30000 15000 0 Number of Collections Bare Metal EC2 XL
    132. 132. Battle Scars - Ops
    133. 133. Battle Scars - Ops• Java Best Practices cont’d:
    134. 134. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails)
    135. 135. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails) • Understand what degenerate CMS collection looks like
    136. 136. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails) • Understand what degenerate CMS collection looks like • We settled at -XX:CMSInitiatingOccupancyFraction=60
    137. 137. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails) • Understand what degenerate CMS collection looks like • We settled at -XX:CMSInitiatingOccupancyFraction=60 • Possibly experiment with tenuring threshold
    138. 138. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails) • Understand what degenerate CMS collection looks like • We settled at -XX:CMSInitiatingOccupancyFraction=60 • Possibly experiment with tenuring threshold • When in doubt take a thread dump
    139. 139. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails) • Understand what degenerate CMS collection looks like • We settled at -XX:CMSInitiatingOccupancyFraction=60 • Possibly experiment with tenuring threshold • When in doubt take a thread dump • TDA (http://java.net/projects/tda/)
    140. 140. Battle Scars - Ops• Java Best Practices cont’d: • Get familiar with GC logs (-XX:+PrintGCDetails) • Understand what degenerate CMS collection looks like • We settled at -XX:CMSInitiatingOccupancyFraction=60 • Possibly experiment with tenuring threshold • When in doubt take a thread dump • TDA (http://java.net/projects/tda/) • Eclipse MAT (http://www.eclipse.org/mat/)
    141. 141. Battle Scars - Ops
    142. 142. Battle Scars - Ops• Understand when to compact
    143. 143. Battle Scars - Ops• Understand when to compact• Understand upgrade implications for datafiles
    144. 144. Battle Scars - Ops• Understand when to compact• Understand upgrade implications for datafiles• Watch hinted handoff closely
    145. 145. Battle Scars - Ops• Understand when to compact• Understand upgrade implications for datafiles• Watch hinted handoff closely• Monitor JMX religiously
    146. 146. Looking Forward
    147. 147. Looking Forward• Cassandra is a great hammer but not everything is a nail
    148. 148. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)
    149. 149. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC
    150. 150. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving
    151. 151. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL
    152. 152. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL • Pig
    153. 153. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL • Pig • Brisk
    154. 154. Looking Forward• Cassandra is a great hammer but not everything is a nail• Coprocessors would be awesome (hint hint)• Still spend too much time worrying about GC• Glad to see the ecosystem around the product evolving • CQL • Pig • Brisk• Guardedly optimistic about off heap data management
    155. 155. Thanks to• jbellis, driftx• Datastax• Whoever wrote TDA• SAP
    156. 156. Thanks!• Urban Airship: http://urbanairship.com/• We’re hiring! http://urbanairship.com/company/jobs/• Me @eonnen or erik at

    ×