Stampede con 2014 cassandra in the real world

990 views
892 views

Published on

Three use cases of Apache Cassandra in real-world implementations and the best practices distilled from such.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
990
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
7
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Stampede con 2014 cassandra in the real world

  1. 1. STAMPEDECON 2014 CASSANDRA IN THE REAL WORLD Nate McCall @zznate ! Co-Founder & Sr.Technical Consultant ! Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. AboutThe Last Pickle. ! Work with clients to deliver and improve Apache Cassandra based solutions. ! Based in New Zealand & USA.
  3. 3. “…in the Real World?” ! Lots of hype, stats get attention, as do big names
  4. 4. “Real World?” ! “…1.1 million client writes per second. Data was automatically replicated across all three zones making a total of 3.3 million writes per second across the cluster.” http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  5. 5. “Real World?” ! “+10 clusters, +100s nodes, 250TB provisioned, 9 billion writes/day, 5 billion reads/day” http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-cassandra-summit-2013
  6. 6. “Real World?” ! … • “but I don’t have an∞ AMZN budget” • “maybe one day I’ll have that much data”
  7. 7. “Real World!” ! Most folks needed: real fault tolerance, scale out characteristics
  8. 8. “Real World!” ! Most folks have: 3 to 12 nodes with 2-15TB, commodity hardware, small teams
  9. 9. ! Cassandra at 10k feet Case Studies Common Best Practices Cassandra in the Real World.
  10. 10. Cassandra Architecture (briefly). API's Cluster Aware Cluster Unaware Clients Disk
  11. 11. Cassandra Cluster Architecture (briefly). API's Cluster Aware Cluster Unaware Clients Disk API's Cluster Aware Cluster Unaware Disk Node 1 Node 2
  12. 12. Dynamo Cluster Architecture (briefly). API's Dynamo Database Clients Disk API's Dynamo Database Disk Node 1 Node 2
  13. 13. Cassandra Architecture (briefly). ! API Dynamo Database
  14. 14. APITransports. ! Thrift Native Binary
  15. 15. Thrift transport. ! Extremely performant for specific workloads Astyanax, disruptor-based HSHA in 2.0
  16. 16. APITransports. ! Thrift Native Binary
  17. 17. Native BinaryTransport. ! Focus of future development Uses Netty, CQL 3 only, asynchronous
  18. 18. API Services. ! JMX Thrift CQL 3 !
  19. 19. API Services. ! JMX Thrift CQL 3 !
  20. 20. API Services. ! JMX Thrift CQL 3 !
  21. 21. Cassandra Architecture (briefly). ! API Dynamo Database Please see: http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118 http://www.slideshare.net/planetcassandra/c-summit-eu-2013-cassandra-internals http://www.slideshare.net/aaronmorton/cassandra-community-webinar-august-29th-2013-in-case-of-emergency-break-glass
  22. 22. Cassandra in the Real World. ! Cassandra at 10k feet Case Studies Common Best Practices
  23. 23. Case Studies. Ad Tech Sensor Data Mobile Device Diagnostics
  24. 24. AdTech. Latency = $$$
  25. 25. AdTech. Large “Hot Data” set active users, targeting, display count
  26. 26. AdTech. Huge Long Tail who saw what, used for billing, campaign effectiveness over time, all sorts of analytics
  27. 27. AdTech: Software. Java CQL via DataStax Java Driver Python Pycassa (Thrift)
  28. 28. AdTech: Cluster. Cluster 12 nodes, 2 datacenters, {DC1:R1:3,DC2:R2:3}
  29. 29. AdTech: Systems. Physical Hardware commodity 1U 8xSSD, 36GB RAM, 10gigE + 4x1gigE
  30. 30. Case Studies. AdTech Sensor Data Mobile Device Diagnostics
  31. 31. Sensor Data. Latency != $$$
  32. 32. Sensor Data. High Write Throughput: consistent “shape”, immutable data, large sequential reads, high uptime (for writes)
  33. 33. Sensor Data: Software. REST application: separate reader service, writes to kafka, ELB to multiple regions
  34. 34. Sensor Data: Software. Java: Thrift via Astyanax, read from kafka and batch insertions to optimal size
  35. 35. Sensor Data: Cluster. Cluster 9 nodes, 1 availability zone, {RF:3}
  36. 36. Sensor Data: Systems. m1.xlarge: 15GB, 2TB RAID0 “high”, tablesnap for backup
  37. 37. Case Studies. AdTech Sensor Data Mobile Device Diagnostics
  38. 38. Device Diagnostics. Latency = battery
  39. 39. Device Diagnostics. Write Bursts large single payloads, large hot data set
  40. 40. Device Diagnostics. Huge long tail but irrelevant after 2 months, external partner API* ! *thar be dragons
  41. 41. Device Diagnostics: Software. Java CQL / DataStax Java Driver
  42. 42. Device Diagnostics: Software. REST application Payloads to S3, pointer in kafka to payload
  43. 43. Device Diagnostics: Cluster. Cluster 12 nodes, 3 availability zones {us-east-1:1}
  44. 44. Device Diagnostics: Systems. i2.2xlarge 61gb, 1.8TB RAID0 SSD “Enhanced Networking”, dedicated ENI
  45. 45. Device Diagnostics: Systems. No Backups. ! !
  46. 46. Device Diagnostics: Systems. No Backups. ! “Replay the front end.”
  47. 47. Cassandra in the Real World. ! Cassandra at 10k feet Case Studies Common Best Practices
  48. 48. Common Best Practices. API's Cluster Aware Cluster Unaware Clients Disk
  49. 49. Client Best Practices. Decouple! buffer writes for event based systems, use asynchronous operations
  50. 50. Client Best Practices. Use Official Drivers (but there are exceptions)
  51. 51. Client Best Practices. CQL3: collections, user defined types, tooling available
  52. 52. Common Best Practices. API's Cluster Aware Cluster Unaware Clients Disk
  53. 53. API Best Practices. Understand Replication!
  54. 54. API Best Practices. Monitor & Instrument
  55. 55. Common Best Practices. API's Cluster Aware Cluster Unaware Clients Disk
  56. 56. Cluster Best Practices. Understand Replication! learn all you can about topology options
  57. 57. Cluster Best Practices. Verify Assumptions: test failure scenarios explicitly
  58. 58. Common Best Practices. API's Cluster Aware Cluster Unaware Clients Disk
  59. 59. Systems Best Practices. Better to have a lot of a little commodity hardware*, 32-64gb or RAM (or more) *10gigE is now commodity
  60. 60. Systems Best Practices. BUT: do you have staff that can tune kernels? larger hardware needs tuning: “receive packet steering”
  61. 61. Systems Best Practices. EC2 SSD instances if you can, UseVPCs, Deployment groups and ENIs
  62. 62. Common Best Practices. API's Cluster Aware Cluster Unaware Clients Disk
  63. 63. Storage Best Practices. Dependent on workload can mix and match: rotational for commitlog and system
  64. 64. Storage Best Practices. You can mix and match: rotational for commitlog and system, SSD for data
  65. 65. Storage Best Practices. SSD consider JBOD, consumer grade works fine
  66. 66. Storage Best Practices. “What about SANs?”
  67. 67. Storage Best Practices. “What about SANs?” ! NO. ! (You would be moving a distributed system onto a centralized component)
  68. 68. Storage Best Practices. Backups: tablesnap on EC2, rsync (immutable data FTW!)
  69. 69. Storage Best Practices. Backups: combine rebuild+replay for best results (Bonus: loading production data to staging is testing your backups!)
  70. 70. Thanks. !
  71. 71. Nate McCall @zznate ! Co-Founder & Sr.Technical Consultant www.thelastpickle.com

×