100m Events

488 views
385 views

Published on

SimpleReach is a social intelligence tool for content creators. In order to handle both the data ingestion and data volume, we've employed Cassandra to store, process and aid in the display and organization of that data. We've learned a lot of lessons along the way about the right and wrong things to do both with data in general and with Cassandra in particular. These are some of those lessons.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
488
On SlideShare
0
From Embeds
0
Number of Embeds
94
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 100m Events

    1. 1. 100 Million Events Eric Lubow @elubow elubow@simplereach.com
    2. 2. Overview100 Million Events Eric Lubow @elubow
    3. 3. Overview• SimpleReach 100 Million Events Eric Lubow @elubow
    4. 4. Overview• SimpleReach• 100 Million Events 100 Million Events Eric Lubow @elubow
    5. 5. Overview• SimpleReach• 100 Million Events• Finding Patterns in Your Data 100 Million Events Eric Lubow @elubow
    6. 6. Overview• SimpleReach• 100 Million Events• Finding Patterns in Your Data• What Mistakes? 100 Million Events Eric Lubow @elubow
    7. 7. Overview• SimpleReach• 100 Million Events• Finding Patterns in Your Data• What Mistakes?• Questions 100 Million Events Eric Lubow @elubow
    8. 8. Socially Intelligent100 Million Events Eric Lubow @elubow
    9. 9. Size100 Million Events Eric Lubow @elubow
    10. 10. Size• 100m events recorded per day and growing 100 Million Events Eric Lubow @elubow
    11. 11. Size• 100m events recorded per day and growing• 500m Pageviews per month and growing 100 Million Events Eric Lubow @elubow
    12. 12. Right Tool For The Job100 Million Events Eric Lubow @elubow
    13. 13. Why?100 Million Events Eric Lubow @elubow
    14. 14. Why?• Heavier READ loads vs heavier write loads 100 Million Events Eric Lubow @elubow
    15. 15. Why?• Heavier READ loads vs heavier write loads• Data relationships may be less important 100 Million Events Eric Lubow @elubow
    16. 16. Why?• Heavier READ loads vs heavier write loads• Data relationships may be less important• Different aspects of a system have different requirements 100 Million Events Eric Lubow @elubow
    17. 17. Why?• Heavier READ loads vs heavier write loads• Data relationships may be less important• Different aspects of a system have different requirements• Know your compromises 100 Million Events Eric Lubow @elubow
    18. 18. Cassandra100 Million Events Eric Lubow @elubow
    19. 19. Cassandra• Large data volume ingestion 100 Million Events Eric Lubow @elubow
    20. 20. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency) 100 Million Events Eric Lubow @elubow
    21. 21. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency)• Query by column groups within rows 100 Million Events Eric Lubow @elubow
    22. 22. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency)• Query by column groups within rows• Range queries in Hive (Slice predicate ranges) 100 Million Events Eric Lubow @elubow
    23. 23. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency)• Query by column groups within rows• Range queries in Hive (Slice predicate ranges)• Fault tolerant 100 Million Events Eric Lubow @elubow
    24. 24. What Mistakes?100 Million Events Eric Lubow @elubow
    25. 25. What Mistakes?• Manage how many servers? 100 Million Events Eric Lubow @elubow
    26. 26. What Mistakes?• Manage how many servers?• Re-inventing the wheel (Helenus) 100 Million Events Eric Lubow @elubow
    27. 27. What Mistakes?• Manage how many servers?• Re-inventing the wheel (Helenus)• Composites Rock 100 Million Events Eric Lubow @elubow
    28. 28. What Mistakes?• Manage how many servers?• Re-inventing the wheel (Helenus)• Composites Rock• Snapshots before drop keyspace 100 Million Events Eric Lubow @elubow
    29. 29. What Mistakes?• Manage how many servers?• Re-inventing the wheel (Helenus)• Composites Rock• Snapshots before drop keyspace• How many experts does it take to run a cluster? 100 Million Events Eric Lubow @elubow
    30. 30. What Mistakes?• Manage how many servers?• Re-inventing the wheel (Helenus)• Composites Rock• Snapshots before drop keyspace• How many experts does it take to run a cluster?• You can tune Cassandra?!? 100 Million Events Eric Lubow @elubow
    31. 31. Server Management Cluster SSH100 Million Events Eric Lubow @elubow
    32. 32. Server Management• Hand tools - AWS, csshx Cluster SSH 100 Million Events Eric Lubow @elubow
    33. 33. Server Management• Hand tools - AWS, csshx• Configuration Management Cluster SSH 100 Million Events Eric Lubow @elubow
    34. 34. Server Management• Hand tools - AWS, csshx• Configuration Management• Monitoring and Alerting Tools Cluster SSH 100 Million Events Eric Lubow @elubow
    35. 35. Server Management• Hand tools - AWS, csshx• Configuration Management• Monitoring and Alerting Tools Cluster SSH• Performance 100 Million Events Eric Lubow @elubow
    36. 36. Server Management• Hand tools - AWS, csshx• Configuration Management• Monitoring and Alerting Tools Cluster SSH• Performance• Security 100 Million Events Eric Lubow @elubow
    37. 37. Helenus100 Million Events Eric Lubow @elubow
    38. 38. Helenus• Built Node.js driver for Cassandra 100 Million Events Eric Lubow @elubow
    39. 39. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus 100 Million Events Eric Lubow @elubow
    40. 40. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus• CQL 2/3, Composite Column, Thrift Interface 100 Million Events Eric Lubow @elubow
    41. 41. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus• CQL 2/3, Composite Column, Thrift Interface• Parallel querying (split up queries) 100 Million Events Eric Lubow @elubow
    42. 42. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus• CQL 2/3, Composite Column, Thrift Interface• Parallel querying (split up queries)• Fault tolerance and resilience 100 Million Events Eric Lubow @elubow
    43. 43. Data Patterns100 Million Events Eric Lubow @elubow
    44. 44. Data Patterns• Storage is cheap 100 Million Events Eric Lubow @elubow
    45. 45. Data Patterns• Storage is cheap• Composites are WAY better than underscores 100 Million Events Eric Lubow @elubow
    46. 46. Data Patterns• Storage is cheap• Composites are WAY better than underscores• Beyond UTF8Type 100 Million Events Eric Lubow @elubow
    47. 47. Data Patterns• Storage is cheap• Composites are WAY better than underscores• Beyond UTF8Type• Timestamps as LongType 100 Million Events Eric Lubow @elubow
    48. 48. Safety Mechanisms100 Million Events Eric Lubow @elubow
    49. 49. Safety Mechanisms• Snapshots before dropping keyspaces 100 Million Events Eric Lubow @elubow
    50. 50. Safety Mechanisms• Snapshots before dropping keyspaces• Authorization and authentication 100 Million Events Eric Lubow @elubow
    51. 51. Safety Mechanisms• Snapshots before dropping keyspaces• Authorization and authentication• (Limit) Direct access to the data store 100 Million Events Eric Lubow @elubow
    52. 52. Expertise100 Million Events Eric Lubow @elubow
    53. 53. Expertise• What happens when you need help? 100 Million Events Eric Lubow @elubow
    54. 54. Expertise• What happens when you need help?• How do you become an expert? 100 Million Events Eric Lubow @elubow
    55. 55. Expertise• What happens when you need help?• How do you become an expert?• What happens when you need more experts? 100 Million Events Eric Lubow @elubow
    56. 56. Tunables100 Million Events Eric Lubow @elubow
    57. 57. Tunables• Replication factor and read_repair_chance 100 Million Events Eric Lubow @elubow
    58. 58. Tunables• Replication factor and read_repair_chance• Phi Convict and RPC timeout for AWS or DC separation 100 Million Events Eric Lubow @elubow
    59. 59. Tunables• Replication factor and read_repair_chance• Phi Convict and RPC timeout for AWS or DC separation• MAX_HEAP_SIZE and HEAP_NEWSIZE (Analytics vs Realtime) 100 Million Events Eric Lubow @elubow
    60. 60. Future• Priam• Asgard• Curator• Work for ?• Hastur 100 Million Events Eric Lubow @elubow
    61. 61. Summary100 Million Events Eric Lubow @elubow
    62. 62. Summary• Learn from others mistakes 100 Million Events Eric Lubow @elubow
    63. 63. Summary• Learn from others mistakes• Tuning and data patterns 100 Million Events Eric Lubow @elubow
    64. 64. Summary• Learn from others mistakes• Tuning and data patterns• It’s ok to re-invent the wheel 100 Million Events Eric Lubow @elubow
    65. 65. Summary• Learn from others mistakes• Tuning and data patterns• It’s ok to re-invent the wheel• Applications for/with Cassandra 100 Million Events Eric Lubow @elubow
    66. 66. We’re Hiring100 Million Events Eric Lubow @elubow
    67. 67. Questions are guaranteed in life.Answers aren’t. Eric Lubow @elubow elubow@simplereach.com Thank you.

    ×