Your SlideShare is downloading. ×
  • Like
1, 2, 3, 4, Add Another Data Store
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

1, 2, 3, 4, Add Another Data Store

  • 941 views
Published

In order to meet all our data needs including high volume ingestion, Map Reduce capabilities, real-time analytics, historical analytics, and other analysis technologies, we needed to incorporate the …

In order to meet all our data needs including high volume ingestion, Map Reduce capabilities, real-time analytics, historical analytics, and other analysis technologies, we needed to incorporate the use of Redis, Mongo, a MySQL column store and Cassandra. Wrap the whole thing up in a Node.js API for speed and consistent access patterns and you have a whole data storage spread.

Talk URL: http://www.youtube.com/watch?v=od6DdB-zJCk

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
941
On SlideShare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. 1,2,3,4Add Another Data Store(And Other Rhymes)Eric Lubow@elubowelubow@simplereach.com#cassandra12
  • 2. Overview1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 3. Overview• SimpleReach 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 4. Overview• SimpleReach• Definitions and Data Stores 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 5. Overview• SimpleReach• Definitions and Data Stores• Evolution to Polyglottany 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 6. Overview• SimpleReach• Definitions and Data Stores• Evolution to Polyglottany• Tie It Together 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 7. Overview• SimpleReach• Definitions and Data Stores• Evolution to Polyglottany• Tie It Together• Questions 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 8. Socially Intelligent1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 9. Size1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 10. Size• 100m events recorded per day and growing 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 11. Size• 100m events recorded per day and growing• 500m Pageviews per month and growing 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 12. Polyglot PersistencePolyglot Persistence, like polyglot programming, is allabout choosing the right persistence option for the taskat hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 13. Right Tool For The Job1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 14. Why?1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 15. Why?• Heavier READ loads vs heavier write loads 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 16. Why?• Heavier READ loads vs heavier write loads• Data relationships may be less important 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 17. Why?• Heavier READ loads vs heavier write loads• Data relationships may be less important• Different aspects of a system have different requirements 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 18. No One Size Fits All1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 19. Tools1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 20. Free vs. Cost1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 21. Languages1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 22. Pre-Scale1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 23. Scale1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 24. SimpleReach Pre-Scale1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 25. SimpleReach1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 26. Cassandra1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 27. Cassandra• Large data volume ingestion 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 28. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 29. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency)• Query by column groups within rows 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 30. Cassandra• Large data volume ingestion• Really fast writes to many locations (eventual consistency)• Query by column groups within rows• Range queries in Hive (partial CF scans) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 31. mongoDB1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 32. mongoDB• Fast atomic increments (Node.js is native JSON) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 33. mongoDB• Fast atomic increments (Node.js is native JSON)• Sharding for faster distributed increments 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 34. mongoDB• Fast atomic increments (Node.js is native JSON)• Sharding for faster distributed increments• Solid ORM for Rails (MongoID) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 35. mongoDB• Fast atomic increments (Node.js is native JSON)• Sharding for faster distributed increments• Solid ORM for Rails (MongoID)• Fast access for pub/sub of durable/persisted documents 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 36. Redis1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 37. Redis• Supports hundreds of thousands transactions per second 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 38. Redis• Supports hundreds of thousands transactions per second• Great caching engine 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 39. Redis• Supports hundreds of thousands transactions per second• Great caching engine• Supports useful variable types like sorted set 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 40. Redis• Supports hundreds of thousands transactions per second• Great caching engine• Supports useful variable types like sorted set• Pay SerDe price on each access 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 41. InfiniDB and Infobright1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 42. InfiniDB and Infobright• Column Stores for ad-hoc analytics queries in SQL 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 43. InfiniDB and Infobright• Column Stores for ad-hoc analytics queries in SQL• Databases built for business intelligence 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 44. InfiniDB and Infobright• Column Stores for ad-hoc analytics queries in SQL• Databases built for business intelligence• Heavy compression of data 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 45. InfiniDB and Infobright• Column Stores for ad-hoc analytics queries in SQL• Databases built for business intelligence• Heavy compression of data• Pre-aggregated data (Extents/Knowledge Grid) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 46. Ruby, Node.js, Python1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 47. Ruby, Node.js, Python• Polyglottany doesn’t only apply to data stores 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 48. Ruby, Node.js, Python• Polyglottany doesn’t only apply to data stores• Each language has its own benefit to each data storage layer 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 49. Ruby, Node.js, Python• Polyglottany doesn’t only apply to data stores• Each language has its own benefit to each data storage layer• Each language has its own individual benefits 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 50. Ruby, Node.js, Python• Polyglottany doesn’t only apply to data stores• Each language has its own benefit to each data storage layer• Each language has its own individual benefits• JSON, APIs, Performance 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 51. Choice1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 52. Cons1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 53. Cons• Redis - Can only utilize a single core 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 54. Cons• Redis - Can only utilize a single core• MySQL Column Store - DELETE/UPDATEs are VERY expensive 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 55. Cons• Redis - Can only utilize a single core• MySQL Column Store - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 56. Cons• Redis - Can only utilize a single core• MySQL Column Store - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes• Mongo - Queries slow down when shard count increases. Indexes must fit in memory 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 57. Cons• Redis - Can only utilize a single core• MySQL Column Store - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes• Mongo - Queries slow down when shard count increases. Indexes must fit in memory• Python - Whitespace. Community 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 58. Cons• Redis - Can only utilize a single core• MySQL Column Store - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes• Mongo - Queries slow down when shard count increases. Indexes must fit in memory• Python - Whitespace. Community• Ruby - Not high performance enough for our standards 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 59. Cons• Redis - Can only utilize a single core• MySQL Column Store - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes• Mongo - Queries slow down when shard count increases. Indexes must fit in memory• Python - Whitespace. Community• Ruby - Not high performance enough for our standards• Javascript (Node.js) - Bad for CPU or IO intensive workloads 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 60. Tying It Together1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 61. Tying It Together• Built in the cloud 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 62. Tying It Together• Built in the cloud• Service Oriented Architecture (Internal API) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 63. Tying It Together• Built in the cloud• Service Oriented Architecture (Internal API)• Built Helenus (Cassandra Node.js driver) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 64. Tying It Together• Built in the cloud• Service Oriented Architecture (Internal API)• Built Helenus (Cassandra Node.js driver)• Data accuracy checks: visual and programmatic 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 65. Tying It Together• Built in the cloud• Service Oriented Architecture (Internal API)• Built Helenus (Cassandra Node.js driver)• Data accuracy checks: visual and programmatic• Built framework for testing out storage engines 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 66. Service Architecture Analytics Real-time Internal API1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 67. Helenus1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 68. Helenus• Built Node.js driver for Cassandra 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 69. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 70. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus• CQL 2/3, Composite Column, Thrift Interface 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 71. Helenus• Built Node.js driver for Cassandra• https://github.com/simplereach/helenus• CQL 2/3, Composite Column, Thrift Interface• More about Node.js and Cassandra 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 72. Points To Consider1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 73. Points To Consider• Data consistency - Same in all data stores 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 74. Points To Consider• Data consistency - Same in all data stores• How important is data durability? 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 75. Points To Consider• Data consistency - Same in all data stores• How important is data durability?• Managing many servers (Chef, AWS, CSSH) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 76. Points To Consider• Data consistency - Same in all data stores• How important is data durability?• Managing many servers (Chef, AWS, CSSH)• Managing and learning many different applications and tuning for them 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 77. Summary1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 78. Summary• Polyglottany is not a sin 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 79. Summary• Polyglottany is not a sin• Know your data read/write patterns 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 80. Summary• Polyglottany is not a sin• Know your data read/write patterns• Know the tools available to you 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 81. Summary• Polyglottany is not a sin• Know your data read/write patterns• Know the tools available to you• Know your compromises 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 82. We’re Hiring1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 83. Questions are guaranteed in life.Answers aren’t. Eric Lubow @elubow elubow@simplereach.com #cassandra12 Thank you.