C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

  • 585 views
Uploaded on

Gnip ingests and must serve out hundreds of millions of social activities every day and social platforms are only growing. This makes the scalability of applications essential for Gnip. Enter …

Gnip ingests and must serve out hundreds of millions of social activities every day and social platforms are only growing. This makes the scalability of applications essential for Gnip. Enter Cassandra. Problem solved, right? Not exactly, Gnip's relationship with Cassandra was not all rainbows and unicorns. In this session we will walk you through why we began looking at Cassandra as a data store in the first place and the valuable lessons we with Cassandra that has made it an invaluable part of our infrastructure.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
585
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
16
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. #Cassandra2013Dude, Where’s My Tweet?Taming the Twitter FirehoseAndrew NoonanSoftware Engineer at Gnip@noonanisms
  • 2. #Cassandra2013GnipCassandraRainbowsUnicorns???  
  • 3. #Cassandra2013
  • 4. #Cassandra2013Social Data
  • 5. #Cassandra201390% of Fortune 500120 Billion Activities Delivered Per Month
  • 6. #Cassandra2013Lots-O-DataRedundancy & ReliabilityAvailability
  • 7. #Cassandra2013
  • 8. #Cassandra2013High Write Throughput✔Scalable✔Highly Available✔Persistent✔
  • 9. #Cassandra2013Right?
  • 10. #Cassandra2013Not Exactly…
  • 11. #Cassandra2013No Maintenance? Bad IdeaBegin Maintenance -> 2X Data GrowthScalable, Right?Bootstrap Failures Due To Cluster Load
  • 12. #Cassandra2013Reconsider (Life) Choices?
  • 13. #Cassandra2013Size Tiered Compaction vs Leveled CompactionHow Much Data To Store Per NodeYour Write Pattern Matters Too
  • 14. #Cassandra2013compaction_throughput_mb_per_sec16-32X write rate?Lots-o-options – explore them
  • 15. #Cassandra2013Lookup by Tweet IDRead Rate < Write RateDynamic ColumnFamilies
  • 16. #Cassandra2013For realz this time!?
  • 17. #Cassandra2013
  • 18. #Cassandra2013Bloom Filter False Positive RateIndex IntervalsOnly Change Schema On One Node! (For Now)
  • 19. #Cassandra2013You Won’t Always Fit The Mold and That’s OkayExplore Your Options No Matter WhatUnderstand The Consequences Of Your ChoicesStaging Environment Identical To Production
  • 20. #Cassandra2013www.gnip.com@noonanisms