Cassandra at Twitter - Distributed Counters

  • 5,194 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,194
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
34
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • this is our most successful use case\nwe had a general need for realtime high-scale time series data\n
  • \n
  • \n
  • counters work in trunk\nsome things, like averages can be modeled as several counters that get combined at read time\n
  • \n
  • \n
  • \n
  • you can aggregate by many dimensions at once\nevery combination is persisted separately\n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Cassandra at Twitter (one use case)Ryan KingCassandra MeetupJanuary 12, 2011 TM
  • 2. History
  • 3. History‣ Started port of Tweets to Cassandra June 2009‣ Started other Cassandra projects in 2009 (more on this later)‣ Abandoned Tweet in 2010
  • 4. Time Series Data
  • 5. Use cases‣ realtime traffic/engagement analytics‣ systems monitoring
  • 6. Time Series Data‣ write heavy‣ stored temporally‣ viewed temporarily‣ hierarchical aggregation
  • 7. Data Model‣ Distributed Counters (CASSANDRA-1072)‣ each time series is a row (or rows) of counters‣ slice over rows to get recent data
  • 8. Data Model‣ An example (not exactly the way we do it): 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1 5 4 ... host:web2:load1 4 3 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ...
  • 9. Aggregation‣ Measured every minute (or continuously)‣ Rollup to courser granularities‣ More Counters! (aka, let’s do it live)
  • 10. Aggregation Minutes 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1:sum 5 4 ...host:web1:load1:count 1 1 ... host:web2:load1:sum 4 3 ...host:web1:load1:count 1 1 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ... Hours 2011-01-12T10 2011-01-12T11 ... host:web1:load1:sum 300 250 ...host:web1:load1:count 60 59 ... host:web2:load1:sum 4 3 ...host:web1:load1:count 61 60 ... cluster:web:load1:sum 3010 2995 ...cluster:web:load1:count 6000 5990 ...
  • 11. Aggregation‣ other dimensions besides time:‣ clusters‣ racks / dcs, etc‣ And combinations of the above
  • 12. Pros / Cons‣ Pros‣ real-time data (average 30s between measurement and visibility)‣ real time aggregation‣ flexible data retention (once counters and TTLs work together)‣ Cons‣ Storage-intensive‣ Slow reads
  • 13. Questions?ryan@twitter.comtwitter.com/rk TM
  • 14. Obligatory Plug.twitter.com/jobs TM