Cassandra at Twitter - Distributed Counters

5,809 views

Published on

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,809
On SlideShare
0
From Embeds
0
Number of Embeds
443
Actions
Shares
0
Downloads
43
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • this is our most successful use case\nwe had a general need for realtime high-scale time series data\n
  • \n
  • \n
  • counters work in trunk\nsome things, like averages can be modeled as several counters that get combined at read time\n
  • \n
  • \n
  • \n
  • you can aggregate by many dimensions at once\nevery combination is persisted separately\n
  • \n
  • \n
  • \n
  • \n
  • Cassandra at Twitter - Distributed Counters

    1. 1. Cassandra at Twitter (one use case)Ryan KingCassandra MeetupJanuary 12, 2011 TM
    2. 2. History
    3. 3. History‣ Started port of Tweets to Cassandra June 2009‣ Started other Cassandra projects in 2009 (more on this later)‣ Abandoned Tweet in 2010
    4. 4. Time Series Data
    5. 5. Use cases‣ realtime traffic/engagement analytics‣ systems monitoring
    6. 6. Time Series Data‣ write heavy‣ stored temporally‣ viewed temporarily‣ hierarchical aggregation
    7. 7. Data Model‣ Distributed Counters (CASSANDRA-1072)‣ each time series is a row (or rows) of counters‣ slice over rows to get recent data
    8. 8. Data Model‣ An example (not exactly the way we do it): 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1 5 4 ... host:web2:load1 4 3 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ...
    9. 9. Aggregation‣ Measured every minute (or continuously)‣ Rollup to courser granularities‣ More Counters! (aka, let’s do it live)
    10. 10. Aggregation Minutes 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1:sum 5 4 ...host:web1:load1:count 1 1 ... host:web2:load1:sum 4 3 ...host:web1:load1:count 1 1 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ... Hours 2011-01-12T10 2011-01-12T11 ... host:web1:load1:sum 300 250 ...host:web1:load1:count 60 59 ... host:web2:load1:sum 4 3 ...host:web1:load1:count 61 60 ... cluster:web:load1:sum 3010 2995 ...cluster:web:load1:count 6000 5990 ...
    11. 11. Aggregation‣ other dimensions besides time:‣ clusters‣ racks / dcs, etc‣ And combinations of the above
    12. 12. Pros / Cons‣ Pros‣ real-time data (average 30s between measurement and visibility)‣ real time aggregation‣ flexible data retention (once counters and TTLs work together)‣ Cons‣ Storage-intensive‣ Slow reads
    13. 13. Questions?ryan@twitter.comtwitter.com/rk TM
    14. 14. Obligatory Plug.twitter.com/jobs TM

    ×