• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cassandra at Twitter - Distributed Counters
 

Cassandra at Twitter - Distributed Counters

on

  • 5,738 views

 

Statistics

Views

Total Views
5,738
Views on SlideShare
5,407
Embed Views
331

Actions

Likes
7
Downloads
33
Comments
0

8 Embeds 331

http://nosequel.wordpress.com 252
http://www.gourlaouen.com 63
http://www.yatedo.com 7
http://twitter.com 5
http://us-w1.rockmelt.com 1
http://tweetedtimes.com 1
url_unknown 1
http://translate.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • this is our most successful use case\nwe had a general need for realtime high-scale time series data\n
  • \n
  • \n
  • counters work in trunk\nsome things, like averages can be modeled as several counters that get combined at read time\n
  • \n
  • \n
  • \n
  • you can aggregate by many dimensions at once\nevery combination is persisted separately\n
  • \n
  • \n
  • \n
  • \n

Cassandra at Twitter - Distributed Counters Cassandra at Twitter - Distributed Counters Presentation Transcript

  • Cassandra at Twitter (one use case)Ryan KingCassandra MeetupJanuary 12, 2011 TM
  • History
  • History‣ Started port of Tweets to Cassandra June 2009‣ Started other Cassandra projects in 2009 (more on this later)‣ Abandoned Tweet in 2010
  • Time Series Data
  • Use cases‣ realtime traffic/engagement analytics‣ systems monitoring
  • Time Series Data‣ write heavy‣ stored temporally‣ viewed temporarily‣ hierarchical aggregation
  • Data Model‣ Distributed Counters (CASSANDRA-1072)‣ each time series is a row (or rows) of counters‣ slice over rows to get recent data
  • Data Model‣ An example (not exactly the way we do it): 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1 5 4 ... host:web2:load1 4 3 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ...
  • Aggregation‣ Measured every minute (or continuously)‣ Rollup to courser granularities‣ More Counters! (aka, let’s do it live)
  • Aggregation Minutes 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1:sum 5 4 ...host:web1:load1:count 1 1 ... host:web2:load1:sum 4 3 ...host:web1:load1:count 1 1 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ... Hours 2011-01-12T10 2011-01-12T11 ... host:web1:load1:sum 300 250 ...host:web1:load1:count 60 59 ... host:web2:load1:sum 4 3 ...host:web1:load1:count 61 60 ... cluster:web:load1:sum 3010 2995 ...cluster:web:load1:count 6000 5990 ...
  • Aggregation‣ other dimensions besides time:‣ clusters‣ racks / dcs, etc‣ And combinations of the above
  • Pros / Cons‣ Pros‣ real-time data (average 30s between measurement and visibility)‣ real time aggregation‣ flexible data retention (once counters and TTLs work together)‣ Cons‣ Storage-intensive‣ Slow reads
  • Questions?ryan@twitter.comtwitter.com/rk TM
  • Obligatory Plug.twitter.com/jobs TM