Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra sf meetup_2013_07_31

1,067 views

Published on

Published in: Technology, Business
  • Be the first to comment

Cassandra sf meetup_2013_07_31

  1. 1. C* @Disqus · July 31, 2013 Cassandra SF Meetup 1Thursday, August 1, 13
  2. 2. INTRO Software Engineer at Disqus Built the current Data Pipeline Enjoy working on large ecosystems Who am I? 2Thursday, August 1, 13
  3. 3. SO YOU MADE SOME ANALYTICS 200,000 unique users creating 1,000,000 unique comments on 1,000,000 unique articles on 20,000 unique websites Needed to build a system to track events from across the Disqus network. On a given day we have 4*10^21 4,000,000,000,000,000,000,000 4 sextillion (zetta) potential combinations PER DAY 3Thursday, August 1, 13
  4. 4. INTROTHE BIG ONE 4Thursday, August 1, 13
  5. 5. DESIGNING THE SYSTEM 5Thursday, August 1, 13
  6. 6. 3. ABILITY TO ACCESS A SUBSET IN REAL TIME 2. ABILITY TO QUERY AND JOIN LARGE DATA SETS 1. SCALABLE AND AVAILABLE DATA PIPELINE GOALS 6Thursday, August 1, 13
  7. 7. 3. ABILITY TO ACCESS A SUBSET IN REAL TIME 2. ABILITY TO QUERY AND JOIN LARGE DATA SETS 1. SCALABLE AND AVAILABLE DATA PIPELINE GOALS This is where Cassandra comes in 7Thursday, August 1, 13
  8. 8. DATA FORMAT You need a format for your data 8Thursday, August 1, 13
  9. 9. You need a format for your data Avro Thrift Protobuf JSON DATA FORMAT 9Thursday, August 1, 13
  10. 10. We chose JSON Avro Thrift Protobuf JSON DATA FORMAT 10Thursday, August 1, 13
  11. 11. At Disqus we do comments { ! "category": "comment", ! "data": { ! ! "text": "What's going on", ! ! "author": "gjcourt" ! }, ! "meta": { ! ! "endpoint": "/event.js", ! ! "useragent": { ! ! ! "flavor": { "version": "X" }, ! ! ! "browser": { "version": "6.0", "name": "Safari" } ! ! } ! }, ! "timestamp": 1375228800 } DATA FORMAT 11Thursday, August 1, 13
  12. 12. At Disqus we do comments { ! "category": "comment", ! "data": { ! ! "text": "What's going on", ! ! "author": "gjcourt" ! }, ! "meta": { ! ! "endpoint": "/event.js", ! ! "useragent": { ! ! ! "flavor": { "version": "X" }, ! ! ! "browser": { "version": "6.0", "name": "Safari" } ! ! } ! }, ! "timestamp": 1375228800 } DATA FORMAT 12Thursday, August 1, 13
  13. 13. At Disqus we do comments { ! "category": "comment", ! "data": { ! ! "text": "What's going on", ! ! "author": "gjcourt" ! }, ! "meta": { ! ! "endpoint": "/event.js", ! ! "useragent": { ! ! ! "flavor": { "version": "X" }, ! ! ! "browser": { "version": "6.0", "name": "Safari" } ! ! } ! }, ! "timestamp": 1375228800 } DATA FORMAT 13Thursday, August 1, 13
  14. 14. At Disqus we do comments { ! "category": "comment", ! "data": { ! ! "text": "What's going on", ! ! "author": "gjcourt" ! }, ! "meta": { ! ! "endpoint": "/event.js", ! ! "useragent": { ! ! ! "flavor": { "version": "X" }, ! ! ! "browser": { "version": "6.0", "name": "Safari" } ! ! } ! }, ! "timestamp": 1375228800 } DATA FORMAT 14Thursday, August 1, 13
  15. 15. At Disqus we do comments { ! "category": "comment", ! "data": { ! ! "text": "What's going on", ! ! "author": "gjcourt" ! }, ! "meta": { ! ! "endpoint": "/event.js", ! ! "useragent": { ! ! ! "flavor": { "version": "X" }, ! ! ! "browser": { "version": "6.0", "name": "Safari" } ! ! } ! }, ! "timestamp": 1375228800 } DATA FORMAT 15Thursday, August 1, 13
  16. 16. Random Aside Handling time in python is a pain in the ass RANDOM ASIDE time.time() Return the time in seconds since the epoch as a floating point number. Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls. 16Thursday, August 1, 13
  17. 17. Random Aside Handling time in python is a pain in the ass RANDOM ASIDE time.time() Return the time in seconds since the epoch as a floating point number. Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls. >>> print time.time(); print time.mktime(time.gmtime()) 1375244678.64 1375273478.0 17Thursday, August 1, 13
  18. 18. PICKING A DATABASE IS HARD 18Thursday, August 1, 13
  19. 19. Mainly because there are so many choices PICKING A DATABASE 19Thursday, August 1, 13
  20. 20. PICKING A DATABASE In an early startup, opportunity cost is king While the choice of a system is important there are a range of possible choices. A system that provides value is more important than choosing a local maximum. 20Thursday, August 1, 13
  21. 21. PICKING A DATABASE We need a large sparse matrix Requires horizontal scalability Fast reads and inserts High cardinality 21Thursday, August 1, 13
  22. 22. PICKING A DATABASE We need a large sparse matrix Requires horizontal scalability Fast reads and inserts High cardinality Almost rules out most RDBMS 22Thursday, August 1, 13
  23. 23. PICKING A DATABASE We chose Cassandra 23Thursday, August 1, 13
  24. 24. PICKING A DATABASE We chose Cassandra 24Thursday, August 1, 13
  25. 25. PICKING A DATABASE What made the difference We wanted counters and 0.8.0 has this capability Fast inserts and reads Tunable consistency guarantees Simple data model 25Thursday, August 1, 13
  26. 26. DESIGNING A DATA MODEL 26Thursday, August 1, 13
  27. 27. 3. SCALABLE AND AVAILABLE 2. FAST AND ACCURATE COUNTERS 1. HIGH VOLUME SPARSE MATRIX (billions of dimensions) DATA THAT SCALES 27Thursday, August 1, 13
  28. 28. DATA MODEL How do you store arbitrary dimensionality over time? Cassandra is a 2D sorted array 28Thursday, August 1, 13
  29. 29. DATA MODEL A simple way to build a counter CREATE TABLE counts ( key text, time_dimension text, value counter, PRIMARY KEY (key, time_dimension) ); 29Thursday, August 1, 13
  30. 30. DATA MODEL A simple way to build a counter +--------------+-----------------+-----------------+-----------------+-----------------+ |! ! ! | 2013 | 2013.7 | 2013.7.30 | 2013.7.30.0 | | comment |-----------------+------------------------------------------------------ |! ! ! | 1000 | 100 | 10 | 1 | +--------------+-----------------+-----------------+-----------------+-----------------+ 30Thursday, August 1, 13
  31. 31. DATA MODEL A simple way to build a counter +--------------+-----------------+-----------------+-----------------+-----------------+ |! ! ! | 2013 | 2013.7 | 2013.7.30 | 2013.7.30.0 | | comment |-----------------+------------------------------------------------------ |! ! ! | 1000 | 100 | 10 | 1 | +--------------+-----------------+-----------------+-----------------+-----------------+ ----------------------------+-----------------+-----------------+-----------------+-----------------+ |! ! ! | 2013 | 2013.7 | 2013.7.30 | 2013.7.30.0 | | comment.author.gjcourt |-----------------+------------------------------------------------------ |! ! ! | 23 | 17 | 7 | 1 | ----------------------------+-----------------+-----------------+-----------------+-----------------+ Dimensions are easy 31Thursday, August 1, 13
  32. 32. DATA MODEL And if you increment the time bucket 2013-07-31 +--------------+-----------------+-----------------+-----------------+-----------------+ |! ! ! | 2013 | 2013.7 | 2013.7.30 | 2013.7.30.0 | | comment |-----------------+------------------------------------------------------ |! ! ! | 1001 | 101 | 10 | 1 | +--------------+-----------------+-----------------+-----------------+-----------------+ ----------------------------+-----------------+-----------------+-----------------+-----------------+ |! ! ! | 2013 | 2013.7 | 2013.7.30 | 2013.7.30.0 | | comment.author.gjcourt |-----------------+------------------------------------------------------ |! ! ! | 24 | 18 | 7 | 1 | ----------------------------+-----------------+-----------------+-----------------+-----------------+ Dimensions are easy 32Thursday, August 1, 13
  33. 33. DATA MODEL Some major disadvantages All time intervals are in the same row Queries are non linear Time buckets in lexical order Dimensions can not be indexed Rows can grow unbounded 33Thursday, August 1, 13
  34. 34. DATA MODEL A better version of counters --------------------+-----------------+ |! ! ! | 2013 | | comment.year |-----------------+ |! ! ! | 1000 | --------------------+-----------------+ ---------------------+-----------------+-----------------+-----------------+ |! ! ! | 2013.5 | 2013.6 | 2013.7 | | comment.month |-----------------+-----------------+-----------------+ |! ! ! | 96 | 78 | 100 | ---------------------+-----------------+-----------------+-----------------+ ---------------------+-----------------+-----------------+-----------------+ |! ! ! | 2013.7.28 | 2013.7.29 | 2013.7.30 | | comment.day |-----------------+-----------------+-----------------+ |! ! ! | 8 | 6 | 13 | ---------------------+-----------------+-----------------+-----------------+ 34Thursday, August 1, 13
  35. 35. DATA MODEL This is a large improvement Efficient range queries Rollups are possible 35Thursday, August 1, 13
  36. 36. DATA MODEL However still has some problems Dimensions are not indexed Rows can grow unbounded 36Thursday, August 1, 13
  37. 37. DATA MODEL Remember the schema CREATE TABLE counts ( key text, time_dimension text, value counter, PRIMARY KEY (key, time_dimension) ); 37Thursday, August 1, 13
  38. 38. DATA MODEL Remember the schema CREATE TABLE counts ( key text, time_dimension text, value counter, PRIMARY KEY (key, time_dimension) ); 38Thursday, August 1, 13
  39. 39. DATA MODEL Remember the schema CREATE TABLE counts ( key text, time_dimension text, value counter, PRIMARY KEY (key, time_dimension) ); Should this be a <timestamp>? 39Thursday, August 1, 13
  40. 40. DATA MODEL A better version of counters CREATE TABLE better_counts ( key text, time_dimension 'org.apache.cassandra.db.marshal.ReversedType' <timestamp>, value counter, PRIMARY KEY (key, time_dimension) ); 40Thursday, August 1, 13
  41. 41. DATA MODEL The problem with counters Operations are NOT Idempotent Limited protection for overcounting https://issues.apache.org/jira/browse/CASSANDRA-4775 41Thursday, August 1, 13
  42. 42. DATA MODEL And you end up having to write code like this def swallow_cassandra_timeouts(func): @wraps(func) def inner(*args, **kwargs): try: return func(*args, **kwargs) except TimedOutException, e: logger.warning("processor.pycassa.exception.timeout") except UnavailableException, e: # raise so that we retry this batch logger.error("processor.pycassa.exception.unavailable") raise CassandraError(e) except MaximumRetryException, e: logger.warning("processor.pycassa.exception.max_retry") except Exception, e: logger.error("processor.pycassa.exception.unknown") raise return inner 42Thursday, August 1, 13
  43. 43. DATA MODEL And this if LOCAL: CASSANDRA_TIMEOUT = 60 CASSANDRA_RETRIES = 0 elif "prod" in hostname: CASSANDRA_TIMEOUT = 2 # Seconds CASSANDRA_RETRIES = 0 # None elif "storm" in hostname: CASSANDRA_TIMEOUT = 0.2 CASSANDRA_RETRIES = 0 else: # proxy (read only) CASSANDRA_TIMEOUT = 60 CASSANDRA_RETRIES = 3 43Thursday, August 1, 13
  44. 44. DATA MODEL And this too CASSANDRA_CONFIG = { 'stats': { 'pool': PoolConfig(CASSANDRA_TIMEOUT, CASSANDRA_RETRIES, CASSANDRA_POOL_SIZE), 'cf': { 'counts': ColumnFamilyConfig(ConsistencyLevel.LOCAL_QUORUM, ConsistencyLevel.ONE), 'durable_counts': ColumnFamilyConfig(ConsistencyLevel.LOCAL_QUORUM, ConsistencyLevel.LOCAL_QUORUM), 'sets': ColumnFamilyConfig(ConsistencyLevel.LOCAL_QUORUM, ConsistencyLevel.LOCAL_QUORUM), } } } 44Thursday, August 1, 13
  45. 45. DATA MODEL And operations to Cassandra look like this @swallow_cassandra_timeouts def side_effecting_function(): # insert/update into cassandra pass 45Thursday, August 1, 13
  46. 46. DATA MODEL Durable counts CREATE TABLE durable_counts ( key text, time_dimension 'org.apache.cassandra.db.marshal.ReversedType'<timestamp>, random uuid, value int, PRIMARY KEY (key, time_dimension, random) ); 46Thursday, August 1, 13
  47. 47. DATA MODEL Durable counts ---------------------+----------------------------------------+----------------------------------------+ |! ! ! | 2013-07-30 05:21:38+0000 | 2013-07-30 05:23:44+0000 | |! ! ! | eb401386-f420-11e2-a26b-002590024b08 | b320a95c-f240-11e2-a26b-002590024b08 | | comment.year |----------------------------------------+----------------------------------------+ |! ! ! | 20 | 50 | ---------------------+----------------------------------------+----------------------------------------+ ---------------------+----------------------------------------+----------------------------------------+ |! ! ! | 2013-07-30 05:21:38+0000 | 2013-07-30 05:23:44+0000 | |! ! ! | eb401386-f420-11e2-a26b-002590024b08 | b320a95c-f240-11e2-a26b-002590024b08 | | comment.month |----------------------------------------+----------------------------------------+ |! ! ! | 20 | 50 | ---------------------+----------------------------------------+----------------------------------------+ ---------------------+----------------------------------------+----------------------------------------+ |! ! ! | 2013-07-30 05:21:38+0000 | 2013-07-30 05:23:44+0000 | |! ! ! | eb401386-f420-11e2-a26b-002590024b08 | b320a95c-f240-11e2-a26b-002590024b08 | | comment.day |----------------------------------------+----------------------------------------+ |! ! ! | 20 | 50 | ---------------------+----------------------------------------+----------------------------------------+ 47Thursday, August 1, 13
  48. 48. DATA MODEL And even doing all that hackery Hive count C* counter % Similar C* durable counts % Similar 8101 8179 99.046338 8179 99.046338 7328 7390 99.161028 7390 99.161028 6255 6304 99.222715 6304 99.222715 6604 6665 99.150141 6665 99.150141 7700 7766 99.150141 7766 99.150141 5 week days of countable data 48Thursday, August 1, 13
  49. 49. DATA MODEL Over 99% accuracy 100% (allegedly) counter parity 49Thursday, August 1, 13
  50. 50. DATA MODEL Since our data is time series what if you could view it that way 50Thursday, August 1, 13
  51. 51. DATA MODEL With arbitrary dimensionality 51Thursday, August 1, 13
  52. 52. DATA MODEL With arbitrary multi dimensionality 52Thursday, August 1, 13
  53. 53. DATA MODEL Sets (our first iteration) CREATE TABLE sets ( key text, time_dimension timestamp, element blob, value double, PRIMARY KEY (key, time_dimension) ); Insert only workload. Items are deleted by TTL 53Thursday, August 1, 13
  54. 54. DATA MODEL Better Sets CREATE TABLE sets ( key text, time_dimension timestamp, element blob, deleted boolean, value double, PRIMARY KEY (key, time_dimension) ); Insert only workload. When you want to delete, you insert with deleted set to true. Read require you to iterate over all columns in chronological order. You sum values to calculate a score. 54Thursday, August 1, 13
  55. 55. DATA MODEL Counters with indexable dimensions CREATE TABLE catalog ( key text, time_dimension 'org.apache.cassandra.db.marshal.ReversedType' <timestamp>, dimension_1 text, dimension_1_val text, dimension_2 text, dimension_2_val text, ... value counter, PRIMARY KEY (key, time_dimension) ); 55Thursday, August 1, 13
  56. 56. DATA MODEL Dimension Catalog CREATE TABLE catalog ( key text, dimension text, value text, PRIMARY KEY (key, dimension) ); 56Thursday, August 1, 13
  57. 57. DATA MODEL Dimension Catalog CREATE TABLE catalog ( key text, dimension text, value text, PRIMARY KEY (key, dimension) ); cqlsh:> insert into catalog (key, dimension, value) values ('comment', 'author', 'gjcourt'); cqlsh:> insert into catalog (key, dimension, value) values ('comment', 'forum', 'disqus'); cqlsh:> select dimension from catalog where key='comment'; dimension ----------- author forum 57Thursday, August 1, 13
  58. 58. WHERE ARE WE GOING 58Thursday, August 1, 13
  59. 59. 3. EXPLORE NEW AND INTERESTING DATA PRODUCTS 2. PRODUCTIZE OUR DATA PIPELINE 1. EVOLVE CONTENT RECOMMENDATION AND ADVERTISING OUR 2013 MISSIONS 59Thursday, August 1, 13
  60. 60. THE FUTURE casscached Comparable performance 2GB max “key” (instead of 1mb) Tunable consistency levels Useful for SSI, mat-views 60Thursday, August 1, 13
  61. 61. THE FUTURE Postgres Foreign Data Wrapper Could use a cass_fdw 61Thursday, August 1, 13
  62. 62. THE FUTURE Graph of users and views g.V('username','gjcourt').out('thread_views').in('thread_views').except('username', 'gjcourt') The Netflix algorithm: All articles that people that have viewed the thread I’m currently viewing have also viewed. 62Thursday, August 1, 13
  63. 63. C* @Disqus · July 31, 2013 Cassandra SF Meetup Thanks for listening We’re hiring http://disqus.com/jobs/ 63Thursday, August 1, 13

×