Welcome to Apache Cassandra Day Silicon Valley 2014: Keynote by Chief Evangelist, Patrick McFadin

2,393 views

Published on

Keynote Kick-Off for Apache Cassandra Day Silicon Valley 2014

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,393
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Welcome to Apache Cassandra Day Silicon Valley 2014: Keynote by Chief Evangelist, Patrick McFadin

  1. 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist for Apache Cassandra Keynote: The State of Cassandra 1
  2. 2. Five Years of Cassandra Jun-09 Mar-10 Jan-11 Nov-11 Sep-12 Jul-13 0.1 0.3 0.6 0.7 1.0 1.2 ... 2.0 DSE Jul-08
  3. 3. • Massively scalable • High performance • Reliable/Available Core values Cassandra HBase Redis MySQL
  4. 4. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); ! ! CREATE INDEX ON users(state); ! SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950; New Core Value •Massively scalable •High performance •Reliable/Available •Productivity + ease of use
  5. 5. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); ! SELECT * FROM users NATURAL JOIN users_addresses; X Collections
  6. 6. UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’}; CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); Collections
  7. 7. Cassandra Community
  8. 8. Your community
  9. 9. We are growing!
  10. 10. Great projects! •Storage backend for Graphite •Replacement of whisper •Plugs into carbon •Real users: Love it! Cyanite https://github.com/pyr/cyanite
  11. 11. Great projects! The Kiji Project http://www.kiji.org/2014/02/11/working-on-cassandra-integration-with-datastax/ •Real-time application framework •Product and content recommendation systems •Risk analysis and fraud monitoring •Customer profile and segmentation applications
  12. 12. Get Involved! •Talk at a meet up •5 minute interview? •Stack overflow •Mailing list •Apache Jira
  13. 13. Cassandra 2.0
  14. 14. Reads in a cluster Client Coordinator 40%
 busy 90% busy 30% busy
  15. 15. A failure Client Coordinator 40%
 busy 90% busy 30% busy Xtimeout
  16. 16. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy Xsuccess
  17. 17. Rapid Read Protection NONE
  18. 18. Latency (mid-compaction)
  19. 19. Cold data 10,000 req/s 5,000 req/s 4,000 req/s 10 req/s
  20. 20. Cold data 10,000 req/s 5,000 req/s 4,000 req/s 10 req/s
  21. 21. Cold data compaction 10 req/s 10,000 req/s
  22. 22. Cold Data Compaction •Disabled in 2.0 •Default in 2.1 •Configuration per Table: ALTER TABLE mykeyspace.mytable WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'cold_reads_to_omit': 0.05}; http://www.datastax.com/dev/blog/optimizations-around-cold-sstables
  23. 23. Cassandra 2.1
  24. 24. User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) ! CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) ! SELECT id, name, addresses.city, addresses.phones FROM users; ! id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
  25. 25. Collection indexing CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); ! CREATE INDEX song_tags_idx ON songs(tags); ! SELECT * FROM songs WHERE tags CONTAINS 'blues'; ! id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+------------------ 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind ! !
  26. 26. (UDT indexing?)
  27. 27. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated •better performance for 99% of uses •RF>1, replicate_on_write=true •topology changes not leading to data loss (#4071) •commitlog now 100% safe to replay (#4417) •Internal format overhaul still coming in 3.0 (#6506)
  28. 28. What hasn’t changed •same API •same average throughput •same restrictions on mixing counter and non-counter columns •same restrictions on mixing counter and non-counter updates •same restrictions on counter deletes •same retry limitations
  29. 29. Writes (low contention)
  30. 30. Writes (high contention)
  31. 31. Data directories (2.0) /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-CompressionInfo.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Data.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Filter.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Index.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Statistics.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Summary.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-TOC.txt
  32. 32. Data directories (2.1) /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4 /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-CompressionInfo.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Data.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Filter.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Index.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Statistics.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Summary.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-TOC.txt
  33. 33. Inefficient bloom filters + = ?
  34. 34. + = Inefficient bloom filters
  35. 35. + = Inefficient bloom filters
  36. 36. Inefficient bloom filters
  37. 37. HyperLogLog applied
  38. 38. HLL and compaction
  39. 39. HLL and compaction
  40. 40. More-efficient repair
  41. 41. More-efficient repair
  42. 42. More-efficient repair
  43. 43. Implications for LCS (and STCS)
  44. 44. Performance •Memtable memory use cut by 40% (85% by beta2) •larger sstables, less compaction •~50% better write performance •Full results after beta2
  45. 45. The new query cache
  46. 46. The new row cache CREATE TABLE notifications ( target_user text, notification_id timeuuid, source_id uuid, source_type text, activity text, PRIMARY KEY (target_user, notification_id) ) WITH CLUSTERING ORDER BY (notification_id DESC) AND caching = 'rows_only' AND rows_per_partition_to_cache = '3'; !
  47. 47. The new row cache target_user notification_id source_id source_type activity nick e1bd2bcb- d972b679- photo jbellis liked nick 321998c- d972b679- photo rbranson commented nick ea1c5d35- 88a049d5- user mbulman created account nick 5321998c- 64613f27- photo jbellis commented nick 07581439- 076eab7e- user thobbs created account rbranson 1c34467a- f04e309f- user jbellis created account
  48. 48. The new row cache target_user notification_id source_id source_type activity nick e1bd2bcb- d972b679- photo jbellis liked nick 321998c- d972b679- photo rbranson commented nick ea1c5d35- 88a049d5- user mbulman created account nick 5321998c- 64613f27- photo jbellis commented nick 07581439- 076eab7e- user thobbs created account rbranson 1c34467a- f04e309f- user jbellis created account
  49. 49. 2.1 roadmap •Beta1 today •Beta2 beginning of April •RC end of March •Final release end of April?
  50. 50. Questions?

×