Apache Cassandra in the Real World

1,350 views

Published on

Given at the Big Data Budapest meetup on 17 March, 2014.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,350
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Apache Cassandra in the Real World

  1. 1. ©2013 DataStax Confidential. Do not distribute without consent. Jeremy Hanna Support Engineer Apache Cassandra in the Real World
  2. 2. Cassandra Design •Massive scalability •Multi-datacenter •High Performance •Reliability/Availability •no SPOF, no special roles
  3. 3. Linear Scalability •Just add more servers •No special node roles
  4. 4. Multi-DC Replication
  5. 5. CAP Theorem •Select two Consistency Availability Partition Tolerance
  6. 6. Failure is expected •When bad things happen to good clusters •Rapid read protection •Consistency level guarantees •Anti-entropy services •read-repair •hinted handoff •regular repairs
  7. 7. Cassandra Design •Massive scalability •High Performance •Reliability/Availability •Ease of use
  8. 8. Developer friendly •CQL3 •Collections (List, Map, Set) •User defined types (2.1) •Cassandra native drivers •Native paging •Tracing •DataStax DevCenter tool •Atomic batches •Lightweight transactions •Triggers
  9. 9. CQL3 examples CREATE KEYSPACE shire WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'eu' : 3, 'us-east' : 2}; SELECT * FROM emp WHERE empID IN (130,104) ORDER BY deptID DESC; INSERT INTO excelsior.clicks (userid, url, date, name)
 VALUES (
 3715e600-2eb0-11e2-81c1-0800200c9a66,
 ‘http://cassandra.apache.org',
 ‘2013-10-09', ‘Mary')
 USING TTL 86400; UPDATE users SET email = ‘charlie@wonka.com’ WHERE login = ‘cbucket64' IF email = ‘cbucket@wonka.com’ CREATE USER bombadil WITH PASSWORD 'goldberry4ever' SUPERUSER; GRANT ALTER ON KEYSPACE shire TO gandalf;
  10. 10. User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) ! CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) ! SELECT id, name, addresses.city, addresses.phones FROM users; ! id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------- 63bf691f | frodo | Shire | {'075114563', '07512314'}
  11. 11. Ops Friendly •Simple design •no special role, no single point of failure •Lots of exposed metrics via JMX •Nodes and entire datacenters can go down with no loss of service •Rapid read protection •DataStax OpsCenter •Visual monitoring tool •REST interface to metric data •Free version •Hands-off services
  12. 12. Some C* Users
  13. 13. Spotify •Use case began with playlist storage •Grew significantly beyond that •Some playlist details •Essentially version control system •More than 1 billion playlists •>40,000 request/second at peak •Off-line mode (both access and changes) •Concurrent changes See also: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-playlists-at-spotify-using-cassandra-to-store-version-controlled-objects
  14. 14. La Poste •Use case: parcel distribution metadata •From MySQL to Cassandra •Holiday load doubles •4 million parcels/day •Average day for one of 70,000 postmen •Scan parcels •Print parcel list •Deliver parcels •Scans remaining, held up to 15 days (TTL) See also: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-delivering-christmas-gifts-in-france-since-2012
  15. 15. Netflix •50 clusters, 750+ nodes •Nearly all data served by Cassandra •film metadata •user ratings •recommendations •Interesting use case because: •Sheer size and how much they depend on it •Multi-region (effectively multi-datacenter) within AWS •Highly available (through various AWS outages) See also: http://planetcassandra.org/blog/post/case-study-netflix
  16. 16. Rackspace •Use case: multi-tenant cloud monitoring services •Common time series use case •raw metric data at varying intervals •raw data expires using TTLs •Supports •Ingestion through modular sources •Rollups •Servicing queries at various resolutions •Currently ingests 120 million metrics/hour •See Blueflood.io for project details See also: http://www.slideshare.net/gdusbabek/blueflood-open-source-metrics-processing-at-cassandraeu-2013
  17. 17. Questions? •@jeromatron on twitter and #cassandra irc •More real world cases •http://planetcassandra.org/functional-use-cases/ •DataStax •Cassandra documentation •Free online training •Free developer tools

×