Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache Cassandra in the Real World
Jeremy Hanna
Support Engineer

©2013 DataStax Confidential. Do not distribute without co...
Cassandra Design
•Massive scalability
•Multi-datacenter

•High Performance
•Reliability/Availability
•no SPOF, no special ...
Multi-DC Replication
Ops Friendly
•Simple design
•no special role, no single point of failure

•Lots of exposed metrics via JMX
•Nodes and enti...
Developer friendly
•CQL3
•Collections (Set, Map, List)
•Cassandra native drivers
•Native paging
•Tracing
•DataStax DevCent...
CQL3 examples
CREATE USER bombadil WITH PASSWORD 'goldberry4ever' SUPERUSER;
CREATE KEYSPACE shire WITH 	
REPLICATION = {'...
Some C* Users
Netflix
•50 clusters, 750 nodes
•Nearly all data in Cassandra
•film metadata
•user ratings
•recommendations

•Interesting u...
La Poste
•Use case: parcel distribution metadata
•From MySQL to Cassandra
•Holiday load doubles
•4 million parcels/day
•Av...
Rackspace
•Use case: multi-tenant cloud monitoring services
•Common time series use case
•raw metric data at varying inter...
Spotify
•Use case began with playlist storage
•Grew significantly beyond that
•Some playlist details
•Essentially version ...
Questions?
•@jeromatron on twitter and #cassandra irc
•More real world cases
•http://planetcassandra.org/FiveMinuteIntervi...
Upcoming SlideShare
Loading in …5
×

Apache Cassandra in the Real World

3,633 views

Published on

Given at the NoSQL Roadshow in London.

Published in: Technology
  • Be the first to comment

Apache Cassandra in the Real World

  1. 1. Apache Cassandra in the Real World Jeremy Hanna Support Engineer ©2013 DataStax Confidential. Do not distribute without consent.
  2. 2. Cassandra Design •Massive scalability •Multi-datacenter •High Performance •Reliability/Availability •no SPOF, no special roles
  3. 3. Multi-DC Replication
  4. 4. Ops Friendly •Simple design •no special role, no single point of failure •Lots of exposed metrics via JMX •Nodes and entire datacenters can go down with no loss of service •DataStax OpsCenter •Visual monitoring tool •REST interface to metric data •Free version •Hands-off services
  5. 5. Developer friendly •CQL3 •Collections (Set, Map, List) •Cassandra native drivers •Native paging •Tracing •DataStax DevCenter tool •Atomic batches •Lightweight transactions •Triggers
  6. 6. CQL3 examples CREATE USER bombadil WITH PASSWORD 'goldberry4ever' SUPERUSER; CREATE KEYSPACE shire WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'eu' : 3, 'us-east' : 2}; GRANT ALTER ON KEYSPACE shire TO gandalf; SELECT * FROM emp WHERE empID IN (130,104) ORDER BY deptID DESC; INSERT INTO excelsior.clicks (userid, url, date, name)
 VALUES (
 3715e600-2eb0-11e2-81c1-0800200c9a66,
 ‘http://cassandra.apache.org',
 ‘2013-10-09', ‘Mary')
 USING TTL 86400; UPDATE users SET email = ‘charlie@wonka.com’ WHERE login = ‘cbucket64' IF email = ‘cbucket@wonka.com’
  7. 7. Some C* Users
  8. 8. Netflix •50 clusters, 750 nodes •Nearly all data in Cassandra •film metadata •user ratings •recommendations •Interesting use case because: •Sheer size and how much they depend on it •Multi-region (effectively multi-datacenter) within AWS •Highly available (through various AWS outages) See also: http://planetcassandra.org/blog/post/case-study-netflix
  9. 9. La Poste •Use case: parcel distribution metadata •From MySQL to Cassandra •Holiday load doubles •4 million parcels/day •Average day for one of 70,000 postmen •Scan parcels •Print parcel list •Deliver parcels •Scans remaining, held up to 15 days (TTL) See also: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-delivering-christmas-gifts-in-france-since-2012
  10. 10. Rackspace •Use case: multi-tenant cloud monitoring services •Common time series use case •raw metric data at varying intervals •raw data expires using TTLs •Supports •Ingestion through modular sources •Rollups •Servicing queries at various resolutions •Currently ingests 120 million metrics/hour •See Blueflood.io for project details See also: http://www.slideshare.net/gdusbabek/blueflood-open-source-metrics-processing-at-cassandraeu-2013
  11. 11. Spotify •Use case began with playlist storage •Grew significantly beyond that •Some playlist details •Essentially version control system •More than 1 billion playlists •>40,000 request/second at peak •Off-line mode (both access and changes) •Concurrent changes See also: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-playlists-at-spotify-using-cassandra-to-store-version-controlled-objects
  12. 12. Questions? •@jeromatron on twitter and #cassandra irc •More real world cases •http://planetcassandra.org/FiveMinuteInterviews •DataStax •Free online training •Free developer tools

×