Massively Scalable NoSQL with Apache Cassandra
Upcoming SlideShare
Loading in...5
×
 

Massively Scalable NoSQL with Apache Cassandra

on

  • 3,517 views

 

Statistics

Views

Total Views
3,517
Views on SlideShare
3,511
Embed Views
6

Actions

Likes
7
Downloads
168
Comments
0

1 Embed 6

https://twitter.com 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Massively Scalable NoSQL with Apache Cassandra Massively Scalable NoSQL with Apache Cassandra Presentation Transcript

    • Massively scalable NoSQLwith Apache Cassandra!Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
    • Big data Analytics Realtime ? (Hadoop) (“NoSQL”)©2012 DataStax
    • Some Casandra users ©2012 DataStax
    • eBay Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support©2012 DataStax ACE
    • Time series data©2012 DataStax
    • Multi-datacenter support©2012 DataStax
    • Distributed counters©2012 DataStax
    • Hadoop support©2012 DataStax
    • Disney Application/Use Case • Meet the data management needs of user facing applications across The Walt Disney Company with a single platform Why Cassandra? • DataStax Enterprise can tackle real-time and search functions in the same cluster • Scalability • 24x7 uptime©2012 DataStax NDI
    • Multitenancy©2012 DataStax
    • Multitenancy©2012 DataStax
    • Enterprise search©2012 DataStax
    • SimpleReach Application/Use Case • SimpleReach tracks social actions for content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior. Why Cassandra? • Very high velocity data ingest rate and large data volumes • Workload separation between realtime and batch applications©2012 DataStax NDE
    • SourceNinja Application/Use Case • SourceNinja notifies you to performance, security, and bug fixes for the software you depend on Why Cassandra? • Previous database system could not handle load; HBase has too many points of failure and was too slow • Fast real time capabilities, batch analytics on that data, and enterprise search©2012 DataStax RDE
    • Netflix Application/Use Case • General purpose backend for large scale highly available cloud based web services supporting Netflix Streaming Why Cassandra? • Highly available, highly robust and no schema change downtime • Highly scalable, optimized for SSD • Much lower cost than previous Oracle and SimpleDB implementations • Flexible data model • Ability to directly influence/implement OSS feature set • Supports local and wide area distributed operations, spanning US and Europe©2012 DataStax RCE
    • Optimized for SSD©2012 DataStax
    • Open source©2012 DataStax
    • Use case patterns • Massively scalable • High performance • Reliable/Available©2012 DataStax
    • ©2012 DataStax
    • reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0©2012 DataStax Cassandra 1.0
    • ©2012 DataStax
    • Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 router client©2012 DataStax
    • Availability • “High availability implies that a single fault will not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax • “The biggest problem with failover is that youre almost never using it until it really hurts. Its like backups that you never test.” -- Rick Branson: Instagram©2012 DataStax
    • Fully distributed, no SPOF client p3 p6 p1 p1 p1©2012 DataStax
    • Partitioning jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F©2012 DataStax
    • Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F©2012 DataStax
    • PK MD5 Hash jim 5e02739678... MD5* hash carol a9a0198010... operation yields a 128-bit number johnny f4eb27cea7... for keys of any size. suzy 78b421309e...©2012 DataStax
    • The “token ring” Node A Node B Node D Node C©2012 DataStax
    • Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
    • Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
    • Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
    • Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
    • Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
    • Replication Node A Node B Node D Node C carol a9a0198010...©2012 DataStax
    • Node A Node B Node D Node C carol a9a0198010...©2012 DataStax
    • Node A Node B Node D Node C carol a9a0198010...©2012 DataStax
    • Highlights • Adding capacity is application-transparent and requires no downtime • No SPOF, not even temporarily • No “primary” replica • Configurable synchronous/asynchronous • Tolerates node failure; never have to restart replication “from scratch” • “Smart” replication avoids correlated failures©2012 DataStax
    • CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
    • Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • ORDER BY?©2012 DataStax
    • Clustered data in in CFS©2012 DataStax
    • Clustered data in in CFS©2012 DataStax
    • Clustering in CQL3 CREATE TABLE sblocks (     block_id uuid,     subblock_id uuid,     data blob, block_id subblock_id data     PRIMARY KEY (block_id, subblock_id) Block1 subblock A data A ); Block1 subblock B data B ... ... ... Block2 subblock C data C Block2 subblock D data D ... ... ... Block3 subblock E data E Block3 subblock F data F ... ... ...©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};©2012 DataStax
    • Big data Analytics Realtime ? (Hadoop) (“NoSQL”)©2012 DataStax
    • The evolution of Analytics Analytics + Realtime©2012 DataStax
    • The evolution of Analytics replication Analytics Realtime©2012 DataStax
    • The evolution of Analytics ETL©2012 DataStax
    • Big data Analytics Datastax Realtime (Hadoop) Enterprise (Cassandra)©2012 DataStax
    • ©2012 DataStax
    • Better Hadoop than Hadoop • “Vanilla” Hadoop • 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...) • Single points of failure • Cant separate online and offline processing • DataStax Enterprise • Single, simplified component • Self-organizes based on workload • Peer to peer • JobTracker failover©2012 DataStax
    • Enterprise search with Solr SELECT title FROM solr WHERE solr_query=title:natio*; title -------------------------------------------------------------------------- Bolivia national football team 2002 List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999 Israel mens national inline hockey team Bolivia national football team 2001©2012 DataStax
    • Managing & Monitoring Big Data ©2012 DataStax
    • Questions? • http://www.datastax.com/docs • http://www.datastax.com/dev/blog/whats-new-in- cassandra-1-1 • http://www.datastax.com/dev/blog/schema-in- cassandra-1-1 • http://www.datastax.com/products/enterprise©2012 DataStax