• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013
 

Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

on

  • 1,125 views

 

Statistics

Views

Total Views
1,125
Views on SlideShare
834
Embed Views
291

Actions

Likes
2
Downloads
27
Comments
0

1 Embed 291

http://it-eburg.com 291

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013 Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013 Presentation Transcript

    • Modern Apache Cassandra Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax ©2013 DataStax Confidential. Do not distribute without consent. 1
    • Five years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 May-10 0.7 Feb-11 1.0 Dec-11 DSE 1.2 Oct-12 2.0 Jul-13
    • Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support ACE
    • Time series data
    • Multi-datacenter support
    • Distributed counters
    • Hadoop support
    • Application/Use Case • Adobe AudienceManager: web analytics, content management, and online advertising Why Cassandra? • Low-latency • Scalable • Multi-datacenter • Tuneable consistency ACE
    • Bootstrapping
    • Bootstrapping
    • s d Bootstrapping s d s d s d
    • s d Bootstrapping s d s d s d
    • Bootstrapping
    • Tuneable consistency •(We’ll come back to this)
    • Application/Use Case • Logging • Notifications Why Cassandra? • Efficient writes • Durable • Scalable • High availability ACE
    • Durable + efficient writes write( k1 ,c1:v1 ) Memory Memtable Commit log Hard drive
    • write(k1 ,c1:v Memory k1 c1:v Memtable k1 c1:v Commit log Hard drive
    • write(k1 ,c2:v k1 c1:v c2:v Memory k1 c1:v k1 c2:v Hard drive
    • write(k2 ,c1:v c2:v ) k1 c1:v c2:v Memory k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
    • write(k1 ,c1:v c3:v ) k1 c1:v c2:v c3:v Memory k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
    • Memory flush index / BF cleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
    • High availability •99.9999% availability on Cassandra •(We’ll come back to this, too)
    • Core values •Massive scalability •High performance •Ease of use •Reliability/Availabilty Cassandra MySQL HBase Redis
    • VLDB benchmark (RWS) THROUGHPUT OPS/SEC) 80000 Cassandra MySQL HBase Redis C SS A RA ND A 60000 40000 20000 0 0 2 4 6 NUMBER OF NODES 8 10 12
    • Endpoint benchmark (RW) HBase MongoDB AN DR A Cassandra CA THROUGHPUT OPS/SEC) SS 35000 26250 17500 8750 0 1 2 4 8 NUMBER OF NODES 16 32
    • Ease of use CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
    • Classic partitioning (SPOF) partition 1 partition 2 partition 3 partition 4 router client
    • (Not a theoretical problem) https://speakerdeck.com/mitsuhiko/a-year-of-mongodb http://aphyr.com/posts/288-the-network-is-reliable
    • Fully distributed, no SPOF Client p3 p6 p1 p1 p1
    • Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
    • PK Murmur Hash jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Murmur* hash operation yields a 64-bit number for keys of any size.
    • The “token ring” Node A Node B Node D Node C
    • Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
    • Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
    • Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
    • Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
    • Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
    • Replication Node A Node D carol a9a0198010... Node B Node C
    • Node A Node D carol a9a0198010... Node B Node C
    • Node A Node D carol a9a0198010... Node B Node C
    • Virtual nodes Node A Node B C’’ B A’’ C D’ Node D Node C Without vnodes B’ C’ A A’ D With vnodes
    • A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
    • A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
    • A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
    • A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
    • A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
    • Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
    • Rapid Read Protection NONE
    • Consistency levels 90% busy Client Coordinator 30% busy 40% busy
    • Consistency levels 90% busy Client Coordinator 30% busy 40% busy
    • Consistency levels 90% busy Client Coordinator 30% busy 40% busy
    • Consistency levels 90% busy Client Coordinator 30% busy 40% busy
    • Consistency levels 90% busy Client Coordinator 30% busy 40% busy
    • Consistency levels •ONE •QUORUM •LOCAL_QUORUM •LOCAL_ONE •TWO •ALL
    • Race condition SELECT name FROM users WHERE username = 'pmcfadin'; #CASSANDRAEU
    • Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin';
    • Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows)
    • Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01');
    • Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) This one wins INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01');
    • Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; #CASSANDRAEU
    • Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True #CASSANDRAEU INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;
    • Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True #CASSANDRAEU INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS; [applied] | username | created_date | name -----------+----------+----------------+---------------False | pmcfadin | 2011-06-20 ... | Patrick McFadin
    • Paxos •All operations are quorum-based •Each replica sends information about unfinished operations to the leader during prepare •Paxos made Simple
    • Details •4 round trips vs 1 for normal updates •Paxos state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0
    • Use with caution •Great for 1% of your application •Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013- eventual-consistency-hopeful-consistency-by-christos-kalantzis
    • Cassandra 2.1
    • User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) SELECT id, name, addresses.city, addresses.phones FROM users; id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
    • Collection indexing CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); CREATE INDEX song_tags_idx ON songs(tags); SELECT * FROM songs WHERE 'blues' IN tags; id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • More-efficient repair
    • 2.1 roadmap •Efficient handling of cold data •Counters 2.0 •Only repair new-since-last-repair data •January/February 2014
    • Вопросы?