SlideShare a Scribd company logo
1 of 27
Download to read offline
Building a Key-Value
Store with Cassandra
Kiwi PyCon 2010
Aaron Morton @aaronmorton
Weta Digital
1
Why Cassandra?
• Part of a larger project started earlier
this year to build new systems for code
running on the render farm of 35,000
cores
• Larger project goals were Scalability,
Reliability, Flexible Schema
2
How about MySQL ?
• It works. But...
• Schema changes
• Write redundancy
• Query language mismatch
• So went looking for the right tool for the
job
3
Redis ?
• Fast, flexible. But...
• Single core limit
• Replication, but no cluster (itʼs
coming)
• Limited support options
4
Couch DB ?
• Schema free, scalable (sort of),
redundant (sort of). But...
• Single write thread limit
• Replication, but no cluster (itʼs
coming)
• Low consistency with asynchronous
replication
5
Cassandra ?
• Just right, perhaps. Letʼs see...
• Highly available
• Tuneable synchronous replication
• Scalable writes and reads
• Schema free (sort of)
• Lots of new mistakes to be made
6
Availability
• Row data is kept together and
replicated around the cluster
• Replication Factor is configurable
• Partitioner determines the position of a
row key in the distributed hash table
• Replication Strategy determines where
in the cluster to place the replicas
7
Consistency
• Each read or write request specifies a
Consistency Level
• Individual nodes may be inconsistent with
respect to others
• Reads may give consistent results while some
nodes have inconsistent values
• The entire cluster will eventually mode to a
state where there is one version of each
8
Consistency
• R + W > N
• R = Read Consistency
• W = Write Consistency
• N = Replication Factor
9
Scale
• Distributed hash table
• Scale throughput and capacity with
more nodes, more disk, more memory
• Adding or removing nodes is an
online operation
• Gossip based protocol for discovery
10
Data Model
• Column orientated
• Denormalise
• Cassandra in an index building
machine
• Simple explanation: a row has a key
and stores an ordered hash in one or
more Column Families
11
Data Model
• Keyspace
• Row / Key
• Column Family or Super Column
Family
• Column
12
Data Model
User CF Posts SCF
Fred
email:fred@...
dob:04/03
post_1:{
title: foo,
body: bar}
Bob email:bob
post_100:{
title: monkeys,
body: naughty}
13
API
• Thrift
• Avro (beta)
• Auto generated bindings for many
languages
• Stateful connections
• Python wrappers pycassa, Telephus
(twisted)
14
API
• Client supplied time stamp for all
mutations
• Client supplied Consistency Level for
all mutations and reads
15
API
• insert (key, column_family,
super_column, column, value)
• get(key, column_family,
super_column, column)
• remove(key, column_family,
super_column, column)
16
API
• Slicing columns or super columns
• list of names
• start, finish, count, reversed
• get_slice() to slice one row
• multiget_slice() to slice multiple rows
• get_range_slices() to slice rows and
columns
17
API
• Slicing keys
• start key, finish key, count
• Partitioner effects key order
• get_range_slices() to slice rows and
columns
18
API
• batch_mutate()
• multiple rows and CFʼs
• delete or insert / update
• Individual mutations are atomic
• Request is not atomic, no rollback
19
Our Application
Varnish
Nginx
Tornado
Cassandra Rabbit MQ
20
Our Application
• Similar to Amazon S3.
• REST API.
• Databases, Buckets, Keys+Values.
21
Our Column Families
• Database (super)
• Bucket (super)
• Bucket Index
• Object
• Object Index (super)
22
Our API
http:// db_name.wetafx.co.nz/bucket/key
23
PUT Object
• /bucket/object
• batch_mutate()
• one row in Objects CF with columns
for meta and the body
• one column in ObjectIndex CF row
for the bucket
24
List Objects
• /bucket_name?start=foo
• get_slice()
• for the bucket row in ObjectIndex
CF
• if needed, multiget_slice() to “join”
to the Object CF
25
Delete Bucket
• /bucket_name
• get_slice() on ObjectIndex CF
• batch_mutate() to delete Object CF
and ObjectIndex CF
• delete Bucket CF row
26
Thanks
• http://wetafx.co.nz
• http://cassandra.apache.org/
•
27

More Related Content

What's hot

Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streamsconfluent
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)W2O Group
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Cracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Folio3 Software
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streamsvito jeng
 
Apache Pulsar and Github
Apache Pulsar and GithubApache Pulsar and Github
Apache Pulsar and GithubStreamNative
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021StreamNative
 

What's hot (20)

Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Cracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworks
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streams
 
Apache Pulsar and Github
Apache Pulsar and GithubApache Pulsar and Github
Apache Pulsar and Github
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
 

Viewers also liked

Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APISimran Kedia
 
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...DataStax
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with CassandraMikalai Alimenkou
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeLorenzo Alberton
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...Amazon Web Services
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresLorenzo Alberton
 

Viewers also liked (14)

Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
CQL Under the Hood
CQL Under the HoodCQL Under the Hood
CQL Under the Hood
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
 
CQL3 in depth
CQL3 in depthCQL3 in depth
CQL3 in depth
 
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks Age
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structures
 

Similar to Building a distributed Key-Value store with Cassandra

Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyContinuent
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataJohn Nestor
 
Exploiting NoSQL Like Never Before
Exploiting NoSQL Like Never BeforeExploiting NoSQL Like Never Before
Exploiting NoSQL Like Never BeforeFrancis Alexander
 
Clojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp ProgrammersClojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp Programmerselliando dias
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
JSR 335 / java 8 - update reference
JSR 335 / java 8 - update referenceJSR 335 / java 8 - update reference
JSR 335 / java 8 - update referencesandeepji_choudhary
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrationsT Jake Luciani
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshareColin Charles
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 

Similar to Building a distributed Key-Value store with Cassandra (20)

Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
KeyValue Stores
KeyValue StoresKeyValue Stores
KeyValue Stores
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
Exploiting NoSQL Like Never Before
Exploiting NoSQL Like Never BeforeExploiting NoSQL Like Never Before
Exploiting NoSQL Like Never Before
 
Clojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp ProgrammersClojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp Programmers
 
Redis Labcamp
Redis LabcampRedis Labcamp
Redis Labcamp
 
Cassandra
CassandraCassandra
Cassandra
 
JSR 335 / java 8 - update reference
JSR 335 / java 8 - update referenceJSR 335 / java 8 - update reference
JSR 335 / java 8 - update reference
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshare
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 

More from aaronmorton

Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache CassandraCassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandraaaronmorton
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.Xaaronmorton
 
Cassandra Day Atlanta 2016 - Monitoring Cassandra
Cassandra Day Atlanta 2016  - Monitoring CassandraCassandra Day Atlanta 2016  - Monitoring Cassandra
Cassandra Day Atlanta 2016 - Monitoring Cassandraaaronmorton
 
Cassandra London March 2016 - Lightening talk - introduction to incremental ...
Cassandra London March 2016  - Lightening talk - introduction to incremental ...Cassandra London March 2016  - Lightening talk - introduction to incremental ...
Cassandra London March 2016 - Lightening talk - introduction to incremental ...aaronmorton
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandraaaronmorton
 
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL aaronmorton
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodesaaronmorton
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glassaaronmorton
 
Cassandra Community Webinar - August 22 2013 - Cassandra Internals
Cassandra Community Webinar - August 22 2013 - Cassandra InternalsCassandra Community Webinar - August 22 2013 - Cassandra Internals
Cassandra Community Webinar - August 22 2013 - Cassandra Internalsaaronmorton
 
Cassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break GlassCassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break Glassaaronmorton
 
Cassandra SF 2013 - Cassandra Internals
Cassandra SF 2013 - Cassandra InternalsCassandra SF 2013 - Cassandra Internals
Cassandra SF 2013 - Cassandra Internalsaaronmorton
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2aaronmorton
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performanceaaronmorton
 
Apache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra InternalsApache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra Internalsaaronmorton
 
Cassandra SF 2012 - Technical Deep Dive: query performance
Cassandra SF 2012 - Technical Deep Dive: query performance Cassandra SF 2012 - Technical Deep Dive: query performance
Cassandra SF 2012 - Technical Deep Dive: query performance aaronmorton
 
Hello @world #cassandra
Hello @world #cassandraHello @world #cassandra
Hello @world #cassandraaaronmorton
 
Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012aaronmorton
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010aaronmorton
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Cassandra - Wellington No Sql
Cassandra - Wellington No SqlCassandra - Wellington No Sql
Cassandra - Wellington No Sqlaaronmorton
 

More from aaronmorton (20)

Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache CassandraCassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
 
Cassandra Day Atlanta 2016 - Monitoring Cassandra
Cassandra Day Atlanta 2016  - Monitoring CassandraCassandra Day Atlanta 2016  - Monitoring Cassandra
Cassandra Day Atlanta 2016 - Monitoring Cassandra
 
Cassandra London March 2016 - Lightening talk - introduction to incremental ...
Cassandra London March 2016  - Lightening talk - introduction to incremental ...Cassandra London March 2016  - Lightening talk - introduction to incremental ...
Cassandra London March 2016 - Lightening talk - introduction to incremental ...
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
 
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
Cassandra sf 2015 - Steady State Data Size With Compaction, Tombstones, and TTL
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
 
Cassandra Community Webinar - August 22 2013 - Cassandra Internals
Cassandra Community Webinar - August 22 2013 - Cassandra InternalsCassandra Community Webinar - August 22 2013 - Cassandra Internals
Cassandra Community Webinar - August 22 2013 - Cassandra Internals
 
Cassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break GlassCassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break Glass
 
Cassandra SF 2013 - Cassandra Internals
Cassandra SF 2013 - Cassandra InternalsCassandra SF 2013 - Cassandra Internals
Cassandra SF 2013 - Cassandra Internals
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performance
 
Apache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra InternalsApache Con NA 2013 - Cassandra Internals
Apache Con NA 2013 - Cassandra Internals
 
Cassandra SF 2012 - Technical Deep Dive: query performance
Cassandra SF 2012 - Technical Deep Dive: query performance Cassandra SF 2012 - Technical Deep Dive: query performance
Cassandra SF 2012 - Technical Deep Dive: query performance
 
Hello @world #cassandra
Hello @world #cassandraHello @world #cassandra
Hello @world #cassandra
 
Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra - Wellington No Sql
Cassandra - Wellington No SqlCassandra - Wellington No Sql
Cassandra - Wellington No Sql
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Building a distributed Key-Value store with Cassandra

  • 1. Building a Key-Value Store with Cassandra Kiwi PyCon 2010 Aaron Morton @aaronmorton Weta Digital 1
  • 2. Why Cassandra? • Part of a larger project started earlier this year to build new systems for code running on the render farm of 35,000 cores • Larger project goals were Scalability, Reliability, Flexible Schema 2
  • 3. How about MySQL ? • It works. But... • Schema changes • Write redundancy • Query language mismatch • So went looking for the right tool for the job 3
  • 4. Redis ? • Fast, flexible. But... • Single core limit • Replication, but no cluster (itʼs coming) • Limited support options 4
  • 5. Couch DB ? • Schema free, scalable (sort of), redundant (sort of). But... • Single write thread limit • Replication, but no cluster (itʼs coming) • Low consistency with asynchronous replication 5
  • 6. Cassandra ? • Just right, perhaps. Letʼs see... • Highly available • Tuneable synchronous replication • Scalable writes and reads • Schema free (sort of) • Lots of new mistakes to be made 6
  • 7. Availability • Row data is kept together and replicated around the cluster • Replication Factor is configurable • Partitioner determines the position of a row key in the distributed hash table • Replication Strategy determines where in the cluster to place the replicas 7
  • 8. Consistency • Each read or write request specifies a Consistency Level • Individual nodes may be inconsistent with respect to others • Reads may give consistent results while some nodes have inconsistent values • The entire cluster will eventually mode to a state where there is one version of each 8
  • 9. Consistency • R + W > N • R = Read Consistency • W = Write Consistency • N = Replication Factor 9
  • 10. Scale • Distributed hash table • Scale throughput and capacity with more nodes, more disk, more memory • Adding or removing nodes is an online operation • Gossip based protocol for discovery 10
  • 11. Data Model • Column orientated • Denormalise • Cassandra in an index building machine • Simple explanation: a row has a key and stores an ordered hash in one or more Column Families 11
  • 12. Data Model • Keyspace • Row / Key • Column Family or Super Column Family • Column 12
  • 13. Data Model User CF Posts SCF Fred email:fred@... dob:04/03 post_1:{ title: foo, body: bar} Bob email:bob post_100:{ title: monkeys, body: naughty} 13
  • 14. API • Thrift • Avro (beta) • Auto generated bindings for many languages • Stateful connections • Python wrappers pycassa, Telephus (twisted) 14
  • 15. API • Client supplied time stamp for all mutations • Client supplied Consistency Level for all mutations and reads 15
  • 16. API • insert (key, column_family, super_column, column, value) • get(key, column_family, super_column, column) • remove(key, column_family, super_column, column) 16
  • 17. API • Slicing columns or super columns • list of names • start, finish, count, reversed • get_slice() to slice one row • multiget_slice() to slice multiple rows • get_range_slices() to slice rows and columns 17
  • 18. API • Slicing keys • start key, finish key, count • Partitioner effects key order • get_range_slices() to slice rows and columns 18
  • 19. API • batch_mutate() • multiple rows and CFʼs • delete or insert / update • Individual mutations are atomic • Request is not atomic, no rollback 19
  • 21. Our Application • Similar to Amazon S3. • REST API. • Databases, Buckets, Keys+Values. 21
  • 22. Our Column Families • Database (super) • Bucket (super) • Bucket Index • Object • Object Index (super) 22
  • 24. PUT Object • /bucket/object • batch_mutate() • one row in Objects CF with columns for meta and the body • one column in ObjectIndex CF row for the bucket 24
  • 25. List Objects • /bucket_name?start=foo • get_slice() • for the bucket row in ObjectIndex CF • if needed, multiget_slice() to “join” to the Object CF 25
  • 26. Delete Bucket • /bucket_name • get_slice() on ObjectIndex CF • batch_mutate() to delete Object CF and ObjectIndex CF • delete Bucket CF row 26