SlideShare a Scribd company logo
1 of 85
Cassandra - A Decentralized
Structured Storage System
Presented By
Tejaswi Ganne
Latha Muddu
Khulud Alsultan
Rajaramya Janagama
Marmik Patel
Arunit Gupta
Chaitanya Sai
Prashant Malik
Facebook
Avinash Lakshman
Facebook
Outline
• Data Model.
• Cassandra API.
• System Architecture:
• Partitioning.
• Replication.
• Membership and Failure Detection.
• Bootstrapping and Scaling the Cluster.
• Local Persistence.
Introduction and
Data Model
-Tejaswi Ganne
Introduction
•Apache Cassandra is an open source distributed storage
system.
•Manages very large data spread across many commodity
servers located across many data centers.
•Named after the Greek mythological prophet Cassandra.
•Initially developed at Facebook to power their Inbox Search
feature, later Facebook open sourced it as Apache Incubator
project.
•Features –
•High Scalability
•High Availability
•Fault Tolerant
Features
• High Scalability: There is no downtime or interruptions to
applications as read and write throughput increases linearly
as new machines get added.
• High Availability: It refers to systems which are durable and
likely to operate continuously without any failure for a long
time.
• Fault Tolerant: Data is automatically replicated to multiple
nodes, where failed nodes can be replaced within no time.
Replication is supported across multiple data centers.
Data Model
•Uses a simple data model instead of a full Relational data
model.
• A table in Cassandra is a distributed multi dimensional map
indexed by a key.
•Value is a structured object.
•Operations are atomic on each row per replica.
Data Model
*Figure taken from http://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
Data Model Contd…
•Each row can have different number of columns.
•Each column has < Name, Value, Timestamp >.
•Columns can be ordered by names and timestamps.
•Columns are grouped into Column Families (CF).
•Two types of CFs –
•Simple Column Family
–Has columns.
–A Column can be accessed using the convention -
Âťcolumn_family : column
•Super Column Family
–CF within a CF.
–Has a Simple CF, or another Super CF in it.
–A Column can be accessed using the convention -
Âťcolumn_family : super_column : column
Key-Value Model
• It is column
oriented NoSQL
system
• Row is collection of
columns labeled
with a name
• Key is the column
name and a row
must contain at
least 1 column
https://10kloc.wordpress.com/2012/12/25/cassandra-chapter-three-data-model/
Related Work
•Amazon Dynamo
Dynamo is a storage system that is used by Amazon to store
and retrieve user shopping carts. It requires both read and
write operations for managing timestamps.
•Google Chubby
GFS uses a simple design with a single master server for
hosting the entire metadata and where the data is split into
chunks and stored in chunk servers. It is made fault tolerant
using the Chubby abstraction. Chubby achieves fault-tolerant
through replication.
References
• https://en.wikipedia.org/wiki/Apache_Cassandra
• http://cassandra.apache.org/
• http://www.planetcassandra.org/what-is-apache-
cassandra/
• http://docs.datastax.com/en/cassandra/2.1/cassandra/gett
ingStartedCassandraIntro.html
Questions???
Cassandra API
-Muddu Latha
Cassandra Query Language
• The Cassandra Query Language (CQL) is the
primary language for communicating with the
Cassandra database.
CQL Statements :
Data Definition Statements
Data Manipulation Statements
Queries
Data Definition Statements
• Create Keyspace
• Use
• Alter Keyspace
• Drop Keyspace
• Create Table
• Alter Table
• Drop Table
• Create Type
• Alter Type
• Drop Type
• Create Trigger
• Drop Trigger
• Create Function
• Drop Function
• Create Aggregate
• Drop Aggregate
Data Definition Statements(Cont..)
1. Create Keyspace :
cqlsh> CREATE KEYSPACE sample_demo with
replication = {‘ class ’:
‘ SimpleStrategy ‘ , ‘ replicaton_factor ‘ : 3 };
2.Use Keyspace :
cqlsh> USE sample_demo ;
3.Alter Keyspace :
cqlsh>ALTER KEYSPACE sample_demo
WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 5};
Data Definition Statements(Cont..)
4. Drop Keyspace :
DROP KEYSPACE sample_demo ;
5. Create Table :
CREATE TABLE presentors_list ( firstname text,
lastname text, classid int, email text, PRIMARY KEY
(lastname));
6. Alter Table :
ALTER TABLE presentors_list ADD city text ;
Data Definition Statements(Cont..)
7 . Drop Table :removes a table.
DROP TABLE presentors_list;
8. Truncate Table : removes all data from a
table.
TRUNCATE presentors_list;
CQL Statements Execution Screenshots
Data Manipulation Statements
Insert ,Update , Delete , Batch
1.Insert :
INSERT INTO presentors_list( firstname , lastname
, classid ,email ) VALUES( ‘ lakshmi‘ , ‘ upadrasta ‘
, 29 ,mltx9@mail.umkc.edu);
2.Update :
UPDATE presentors_list TTL 400 set firstname = ‘
dileep ’ , classid=‘28’, WHERE lastname =‘
upadrasta ’ ;
Data Manipulation Statements(Cont..)
3.Delete :
DELETE FROM presentors_list USING
TIMESTAMP WHERE lastname=‘upadrasta’ ;
4.Batch :
BEGIN BATCH
INSERT QUERY;
UPDATE QUERY;
DELETEQUERY;
APPLY BATCH
Queries
Select
SELECT * FROM presentors_list ;
SELECT firstname , email WHERE lastname =
‘upadrasta ’;
References
1. http://www.planetcassandra.org/try-
cassandra/
2. https://cassandra.apache.org/doc/cql3/C
QL.html#createKeyspaceStmt
3. https://cassandra.apache.org/doc/cql3/C
QL-2.2.html
Questions???
System Architecture
Partitioning
- Khulud Alsultan
System Architecture
• The core distributed systems techniques:
 Partitioning
 Replication
 Membership and Failure handling
 Bootstrapping and Scaling the Cluster.
 Local Persistence.
System Architecture
• These modules work in synchrony to handle
read/write requests.
• Read/write request for a key gets routed to any
node in the cluster.
• The node determines the replica for this
particular key.
System Architecture
• For writes:
Routes the requests to the replicas and waits for a
quorum of replicas to acknowledge the completion
of the writes.
• For reads:
• Routes the requests to the closest replica
OR
• Routes the requests to all replicas and waits for
a quorum of responses
Partitioning
• Scale incrementally.
• Dynamically partition the data over the set of nodes
in cluster.
• Partitions data using consistent hashing.
• Uses an order preserving hash function.
• Output range is treated as ring.
• Each node is assigned a random value which
represents its position on the ring.
Partitioning
http://www.ibm.com/developerworks/library/os-apache-cassandra/index.html#resources
Consistent Hashing
• Example:
Cassandra assigns a hash value to each partition key:
if you have the following data:
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.htm
l
Consistent Hashing
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashin
g_c.html
Consistent Hashing
• Cassandra places the data on each node according to the value of
the partition key and the range that the node is responsible for.
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashin
g_c.html
Consistent hashing
Advantage:
• Departure or arrival of a node only affects its
immediate neighbors and others remain
unaffected.
Some Challenges:
• The random position assignment of each node on
the ring leads to non-uniform data and load
distribution.
• The heterogeneity in the performance of nodes.
Partitioning
• Two ways to address this issue:
 Nodes get assigned to multiple positions in the
circle.
 Analyze load information on the ring and have
lightly loaded nodes move on the ring to alleviate
heavily loaded nodes
Questions???
Replication
-Rajaramya Janagama
Replication
• How data is duplicated across nodes.
• Why replication?
• To achieve high availability and durability.
• Ensure fault tolerance and no failure by
replicating one or more copies of every row in
a column family across nodes in cluster
Replication
• How to achieve replication?
• Each data item is replicated at N (replication factor)
nodes.
• Coordinator node is responsible for the replication of
data items.
• It also replicates keys across N-1 nodes.
Replication Policies
– Various options to replicate data
• Rack Unaware
• Rack Aware
• Datacenter Aware
Rack Unware
• Replicate data at N-1 successive nodes after its
coordinator
Rack Aware
Replica 1
Replica 2
Rack 1
N1 N2 N3
Rack2
N4 N5 N6
• No two replicas should lie in the same rack.
N- Nodes
Data Center Aware
• No two replicas should lie in the same datacenter.
Rack1
Rack2
Datacenter 1
N1 N2 N4N3
N6N5 N7 N8
Datacenter 2
Replica1
Replica2
Advantages
• Cassandra provides durability guarantees in the
presence of node failures and network partitions.
• The storage nodes are spread across multiple
datacenters and are connected through high speed data
links.
• This scheme of replicating across multiple datacenters
allows us to handle entire data center failures without
any outage.
Questions???
Membership And Failure
Detection
- Marmik Patel
What is Membership?
• Can be split into two parts:
1. Service Discovery
2. Failure Detection
Service Discovery
• Service Discovery comes into picture when new node is set
up and added to cluster
• Based on Scuttlebutt Reconciliation, a very efficient anti-
antropy gossip protocol based mechanism
• Scuttlebutt has very efficient utilization of CPU and gossip
channel
Gossip Protocol and Scuttlebutt
Reconciliation
Gossip Protocol
• Protocol that Cassandra uses to discover information about other
nodes
• Information transferred from node to the node it knows about
• Not only for Membership, but also used to disseminate other
system related to control state such as health, tokens, addresses,
data size etc.
Scuttlebutt Reconciliation
• Not necessary that two participants in a gossip exchange most
recent mapping than those of the peer
• Inspired by real life rumor spreading
• Repair replicated data by comparing differences
Robbert van Renesse, Dan Mihai Dumitriu, Valient Gough, and Chris Thomas. Efficient reconciliation
and flow control for anti-entropy protocols
Failure Detection
• Comes into picture when the node is was taken down for
maintenance, or fails due to an error
• Mechanism by which a node can locally determine if any
other node in system is up or down
• Also used to avoid attempts to communicate with
unreachable node
• Uses failure detector which is modified version of Φ Accrual
Failure Detector
• Gossip protocol is used for exchanging information
ÎŚ Accrual Failure Detector
• Based on very simple principle
• Does not emit a Boolean value stating a node is up or down,
but emits a value which represents a suspicion level for
nodes
• Value is defines as Φ
• Idea is to express the value of Φ on a scale that is
dynamically adjusted to reflect network and load condition
• Difference between traditional failure detector and accrual
failure detector is which component of the system does what
part of failure detection
Traditional Failure Detector vs
Accrual Failure Detector
• In Traditional Failure Detector, the
monitoring and interpretation are
combined and output of this
combination is Boolean.
• Application cannot do any
interpretation as monitored
information is already being
interpreted
Traditional Failure
Detector
Accrual Failure Detector
• Accrual Failure Detector provides
lower level abstraction that avoids
the interpretation of monitoring
information
• Value associated with each
process represents suspicion level
which is left for application to
interpret
http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf
Properties of ÎŚ
• Φ represents likelihood that node A is wrong about node B’s
state
• Assume when Φ = 1, A will make mistake in deciding state of B
is 10%, then likelihood is about 1% when ÎŚ = 2 , 0.1% when ÎŚ
= 3 and so on
• Node maintains a sliding window of inter-arrival times of gossip
messages to calculate the value of ÎŚ
• Φ is very good in accuracy and speed
• Also adjust well to network conditions and server load
conditions
• Cassandra approximate Φ using exponential distribution
Questions???
Bootstrapping
in
-Arunit Gupta
Bootstrapping
What is Bootstrapping?
Adding new nodes is called “Bootstrapping”
Ways of Adding new node
There are two ways of adding node :
– New node gets assigned a random token which gives its position in the ring. It
gossips its location to rest of the ring where the information is exchanged about
one another.
– New node reads its config file to contact it’s initial contact points.
• New nodes are added manually by administrator via CLI or Web interface provided
by Cassandra.
http://s3.amazonaws.com/ppt-download/cassandraekaterinberg2013-131212053553-phpapp01.pdf?response-content-
disposition=attachment&Signature=7pB%2BhMgGqV1vxcRUaqCbCt2%2BH6o%3D&Expires=1458678552&AWSAccessKeyId=AKIAJ6D6SEMXSASX
HDAQ
Bootstrapping Contd..
• These initial contact points are known as Seeds, which is basically used by newly added
node to know each other, where ultimate goal for all nodes in the cluster is to discover
one another.
• Seeds can also come from configuration service like Zookeeper, which is a centralized
service for maintaining configuration information, naming, providing distributed
synchronization, and providing group services.
“Because Coordinating Distributed Systems is a Zoo”
Google images
Facts!!!
• Comparison with Amazon’s Dynamo which is a
highly available key-value structured storage system.
“Dynamo’s load is no where close to what we see in
practice over here at Facebook.” –Avinash Lakshman
nosqlmatters2012-130102154135-phpapp01.pdf
Configuration
In addition to seeds, you'll also need to configure the IP interface to listen on for
Gossip and CQL, (listen_address and rpc_address respectively).Use
listen_address that will be reachable from the listen_address used on all other nodes, a
nd a rpc_address that will be accessible to clients.
Once everything is configured and the nodes are running, use
the bin/nodetool status utility to verify a properly connected cluster. For example:
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cl
uster_t.html
Environment
• Node outages occurred are often transient but may last for extended
intervals.
• A network outage rarely signifies a permanent departure and should not
result in re-balancing of the partition assignment or repair of the
unreachable replicas.
• Manual errors could result in unintentional startup of new Cassandra nodes.
As a result an explicit mechanism is considered appropriate to initiate the
addition and removal of nodes from a Cassandra Instance.
• Administrator – uses a command line tool or a browser to connect to a
Cassandra node and issue a membership change to join or leave the cluster.
Scaling the cluster
• Whenever a new node is added into the system, it gets assigned a token such that it can alleviate heavily
loaded node.
• New node will take the range which other node were responsible for before.
• Cassandra bootstrapping algorithm is initiated from any other node in the system either using a command
line utility or web dashboard.
• The node giving up the data streams the data over to the new nodes using kernel copy techniques.
Cassandra Ring showing scalability.
Scaling the Cluster
https://www.google.com/search?q=scalability+in+cassandra&biw=1366&bih=667&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjtztnc4dfLAhWEzoMKHQ1zD4MQ_AUIBygC#imgrc=qusi2v
eDeVAH4M%3A
Future
What is the Future?
• Operational experience has shown that data can be transferred at the rate of
40MB/sec from a single node. Work is going on to have multiple replicas take part
in the bootstrap transfer by parallelizing the effort, similar to bit torrent which is a
p2p system used to transfer large files to thousands of location in a short period of
time
• Facebook uses bit torrent to distribute updates to Facebook servers.
“Bit Torrent is fantastic for this, it’s really great,” Cook said. “It’s ‘super-duper’fast
and it allows us to alleviate a lot of scaling concerns we’ve had in the past, where it
took forever to get code to the webservers before you could even boot it up and run it.”
Virtual nodes in Cassandra
• One of the new features slated for Cassandra 1.2’s release was virtual nodes
(vnodes) where there was paradigm change from one token or range per node, to
many per node. Within a cluster these can be randomly selected and be non-
contiguous, giving us many smaller ranges that belong to each node.
Advantage?
 Use of Heterogeneous machines in a cluster.
 Node Failures and backing up.
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
Questions???
Local Persistence
-Chaitanya Sai Manne
Local Persistence
– Cassandra depends on the local file system for data
persistence.
– The data is represented on disk using a format that lends
itself to efficient data retrieval.
– For a data store to be considered persistent, it must write to
non-volatile storage.
Cassandra – more than one server
• All the nodes participate in a
cluster
• They are independent –
share nothing
• Add or remove as needed
• If you need more capacity?
Add a server
Focus on singer server
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
Write operation
• Firstly, it writes into the commit
log
• Then it puts into the in-memory
data structure i.e. memtable
• The memtable is identified by the
primary key
• Acknowledge back to the client
• This is a simple process and that’s
what make scaling is easier
• As memtable start to fill up there
is a flush process
• Flush process writes the memtable
to a file called SS table i.e. Sorted
String
• The writes here are sequential
writes
http://www.slideshare.net/sameiralk/cassendra
Example
Update users
Set firstname = ‘Chaitanya’
Where id = ‘cm7cd’
write Rowkey,Column
(id = ‘cm7cd’,
firstname = ‘Chaitanya’)
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
Compaction
• Compaction is process which takes all the SSTables, does a
sequential reads back into the memtable of both files, do
merge sort, picks the latest timestamp file and writes a brand
new file.
• It deletes the old files.
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
Compaction
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
Read Operation
www.slideshare.net/sameiralk/cassendra
Read Operation
• It look up in the memtable before going into the files on the disk
• Look up is done in order of newest to oldest
• Cassandra checks an in-memory data structure called Bloom filter
• Bloom filter can quickly tell you whether the key exists in a file
• A key in a column family have many columns so in order to prevent
scanning all the columns it maintain column indices
• In a cluster, client can ask any node to retrieve the data
Consistency Levels
• Set every read and write like ONE, TWO, ALL, QUORUM->51% etc.
Read Operation
http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
Summary
• Established high scalability, performance and wide
applicability
• Very high update throughput, delivering low latency
• Future works:
– Adding compression
– Support atomicity across keys
– Secondary index support
Questions???
Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured
storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
For More Information
Thank You

More Related Content

What's hot

Cassandra
Cassandra Cassandra
Cassandra Pooja GV
 
How to Use Oracle RAC in a Cloud? - A Support Question
How to Use Oracle RAC in a Cloud? - A Support QuestionHow to Use Oracle RAC in a Cloud? - A Support Question
How to Use Oracle RAC in a Cloud? - A Support QuestionMarkus Michalewicz
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsMydbops
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleMariaDB plc
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debeziumKasun Don
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Distributed Lock Manager
Distributed Lock ManagerDistributed Lock Manager
Distributed Lock ManagerHao Chen
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...DataStax
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBScyllaDB
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 
Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage OverviewCloudian
 
Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...
Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...
Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...Isuru Perera
 

What's hot (20)

Cassandra
Cassandra Cassandra
Cassandra
 
How to Use Oracle RAC in a Cloud? - A Support Question
How to Use Oracle RAC in a Cloud? - A Support QuestionHow to Use Oracle RAC in a Cloud? - A Support Question
How to Use Oracle RAC in a Cloud? - A Support Question
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Distributed Lock Manager
Distributed Lock ManagerDistributed Lock Manager
Distributed Lock Manager
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Object Storage Overview
Object Storage OverviewObject Storage Overview
Object Storage Overview
 
Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...
Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...
Unified Modeling Language (UML), Object-Oriented Programming Concepts & Desig...
 

Viewers also liked

Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in CassandraArunit Gupta
 
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...Maria Stylianou
 
Cassandra Prophecy
Cassandra ProphecyCassandra Prophecy
Cassandra ProphecyIgor Khotin
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10egpeters
 
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10egpeters
 
Distribution strategy
Distribution strategyDistribution strategy
Distribution strategyBo Sar
 
Gossip & Key Value Store
Gossip & Key Value StoreGossip & Key Value Store
Gossip & Key Value StoreSajeev P
 
2014 09-23 Mechanism of Gossip protocol
2014 09-23 Mechanism of Gossip protocol2014 09-23 Mechanism of Gossip protocol
2014 09-23 Mechanism of Gossip protocolSugawara Genki
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra SigmodJeff Hammerbacher
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
The difference between centralized and decentralized education
The difference between centralized and decentralized educationThe difference between centralized and decentralized education
The difference between centralized and decentralized educationAnnieforever Oralloalways
 
Our presentation on algorithm design
Our presentation on algorithm designOur presentation on algorithm design
Our presentation on algorithm designNahid Hasan
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 

Viewers also liked (20)

Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in Cassandra
 
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
 
Cassandra Prophecy
Cassandra ProphecyCassandra Prophecy
Cassandra Prophecy
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
 
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
NoSQL Cassandra Talk for Seattle Tech Startups 3-10-10
 
Distribution strategy
Distribution strategyDistribution strategy
Distribution strategy
 
Gossip & Key Value Store
Gossip & Key Value StoreGossip & Key Value Store
Gossip & Key Value Store
 
Cassandra
CassandraCassandra
Cassandra
 
2014 09-23 Mechanism of Gossip protocol
2014 09-23 Mechanism of Gossip protocol2014 09-23 Mechanism of Gossip protocol
2014 09-23 Mechanism of Gossip protocol
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
The difference between centralized and decentralized education
The difference between centralized and decentralized educationThe difference between centralized and decentralized education
The difference between centralized and decentralized education
 
Our presentation on algorithm design
Our presentation on algorithm designOur presentation on algorithm design
Our presentation on algorithm design
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Writing algorithms
Writing algorithmsWriting algorithms
Writing algorithms
 

Similar to Cassandra - A decentralized storage system

Cassandra
CassandraCassandra
Cassandraexsuns
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Md. Shohel Rana
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra DatabaseYounesCharfaoui
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial Na Zhu
 
cassandra
cassandracassandra
cassandraAkash R
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraTarun Garg
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptxNaveen Kumar
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandraAdnan Siddiqi
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrationsT Jake Luciani
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 

Similar to Cassandra - A decentralized storage system (20)

Cassandra
CassandraCassandra
Cassandra
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
 
cassandra
cassandracassandra
cassandra
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Cassandra - A decentralized storage system

  • 1. Cassandra - A Decentralized Structured Storage System Presented By Tejaswi Ganne Latha Muddu Khulud Alsultan Rajaramya Janagama Marmik Patel Arunit Gupta Chaitanya Sai Prashant Malik Facebook Avinash Lakshman Facebook
  • 2. Outline • Data Model. • Cassandra API. • System Architecture: • Partitioning. • Replication. • Membership and Failure Detection. • Bootstrapping and Scaling the Cluster. • Local Persistence.
  • 4. Introduction •Apache Cassandra is an open source distributed storage system. •Manages very large data spread across many commodity servers located across many data centers. •Named after the Greek mythological prophet Cassandra. •Initially developed at Facebook to power their Inbox Search feature, later Facebook open sourced it as Apache Incubator project. •Features – •High Scalability •High Availability •Fault Tolerant
  • 5. Features • High Scalability: There is no downtime or interruptions to applications as read and write throughput increases linearly as new machines get added. • High Availability: It refers to systems which are durable and likely to operate continuously without any failure for a long time. • Fault Tolerant: Data is automatically replicated to multiple nodes, where failed nodes can be replaced within no time. Replication is supported across multiple data centers.
  • 6. Data Model •Uses a simple data model instead of a full Relational data model. • A table in Cassandra is a distributed multi dimensional map indexed by a key. •Value is a structured object. •Operations are atomic on each row per replica.
  • 7. Data Model *Figure taken from http://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
  • 8. Data Model Contd… •Each row can have different number of columns. •Each column has < Name, Value, Timestamp >. •Columns can be ordered by names and timestamps. •Columns are grouped into Column Families (CF). •Two types of CFs – •Simple Column Family –Has columns. –A Column can be accessed using the convention - Âťcolumn_family : column •Super Column Family –CF within a CF. –Has a Simple CF, or another Super CF in it. –A Column can be accessed using the convention - Âťcolumn_family : super_column : column
  • 9. Key-Value Model • It is column oriented NoSQL system • Row is collection of columns labeled with a name • Key is the column name and a row must contain at least 1 column https://10kloc.wordpress.com/2012/12/25/cassandra-chapter-three-data-model/
  • 10. Related Work •Amazon Dynamo Dynamo is a storage system that is used by Amazon to store and retrieve user shopping carts. It requires both read and write operations for managing timestamps. •Google Chubby GFS uses a simple design with a single master server for hosting the entire metadata and where the data is split into chunks and stored in chunk servers. It is made fault tolerant using the Chubby abstraction. Chubby achieves fault-tolerant through replication.
  • 11. References • https://en.wikipedia.org/wiki/Apache_Cassandra • http://cassandra.apache.org/ • http://www.planetcassandra.org/what-is-apache- cassandra/ • http://docs.datastax.com/en/cassandra/2.1/cassandra/gett ingStartedCassandraIntro.html
  • 14. Cassandra Query Language • The Cassandra Query Language (CQL) is the primary language for communicating with the Cassandra database. CQL Statements : Data Definition Statements Data Manipulation Statements Queries
  • 15. Data Definition Statements • Create Keyspace • Use • Alter Keyspace • Drop Keyspace • Create Table • Alter Table • Drop Table • Create Type • Alter Type • Drop Type • Create Trigger • Drop Trigger • Create Function • Drop Function • Create Aggregate • Drop Aggregate
  • 16. Data Definition Statements(Cont..) 1. Create Keyspace : cqlsh> CREATE KEYSPACE sample_demo with replication = {‘ class ’: ‘ SimpleStrategy ‘ , ‘ replicaton_factor ‘ : 3 }; 2.Use Keyspace : cqlsh> USE sample_demo ; 3.Alter Keyspace : cqlsh>ALTER KEYSPACE sample_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 5};
  • 17. Data Definition Statements(Cont..) 4. Drop Keyspace : DROP KEYSPACE sample_demo ; 5. Create Table : CREATE TABLE presentors_list ( firstname text, lastname text, classid int, email text, PRIMARY KEY (lastname)); 6. Alter Table : ALTER TABLE presentors_list ADD city text ;
  • 18. Data Definition Statements(Cont..) 7 . Drop Table :removes a table. DROP TABLE presentors_list; 8. Truncate Table : removes all data from a table. TRUNCATE presentors_list;
  • 20. Data Manipulation Statements Insert ,Update , Delete , Batch 1.Insert : INSERT INTO presentors_list( firstname , lastname , classid ,email ) VALUES( ‘ lakshmi‘ , ‘ upadrasta ‘ , 29 ,mltx9@mail.umkc.edu); 2.Update : UPDATE presentors_list TTL 400 set firstname = ‘ dileep ’ , classid=‘28’, WHERE lastname =‘ upadrasta ’ ;
  • 21. Data Manipulation Statements(Cont..) 3.Delete : DELETE FROM presentors_list USING TIMESTAMP WHERE lastname=‘upadrasta’ ; 4.Batch : BEGIN BATCH INSERT QUERY; UPDATE QUERY; DELETEQUERY; APPLY BATCH
  • 22. Queries Select SELECT * FROM presentors_list ; SELECT firstname , email WHERE lastname = ‘upadrasta ’;
  • 26. System Architecture • The core distributed systems techniques:  Partitioning  Replication  Membership and Failure handling  Bootstrapping and Scaling the Cluster.  Local Persistence.
  • 27. System Architecture • These modules work in synchrony to handle read/write requests. • Read/write request for a key gets routed to any node in the cluster. • The node determines the replica for this particular key.
  • 28. System Architecture • For writes: Routes the requests to the replicas and waits for a quorum of replicas to acknowledge the completion of the writes. • For reads: • Routes the requests to the closest replica OR • Routes the requests to all replicas and waits for a quorum of responses
  • 29. Partitioning • Scale incrementally. • Dynamically partition the data over the set of nodes in cluster. • Partitions data using consistent hashing. • Uses an order preserving hash function. • Output range is treated as ring. • Each node is assigned a random value which represents its position on the ring.
  • 31. Consistent Hashing • Example: Cassandra assigns a hash value to each partition key: if you have the following data: https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.htm l
  • 33. Consistent Hashing • Cassandra places the data on each node according to the value of the partition key and the range that the node is responsible for. https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashin g_c.html
  • 34. Consistent hashing Advantage: • Departure or arrival of a node only affects its immediate neighbors and others remain unaffected. Some Challenges: • The random position assignment of each node on the ring leads to non-uniform data and load distribution. • The heterogeneity in the performance of nodes.
  • 35. Partitioning • Two ways to address this issue:  Nodes get assigned to multiple positions in the circle.  Analyze load information on the ring and have lightly loaded nodes move on the ring to alleviate heavily loaded nodes
  • 38. Replication • How data is duplicated across nodes. • Why replication? • To achieve high availability and durability. • Ensure fault tolerance and no failure by replicating one or more copies of every row in a column family across nodes in cluster
  • 39. Replication • How to achieve replication? • Each data item is replicated at N (replication factor) nodes. • Coordinator node is responsible for the replication of data items. • It also replicates keys across N-1 nodes.
  • 40. Replication Policies – Various options to replicate data • Rack Unaware • Rack Aware • Datacenter Aware
  • 41. Rack Unware • Replicate data at N-1 successive nodes after its coordinator
  • 42. Rack Aware Replica 1 Replica 2 Rack 1 N1 N2 N3 Rack2 N4 N5 N6 • No two replicas should lie in the same rack. N- Nodes
  • 43. Data Center Aware • No two replicas should lie in the same datacenter. Rack1 Rack2 Datacenter 1 N1 N2 N4N3 N6N5 N7 N8 Datacenter 2 Replica1 Replica2
  • 44. Advantages • Cassandra provides durability guarantees in the presence of node failures and network partitions. • The storage nodes are spread across multiple datacenters and are connected through high speed data links. • This scheme of replicating across multiple datacenters allows us to handle entire data center failures without any outage.
  • 47. What is Membership? • Can be split into two parts: 1. Service Discovery 2. Failure Detection Service Discovery • Service Discovery comes into picture when new node is set up and added to cluster • Based on Scuttlebutt Reconciliation, a very efficient anti- antropy gossip protocol based mechanism • Scuttlebutt has very efficient utilization of CPU and gossip channel
  • 48. Gossip Protocol and Scuttlebutt Reconciliation Gossip Protocol • Protocol that Cassandra uses to discover information about other nodes • Information transferred from node to the node it knows about • Not only for Membership, but also used to disseminate other system related to control state such as health, tokens, addresses, data size etc. Scuttlebutt Reconciliation • Not necessary that two participants in a gossip exchange most recent mapping than those of the peer • Inspired by real life rumor spreading • Repair replicated data by comparing differences Robbert van Renesse, Dan Mihai Dumitriu, Valient Gough, and Chris Thomas. Efficient reconciliation and flow control for anti-entropy protocols
  • 49. Failure Detection • Comes into picture when the node is was taken down for maintenance, or fails due to an error • Mechanism by which a node can locally determine if any other node in system is up or down • Also used to avoid attempts to communicate with unreachable node • Uses failure detector which is modified version of ÎŚ Accrual Failure Detector • Gossip protocol is used for exchanging information
  • 50. ÎŚ Accrual Failure Detector • Based on very simple principle • Does not emit a Boolean value stating a node is up or down, but emits a value which represents a suspicion level for nodes • Value is defines as ÎŚ • Idea is to express the value of ÎŚ on a scale that is dynamically adjusted to reflect network and load condition • Difference between traditional failure detector and accrual failure detector is which component of the system does what part of failure detection
  • 51. Traditional Failure Detector vs Accrual Failure Detector • In Traditional Failure Detector, the monitoring and interpretation are combined and output of this combination is Boolean. • Application cannot do any interpretation as monitored information is already being interpreted Traditional Failure Detector Accrual Failure Detector • Accrual Failure Detector provides lower level abstraction that avoids the interpretation of monitoring information • Value associated with each process represents suspicion level which is left for application to interpret http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf
  • 52. Properties of ÎŚ • ÎŚ represents likelihood that node A is wrong about node B’s state • Assume when ÎŚ = 1, A will make mistake in deciding state of B is 10%, then likelihood is about 1% when ÎŚ = 2 , 0.1% when ÎŚ = 3 and so on • Node maintains a sliding window of inter-arrival times of gossip messages to calculate the value of ÎŚ • ÎŚ is very good in accuracy and speed • Also adjust well to network conditions and server load conditions • Cassandra approximate ÎŚ using exponential distribution
  • 55. Bootstrapping What is Bootstrapping? Adding new nodes is called “Bootstrapping” Ways of Adding new node There are two ways of adding node : – New node gets assigned a random token which gives its position in the ring. It gossips its location to rest of the ring where the information is exchanged about one another. – New node reads its config file to contact it’s initial contact points. • New nodes are added manually by administrator via CLI or Web interface provided by Cassandra. http://s3.amazonaws.com/ppt-download/cassandraekaterinberg2013-131212053553-phpapp01.pdf?response-content- disposition=attachment&Signature=7pB%2BhMgGqV1vxcRUaqCbCt2%2BH6o%3D&Expires=1458678552&AWSAccessKeyId=AKIAJ6D6SEMXSASX HDAQ
  • 56. Bootstrapping Contd.. • These initial contact points are known as Seeds, which is basically used by newly added node to know each other, where ultimate goal for all nodes in the cluster is to discover one another. • Seeds can also come from configuration service like Zookeeper, which is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. “Because Coordinating Distributed Systems is a Zoo” Google images
  • 57. Facts!!! • Comparison with Amazon’s Dynamo which is a highly available key-value structured storage system. “Dynamo’s load is no where close to what we see in practice over here at Facebook.” –Avinash Lakshman nosqlmatters2012-130102154135-phpapp01.pdf
  • 58. Configuration In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and CQL, (listen_address and rpc_address respectively).Use listen_address that will be reachable from the listen_address used on all other nodes, a nd a rpc_address that will be accessible to clients. Once everything is configured and the nodes are running, use the bin/nodetool status utility to verify a properly connected cluster. For example: https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cl uster_t.html
  • 59. Environment • Node outages occurred are often transient but may last for extended intervals. • A network outage rarely signifies a permanent departure and should not result in re-balancing of the partition assignment or repair of the unreachable replicas. • Manual errors could result in unintentional startup of new Cassandra nodes. As a result an explicit mechanism is considered appropriate to initiate the addition and removal of nodes from a Cassandra Instance. • Administrator – uses a command line tool or a browser to connect to a Cassandra node and issue a membership change to join or leave the cluster.
  • 60. Scaling the cluster • Whenever a new node is added into the system, it gets assigned a token such that it can alleviate heavily loaded node. • New node will take the range which other node were responsible for before. • Cassandra bootstrapping algorithm is initiated from any other node in the system either using a command line utility or web dashboard. • The node giving up the data streams the data over to the new nodes using kernel copy techniques. Cassandra Ring showing scalability. Scaling the Cluster https://www.google.com/search?q=scalability+in+cassandra&biw=1366&bih=667&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjtztnc4dfLAhWEzoMKHQ1zD4MQ_AUIBygC#imgrc=qusi2v eDeVAH4M%3A
  • 61. Future What is the Future? • Operational experience has shown that data can be transferred at the rate of 40MB/sec from a single node. Work is going on to have multiple replicas take part in the bootstrap transfer by parallelizing the effort, similar to bit torrent which is a p2p system used to transfer large files to thousands of location in a short period of time • Facebook uses bit torrent to distribute updates to Facebook servers. “Bit Torrent is fantastic for this, it’s really great,” Cook said. “It’s ‘super-duper’fast and it allows us to alleviate a lot of scaling concerns we’ve had in the past, where it took forever to get code to the webservers before you could even boot it up and run it.”
  • 62. Virtual nodes in Cassandra • One of the new features slated for Cassandra 1.2’s release was virtual nodes (vnodes) where there was paradigm change from one token or range per node, to many per node. Within a cluster these can be randomly selected and be non- contiguous, giving us many smaller ranges that belong to each node. Advantage?  Use of Heterogeneous machines in a cluster.  Node Failures and backing up. http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
  • 65. Local Persistence – Cassandra depends on the local file system for data persistence. – The data is represented on disk using a format that lends itself to efficient data retrieval. – For a data store to be considered persistent, it must write to non-volatile storage.
  • 66. Cassandra – more than one server • All the nodes participate in a cluster • They are independent – share nothing • Add or remove as needed • If you need more capacity? Add a server
  • 67. Focus on singer server http://www.slideshare.net/patrickmcfadin/introduction-to-cassandra-2014
  • 68. Write operation • Firstly, it writes into the commit log • Then it puts into the in-memory data structure i.e. memtable • The memtable is identified by the primary key • Acknowledge back to the client • This is a simple process and that’s what make scaling is easier • As memtable start to fill up there is a flush process • Flush process writes the memtable to a file called SS table i.e. Sorted String • The writes here are sequential writes http://www.slideshare.net/sameiralk/cassendra
  • 69. Example Update users Set firstname = ‘Chaitanya’ Where id = ‘cm7cd’ write Rowkey,Column (id = ‘cm7cd’, firstname = ‘Chaitanya’)
  • 76. Compaction • Compaction is process which takes all the SSTables, does a sequential reads back into the memtable of both files, do merge sort, picks the latest timestamp file and writes a brand new file. • It deletes the old files.
  • 80. Read Operation • It look up in the memtable before going into the files on the disk • Look up is done in order of newest to oldest • Cassandra checks an in-memory data structure called Bloom filter • Bloom filter can quickly tell you whether the key exists in a file • A key in a column family have many columns so in order to prevent scanning all the columns it maintain column indices • In a cluster, client can ask any node to retrieve the data Consistency Levels • Set every read and write like ONE, TWO, ALL, QUORUM->51% etc.
  • 82. Summary • Established high scalability, performance and wide applicability • Very high update throughput, delivering low latency • Future works: – Adding compression – Support atomicity across keys – Secondary index support
  • 84. Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40. For More Information