SlideShare a Scribd company logo
1 of 12
Presentation by
Dhivya Ramasamy
Email:achuhivi08@gmail.com
1
 It is a distributed database from Apache .
 It is highly scalable and designed to manage very large amounts of
structured data.
 High availability with no single point of failure.
 It is a column-oriented database
2
Cassandra RDBMS
It is used to deal with unstructured data. It is used to deal with structured data.
Flexible schema Fixed Schema
Relationships are represented using
collections.
In RDBMS, there are concept of foreign keys,
joins etc.
It won’t support Join’s It support Join’s
3
 Cassandra is to handle big data workloads across multiple
nodes without any single point of failure.
 Cassandra has peer-to-peer distributed system across its
nodes.
 Data is distributed among all the nodes in a cluster.
Advantages and Applicable Area
 Open Source
 Peer to peer
 High Availability & performance..
4
 The components of Cassandra data model are keyspaces,
tables, and columns.
 Keyspaces - is the outermost container for data in Cassandra.
◦ no default keyspace
◦ Replication is specified at the keyspace level.

5
 CQL does not support aggregation queries like max, min, avg
 CQL does not support group by, having queries.
 CQL does not support joins.
 CQL does not support OR queries.
 CQL does not support wildcard queries.
 CQL does not support Union, Intersection queries.
 Table columns cannot be filtered without creating the index.
 Greater than (>) and less than (<) query is only supported on
clustering column.Cassandra query language is not suitable
for analytics purposes because it has so many limitations.
6
 It is the internal communication technique for nodes in a cluster to talk to each other.
 It runs every second for every node and exchange state messages with up to three other nodes in the
cluster.
7
 Snitch job is to determine which data centers and racks it should use to read data from and write data
to.
 Types of Snitches:
 SimpleSnitch
 GossipingPropertyFileSnitch
 PropertyFileSnitch
 Ec2Snitch
 Ec2MultiRegionSnitch
 RackInferringSnitch
8
 Compaction refers to a maintenance process in Cassandra, in which the SSTables are reorganized for
data optimization of data structures on the disk.
 It is useful during interacting with memtables.
 There are two types of compaction in Cassandra.
◦ Minor compaction: It gets started automatically when a new SSTable is created. Here, Cassandra
condenses all the equally sized SSTables into one.
◦ Major compaction: It is triggered manually using the nodetool. It compacts all SSTables of a column
family into one.

9
 Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two
possible states: - The data definitely does not exist in the given file, or - The data probably
exists in the given file.
 It checks if the requested row exists in the SSTable before doing any disk I/O.
 To change the Bloom filter attribute on a column family,
◦ ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.01;
10
 Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two
possible states: - The data definitely does not exist in the given file, or - The data probably
exists in the given file.
 It checks if the requested row exists in the SSTable before doing any disk I/O.
 To change the Bloom filter attribute on a column family,
◦ ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.01;

11
12

More Related Content

What's hot

NoSql And The Semantic Web
NoSql And The Semantic WebNoSql And The Semantic Web
NoSql And The Semantic Web
Irina Hutanu
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
Tharun Srinivasa
 
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Nandhitha B
 
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Nandhitha B
 

What's hot (20)

Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
Discretized streams
Discretized streamsDiscretized streams
Discretized streams
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
Building a PII scrubbing layer
Building a PII scrubbing layerBuilding a PII scrubbing layer
Building a PII scrubbing layer
 
Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Fact
 
NoSql And The Semantic Web
NoSql And The Semantic WebNoSql And The Semantic Web
NoSql And The Semantic Web
 
Cassandra
CassandraCassandra
Cassandra
 
Cluster
ClusterCluster
Cluster
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
C* Summit 2013: The State of CQL by Sylvain Lebresne
C* Summit 2013: The State of CQL by Sylvain LebresneC* Summit 2013: The State of CQL by Sylvain Lebresne
C* Summit 2013: The State of CQL by Sylvain Lebresne
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Aeneas:: An Extensible NoSql Enhancing Application System
Aeneas:: An Extensible NoSql Enhancing Application SystemAeneas:: An Extensible NoSql Enhancing Application System
Aeneas:: An Extensible NoSql Enhancing Application System
 
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
Introduction to yarn N.Nandhitha II M.Sc., computer science Bon secours colle...
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
First review presentation
First review presentationFirst review presentation
First review presentation
 
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
 

Similar to Cassandra advanced-I

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 

Similar to Cassandra advanced-I (20)

Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
cassandra
cassandracassandra
cassandra
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Apache Cassandra.pptx
Apache Cassandra.pptxApache Cassandra.pptx
Apache Cassandra.pptx
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra
CassandraCassandra
Cassandra
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
 

Cassandra advanced-I

  • 2.  It is a distributed database from Apache .  It is highly scalable and designed to manage very large amounts of structured data.  High availability with no single point of failure.  It is a column-oriented database 2
  • 3. Cassandra RDBMS It is used to deal with unstructured data. It is used to deal with structured data. Flexible schema Fixed Schema Relationships are represented using collections. In RDBMS, there are concept of foreign keys, joins etc. It won’t support Join’s It support Join’s 3
  • 4.  Cassandra is to handle big data workloads across multiple nodes without any single point of failure.  Cassandra has peer-to-peer distributed system across its nodes.  Data is distributed among all the nodes in a cluster. Advantages and Applicable Area  Open Source  Peer to peer  High Availability & performance.. 4
  • 5.  The components of Cassandra data model are keyspaces, tables, and columns.  Keyspaces - is the outermost container for data in Cassandra. ◦ no default keyspace ◦ Replication is specified at the keyspace level.  5
  • 6.  CQL does not support aggregation queries like max, min, avg  CQL does not support group by, having queries.  CQL does not support joins.  CQL does not support OR queries.  CQL does not support wildcard queries.  CQL does not support Union, Intersection queries.  Table columns cannot be filtered without creating the index.  Greater than (>) and less than (<) query is only supported on clustering column.Cassandra query language is not suitable for analytics purposes because it has so many limitations. 6
  • 7.  It is the internal communication technique for nodes in a cluster to talk to each other.  It runs every second for every node and exchange state messages with up to three other nodes in the cluster. 7
  • 8.  Snitch job is to determine which data centers and racks it should use to read data from and write data to.  Types of Snitches:  SimpleSnitch  GossipingPropertyFileSnitch  PropertyFileSnitch  Ec2Snitch  Ec2MultiRegionSnitch  RackInferringSnitch 8
  • 9.  Compaction refers to a maintenance process in Cassandra, in which the SSTables are reorganized for data optimization of data structures on the disk.  It is useful during interacting with memtables.  There are two types of compaction in Cassandra. ◦ Minor compaction: It gets started automatically when a new SSTable is created. Here, Cassandra condenses all the equally sized SSTables into one. ◦ Major compaction: It is triggered manually using the nodetool. It compacts all SSTables of a column family into one.  9
  • 10.  Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The data definitely does not exist in the given file, or - The data probably exists in the given file.  It checks if the requested row exists in the SSTable before doing any disk I/O.  To change the Bloom filter attribute on a column family, ◦ ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.01; 10
  • 11.  Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The data definitely does not exist in the given file, or - The data probably exists in the given file.  It checks if the requested row exists in the SSTable before doing any disk I/O.  To change the Bloom filter attribute on a column family, ◦ ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.01;  11
  • 12. 12