SlideShare a Scribd company logo
Introduction to Cassandra
Presented on 26th Feb 2014
Scope
• Introduction to Cassandra and NoSql
• Understanding Cassandra data model
• Configuration, read and writing data in Cassandra
• CQL

2
What is Cassandra
• A Database
• Uses Amazon’s Dyanamo’s fully distribution design
• Uses Google’s BigTable’s column family based data model
• Developed by Facebook (The team was led by Jeff Hammerbacher,
with Avinash Lakshman, Karthik Ranganathan, and Prashant Malik
(Search Team))
• Open source in 2008

3
Problems with RDBMS
• Horizontal scaling: In RDBMS as the size grows the joins become
slows so the retrieval become slow.
• Vertical scaling: adding more hardware, memory, faster processor or
upgrading disk space. Adding hardware creates problem like data
replication, consistency, fail over mechanism.
• Caching layer in large system: like memcache, EHCache, Oracle
Coherence. Updation in the cache and data base is exacerbated over
a cluster.

4
Cassandra
• Apache Cassandra is an open source, distributed, decentralized,
elastically scalable, highly available, fault-tolerant, tuneable
consistent, column-oriented database that bases its distribution
design on Amazon’s Dynamo and its data model on Google’s
Bigtable. Created at Facebook.”

5
Why Cassandra
• Fault tolerant
• Decentralized
• Eventually consistent
• Rich data model
• Elastic
• Highly Available
• No SPF (Single point failure)

6
Cap theorem
• University of California at Berkeley, Eric Brewer posted his CAP theorem in 2000.
• The theorem states that within a large-scale distributed data system, there are three
requirements that have a relationship of sliding dependency.
• Consistency: All database clients will read the same value for the same query, even given
concurrent updates.
• Availability: All database clients will always be able to read and write data.
• Partition Tolerance: The database can be split into multiple machines; it can continue functioning
in the face of network segmentation breaks.

7
Cap theorem (cont.)
• According to theorem only two of the three can be strongly supported distributed data
system

• CA: it means system will block when the system will partitions. so in this the system is been
limited to a single data centre to mitigate this.
• CP: it allow data sharding in order to data scaling. The data will be consistent but data may
loss whenever a node goes down.
• AP: system may return inaccurate data, but the system will always be available, even in the
face of network partitioning. DNS is perhaps the most popular example of a system that is
massively scalable, highly available, and partition-tolerant.

8
9
Fault Tolerant
• Data is automatically replicated to multiple nodes based on
replication factor.
• Replication across multiple data center
• Failed nodes can be replaced with no downtime.
• Uses Accrual Failure Detector for fault detection.

10
Decentralization
• Every node in the cluster is identical (No client server architecture)
• There is no single points of failure.

11
Eventual consistency
• Uses BASE (Basically Available Soft-state Eventual) Consistency.
• As the data is replicated, the latest version of something is sitting on
at least one node in the cluster, but old version will still be on other
node.
• Eventually all nodes will see the latest version.

12
Eventual consistency (Cont.)
• Tuneable Consistency: a replication factor to the number of nodes in
the cluster you want the updates to propagate to.
• Consistency level is a setting that clients must specify on every
operation and that allows you to decide how many replicas in the
cluster must acknowledge a write operation or respond to a read
operation in order to be considered successful. That’s the part where
Cassandra has pushed the decision for determining consistency out to
the client. so strict consistency can be achieved assigning same value
to replication factor and consistency level.

13
Rich Data Model
• Keyspace
• Column family
• Rows
• Column
• Super column

14
Column family
"ToyStore" : {
"Toys" : {
"GumDrop" : {
"Price" : "0.25",
"Section" : "Candy"
}
"Transformer" : {
"Price" : "29.99",
"Section" : "Action
Figures"
}
"MatchboxCar" : {
"Price" : "1.49",
"Section" : "Vehicles"
}
}
},
"Keyspace1" : null,
"system" : null

15
Super Column family

16
"ToyCorporation" : {
"ToyStores" : {
"Ohio Store" : {
"Transformer" : {"Price" : "29.99", "Section" : "Action Figures"}
"GumDrop" : {"Price" : "0.25","Section" : "Candy"}
"MatchboxCar" : {"Price" : "1.49","Section" : "Vehicles"}
}
"New York Store" : {
"JawBreaker" : {"Price" : "4.25","Section" : "Candy"}
"MatchboxCar" : {"Price" : "8.79","Section" : "Vehicles"}
}
}
}

17
Keyspace
It is similar as we have schema in RDBMS, it contains a name and a set
of attributes that defines keyspace wide behaviour.
various attributes are:
1. Replication factor: if it is set to 3 then 3 nodes will be having
the copy of each row.
2. Replica placement strategy: like SimpleStrategy
(RackUnawareStrategy), OldNetworkTopologyStrategy (RackAwareStrategy), and NetworkTopologyStrategy (DatacenterShardStrategy).
3. Column family: will discussed.
18
Column family
• A column family is a container for columns, analogous to the table in
a relational system.
• A Column family holds an ordered list of columns, which is been
refered by the column name.

• [Keyspace][ColumnFamily][Key][Column]

19
Column family (cont.)
column family has two attributes: a name and a comparator.
comparator indicate the sorting order when they are returns against a
query. comparator can be of following types: AsciiType, BytesType,
LexicalUUIDType, IntegerType, LongType, TimeUUIDType, or UTF8Type,
Custom (plug your class to cassandra which should be extending
org.apache.cassandra.db.marshal.AbstractType)

20
Column family (cont.)
Hotel {
• key: AZC_043 { name: Cambria Suites Hayden, phone: 480-444-4444,

address: 400 N. Hayden Rd., city: Scottsdale, state: AZ, zip: 85255}
• key: AZS_011 { name: Clarion Scottsdale Peak, phone: 480-333-3333,
address: 3000 N. Scottsdale Rd, city: Scottsdale, state: AZ, zip: 85255}
• key: CAS_021 { name: W Hotel, phone: 415-222-2222,

address: 181 3rd Street, city: San Francisco, state: CA, zip: 94103}
• key: NYN_042 { name: Waldorf Hotel, phone: 212-555-5555,
address: 301 Park Ave, city: New York, state: NY, zip: 10019}
}

21
Rows
• Cassandra is column-oriented database. each row doesn’t have to
have a same number of columns (as in relational database). Each row
has a unique key, which makes it data accessible.
• Each column family is stored in a separate file.

22
Columns
• The column, which is a name/value pair (and a client-supplied
timestamp of when it was last updated), and a column family, which
is a container for rows that have similar, but not identical, column
sets. each column has an extra column for time stamp which records
the time when last column was last updated. rows does not have
timestamp
• columns are name/value pairs, but a regular column stores a byte
array value

23
Super column
• The value of a super column is a map of subcolumns (which store
byte array values).
• it’s important to keep columns that you are likely to query together in
the same column family, and a super column can be helpful for this.
• Super columns are not indexed.
• Cassandra looks like a four-dimensional hash table. But for super
columns, it becomes more like a five-dimensional hash:
[Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]

24
Some points
• You cannot perform joins in Cassandra. If you have designed a data
model and find that you need a join, you’ll have to either do the work
on the client side, or create a denormalized second column family
that represents the join results for you.
• It is not possible to sort by value, it can only sort by column name in
order to fetch individual columns from a rows without pulling entire
row into memory.
• Column sorting is controllable, but key sorting isn’t row keys always
sort in byte order.

25
Elastic/Highly Avaliable
• Read and write throughput both increase linearly as new machine are
added.
• No downtime or interruption to application.

26
Sharding basic strategies
• feature base or functional segmentation: sharding will feature based
with no common features like user details and items for sale will be
different shards, movie rating and comments will be in different
shards.
• key based sharding: a key in data that will evenly distribute it across
shards. So instead of simply storing one letter of the alphabet for
each server as in the (naive and improper) earlier example, you use a
one-way hash on a key data element and distribute data across
machines according to the hash.
• lookup table: a table with contain information regarding the location
of the actual data.
27
Design Pattern
1. Materialized View (one table per query): create a secondary index to
represent the additional query. “materialized” means storing a full
copy of the original data so that everything you need to answer a
query is right there, without forcing you to look up the original data.
If you are performing a second query because you’re only storing
column names that you use, like foreign keys in the second column
family, that’s a secondary index.

28
Design Pattern (Cont.)
2. Valueless column: storing column value as column name. like in
user/usercity we can have city name as key and users of that city as
column names.
3. Aggregate key: key should be unique so it is possible to add two
column value with a separator to create a aggregate key.

29
Reference
• Assembled using various resources over internet.
Thank You

More Related Content

What's hot

Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
cassandra
cassandracassandra
cassandra
Akash R
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtable
maxtable
 
Vertica
VerticaVertica
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
Evan Weaver
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
Na Zhu
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
Nisheet Mahajan
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
Suvradeep Rudra
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
Justin Swanhart
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
Anuja Gunale
 

What's hot (20)

Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
cassandra
cassandracassandra
cassandra
 
Hive
HiveHive
Hive
 
Pig
PigPig
Pig
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtable
 
Vertica
VerticaVertica
Vertica
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 

Similar to Introduction to cassandra

cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
BRINDHA256909
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
Andrey Lomakin
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
nehabsairam
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
Naveen Kumar
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
ShaimaaMohamedGalal
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
Amazon Web Services
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
Navanit Katiyar
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
Suresh Parmar
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
Kel Graham
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
Ehsan Javanmard
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
Fayez Shayeb
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
NikhilAmauriya
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
Chen Robert
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 

Similar to Introduction to cassandra (20)

cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra
CassandraCassandra
Cassandra
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Introduction to cassandra

  • 2. Scope • Introduction to Cassandra and NoSql • Understanding Cassandra data model • Configuration, read and writing data in Cassandra • CQL 2
  • 3. What is Cassandra • A Database • Uses Amazon’s Dyanamo’s fully distribution design • Uses Google’s BigTable’s column family based data model • Developed by Facebook (The team was led by Jeff Hammerbacher, with Avinash Lakshman, Karthik Ranganathan, and Prashant Malik (Search Team)) • Open source in 2008 3
  • 4. Problems with RDBMS • Horizontal scaling: In RDBMS as the size grows the joins become slows so the retrieval become slow. • Vertical scaling: adding more hardware, memory, faster processor or upgrading disk space. Adding hardware creates problem like data replication, consistency, fail over mechanism. • Caching layer in large system: like memcache, EHCache, Oracle Coherence. Updation in the cache and data base is exacerbated over a cluster. 4
  • 5. Cassandra • Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneable consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook.” 5
  • 6. Why Cassandra • Fault tolerant • Decentralized • Eventually consistent • Rich data model • Elastic • Highly Available • No SPF (Single point failure) 6
  • 7. Cap theorem • University of California at Berkeley, Eric Brewer posted his CAP theorem in 2000. • The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency. • Consistency: All database clients will read the same value for the same query, even given concurrent updates. • Availability: All database clients will always be able to read and write data. • Partition Tolerance: The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks. 7
  • 8. Cap theorem (cont.) • According to theorem only two of the three can be strongly supported distributed data system • CA: it means system will block when the system will partitions. so in this the system is been limited to a single data centre to mitigate this. • CP: it allow data sharding in order to data scaling. The data will be consistent but data may loss whenever a node goes down. • AP: system may return inaccurate data, but the system will always be available, even in the face of network partitioning. DNS is perhaps the most popular example of a system that is massively scalable, highly available, and partition-tolerant. 8
  • 9. 9
  • 10. Fault Tolerant • Data is automatically replicated to multiple nodes based on replication factor. • Replication across multiple data center • Failed nodes can be replaced with no downtime. • Uses Accrual Failure Detector for fault detection. 10
  • 11. Decentralization • Every node in the cluster is identical (No client server architecture) • There is no single points of failure. 11
  • 12. Eventual consistency • Uses BASE (Basically Available Soft-state Eventual) Consistency. • As the data is replicated, the latest version of something is sitting on at least one node in the cluster, but old version will still be on other node. • Eventually all nodes will see the latest version. 12
  • 13. Eventual consistency (Cont.) • Tuneable Consistency: a replication factor to the number of nodes in the cluster you want the updates to propagate to. • Consistency level is a setting that clients must specify on every operation and that allows you to decide how many replicas in the cluster must acknowledge a write operation or respond to a read operation in order to be considered successful. That’s the part where Cassandra has pushed the decision for determining consistency out to the client. so strict consistency can be achieved assigning same value to replication factor and consistency level. 13
  • 14. Rich Data Model • Keyspace • Column family • Rows • Column • Super column 14
  • 15. Column family "ToyStore" : { "Toys" : { "GumDrop" : { "Price" : "0.25", "Section" : "Candy" } "Transformer" : { "Price" : "29.99", "Section" : "Action Figures" } "MatchboxCar" : { "Price" : "1.49", "Section" : "Vehicles" } } }, "Keyspace1" : null, "system" : null 15
  • 17. "ToyCorporation" : { "ToyStores" : { "Ohio Store" : { "Transformer" : {"Price" : "29.99", "Section" : "Action Figures"} "GumDrop" : {"Price" : "0.25","Section" : "Candy"} "MatchboxCar" : {"Price" : "1.49","Section" : "Vehicles"} } "New York Store" : { "JawBreaker" : {"Price" : "4.25","Section" : "Candy"} "MatchboxCar" : {"Price" : "8.79","Section" : "Vehicles"} } } } 17
  • 18. Keyspace It is similar as we have schema in RDBMS, it contains a name and a set of attributes that defines keyspace wide behaviour. various attributes are: 1. Replication factor: if it is set to 3 then 3 nodes will be having the copy of each row. 2. Replica placement strategy: like SimpleStrategy (RackUnawareStrategy), OldNetworkTopologyStrategy (RackAwareStrategy), and NetworkTopologyStrategy (DatacenterShardStrategy). 3. Column family: will discussed. 18
  • 19. Column family • A column family is a container for columns, analogous to the table in a relational system. • A Column family holds an ordered list of columns, which is been refered by the column name. • [Keyspace][ColumnFamily][Key][Column] 19
  • 20. Column family (cont.) column family has two attributes: a name and a comparator. comparator indicate the sorting order when they are returns against a query. comparator can be of following types: AsciiType, BytesType, LexicalUUIDType, IntegerType, LongType, TimeUUIDType, or UTF8Type, Custom (plug your class to cassandra which should be extending org.apache.cassandra.db.marshal.AbstractType) 20
  • 21. Column family (cont.) Hotel { • key: AZC_043 { name: Cambria Suites Hayden, phone: 480-444-4444, address: 400 N. Hayden Rd., city: Scottsdale, state: AZ, zip: 85255} • key: AZS_011 { name: Clarion Scottsdale Peak, phone: 480-333-3333, address: 3000 N. Scottsdale Rd, city: Scottsdale, state: AZ, zip: 85255} • key: CAS_021 { name: W Hotel, phone: 415-222-2222, address: 181 3rd Street, city: San Francisco, state: CA, zip: 94103} • key: NYN_042 { name: Waldorf Hotel, phone: 212-555-5555, address: 301 Park Ave, city: New York, state: NY, zip: 10019} } 21
  • 22. Rows • Cassandra is column-oriented database. each row doesn’t have to have a same number of columns (as in relational database). Each row has a unique key, which makes it data accessible. • Each column family is stored in a separate file. 22
  • 23. Columns • The column, which is a name/value pair (and a client-supplied timestamp of when it was last updated), and a column family, which is a container for rows that have similar, but not identical, column sets. each column has an extra column for time stamp which records the time when last column was last updated. rows does not have timestamp • columns are name/value pairs, but a regular column stores a byte array value 23
  • 24. Super column • The value of a super column is a map of subcolumns (which store byte array values). • it’s important to keep columns that you are likely to query together in the same column family, and a super column can be helpful for this. • Super columns are not indexed. • Cassandra looks like a four-dimensional hash table. But for super columns, it becomes more like a five-dimensional hash: [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn] 24
  • 25. Some points • You cannot perform joins in Cassandra. If you have designed a data model and find that you need a join, you’ll have to either do the work on the client side, or create a denormalized second column family that represents the join results for you. • It is not possible to sort by value, it can only sort by column name in order to fetch individual columns from a rows without pulling entire row into memory. • Column sorting is controllable, but key sorting isn’t row keys always sort in byte order. 25
  • 26. Elastic/Highly Avaliable • Read and write throughput both increase linearly as new machine are added. • No downtime or interruption to application. 26
  • 27. Sharding basic strategies • feature base or functional segmentation: sharding will feature based with no common features like user details and items for sale will be different shards, movie rating and comments will be in different shards. • key based sharding: a key in data that will evenly distribute it across shards. So instead of simply storing one letter of the alphabet for each server as in the (naive and improper) earlier example, you use a one-way hash on a key data element and distribute data across machines according to the hash. • lookup table: a table with contain information regarding the location of the actual data. 27
  • 28. Design Pattern 1. Materialized View (one table per query): create a secondary index to represent the additional query. “materialized” means storing a full copy of the original data so that everything you need to answer a query is right there, without forcing you to look up the original data. If you are performing a second query because you’re only storing column names that you use, like foreign keys in the second column family, that’s a secondary index. 28
  • 29. Design Pattern (Cont.) 2. Valueless column: storing column value as column name. like in user/usercity we can have city name as key and users of that city as column names. 3. Aggregate key: key should be unique so it is possible to add two column value with a separator to create a aggregate key. 29
  • 30. Reference • Assembled using various resources over internet.