SlideShare a Scribd company logo
Cassandra - A Decentralized
Structured Storage System
Nguyen Tuan Quang
Saltlux – Vietnam Development Center
2016.03.21
Agenda
• Database System Outlines
• Cassandra Overview
• Data Model & Architecture
• Key features
• Comparison
Database Market
Relational DBMS
• Since 1970
• Use SQL to manipulate data
• Excellent for applications such as management
(accounting, reservations, staff management, etc)
Relational DBMS
• Schemas aren't designed for sparse data
• Databases are simply not designed to be distributed
New Trends and Requirements
New Trends and Requirements
CAP Theory
all nodes see the
same data at the
same time
the system
continues to operate
despite arbitrary
message loss
every request receives a response about
whether it was successful or failed
Consistency Level
• Strong (Sequential): After the update completes any
subsequent access will return the updated value
• Weak (weaker than Sequential): The system does not
guarantee that subsequent accesses will return the
updated value
• Eventual: All updates will propagate throughout all of the
replicas in a distributed system, but that this may take
some time. Eventually, all replicas will be consistent.
Cassandra
• Apache Cassandra was initially
developed at Facebook to power their
Inbox Search
• Originally designed at Facebook,
Cassandra came from Amazon’s highly
available Dynamo and Google’s BigTable
data model
Use-case: Facebook Inbox Search
• Cassandra developed to address this problem.
• 50+TB of user messages data in 150 node cluster on which
Cassandra is tested.
• Search user index of all messages in 2 ways.
– Term search : search by a key word
– Interactions search : search by a user id
Use-cases: Apple
• Cassandra is Apple's dominant NoSQL database
– MongoDB - 35 job listings (iTunes, Customer Systems Platform, and
others)
– Couchbase - 4 job listings (iTunes Social)
– Hbase - 33 job listings (Maps, Siri, iAd, iCloud, and more)
– Cassandra - 70 job listings (Maps, iAd, iCloud, iTunes, and more)
Replication and Multi Data Center Replication
Use-cases: NetFlix
Use-cases - Apple
Data Model
• Keyspace is the outermost container for data in Cassandra
• Columns are grouped into Column Families.
• Each Column has
– Name
– Value
– Timestamp
Keyspace: metasearch
Column Families: Metasearch_korean
Data Model for Tornado
Metasearch
TOPIC_URL
URL1
TOPIC_CONTENT
CONTENT 1
TOPIC_TITLE
TOPIC_TITLE1
Row 1 Key
TOPIC_URL
URL2
TOPIC_CONTENT
CONTENT 2
TOPIC_TITLE
TOPIC_TITLE2
Row 2 Key
• Partitioning
How data is partitioned across nodes
• Replication
How data is duplicated across nodes
• Cluster Membership
How nodes are added, deleted to the cluster
System Architecture
• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data partition is used
to assign it to a node in the ring.
• Hashing rounds off after certain value to support ring
structure.
• Lightly loaded nodes moves position to alleviate highly
loaded nodes.
Partitioning
Partitioning
Partitioning
?
Partitioning
Partitions, Partition Key
Replication
• Each data item is replicated at N (replication factor) nodes.
• Different Replication Policies
– Rack Unaware – replicate data at N-1 successive nodes after its
coordinator
– Rack Aware – uses ‘Zookeeper’ to choose a leader which tells nodes
the range they are replicas for
– Datacenter Aware – similar to Rack Aware but leader is chosen at
Datacenter level instead of Rack level.
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)
24
Partitioning and Replication
* Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.
25
Partitioning and Replication
Cassandra Key features
• Big Data Scalability
– Scalable to petabytes
– New nodes = linear performance increase
– Add new nodes online
Cassandra Key features
• No Single Point of Failture
– All nodes are the same
– Read/write from any nodes
– Can replicate from different data centers
Cassandra Key features
• Easy Replica/Data Distribution
– Transparently handled by Cassandra
– Multiple data centers are supported
– Exploit the benefits of cloud computing
Cassandra Key features
• No need for caching software
– Peer-to-peer architectures removes needs for special caching layer
– Database cluster uses memory of its own nodes to cache data
Cassandra Key features
• Tunable Data Consistency
– Choose between strong and eventually consistency
– Can be done on per-operation basis, and for both reads and writes
Cassandra Key features
• Tunable Data Consistency
– Choose between strong and eventually consistency
– Can be done on per-operation basis, and for both reads and writes
Mongodb vs. Cassandra
Comparison with MySQL
• MySQL > 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
• Cassandra > 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms
• Stats provided by Authors using facebook data.
Key features Recaps
• Distributed and Decentralized
– Some nodes need to be set up as masters in order to organize other
nodes, which are set up as slaves
– That there is no single point of failure
• High Availability & Fault Tolerance
– You can replace failed nodes in the cluster with no downtime, and
you can replicate data to multiple data centers to offer improved
local performance and prevent downtime if one data center
experiences a catastrophe such as fire or flood.
• Tunable Consistency
– It allows you to easily decide the level of consistency you require, in
balance with the level of availability
Key features Recaps
• Elastic Scalability
– Elastic scalability refers to a special property of horizontal scalability.
It means that your cluster can seamlessly scale up and scale back
down.
References
• https://jaxenter.com/evaluating-nosql-performance-which-database-is-
right-for-your-data-107481.html
• http://www.slideshare.net/amcsquarelearning/learn-mongo-db-at-
amc-square-learning?next_slideshow=1
• https://en.wikipedia.org/wiki/Apache_Cassandra
• http://www.datastax.com/
• http://www.slideshare.net/asismohanty/cassandra-basics-20
Thank You

More Related Content

What's hot

Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
Nader Ganayem
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
nickmbailey
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
Eiti Kimura
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
cassandra
cassandracassandra
cassandra
Akash R
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
Folio3 Software
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
Umair Mansoob
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
ScyllaDB
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
ScyllaDB
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentation
vanjakom
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
nickmbailey
 

What's hot (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
cassandra
cassandracassandra
cassandra
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentation
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 

Similar to Introduction to cassandra

Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
Vipul Thakur
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
Jason Brown
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
KarthikR780430
 
No sql databases
No sql databasesNo sql databases
No sql databases
swathika rajan
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
Ramakrishna kapa
 
Cassandra
CassandraCassandra
Cassandra
ssuserbad56d
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
DanBarcan2
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
ShaimaaMohamedGalal
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
Ted Wennmark
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
Sanura Hettiarachchi
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
BRINDHA256909
 

Similar to Introduction to cassandra (20)

Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
NoSql
NoSqlNoSql
NoSql
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 

More from Nguyen Quang

Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
Nguyen Quang
 
Apache Storm
Apache StormApache Storm
Apache Storm
Nguyen Quang
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Nguyen Quang
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
Nguyen Quang
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
Nguyen Quang
 
Web browser architecture
Web browser architectureWeb browser architecture
Web browser architecture
Nguyen Quang
 
X Query for beginner
X Query for beginnerX Query for beginner
X Query for beginnerNguyen Quang
 
Redistributable introtoscrum
Redistributable introtoscrumRedistributable introtoscrum
Redistributable introtoscrumNguyen Quang
 
Text categorization
Text categorizationText categorization
Text categorization
Nguyen Quang
 
A holistic lexicon based approach to opinion mining
A holistic lexicon based approach to opinion miningA holistic lexicon based approach to opinion mining
A holistic lexicon based approach to opinion miningNguyen Quang
 
Overview of NoSQL
Overview of NoSQLOverview of NoSQL
Overview of NoSQL
Nguyen Quang
 

More from Nguyen Quang (13)

Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Web browser architecture
Web browser architectureWeb browser architecture
Web browser architecture
 
Eclipse orion
Eclipse orionEclipse orion
Eclipse orion
 
X Query for beginner
X Query for beginnerX Query for beginner
X Query for beginner
 
Html 5
Html 5Html 5
Html 5
 
Redistributable introtoscrum
Redistributable introtoscrumRedistributable introtoscrum
Redistributable introtoscrum
 
Text categorization
Text categorizationText categorization
Text categorization
 
A holistic lexicon based approach to opinion mining
A holistic lexicon based approach to opinion miningA holistic lexicon based approach to opinion mining
A holistic lexicon based approach to opinion mining
 
Overview of NoSQL
Overview of NoSQLOverview of NoSQL
Overview of NoSQL
 

Recently uploaded

De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 

Recently uploaded (20)

De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 

Introduction to cassandra

  • 1. Cassandra - A Decentralized Structured Storage System Nguyen Tuan Quang Saltlux – Vietnam Development Center 2016.03.21
  • 2. Agenda • Database System Outlines • Cassandra Overview • Data Model & Architecture • Key features • Comparison
  • 4. Relational DBMS • Since 1970 • Use SQL to manipulate data • Excellent for applications such as management (accounting, reservations, staff management, etc)
  • 5. Relational DBMS • Schemas aren't designed for sparse data • Databases are simply not designed to be distributed
  • 6. New Trends and Requirements
  • 7. New Trends and Requirements
  • 8. CAP Theory all nodes see the same data at the same time the system continues to operate despite arbitrary message loss every request receives a response about whether it was successful or failed
  • 9. Consistency Level • Strong (Sequential): After the update completes any subsequent access will return the updated value • Weak (weaker than Sequential): The system does not guarantee that subsequent accesses will return the updated value • Eventual: All updates will propagate throughout all of the replicas in a distributed system, but that this may take some time. Eventually, all replicas will be consistent.
  • 10. Cassandra • Apache Cassandra was initially developed at Facebook to power their Inbox Search • Originally designed at Facebook, Cassandra came from Amazon’s highly available Dynamo and Google’s BigTable data model
  • 11. Use-case: Facebook Inbox Search • Cassandra developed to address this problem. • 50+TB of user messages data in 150 node cluster on which Cassandra is tested. • Search user index of all messages in 2 ways. – Term search : search by a key word – Interactions search : search by a user id
  • 12. Use-cases: Apple • Cassandra is Apple's dominant NoSQL database – MongoDB - 35 job listings (iTunes, Customer Systems Platform, and others) – Couchbase - 4 job listings (iTunes Social) – Hbase - 33 job listings (Maps, Siri, iAd, iCloud, and more) – Cassandra - 70 job listings (Maps, iAd, iCloud, iTunes, and more) Replication and Multi Data Center Replication
  • 15. Data Model • Keyspace is the outermost container for data in Cassandra • Columns are grouped into Column Families. • Each Column has – Name – Value – Timestamp
  • 16. Keyspace: metasearch Column Families: Metasearch_korean Data Model for Tornado Metasearch TOPIC_URL URL1 TOPIC_CONTENT CONTENT 1 TOPIC_TITLE TOPIC_TITLE1 Row 1 Key TOPIC_URL URL2 TOPIC_CONTENT CONTENT 2 TOPIC_TITLE TOPIC_TITLE2 Row 2 Key
  • 17. • Partitioning How data is partitioned across nodes • Replication How data is duplicated across nodes • Cluster Membership How nodes are added, deleted to the cluster System Architecture
  • 18. • Nodes are logically structured in Ring Topology. • Hashed value of key associated with data partition is used to assign it to a node in the ring. • Hashing rounds off after certain value to support ring structure. • Lightly loaded nodes moves position to alleviate highly loaded nodes. Partitioning
  • 23. Replication • Each data item is replicated at N (replication factor) nodes. • Different Replication Policies – Rack Unaware – replicate data at N-1 successive nodes after its coordinator – Rack Aware – uses ‘Zookeeper’ to choose a leader which tells nodes the range they are replicas for – Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack level.
  • 24. 01 1/2 F E D C B A N=3 h(key2) h(key1) 24 Partitioning and Replication * Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.
  • 26. Cassandra Key features • Big Data Scalability – Scalable to petabytes – New nodes = linear performance increase – Add new nodes online
  • 27. Cassandra Key features • No Single Point of Failture – All nodes are the same – Read/write from any nodes – Can replicate from different data centers
  • 28. Cassandra Key features • Easy Replica/Data Distribution – Transparently handled by Cassandra – Multiple data centers are supported – Exploit the benefits of cloud computing
  • 29. Cassandra Key features • No need for caching software – Peer-to-peer architectures removes needs for special caching layer – Database cluster uses memory of its own nodes to cache data
  • 30. Cassandra Key features • Tunable Data Consistency – Choose between strong and eventually consistency – Can be done on per-operation basis, and for both reads and writes
  • 31. Cassandra Key features • Tunable Data Consistency – Choose between strong and eventually consistency – Can be done on per-operation basis, and for both reads and writes
  • 33. Comparison with MySQL • MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms • Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms • Stats provided by Authors using facebook data.
  • 34. Key features Recaps • Distributed and Decentralized – Some nodes need to be set up as masters in order to organize other nodes, which are set up as slaves – That there is no single point of failure • High Availability & Fault Tolerance – You can replace failed nodes in the cluster with no downtime, and you can replicate data to multiple data centers to offer improved local performance and prevent downtime if one data center experiences a catastrophe such as fire or flood. • Tunable Consistency – It allows you to easily decide the level of consistency you require, in balance with the level of availability
  • 35. Key features Recaps • Elastic Scalability – Elastic scalability refers to a special property of horizontal scalability. It means that your cluster can seamlessly scale up and scale back down.