SlideShare a Scribd company logo
1 of 34
Download to read offline
A Comprehensive Introduction to
Apache Cassandra
Saeid Zebardast
@saeidzeb
zebardast.com
Feb 2015
Agenda
● What is NoSQL?
● What is Cassandra?
● Architecture
● Data Model
● Key Features and Benefits
● Hardware
● Directories and Files
● Cassandra Tools
○ CQL
○ Nodetool
○ DataStax Opscenter
● Backup and Restore
● Who’s using Cassandra?
2
What is NoSQL?
● NoSQL (Not Only SQL)
● Simplicity of Design
● Horizontal Scaling (Scale Out)
○ Add nodes to the Cluster as much as you wish
○ Not all NoSQL databases.
● Finer Control over availability
● Data Structure
○ Key-Value
○ Column-Oriented
○ Graph
○ Document-Oriented
○ And etc.
3
What is Cassandra?
● Since 2008 - Current stable version 2.1.2 (Nov 2014)
● NoSQL
● Distributed
● Open source
● Written in Java
● High performance
● Extremely scalable
● Fault tolerant (i.e no SPOF)
4
Architecture Highlights
● Scale out, not up
● Peer-to-Peer, distributed system
○ All nodes the same - masterless with no SPOF
● Online load balancing, cluster growth
● Understanding System/Hardware failures
● Custom data replication to ensure fault tolerance
● CAP theorem (Consistency, Availability, Partition tolerance)
○ You can not have the tree at the same time
○ Tradeoff between consistency and latency are tunable
○ Strong Consistency = Increased Latency
● Each node communicates with each other
○ through the Gossip protocol
5
Architecture Layers
Core Layer Middle Layer Top Layer
● Messaging service
● Gossip Failure detection
● Cluster state
● Partitioner
● Replication
● Commit log
● Memtable
● SSTable
● Indexes
● Compaction
● Tombstones
● Hinted handoff
● Read repair
● Bootstrap
● Monitoring
● Admin tools
Architecture Layers
6
Architecture of a write
1. At first write to a disk commit log (sequential).
2. After write to commit log, it is sent to the appropriate nodes.
3. Each node receiving write, first records it in a local log, then makes update to appropriate Memtables (one for each column family).
○ Memtable is in-memory representation of data (before the data gets flushed to disk as an SSTable).
○ Memtables are flushed to disk when:
■ Out of space
■ Too many keys (128 is default)
■ Time duration (Client provided - no cluster clock)
4. When Memtables written out two files go out:
○ Data File (SSTable).
○ Index File (SSTable Index)
5. When a commit log has had all its column families pushed to disk, it is deleted.
6. Compaction
○ Periodically data files are merged sorted into a new file.
○ Merge keys
○ Combine columns
○ Discard tombstones
7
Data Model
● [Keyspace][ColumnFamily][Key][Column]
● A keyspace is akin to a database in RDBMS
● The keyspace is a row-oriented, column structure
● A column family is similar to an RDBMS table
○ More flexible/dynamic
● A row in a column family is indexed by its key (Primary Key).
○ Cassandra supports up to 2 billion columns per (physical) row.
● Sample code to create keyspace and column family:
○ CREATE KEYSPACE logs WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1} ;
○ CREATE TABLE logs.samples (
node_id text,
metric text,
collection_ts timestamp,
value bigint,
PRIMARY KEY ((node_id, metric), collection_ts)
) WITH CLUSTERING ORDER BY (collection_ts DESC);
8
Data Model - Primary Keys
● Primary Keys are unique.
● Single Primary Key
○ PRIMARY KEY(keyColumn)
● Composite Primary Key
○ PRIMARY KEY (myPartiotionKey, my1stClusteringKey, my2stClusteringKey)
● Composite Partitioning Key
○ PRIMARY KEY ((my1PartiotionKey ,my2PartiotionKey), myClusteringKey)
9
Data Model - Time-To-Live (TTL)
● TTL a row
○ INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘saeid’, ‘zeb’)
USING TTL 3600; //Expires data in one our
● TTL a column
○ UPDATE users USING TTL 30 SET last = ‘zebardast’ WHERE id = ‘abc123’;
● TTL is in seconds
● Can also set default TTL at a table level.
● Expired columns/rows automatically deleted.
● With no TTL specified, columns/values never expire.
● TTL is useful for automatic deletion.
● Re-inserting the same row before it expires will overwrite TTL.
10
Partitioners - Consistent hashing
● A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
● A partitioner is a function for deriving a token representing a row from its partition key (typically by hashing).
11
name email gender
Saeid saeid@domain.com M
Kamyar kamyar@domain.com M
Nazanin nazanin@domain.com F
Masoud masoud@domain.com M
partition key Murmur3 hash value
Saeid -2245462676723223822
Kamyar 7723358927203680754
Nazanin -6723372854036780875
Masoud 1168604627387940318
Cassandra places the data on each
node according to the value of
partition key and the range that the
node is responsible for.
Node Start range End range Partition key Hash value
A -9223372036854775808 -4611686018427387903 Saeid -6723372854036780875
B -4611686018427387904 -1 Kamyar -2245462676723223822
C 0 4611686018427387903 Nazanin 1168604627387940318
D 4611686018427387904 9223372036854775807 Masoud 7723358927203680754
Cassandra assigns a hash value to each partition
key
Key Features and Benefits
● Gigabyte to Petabyte scalability
● Linear performance
● No SPOF
● Easy replication / data distribution
● Multi datacenter and cloud capable
● No need for separate caching layer
● Tunable data consistency
● Flexible schema design
● Data compaction
● CQL Language (like SQL)
● Support for key languages and platforms
● No need for special hardware or software
12
Big Data Scalability
● Capable of comfortably scaling to petabytes
● New nodes = linear performance increase
● Add new nodes online
13
No Single Point of Failure
● All nodes the same
○ Peer-to-Peer - masterless
● Customized replication affords tunales data redundancy
● Read/Write from any node
● Can replicate data among different physical data center racks
14
Easy Replication / Data Distribution
● Transparently handled by Cassandra
● Multi-data center capable
● Exploits all the benefits of Cloud computing
● Able to do Hybrid Cloud/On-Premise setup
15
No Need for Caching Software
● Peer-to-Peer architecture
○ removes need for special caching layer
● The database cluster uses the memory from all participating nodes to cache the data assigned
to each node.
● No irregularities between a memory cache and database are encountered
16
Tunable Data Consistency
● Choose between strong and eventual consistency
○ Depends on the need
● Can be done on a per operation basis, and for both read and writes.
● Handle Multi-data center operations
● Consistency Level (CL)
○ ALL = all replicas ack
○ QUORUM = > 51% of replicas ack
○ ONE = only one replica ack
○ Plus more… (see docs)
17
Flexible Schema
● Dynamic schema design
● Handles structured, semi-structured, and unstructured data.
● Counters is supported
● No offline/downtime for schema changes
● Support primary and secondary indexes
○ Secondary indexes != Relational Indexes (They are not for convenient not speed)
18
Data Compaction
● Use Google’s Snappy data compression algorithm
● Compresses data on a per column family level
● Internal tests at DataStax show up to 80%+ compression on row data
● No performance penalty
○ Some increases in overall performance due to less physical I/O
19
Locally Distributed
● Client reads or writes to any node
● Node coordinates with others
● Data read or replicated in parallel
● Replication info
○ Replication Factor (RF): How many copy of your data?
○ Each node is storing (RF/Cluster Size)% of the clusters total data.
○ Handy Calculator: http://www.ecyrd.com/cassandracalculator/
20
Rack Aware
● Cassandra is aware of which rack (or availability zone) each node resides in.
● It will attempt to place each data copy in different rack.
21
Data Center Aware
● Active Everywhere - reads/writes in multiple data centers
● Client writes local
● Data syncs across WAN
● Replication Factor per DC
● Different number of nodes per data center
22
Node Failure
● A single node failure shouldn’t bring failure.
● Replication Factor + Consistency Level = Success
23
Node Recovery
● When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally.
● When the node recovers, the coordinator replays the missed writes.
● Note: a hinted write does not count towards the consistency level.
● Note: you should still run repairs across your cluster.
24
Security in Cassandra
● Internal Authentication
○ Manages login IDs and passwords inside the database.
● Object Permission Management
○ Controls who has access to what and who can do what in the database
○ Uses familiar GRANT/REVOKE from relational systems.
● Client to Node Encryption
○ Protects data in flight to and from a database
25
Hardware
● RAM
○ The more memory a Cassandra node has, the better read performance.
■ For dedicated hardware, the optimal price-performance sweet spot is 16GB to 64GB; the minimum is 8GB.
■ For a virtual environments, the optimal range may be 8GB to 16GB; the minimum is 4GB.
● CPU
○ More cores is better. Cassandra is built with concurrency in mind.
■ For dedicated hardware, 8-core CPU processors are the current price-performance sweet spot.
■ For virtual environments, consider using a provider that allows CPU bursting, such as Rackspace.
● Disk
○ Cassandra tries to minimize random IO. Minimum of 2 disks. Keep CommitLog and Data (SSTable) on separate
spindles. RAID10 or RAID0 as you see fit.
○ XFS or ext4.
● Network
○ Be sure that your network can handle traffic between nodes without bottlenecks.
■ Recommended bandwidth is 1000 Mbit/s (gigabit) or greater.
● More info: Selecting hardware for enterprise implementations...
26
Directories and Files
● Configs
○ The main configuration file for Cassandra
■ /etc/cassandra/cassandra.yaml
○ Java Virtual Machine (JVM) configuration settings
■ /etc/cassandra/cassandra-env.sh
● Data directories
○ /var/lib/cassandra
● Log directory
○ /var/log/cassandra
● Environment settings
○ /usr/share/cassandra
● Cassandra user limits
○ /etc/security/limits.d/cassandra.conf
● More info: Package installation directories...
27
CQL Language
● Very similar to RDBMS SQL syntax
● Create objects via DDL (e.g. CREATE)
● Core DML commands supported: INSERT, UPDATE, DELETE
● Query data with SELECT
● cqlsh, the Python-based command-line client
○ CASSANDRA_PATH/bin/cqlsh
● More info: https://cassandra.apache.org/doc/cql/CQL.html
28
Nodetool
● A command line interface for managing a cluster.
○ CASSANDRA_PATH/bin/nodetool
● Useful commands:
○ nodetool info - Display node info (uptime, load and etc.).
○ nodetool status [keyspace] - Display cluster info (state, load and etc.).
○ nodetool cfstats [keyspace] - Display statistics of column families.
○ nodetool tpstats - Display usage statistics of thread pool.
○ nodetool netstats - Display network information.
○ nodetool repair - Repair one or more column families.
○ nodetool rebuild - Rebuild data by streaming from other nodes (similarly to bootstrap).
○ nodetool drain - Flush Memtables to SSTables on disk and stop accepting writes. Useful before a restart to make startup
quick.
○ nodetool flush [keyspace [columnfamily]] - Flushes one or more column families from the memtable.
○ nodetool cfhistograms keyspace columnfamily - Display statistic histograms for a given column family.
○ nodetool proxyhistograms - Display statistic histograms for network operations.
○ nodetool help - Display help information!
29
Backup and Restore
● Take Snapshot
○ nodetool snapshot
■ /var/lib/cassandra/keyspace_name/table_name-UUID/snapshots/snapshot_name
○ nodetool clearsnapshot
● Restore Procedure
○ Shutdown the node.
○ Clear all files in the commitlog directory (/var/lib/cassandra/commitlog)
○ Delete all *.db files in data_directory_location/keyspace_name/table_name-UUID directory.
○ Locate the most recent snapshot folder in this directory:
■ data_directory_location/keyspace_name/table_name-UUID/snapshots/snapshot_name
○ Copy its contents into this directory:
■ data_directory_location/keyspace_name/table_name-UUID
○ Start the node
■ Restarting causes a temporary burst of I/O activity and consumes a large amount of CPU resources.
○ Run nodetool repair
● More info: Restoring from a Snapshot...
30
DataStax Opscenter
● Visually create new clusters with a few mouse clicks either on premise or in the cloud
● Add, edit, and remove nodes
● Automatically rebalance a cluster
● Control automatic management services including transparent repair
● Manage and schedule backup and restore operations
● Perform capacity planning with historical trend analysis and forecasting capabilities
● Proactively manage all clusters with threshold and timing-based alerts
● Generate reports and diagnostic reports with the push of a button
● Integrate with other enterprise tools via developer API
● More info: http://www.datastax.com/datastax-opscenter
31
Who’s Using Cassandra?
● Apple
● CERN
● Cisco
● Digg
● Facebook
● IBM
● Instagram
● Mahalo.com
● Netflix
● Rackspace
● Reddit
● SoundCloud
● Spotify
● Twitter
● Zoho
● http://planetcassandra.org/companies/
32
Where Can I Learn More?
● https://cassandra.apache.org/
● http://planetcassandra.org/
● http://www.datastax.com
33
Thank you
Saeid Zebardast
@saeidzeb
zebardast.com
Feb 2015
Any
Questions,
Comments?
34

More Related Content

What's hot

Sql serverインデックスの断片化と再構築の必要性について
Sql serverインデックスの断片化と再構築の必要性についてSql serverインデックスの断片化と再構築の必要性について
Sql serverインデックスの断片化と再構築の必要性について
貴仁 大和屋
 
Seasar2で作った俺たちのサービスの今
Seasar2で作った俺たちのサービスの今Seasar2で作った俺たちのサービスの今
Seasar2で作った俺たちのサービスの今
Koichi Sakata
 

What's hot (20)

Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Sql serverインデックスの断片化と再構築の必要性について
Sql serverインデックスの断片化と再構築の必要性についてSql serverインデックスの断片化と再構築の必要性について
Sql serverインデックスの断片化と再構築の必要性について
 
5分でわかるクリーンアーキテクチャ
5分でわかるクリーンアーキテクチャ5分でわかるクリーンアーキテクチャ
5分でわかるクリーンアーキテクチャ
 
Visual Studio 2019で始める「WPF on .NET Core 3.0」開発
Visual Studio 2019で始める「WPF on .NET Core 3.0」開発Visual Studio 2019で始める「WPF on .NET Core 3.0」開発
Visual Studio 2019で始める「WPF on .NET Core 3.0」開発
 
Seasar2で作った俺たちのサービスの今
Seasar2で作った俺たちのサービスの今Seasar2で作った俺たちのサービスの今
Seasar2で作った俺たちのサービスの今
 
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
 
CommonJSの話
CommonJSの話CommonJSの話
CommonJSの話
 
Databricksチューニングあれこれ(JEDAI 2023 X‘mas/忘年会 Meetup! LT登壇資料)
Databricksチューニングあれこれ(JEDAI 2023 X‘mas/忘年会 Meetup! LT登壇資料)Databricksチューニングあれこれ(JEDAI 2023 X‘mas/忘年会 Meetup! LT登壇資料)
Databricksチューニングあれこれ(JEDAI 2023 X‘mas/忘年会 Meetup! LT登壇資料)
 
【初心者向け】Go言語勉強会資料
 【初心者向け】Go言語勉強会資料 【初心者向け】Go言語勉強会資料
【初心者向け】Go言語勉強会資料
 
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
 
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
ドメイン駆動設計のプラクティスでカバーできること、できないこと[DDD]
 
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
 
FHIR on python
FHIR on pythonFHIR on python
FHIR on python
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
 
ちいさなオブジェクトでドメインモデルを組み立てる
ちいさなオブジェクトでドメインモデルを組み立てるちいさなオブジェクトでドメインモデルを組み立てる
ちいさなオブジェクトでドメインモデルを組み立てる
 
220118 RPAコミュニティ Google Workspace セミナー
220118 RPAコミュニティ Google Workspace セミナー220118 RPAコミュニティ Google Workspace セミナー
220118 RPAコミュニティ Google Workspace セミナー
 
モジュールの凝集度・結合度・インタフェース
モジュールの凝集度・結合度・インタフェースモジュールの凝集度・結合度・インタフェース
モジュールの凝集度・結合度・インタフェース
 
クラウド環境下におけるAPIリトライ設計
クラウド環境下におけるAPIリトライ設計クラウド環境下におけるAPIリトライ設計
クラウド環境下におけるAPIリトライ設計
 
磁気浮上式無接点スイッチ - MagLev Switch MX -
磁気浮上式無接点スイッチ - MagLev Switch MX -磁気浮上式無接点スイッチ - MagLev Switch MX -
磁気浮上式無接点スイッチ - MagLev Switch MX -
 
Architecture 101 + Libraries
Architecture 101 + LibrariesArchitecture 101 + Libraries
Architecture 101 + Libraries
 

Similar to An Introduction to Apache Cassandra

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
Christina Yu
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 

Similar to An Introduction to Apache Cassandra (20)

Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Redis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HARedis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HA
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Cassandra
CassandraCassandra
Cassandra
 
Redshift
RedshiftRedshift
Redshift
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Distributed unique id generation
Distributed unique id generationDistributed unique id generation
Distributed unique id generation
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 

More from Saeid Zebardast

More from Saeid Zebardast (13)

Web Components Revolution
Web Components RevolutionWeb Components Revolution
Web Components Revolution
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
An overview of Scalable Web Application Front-end
An overview of Scalable Web Application Front-endAn overview of Scalable Web Application Front-end
An overview of Scalable Web Application Front-end
 
MySQL Cheat Sheet
MySQL Cheat SheetMySQL Cheat Sheet
MySQL Cheat Sheet
 
Java Cheat Sheet
Java Cheat SheetJava Cheat Sheet
Java Cheat Sheet
 
Developing Applications with MySQL and Java for beginners
Developing Applications with MySQL and Java for beginnersDeveloping Applications with MySQL and Java for beginners
Developing Applications with MySQL and Java for beginners
 
Java for beginners
Java for beginnersJava for beginners
Java for beginners
 
MySQL for beginners
MySQL for beginnersMySQL for beginners
MySQL for beginners
 
هفده اصل افراد موثر در تیم
هفده اصل افراد موثر در تیمهفده اصل افراد موثر در تیم
هفده اصل افراد موثر در تیم
 
What is good design?
What is good design?What is good design?
What is good design?
 
How to be different?
How to be different?How to be different?
How to be different?
 
What is REST?
What is REST?What is REST?
What is REST?
 
معرفی گنو/لینوکس و سیستم عامل های متن باز و آزاد
معرفی گنو/لینوکس و سیستم عامل های متن باز و آزادمعرفی گنو/لینوکس و سیستم عامل های متن باز و آزاد
معرفی گنو/لینوکس و سیستم عامل های متن باز و آزاد
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

An Introduction to Apache Cassandra

  • 1. A Comprehensive Introduction to Apache Cassandra Saeid Zebardast @saeidzeb zebardast.com Feb 2015
  • 2. Agenda ● What is NoSQL? ● What is Cassandra? ● Architecture ● Data Model ● Key Features and Benefits ● Hardware ● Directories and Files ● Cassandra Tools ○ CQL ○ Nodetool ○ DataStax Opscenter ● Backup and Restore ● Who’s using Cassandra? 2
  • 3. What is NoSQL? ● NoSQL (Not Only SQL) ● Simplicity of Design ● Horizontal Scaling (Scale Out) ○ Add nodes to the Cluster as much as you wish ○ Not all NoSQL databases. ● Finer Control over availability ● Data Structure ○ Key-Value ○ Column-Oriented ○ Graph ○ Document-Oriented ○ And etc. 3
  • 4. What is Cassandra? ● Since 2008 - Current stable version 2.1.2 (Nov 2014) ● NoSQL ● Distributed ● Open source ● Written in Java ● High performance ● Extremely scalable ● Fault tolerant (i.e no SPOF) 4
  • 5. Architecture Highlights ● Scale out, not up ● Peer-to-Peer, distributed system ○ All nodes the same - masterless with no SPOF ● Online load balancing, cluster growth ● Understanding System/Hardware failures ● Custom data replication to ensure fault tolerance ● CAP theorem (Consistency, Availability, Partition tolerance) ○ You can not have the tree at the same time ○ Tradeoff between consistency and latency are tunable ○ Strong Consistency = Increased Latency ● Each node communicates with each other ○ through the Gossip protocol 5
  • 6. Architecture Layers Core Layer Middle Layer Top Layer ● Messaging service ● Gossip Failure detection ● Cluster state ● Partitioner ● Replication ● Commit log ● Memtable ● SSTable ● Indexes ● Compaction ● Tombstones ● Hinted handoff ● Read repair ● Bootstrap ● Monitoring ● Admin tools Architecture Layers 6
  • 7. Architecture of a write 1. At first write to a disk commit log (sequential). 2. After write to commit log, it is sent to the appropriate nodes. 3. Each node receiving write, first records it in a local log, then makes update to appropriate Memtables (one for each column family). ○ Memtable is in-memory representation of data (before the data gets flushed to disk as an SSTable). ○ Memtables are flushed to disk when: ■ Out of space ■ Too many keys (128 is default) ■ Time duration (Client provided - no cluster clock) 4. When Memtables written out two files go out: ○ Data File (SSTable). ○ Index File (SSTable Index) 5. When a commit log has had all its column families pushed to disk, it is deleted. 6. Compaction ○ Periodically data files are merged sorted into a new file. ○ Merge keys ○ Combine columns ○ Discard tombstones 7
  • 8. Data Model ● [Keyspace][ColumnFamily][Key][Column] ● A keyspace is akin to a database in RDBMS ● The keyspace is a row-oriented, column structure ● A column family is similar to an RDBMS table ○ More flexible/dynamic ● A row in a column family is indexed by its key (Primary Key). ○ Cassandra supports up to 2 billion columns per (physical) row. ● Sample code to create keyspace and column family: ○ CREATE KEYSPACE logs WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1} ; ○ CREATE TABLE logs.samples ( node_id text, metric text, collection_ts timestamp, value bigint, PRIMARY KEY ((node_id, metric), collection_ts) ) WITH CLUSTERING ORDER BY (collection_ts DESC); 8
  • 9. Data Model - Primary Keys ● Primary Keys are unique. ● Single Primary Key ○ PRIMARY KEY(keyColumn) ● Composite Primary Key ○ PRIMARY KEY (myPartiotionKey, my1stClusteringKey, my2stClusteringKey) ● Composite Partitioning Key ○ PRIMARY KEY ((my1PartiotionKey ,my2PartiotionKey), myClusteringKey) 9
  • 10. Data Model - Time-To-Live (TTL) ● TTL a row ○ INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘saeid’, ‘zeb’) USING TTL 3600; //Expires data in one our ● TTL a column ○ UPDATE users USING TTL 30 SET last = ‘zebardast’ WHERE id = ‘abc123’; ● TTL is in seconds ● Can also set default TTL at a table level. ● Expired columns/rows automatically deleted. ● With no TTL specified, columns/values never expire. ● TTL is useful for automatic deletion. ● Re-inserting the same row before it expires will overwrite TTL. 10
  • 11. Partitioners - Consistent hashing ● A partitioner determines how data is distributed across the nodes in the cluster (including replicas). ● A partitioner is a function for deriving a token representing a row from its partition key (typically by hashing). 11 name email gender Saeid saeid@domain.com M Kamyar kamyar@domain.com M Nazanin nazanin@domain.com F Masoud masoud@domain.com M partition key Murmur3 hash value Saeid -2245462676723223822 Kamyar 7723358927203680754 Nazanin -6723372854036780875 Masoud 1168604627387940318 Cassandra places the data on each node according to the value of partition key and the range that the node is responsible for. Node Start range End range Partition key Hash value A -9223372036854775808 -4611686018427387903 Saeid -6723372854036780875 B -4611686018427387904 -1 Kamyar -2245462676723223822 C 0 4611686018427387903 Nazanin 1168604627387940318 D 4611686018427387904 9223372036854775807 Masoud 7723358927203680754 Cassandra assigns a hash value to each partition key
  • 12. Key Features and Benefits ● Gigabyte to Petabyte scalability ● Linear performance ● No SPOF ● Easy replication / data distribution ● Multi datacenter and cloud capable ● No need for separate caching layer ● Tunable data consistency ● Flexible schema design ● Data compaction ● CQL Language (like SQL) ● Support for key languages and platforms ● No need for special hardware or software 12
  • 13. Big Data Scalability ● Capable of comfortably scaling to petabytes ● New nodes = linear performance increase ● Add new nodes online 13
  • 14. No Single Point of Failure ● All nodes the same ○ Peer-to-Peer - masterless ● Customized replication affords tunales data redundancy ● Read/Write from any node ● Can replicate data among different physical data center racks 14
  • 15. Easy Replication / Data Distribution ● Transparently handled by Cassandra ● Multi-data center capable ● Exploits all the benefits of Cloud computing ● Able to do Hybrid Cloud/On-Premise setup 15
  • 16. No Need for Caching Software ● Peer-to-Peer architecture ○ removes need for special caching layer ● The database cluster uses the memory from all participating nodes to cache the data assigned to each node. ● No irregularities between a memory cache and database are encountered 16
  • 17. Tunable Data Consistency ● Choose between strong and eventual consistency ○ Depends on the need ● Can be done on a per operation basis, and for both read and writes. ● Handle Multi-data center operations ● Consistency Level (CL) ○ ALL = all replicas ack ○ QUORUM = > 51% of replicas ack ○ ONE = only one replica ack ○ Plus more… (see docs) 17
  • 18. Flexible Schema ● Dynamic schema design ● Handles structured, semi-structured, and unstructured data. ● Counters is supported ● No offline/downtime for schema changes ● Support primary and secondary indexes ○ Secondary indexes != Relational Indexes (They are not for convenient not speed) 18
  • 19. Data Compaction ● Use Google’s Snappy data compression algorithm ● Compresses data on a per column family level ● Internal tests at DataStax show up to 80%+ compression on row data ● No performance penalty ○ Some increases in overall performance due to less physical I/O 19
  • 20. Locally Distributed ● Client reads or writes to any node ● Node coordinates with others ● Data read or replicated in parallel ● Replication info ○ Replication Factor (RF): How many copy of your data? ○ Each node is storing (RF/Cluster Size)% of the clusters total data. ○ Handy Calculator: http://www.ecyrd.com/cassandracalculator/ 20
  • 21. Rack Aware ● Cassandra is aware of which rack (or availability zone) each node resides in. ● It will attempt to place each data copy in different rack. 21
  • 22. Data Center Aware ● Active Everywhere - reads/writes in multiple data centers ● Client writes local ● Data syncs across WAN ● Replication Factor per DC ● Different number of nodes per data center 22
  • 23. Node Failure ● A single node failure shouldn’t bring failure. ● Replication Factor + Consistency Level = Success 23
  • 24. Node Recovery ● When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally. ● When the node recovers, the coordinator replays the missed writes. ● Note: a hinted write does not count towards the consistency level. ● Note: you should still run repairs across your cluster. 24
  • 25. Security in Cassandra ● Internal Authentication ○ Manages login IDs and passwords inside the database. ● Object Permission Management ○ Controls who has access to what and who can do what in the database ○ Uses familiar GRANT/REVOKE from relational systems. ● Client to Node Encryption ○ Protects data in flight to and from a database 25
  • 26. Hardware ● RAM ○ The more memory a Cassandra node has, the better read performance. ■ For dedicated hardware, the optimal price-performance sweet spot is 16GB to 64GB; the minimum is 8GB. ■ For a virtual environments, the optimal range may be 8GB to 16GB; the minimum is 4GB. ● CPU ○ More cores is better. Cassandra is built with concurrency in mind. ■ For dedicated hardware, 8-core CPU processors are the current price-performance sweet spot. ■ For virtual environments, consider using a provider that allows CPU bursting, such as Rackspace. ● Disk ○ Cassandra tries to minimize random IO. Minimum of 2 disks. Keep CommitLog and Data (SSTable) on separate spindles. RAID10 or RAID0 as you see fit. ○ XFS or ext4. ● Network ○ Be sure that your network can handle traffic between nodes without bottlenecks. ■ Recommended bandwidth is 1000 Mbit/s (gigabit) or greater. ● More info: Selecting hardware for enterprise implementations... 26
  • 27. Directories and Files ● Configs ○ The main configuration file for Cassandra ■ /etc/cassandra/cassandra.yaml ○ Java Virtual Machine (JVM) configuration settings ■ /etc/cassandra/cassandra-env.sh ● Data directories ○ /var/lib/cassandra ● Log directory ○ /var/log/cassandra ● Environment settings ○ /usr/share/cassandra ● Cassandra user limits ○ /etc/security/limits.d/cassandra.conf ● More info: Package installation directories... 27
  • 28. CQL Language ● Very similar to RDBMS SQL syntax ● Create objects via DDL (e.g. CREATE) ● Core DML commands supported: INSERT, UPDATE, DELETE ● Query data with SELECT ● cqlsh, the Python-based command-line client ○ CASSANDRA_PATH/bin/cqlsh ● More info: https://cassandra.apache.org/doc/cql/CQL.html 28
  • 29. Nodetool ● A command line interface for managing a cluster. ○ CASSANDRA_PATH/bin/nodetool ● Useful commands: ○ nodetool info - Display node info (uptime, load and etc.). ○ nodetool status [keyspace] - Display cluster info (state, load and etc.). ○ nodetool cfstats [keyspace] - Display statistics of column families. ○ nodetool tpstats - Display usage statistics of thread pool. ○ nodetool netstats - Display network information. ○ nodetool repair - Repair one or more column families. ○ nodetool rebuild - Rebuild data by streaming from other nodes (similarly to bootstrap). ○ nodetool drain - Flush Memtables to SSTables on disk and stop accepting writes. Useful before a restart to make startup quick. ○ nodetool flush [keyspace [columnfamily]] - Flushes one or more column families from the memtable. ○ nodetool cfhistograms keyspace columnfamily - Display statistic histograms for a given column family. ○ nodetool proxyhistograms - Display statistic histograms for network operations. ○ nodetool help - Display help information! 29
  • 30. Backup and Restore ● Take Snapshot ○ nodetool snapshot ■ /var/lib/cassandra/keyspace_name/table_name-UUID/snapshots/snapshot_name ○ nodetool clearsnapshot ● Restore Procedure ○ Shutdown the node. ○ Clear all files in the commitlog directory (/var/lib/cassandra/commitlog) ○ Delete all *.db files in data_directory_location/keyspace_name/table_name-UUID directory. ○ Locate the most recent snapshot folder in this directory: ■ data_directory_location/keyspace_name/table_name-UUID/snapshots/snapshot_name ○ Copy its contents into this directory: ■ data_directory_location/keyspace_name/table_name-UUID ○ Start the node ■ Restarting causes a temporary burst of I/O activity and consumes a large amount of CPU resources. ○ Run nodetool repair ● More info: Restoring from a Snapshot... 30
  • 31. DataStax Opscenter ● Visually create new clusters with a few mouse clicks either on premise or in the cloud ● Add, edit, and remove nodes ● Automatically rebalance a cluster ● Control automatic management services including transparent repair ● Manage and schedule backup and restore operations ● Perform capacity planning with historical trend analysis and forecasting capabilities ● Proactively manage all clusters with threshold and timing-based alerts ● Generate reports and diagnostic reports with the push of a button ● Integrate with other enterprise tools via developer API ● More info: http://www.datastax.com/datastax-opscenter 31
  • 32. Who’s Using Cassandra? ● Apple ● CERN ● Cisco ● Digg ● Facebook ● IBM ● Instagram ● Mahalo.com ● Netflix ● Rackspace ● Reddit ● SoundCloud ● Spotify ● Twitter ● Zoho ● http://planetcassandra.org/companies/ 32
  • 33. Where Can I Learn More? ● https://cassandra.apache.org/ ● http://planetcassandra.org/ ● http://www.datastax.com 33
  • 34. Thank you Saeid Zebardast @saeidzeb zebardast.com Feb 2015 Any Questions, Comments? 34