SlideShare a Scribd company logo
What is Compaction?
Kazutaka Tomita (INTHEFOREST Co., Ltd.)
Who is this guy?
Kazutaka Tomita (@railute)
• INTHEFOREST Co., Ltd. CEO/CTO
• Consulting for Apache Cassandra and Apache Spark Systems
• Supporting for Cassandra in Japan
• an organizer of Cassandra Summit JPN
Specialty
• RDBMS (Oracle,SQLServer,MySQL,PostgreSQL)
• Apache Cassandra
• Apache Spark
• Apache Hadoop with YARN
• And other NoSQL
• NLP and Text mining for Japanese
Agenda
 Overview of Compaction.
 Compaction Do.
Overview of Compaction.
• Why is the compaction done ?
• When is the compaction done?
• What type is the compaction?
Three points of Cassandra’s Compaction.
Why is the compaction done ?
So, We must purge duplicate or overwritten or deleted data and tombstones.
The most important thing :
The SSTable is immutable.
Writing System for Apache Cassandra
for your reference
memtable
Memory
Disk
Commit Log
Coordinator
node
Flush
SSTable
For
local
1st
NoWriting
node is
alive.
YES
Write Hinted
Sent messages to other node
Writing operation
Receive messages from coordinator node
2nd
memtable memtable
SSTable SSTable
Compacion
Close
YES
No
Sort by token
When is the compaction done?
1.Manually
2.Running in the background
When is the compaction done?
1.Manually
1. nodetool compact
Forces a major compaction on one or more tables.
By size tiered compaction, a major compaction combines each of the
pools of repaired and unrepaired SSTables into one repaired and one
unreparied SSTable.
2. nodetool scrub
Rebuild SSTables for one or more Cassandra tables.
3. nodetool cleanup
Cleans up keyspaces and partition keys no longer belonging to a node.
Use this command to remove unwanted data after adding a new node
to the cluster. Cassandra does not automatically remove data from
nodes that lose part of their partition range to a newly added node.
4. nodetool upgradesstables
Rewrites SSTables for tables that are not running the current version
of Cassandra.
When is the compaction done?
2. Running in the background
1.daemon started
2.after flashing memtables
3.after streaming
4.enable auto compaction by nodetool
5.set compaction threshold by nodetool
What type is the compaction?
1. Minor
2. Major
3. Single-sstable compactions
4. Anti compaction
What type is the compaction?
1. Minor
This compaction runs automatically in the background.
• daemon started
• after flashing memtables
• after streaming
What type is the compaction?
2. Major
This compaction is only called by size tiered compaction.
cf.)
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy#getMaximalTask
The Other compaction is called by “nodetool compact”, but major compaction is not executed.
*n. minor compaction is executed.
cf.)
org.apache.cassandra.db.compaction.DateTieredCompactionStrategy#getMaximalTask
org.apache.cassandra.db.compaction.LeveledCompactionStrategy#getMaximalTask
What type is the compaction?
3. Single-sstable compactions
This Compaction is executed one by one every SSTable.
nodetool upgradesstables
nodetool scrub
nodetool cleanup
What type is the compaction?
4. Anti compaction
This Compaction is for incremental repairs.
After executing incremantal repairs, An anticompaction is called.
*After 2.1
Compaction Strategy
1. SizeTieredCompactionStrategy
For write-intensive workloads
2. LeveledCompactionStrategy
For read-intensive workloads
3. DateTieredCompactionStrategy
For time series data and expiring (TTL) data
Size Tiered Compaction Strategy
When Some SSTables became the similar size, they are merged.
(default is 4.)
SSTable SSTable SSTable SSTable
SSTable SSTable
SSTable SSTable
SSTable
Leveled Compaction Strategy
SSTable SSTable SSTable SSTable SSTable SSTableLebel0
SSTableLebel1 SSTable SSTable
SSTableLebel2 SSTable
The data which
isn't read so much.
DateTieredCompactionStrategy
Default:1hour
The basic idea of DTCS is to group SSTables in windows based on how old the data is in the SSTable.
sstable sstable sstable sstable
sstable
windows
windows
now
sstable
4 sstables 4 sstables
Merge SSTable by Compaction
When Some SSTables became the similar size, they are merged.
(default is 4.)
Name: John
Address: Osaka Address: Tokyo
Tel: xxx-xxx
ages: 20
Name: John
Address: Tokyo
ages: 20

More Related Content

What's hot

Introduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformIntroduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure Platform
SZ Lin
 
Paxos
PaxosPaxos
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
 
Bitcoin
BitcoinBitcoin
Bitcoin
ghanbarianm
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
Databricks
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Citus Data
 
Blockchain: The Information Technology of the Future
Blockchain: The Information Technology of the FutureBlockchain: The Information Technology of the Future
Blockchain: The Information Technology of the Future
Melanie Swan
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
Alexander Korotkov
 
Blockchain and Decentralization
Blockchain and DecentralizationBlockchain and Decentralization
Blockchain and Decentralization
Priyab Satoshi
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
VenkyChinna8
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
 
The Microsoft vision for Blockchain
The Microsoft vision for BlockchainThe Microsoft vision for Blockchain
The Microsoft vision for Blockchain
ASPEX_BE
 
Blockchain for IoT - Smart Home
Blockchain for IoT - Smart HomeBlockchain for IoT - Smart Home
Blockchain for IoT - Smart Home
Biagio Botticelli
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
confluent
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
Johannes Ahlmann
 

What's hot (20)

Introduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformIntroduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure Platform
 
Paxos
PaxosPaxos
Paxos
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
 
Bitcoin
BitcoinBitcoin
Bitcoin
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
 
Blockchain: The Information Technology of the Future
Blockchain: The Information Technology of the FutureBlockchain: The Information Technology of the Future
Blockchain: The Information Technology of the Future
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Blockchain and Decentralization
Blockchain and DecentralizationBlockchain and Decentralization
Blockchain and Decentralization
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
The Microsoft vision for Blockchain
The Microsoft vision for BlockchainThe Microsoft vision for Blockchain
The Microsoft vision for Blockchain
 
Blockchain for IoT - Smart Home
Blockchain for IoT - Smart HomeBlockchain for IoT - Smart Home
Blockchain for IoT - Smart Home
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
 

Viewers also liked

Cassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, CompactionCassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, Compaction
Joshua McKenzie
 
Cassandra3.0
Cassandra3.0Cassandra3.0
Cassandra3.0
Kazutaka Tomita
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
DataStax Academy
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
DataStax
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
datastaxjp
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
MyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBMyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDB
Sergey Petrunya
 
Data Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series DataData Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series Data
Dani Traphagen
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
DataStax
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
Johnny Miller
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax Academy
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
zqhxuyuan
 
Cassandra model
Cassandra modelCassandra model
Cassandra model
zqhxuyuan
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 

Viewers also liked (15)

Cassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, CompactionCassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, Compaction
 
Cassandra3.0
Cassandra3.0Cassandra3.0
Cassandra3.0
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
MyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDBMyRocks: табличный движок для MySQL на основе RocksDB
MyRocks: табличный движок для MySQL на основе RocksDB
 
Data Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series DataData Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series Data
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Cassandra model
Cassandra modelCassandra model
Cassandra model
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 

Similar to Cassandra compaction

SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax London
Uri Cohen
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
JAXLondon2014
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
Laine Campbell
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Mysql talk
Mysql talkMysql talk
Mysql talk
LogicMonitor
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
DataStax
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
DECK36
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Spark Summit
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
ScyllaDB
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
Nathan Milford
 
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Anant Corporation
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
Ilya Bogunov
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Jimin Hsieh
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
DataStax
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
eurobsdcon
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
Chris Fregly
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
Stavros Kontopoulos
 

Similar to Cassandra compaction (20)

SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax London
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Mysql talk
Mysql talkMysql talk
Mysql talk
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strate...
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 

More from Kazutaka Tomita

The rethinkingofrepair
The rethinkingofrepairThe rethinkingofrepair
The rethinkingofrepair
Kazutaka Tomita
 
Apache cassandra nio
Apache cassandra nioApache cassandra nio
Apache cassandra nio
Kazutaka Tomita
 
Apache Cassandra 入門編
Apache Cassandra 入門編Apache Cassandra 入門編
Apache Cassandra 入門編
Kazutaka Tomita
 
Apache cassandra 最前線
Apache cassandra 最前線Apache cassandra 最前線
Apache cassandra 最前線
Kazutaka Tomita
 
Apache sparkとapache cassandraで行うテキスト解析
Apache sparkとapache cassandraで行うテキスト解析Apache sparkとapache cassandraで行うテキスト解析
Apache sparkとapache cassandraで行うテキスト解析
Kazutaka Tomita
 
Cassandra2017
Cassandra2017Cassandra2017
Cassandra2017
Kazutaka Tomita
 
Apache cassandraと apache sparkで作るデータ解析プラットフォーム
Apache cassandraと apache sparkで作るデータ解析プラットフォームApache cassandraと apache sparkで作るデータ解析プラットフォーム
Apache cassandraと apache sparkで作るデータ解析プラットフォーム
Kazutaka Tomita
 
米国の事例で学ぶCassandra
米国の事例で学ぶCassandra米国の事例で学ぶCassandra
米国の事例で学ぶCassandraKazutaka Tomita
 
Cassandraのバックアップと運用を考える
Cassandraのバックアップと運用を考えるCassandraのバックアップと運用を考える
Cassandraのバックアップと運用を考えるKazutaka Tomita
 
What is row level isolation on cassandra
What is row level isolation on cassandraWhat is row level isolation on cassandra
What is row level isolation on cassandra
Kazutaka Tomita
 
Gossip事始め
Gossip事始めGossip事始め
Gossip事始め
Kazutaka Tomita
 

More from Kazutaka Tomita (14)

The rethinkingofrepair
The rethinkingofrepairThe rethinkingofrepair
The rethinkingofrepair
 
Apache cassandra nio
Apache cassandra nioApache cassandra nio
Apache cassandra nio
 
Apache Cassandra 入門編
Apache Cassandra 入門編Apache Cassandra 入門編
Apache Cassandra 入門編
 
Apache cassandra 最前線
Apache cassandra 最前線Apache cassandra 最前線
Apache cassandra 最前線
 
Apache sparkとapache cassandraで行うテキスト解析
Apache sparkとapache cassandraで行うテキスト解析Apache sparkとapache cassandraで行うテキスト解析
Apache sparkとapache cassandraで行うテキスト解析
 
Cassandra2017
Cassandra2017Cassandra2017
Cassandra2017
 
Apache cassandraと apache sparkで作るデータ解析プラットフォーム
Apache cassandraと apache sparkで作るデータ解析プラットフォームApache cassandraと apache sparkで作るデータ解析プラットフォーム
Apache cassandraと apache sparkで作るデータ解析プラットフォーム
 
米国の事例で学ぶCassandra
米国の事例で学ぶCassandra米国の事例で学ぶCassandra
米国の事例で学ぶCassandra
 
Cassandra12to20
Cassandra12to20Cassandra12to20
Cassandra12to20
 
Cassandraのバックアップと運用を考える
Cassandraのバックアップと運用を考えるCassandraのバックアップと運用を考える
Cassandraのバックアップと運用を考える
 
What is row level isolation on cassandra
What is row level isolation on cassandraWhat is row level isolation on cassandra
What is row level isolation on cassandra
 
Cassandra0.7
Cassandra0.7Cassandra0.7
Cassandra0.7
 
Gossip事始め
Gossip事始めGossip事始め
Gossip事始め
 
Consistency level
Consistency levelConsistency level
Consistency level
 

Recently uploaded

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Cassandra compaction

  • 1. What is Compaction? Kazutaka Tomita (INTHEFOREST Co., Ltd.)
  • 2. Who is this guy? Kazutaka Tomita (@railute) • INTHEFOREST Co., Ltd. CEO/CTO • Consulting for Apache Cassandra and Apache Spark Systems • Supporting for Cassandra in Japan • an organizer of Cassandra Summit JPN Specialty • RDBMS (Oracle,SQLServer,MySQL,PostgreSQL) • Apache Cassandra • Apache Spark • Apache Hadoop with YARN • And other NoSQL • NLP and Text mining for Japanese
  • 3. Agenda  Overview of Compaction.  Compaction Do.
  • 4. Overview of Compaction. • Why is the compaction done ? • When is the compaction done? • What type is the compaction? Three points of Cassandra’s Compaction.
  • 5. Why is the compaction done ? So, We must purge duplicate or overwritten or deleted data and tombstones. The most important thing : The SSTable is immutable.
  • 6. Writing System for Apache Cassandra for your reference memtable Memory Disk Commit Log Coordinator node Flush SSTable For local 1st NoWriting node is alive. YES Write Hinted Sent messages to other node Writing operation Receive messages from coordinator node 2nd memtable memtable SSTable SSTable Compacion Close YES No Sort by token
  • 7. When is the compaction done? 1.Manually 2.Running in the background
  • 8. When is the compaction done? 1.Manually 1. nodetool compact Forces a major compaction on one or more tables. By size tiered compaction, a major compaction combines each of the pools of repaired and unrepaired SSTables into one repaired and one unreparied SSTable. 2. nodetool scrub Rebuild SSTables for one or more Cassandra tables. 3. nodetool cleanup Cleans up keyspaces and partition keys no longer belonging to a node. Use this command to remove unwanted data after adding a new node to the cluster. Cassandra does not automatically remove data from nodes that lose part of their partition range to a newly added node. 4. nodetool upgradesstables Rewrites SSTables for tables that are not running the current version of Cassandra.
  • 9. When is the compaction done? 2. Running in the background 1.daemon started 2.after flashing memtables 3.after streaming 4.enable auto compaction by nodetool 5.set compaction threshold by nodetool
  • 10. What type is the compaction? 1. Minor 2. Major 3. Single-sstable compactions 4. Anti compaction
  • 11. What type is the compaction? 1. Minor This compaction runs automatically in the background. • daemon started • after flashing memtables • after streaming
  • 12. What type is the compaction? 2. Major This compaction is only called by size tiered compaction. cf.) org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy#getMaximalTask The Other compaction is called by “nodetool compact”, but major compaction is not executed. *n. minor compaction is executed. cf.) org.apache.cassandra.db.compaction.DateTieredCompactionStrategy#getMaximalTask org.apache.cassandra.db.compaction.LeveledCompactionStrategy#getMaximalTask
  • 13. What type is the compaction? 3. Single-sstable compactions This Compaction is executed one by one every SSTable. nodetool upgradesstables nodetool scrub nodetool cleanup
  • 14. What type is the compaction? 4. Anti compaction This Compaction is for incremental repairs. After executing incremantal repairs, An anticompaction is called. *After 2.1
  • 15. Compaction Strategy 1. SizeTieredCompactionStrategy For write-intensive workloads 2. LeveledCompactionStrategy For read-intensive workloads 3. DateTieredCompactionStrategy For time series data and expiring (TTL) data
  • 16. Size Tiered Compaction Strategy When Some SSTables became the similar size, they are merged. (default is 4.) SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable
  • 17. Leveled Compaction Strategy SSTable SSTable SSTable SSTable SSTable SSTableLebel0 SSTableLebel1 SSTable SSTable SSTableLebel2 SSTable The data which isn't read so much.
  • 18. DateTieredCompactionStrategy Default:1hour The basic idea of DTCS is to group SSTables in windows based on how old the data is in the SSTable. sstable sstable sstable sstable sstable windows windows now sstable 4 sstables 4 sstables
  • 19. Merge SSTable by Compaction When Some SSTables became the similar size, they are merged. (default is 4.) Name: John Address: Osaka Address: Tokyo Tel: xxx-xxx ages: 20 Name: John Address: Tokyo ages: 20