SlideShare a Scribd company logo
How We Made Scylla
Maintenance Easier Safer and Faster
Asias He
Asias He
Software Developer, ScyllaDB
Asias He is a long-time open source developer who previously worked on
Debian Project, Solaris Kernel, KVM Virtualization for Linux and OSv
unikernel. He now works on Seastar and Scylla.
2
Agenda
▪ Motivation
▪ Scylla maintenance improvements
▪ Summary
3
Motivation
4
Motivation
▪ Maintenance is an important job to operate Scylla
▪ Require significant data movement
▪ Maintain minimum impact on your workload
▪ Set a goal to make operating Scylla as easy as possible
5
Towards better elasticity and
maintainability
▪ Scylla scales horizontally and vertically
▪ Users enjoy the performance of Scylla
▪ But maintenance has rough edges
• Some maintenance operations are complicated and not
resilient enough
• Compaction causes longer time to add node
6
Scylla Maintenance
Improvements
7
Seedless
▪ Seed nodes
• Help the discovery of the cluster and propagation of gossip
• Special role breaks “every node is symmetric” architecture
• Special treatment and procedures
▪ The seed concept is removed (Scylla 4.3)
• Seed config --seed-provider-parameters
• Remains for compatibility
• Serves as contact points
• Used once when adding new node to the cluster
8
Seedless
▪ Simplified cluster operations
• No more special treatment for seed nodes
• Seed node bootstraps
• Dangerous auto_bootstrap is gone
• No need to replace a dead seed node
• No need to promote a non-seed to seed
• No need to update seed config after node is bootstrapped
• Easier planing
• No more selecting 2 or 3 nodes per DC as seed nodes
▪ More details on blog post
9
Improvements in row-level repair
▪ Introduced since Scylla 3.1
• Repair at the granularity of row instead of partitions
▪ Continuous performance and stability improvements
• Generate RF -1 times less sstables during repair (Scylla 4.4)
• Before: one sstable per peer nodes per vnode
• After: one sstable for all peer nodes per vnode
• Reduce reactor stalls to reduce latency impact
• Futurize operations take time
• Switch to btree_set to store repair_hash
10
Node2
Node1
Node 3
SST for Node1
SST for Node2
Node2
Node1
Node 3
SST for Node1
& Node2
Improvements in row-level repair
▪ Continuous performance and stability
improvements
• Use UUID to identify a repair job
• Synchronous API to query repair status
• Progress report on metrics
▪ More row level repair internals on blog post
11
Repair based node operations
▪ Use repair for all node operations
• Use row level repair as the underlying mechanism to
sync data between nodes instead of streaming
• Single mechanism for all the node operations
• Bootstrap / replace / rebuild
• Decommission / removenode
• Repair
The
Speaker’s
camera
displays
here
12
Repair based node operations
▪ Significant improvements on performance and data safety
• Resumable
• Resume from previous failed bootstrap ops
• Consistency
• Latest replica is guaranteed
• Simplified
• No need to run repair before or after node operations like replace and removenode
• Unified
• All node ops use the same underlying mechanism
▪ More details:
The
Speaker’s
camera
displays
here
13
Redesigned replace operation
▪ Replace an existing dead node
• Common operation to recover a node
▪ Replace operation before Scylla 4.1
• The replace operation takes time to sync data
• Replacing node takes no writes during the operation
• Missing the writes during that time
• Repair is needed after the replace operation for the missing writes
14
Redesigned replace operation
▪ Simpler faster and safer replace operation
(Scylla 4.1)
• No missing writes on the replacing node
• Ensured by the “replacing node takes writes” feature
• Replacing node has latest replica
• Ensured by the “repair based node ops” feature
• Note: Disabled by default, use --enable-repair-based-
node-ops true to enable
• With the above two features
• No repair is needed before or after the replace operation
15
Make removenode safe by default
▪ Current removenode works in “best-effort mode”
• Remove a dead node from the cluster
• Succeed even if part of the data owned by removed
node has not been synced to new owners
• Might cause data consistency issues
16
Make removenode safe by default
▪ With “safe mode” feature (Scylla 4.4)
• Fail the operation if it is not safe
• By default, require all relevant nodes to be alive
• Allow ignore dead nodes explicitly to run in best-effort mode
• E.g., nodetool removenode --ignore-nodes node1,node2
• More resilient in case of error
• Automatically revert to previous state
17
▪ Introduce off-strategy compaction
• Work-in-Progress
• Sstables generated by node operations
are kept in a separate data set
• Compact them together and integrate to main
set when node operation is done
• Less compaction work during node operations
• Faster to add / remove node
Smarter compaction during
node operations
18
No compaction
during replace with
off-strategy
Compaction during
replace without off-
strategy
▪ Introduce bandwidth and IOPS
limiter for repair and compaction
• Work-in-Progress
• Limit the max disk bandwidth and
IOPS allowed
• More control over speed of repair
and other node operations like
add or remove node
• Less impact on your workload
Better control of repair and
compaction intensity
19
20 MB/s Cap
10 MB/s Cap
40 MB/s Cap
▪ Backup throughput is bounded
▪ De-duplicates at file level
▪ Scylla manager and the agent runs under cgroup
to reduce interfere with Scylla Core
Scylla Manager Backup and
Repair improvements
20
▪ Faster repair with Scylla Manager 2.2
• Run repair jobs on multiple nodes in parallel
• Ensures that each node takes part in at most one Scylla
repair job at all times
• Allow adjustment of repair intensity to a node capabilities
• Changing the repair speed in flight
• Small table optimization
• More on blog
Scylla Manager Backup and
Repair improvements
21
Summary
22
Summary
▪ We covered recent improvements to make
maintenance easier safer and faster
• Seedless
• Repair based node operations & redesigned replace / removenode operation
• Off-strategy compaction & IO rate limiter
• Scylla Manger improvements
▪ We will continue to invest more to make operating scylla
as easy as possible
• Add / remove multiple nodes in parallel
• Raft based schema and topology change
• And more …
23
Thank You
@asias_he
asias@scylladb.com
Asias He
24
Download Scylla Open Source:
scylladb.com/download
Talk to an expert:
scylladb.com/consultation
Take a test drive:
scylladb.com/test-drive
Experience Scylla for Yourself
25

More Related Content

What's hot

Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Spark Summit
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
MariaDB plc
 
ARM Architecture for Kernel Development
ARM Architecture for Kernel DevelopmentARM Architecture for Kernel Development
ARM Architecture for Kernel Development
GlobalLogic Ukraine
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
ScyllaDB
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについて
Kumazaki Hiroki
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
オラクルエンジニア通信
 
Azure Synapse Analytics 専用SQL Poolベストプラクティス
Azure Synapse Analytics 専用SQL PoolベストプラクティスAzure Synapse Analytics 専用SQL Poolベストプラクティス
Azure Synapse Analytics 専用SQL Poolベストプラクティス
Microsoft
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
Mydbops
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
Kernel TLV
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Severalnines
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
Carlos Sierra
 
作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...
作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...
作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...
whywaita
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
Jean-François Gagné
 
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
NTT DATA Technology & Innovation
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
J.B. Langston
 

What's hot (20)

MySQL at Yahoo! JAPAN #dbts2018
MySQL at Yahoo! JAPAN #dbts2018MySQL at Yahoo! JAPAN #dbts2018
MySQL at Yahoo! JAPAN #dbts2018
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
 
ARM Architecture for Kernel Development
ARM Architecture for Kernel DevelopmentARM Architecture for Kernel Development
ARM Architecture for Kernel Development
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 
地理分散DBについて
地理分散DBについて地理分散DBについて
地理分散DBについて
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
[Oracle DBA & Developer Day 2012] 高可用性システムに適した管理性と性能を向上させるASM と RMAN の魅力
 
Azure Synapse Analytics 専用SQL Poolベストプラクティス
Azure Synapse Analytics 専用SQL PoolベストプラクティスAzure Synapse Analytics 専用SQL Poolベストプラクティス
Azure Synapse Analytics 専用SQL Poolベストプラクティス
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
 
作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...
作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...
作って(壊して?)学ぶインターネットのしくみ サイバーエージェントの実験用ASの紹介 / Introduce experimental AS in ...
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 

Similar to How We Made Scylla Maintenance Easier, Safer and Faster

Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
ScyllaDB
 
High Availability with MySQL
High Availability with MySQLHigh Availability with MySQL
High Availability with MySQL
Thava Alagu
 
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsHigh-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
ScyllaDB
 
Scalabe MySQL Infrastructure
Scalabe MySQL InfrastructureScalabe MySQL Infrastructure
Scalabe MySQL Infrastructure
Balazs Pocze
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Feed Burner Scalability
Feed Burner ScalabilityFeed Burner Scalability
Feed Burner Scalability
didip
 
Meeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaMeeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with Scylla
ScyllaDB
 
Moving mongo db to the cloud strategies and points to consider
Moving mongo db to the cloud  strategies and points to considerMoving mongo db to the cloud  strategies and points to consider
Moving mongo db to the cloud strategies and points to consider
Vinicius M Grippa
 
Xen_and_Rails_deployment
Xen_and_Rails_deploymentXen_and_Rails_deployment
Xen_and_Rails_deployment
Abhishek Singh
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
HostedbyConfluent
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
xKinAnx
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
ScyllaDB
 
MySQL Aquarium Paris
MySQL Aquarium ParisMySQL Aquarium Paris
MySQL Aquarium Paris
Alexis Moussine-Pouchkine
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
Codership Oy - Creators of Galera Cluster
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
UniFabric
 
MySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningMySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery Planning
Lenz Grimmer
 
LAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backporting
Linaro
 
Cat on demand emc vplex weakness
Cat on demand emc vplex weaknessCat on demand emc vplex weakness
Cat on demand emc vplex weakness
Sahatma Siallagan
 

Similar to How We Made Scylla Maintenance Easier, Safer and Faster (20)

Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
High Availability with MySQL
High Availability with MySQLHigh Availability with MySQL
High Availability with MySQL
 
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsHigh-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
 
Scalabe MySQL Infrastructure
Scalabe MySQL InfrastructureScalabe MySQL Infrastructure
Scalabe MySQL Infrastructure
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Feed Burner Scalability
Feed Burner ScalabilityFeed Burner Scalability
Feed Burner Scalability
 
Meeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaMeeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with Scylla
 
Moving mongo db to the cloud strategies and points to consider
Moving mongo db to the cloud  strategies and points to considerMoving mongo db to the cloud  strategies and points to consider
Moving mongo db to the cloud strategies and points to consider
 
Xen_and_Rails_deployment
Xen_and_Rails_deploymentXen_and_Rails_deployment
Xen_and_Rails_deployment
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
MySQL Aquarium Paris
MySQL Aquarium ParisMySQL Aquarium Paris
MySQL Aquarium Paris
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
 
MySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningMySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery Planning
 
LAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backporting
 
Cat on demand emc vplex weakness
Cat on demand emc vplex weaknessCat on demand emc vplex weakness
Cat on demand emc vplex weakness
 

More from ScyllaDB

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 

More from ScyllaDB (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

How We Made Scylla Maintenance Easier, Safer and Faster

  • 1. How We Made Scylla Maintenance Easier Safer and Faster Asias He
  • 2. Asias He Software Developer, ScyllaDB Asias He is a long-time open source developer who previously worked on Debian Project, Solaris Kernel, KVM Virtualization for Linux and OSv unikernel. He now works on Seastar and Scylla. 2
  • 3. Agenda ▪ Motivation ▪ Scylla maintenance improvements ▪ Summary 3
  • 5. Motivation ▪ Maintenance is an important job to operate Scylla ▪ Require significant data movement ▪ Maintain minimum impact on your workload ▪ Set a goal to make operating Scylla as easy as possible 5
  • 6. Towards better elasticity and maintainability ▪ Scylla scales horizontally and vertically ▪ Users enjoy the performance of Scylla ▪ But maintenance has rough edges • Some maintenance operations are complicated and not resilient enough • Compaction causes longer time to add node 6
  • 8. Seedless ▪ Seed nodes • Help the discovery of the cluster and propagation of gossip • Special role breaks “every node is symmetric” architecture • Special treatment and procedures ▪ The seed concept is removed (Scylla 4.3) • Seed config --seed-provider-parameters • Remains for compatibility • Serves as contact points • Used once when adding new node to the cluster 8
  • 9. Seedless ▪ Simplified cluster operations • No more special treatment for seed nodes • Seed node bootstraps • Dangerous auto_bootstrap is gone • No need to replace a dead seed node • No need to promote a non-seed to seed • No need to update seed config after node is bootstrapped • Easier planing • No more selecting 2 or 3 nodes per DC as seed nodes ▪ More details on blog post 9
  • 10. Improvements in row-level repair ▪ Introduced since Scylla 3.1 • Repair at the granularity of row instead of partitions ▪ Continuous performance and stability improvements • Generate RF -1 times less sstables during repair (Scylla 4.4) • Before: one sstable per peer nodes per vnode • After: one sstable for all peer nodes per vnode • Reduce reactor stalls to reduce latency impact • Futurize operations take time • Switch to btree_set to store repair_hash 10 Node2 Node1 Node 3 SST for Node1 SST for Node2 Node2 Node1 Node 3 SST for Node1 & Node2
  • 11. Improvements in row-level repair ▪ Continuous performance and stability improvements • Use UUID to identify a repair job • Synchronous API to query repair status • Progress report on metrics ▪ More row level repair internals on blog post 11
  • 12. Repair based node operations ▪ Use repair for all node operations • Use row level repair as the underlying mechanism to sync data between nodes instead of streaming • Single mechanism for all the node operations • Bootstrap / replace / rebuild • Decommission / removenode • Repair The Speaker’s camera displays here 12
  • 13. Repair based node operations ▪ Significant improvements on performance and data safety • Resumable • Resume from previous failed bootstrap ops • Consistency • Latest replica is guaranteed • Simplified • No need to run repair before or after node operations like replace and removenode • Unified • All node ops use the same underlying mechanism ▪ More details: The Speaker’s camera displays here 13
  • 14. Redesigned replace operation ▪ Replace an existing dead node • Common operation to recover a node ▪ Replace operation before Scylla 4.1 • The replace operation takes time to sync data • Replacing node takes no writes during the operation • Missing the writes during that time • Repair is needed after the replace operation for the missing writes 14
  • 15. Redesigned replace operation ▪ Simpler faster and safer replace operation (Scylla 4.1) • No missing writes on the replacing node • Ensured by the “replacing node takes writes” feature • Replacing node has latest replica • Ensured by the “repair based node ops” feature • Note: Disabled by default, use --enable-repair-based- node-ops true to enable • With the above two features • No repair is needed before or after the replace operation 15
  • 16. Make removenode safe by default ▪ Current removenode works in “best-effort mode” • Remove a dead node from the cluster • Succeed even if part of the data owned by removed node has not been synced to new owners • Might cause data consistency issues 16
  • 17. Make removenode safe by default ▪ With “safe mode” feature (Scylla 4.4) • Fail the operation if it is not safe • By default, require all relevant nodes to be alive • Allow ignore dead nodes explicitly to run in best-effort mode • E.g., nodetool removenode --ignore-nodes node1,node2 • More resilient in case of error • Automatically revert to previous state 17
  • 18. ▪ Introduce off-strategy compaction • Work-in-Progress • Sstables generated by node operations are kept in a separate data set • Compact them together and integrate to main set when node operation is done • Less compaction work during node operations • Faster to add / remove node Smarter compaction during node operations 18 No compaction during replace with off-strategy Compaction during replace without off- strategy
  • 19. ▪ Introduce bandwidth and IOPS limiter for repair and compaction • Work-in-Progress • Limit the max disk bandwidth and IOPS allowed • More control over speed of repair and other node operations like add or remove node • Less impact on your workload Better control of repair and compaction intensity 19 20 MB/s Cap 10 MB/s Cap 40 MB/s Cap
  • 20. ▪ Backup throughput is bounded ▪ De-duplicates at file level ▪ Scylla manager and the agent runs under cgroup to reduce interfere with Scylla Core Scylla Manager Backup and Repair improvements 20
  • 21. ▪ Faster repair with Scylla Manager 2.2 • Run repair jobs on multiple nodes in parallel • Ensures that each node takes part in at most one Scylla repair job at all times • Allow adjustment of repair intensity to a node capabilities • Changing the repair speed in flight • Small table optimization • More on blog Scylla Manager Backup and Repair improvements 21
  • 23. Summary ▪ We covered recent improvements to make maintenance easier safer and faster • Seedless • Repair based node operations & redesigned replace / removenode operation • Off-strategy compaction & IO rate limiter • Scylla Manger improvements ▪ We will continue to invest more to make operating scylla as easy as possible • Add / remove multiple nodes in parallel • Raft based schema and topology change • And more … 23
  • 25. Download Scylla Open Source: scylladb.com/download Talk to an expert: scylladb.com/consultation Take a test drive: scylladb.com/test-drive Experience Scylla for Yourself 25

Editor's Notes

  1. Version: Scylla Summit 2021-Internal-Maintenance-Asias-Final-V2-Recording-2020-12-21