SlideShare a Scribd company logo
1 of 24
How We Make Adding
and Removing Nodes
Faster and Safer
Asias He, Software Developer
Presenter
Asias He, Software Developer
Asias He is a software developer with over 10 years of experience
in programming. In the past, he worked on Debian Project,
Solaris Kernel, KVM Virtualization for Linux, OSv unikernel. He
now works on Seastar and ScyllaDB.
Overview of the
node operations
1. Replace operation
■ To replace a dead node
● Token ring does not change
● Use same tokens and host_id of the replaced node
■ Suffer from the resumable issue
● Replace failed after 99% of data is streamed
● Run replace again, we need to stream all the data again
■ Suffer from the "not streaming latest copy” issue
● Stream data from only one of the replicas which might not have the latest copy
● Stream from all the replicas solve the problem
● Too heavy and wasteful to stream the same data more than once
Replace operation: Not streaming latest copy
■ What do we expect from a QUORUM read that follows a QUORUM
write?
● Strong Consistency: Write CL + Read CL > RF
● X2 is newer than X1
Node1
=X2
Node2
=X1 Correct quorum read!
Node3
=X2
■ Node 3 dies and Node 4 replaces it
■ What do we expect from a QUORUM read that follows a QUORUM
write?
Replace operation: Not streaming latest copy
Node1
=X2
Node2
=X1
Node4
(replacing)
=?
■ This is what you would expect:
Replace operation: Not streaming latest copy
Node4
(replacing)
=X2
Stream: X2Node1
=X2
Node2
=X1
■ This is what (may) happen:
Replace operation: Not streaming latest copy
Stream: X1
Node1
=X2
Node2
=X1
Node4
(replacing)
=X1
■ This is what (may) happen:
Replace operation: Not streaming latest copy
Wrong quorum read!
Node1
=X2
Node2
=X1
Node4
(replacing)
=X1
■ What if we have to replace again before repair is run?
■ Node 1 dies and Node 5 replaces it
Replace operation: Not streaming latest copy
New data was lost!
Node5
(replacing)
=X1
Node2
=X1
Node4
(replacing)
=X1
Stream: X1
2. Rebuild operation
■ To get all the data this node owns from other nodes
● E.g., rebuild a new DC
● Token ring does not change
■ Suffer from the resumable issue
■ Suffer from the "not streaming latest copy” issue
● Stream data from only one of the replicas which might not have the latest copy
3. Removenode operation
■ To remove a dead node out of the cluster
● Token ring changes
■ Suffer from the resumable issue
■ Suffer from the "not streaming latest copy” issue
● Remaining nodes pull data from other nodes for the new ranges they own
● Stream from only one of the replicas which might not have the latest copy
4. Decommission operation
■ To remove a live node from the cluster
● Token ring changes
■ Suffer from the resumable issue
■ Do not suffer from the “not streaming latest copy” issue
● The leaving node pushes data to other nodes which are the new owner
Stream: Y2
Node1
=X2
Node3
(leaving)
=X2, Y2
Node2
=Y2
Stream: X2
Node 1: new owner of range for X2
Node 2: new owner of range for Y2
Node 3: loses the range for X2 and Y2
5. Bootstrap operation
■ To add a new node into the cluster
● Token ring changes
■ Suffer from the resumable issue
■ Do not suffer from the "not streaming latest copy” issue
● New node pulls data from existing nodes that are losing the token ranges
Stream: Y2
Node1
=X2
Node3
(joining)
=X2, Y2
Node2
=Y2
Stream: X2
Node 1: loses range for X2
Node 2: loses range for Y2
Node 3: new owner of the range for X2 and Y2
Node operation summary
Node
Operations
Token ring
change
Resumable
issue
Latest copy
issue
Replace No Yes Yes
Rebuild No Yes Yes
Removenode Yes Yes Yes
Decommission Yes Yes No
Bootstrap Yes Yes No
Solutions to
the problems
Repair based node operations
The idea is: use repair to sync data
between replicas instead of streaming
Benefits of repair based node operation
■ Latest copy is guaranteed
● The operation node will always have the latest copy
■ Resumable in nature
● Repair skips the already synced data very quickly
● E.g., Restart the replace operation from where it failed
■ No extra data is streamed
● E.g., rebuild twice, will not stream the same data twice
■ Free repair during node operations
● No need to run repair before/after the node operations
● Simplify the procedure and reduce the chance to make mistakes
■ Unify code path for node operations and repair
● Retire regular streaming code
■ The way you operate the cluster stays the same
● You can still use the nodetool rebuild, decommission commands
Isn’t repair a heavy operation
■ Node operations assume the data is already consistent
● The job to make data consistent is repair
● We recommend repair before node operations
● Repair + streaming won’t be faster than doing only repair
■ Old repair is not fast enough: (partition level repair)
● Over-streaming problem
● Granularity is ~100 partitions
■ New repair is fast (row level repair introduced in Scylla 3.1 )
● No overstreaming
● Only mismatched rows are synced
● Foundation of the repair based node operations
Optimize repair for node operations
■ Increased the internal row buffer size
● Old 256 KiB (3.1) to current 32 MiB(3.2+)
● Good for cross DC cluster with high latency links
■ Improved data transfer efficiency between nodes
● From rpc verb (3.1) to rpc stream (3.2+)
● More efficient to transfer large amount of data
● Same as regular stream based node operations
Test results
Repair v.s. Stream based rebuild operation 1/2
Rebuild from 1 DC Method Space after rebuild Time to rebuild Notes
us-east Stream 26GB 573s Stream 10% of vnode
ranges at a time
us-east Repair 26GB 368s Repair works on more
vnode ranges in parallel,
1.5X less time
● 3 nodes in the cluster, 1 node per DC, 3 DCs
● AWS, i3.2xlarge
● 150M partitions on each node
● RF = { eu-west=1, us-east=1, us-west-2=1 }
● 80 ms latency
● Run rebuild on DC us-west-2
Repair v.s. Stream based rebuild operation 2/2
Rebuild from 2 DCs Method Space after rebuild Time to rebuild Notes
us-east and eu-west Stream 39GB 1500s Two rebuild operations,
streams 2X data, total time
573 + 927 = 1500s
us-east and eu-west Repair 26GB 611s Single rebuild to sync from
two DCs, streams no extra
data, 2.5x less time
● 3 nodes in the cluster, 3 DCs, 1 node per DC
● AWS, i3.2xlarge
● 150M partitions on each node
● RF = { eu-west=1,us-east=1,us-west-2=1 }
● 80ms latency
● Run rebuild on DC use-west-2
Thank you Stay in touch
Any questions?
Asias He
asias@scylladb.com
@asias_he

More Related Content

What's hot

C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutSaju Madhavan
 
Eventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyEventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyScyllaDB
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles BackupsScyllaDB
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
 

What's hot (20)

C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
 
Eventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyEventually, Scylla Chooses Consistency
Eventually, Scylla Chooses Consistency
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 

Similar to How Scylla Make Adding and Removing Nodes Faster and Safer

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsHan Zhou
 
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at OptimizelyMaking the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at OptimizelyScyllaDB
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScyllaDB
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
DB reading group may 16, 2018
DB reading group may 16, 2018DB reading group may 16, 2018
DB reading group may 16, 2018Keisuke Suzuki
 
osdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfosdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfgmdvmk
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleMariaDB plc
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...Nesreen K. Ahmed
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkDemi Ben-Ari
 

Similar to How Scylla Make Adding and Removing Nodes Faster and Safer (20)

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
 
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at OptimizelyMaking the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
DB reading group may 16, 2018
DB reading group may 16, 2018DB reading group may 16, 2018
DB reading group may 16, 2018
 
osdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfosdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdf
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
C100 k and go
C100 k and goC100 k and go
C100 k and go
 
10 replication
10 replication10 replication
10 replication
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

How Scylla Make Adding and Removing Nodes Faster and Safer

  • 1. How We Make Adding and Removing Nodes Faster and Safer Asias He, Software Developer
  • 2. Presenter Asias He, Software Developer Asias He is a software developer with over 10 years of experience in programming. In the past, he worked on Debian Project, Solaris Kernel, KVM Virtualization for Linux, OSv unikernel. He now works on Seastar and ScyllaDB.
  • 3. Overview of the node operations
  • 4. 1. Replace operation ■ To replace a dead node ● Token ring does not change ● Use same tokens and host_id of the replaced node ■ Suffer from the resumable issue ● Replace failed after 99% of data is streamed ● Run replace again, we need to stream all the data again ■ Suffer from the "not streaming latest copy” issue ● Stream data from only one of the replicas which might not have the latest copy ● Stream from all the replicas solve the problem ● Too heavy and wasteful to stream the same data more than once
  • 5. Replace operation: Not streaming latest copy ■ What do we expect from a QUORUM read that follows a QUORUM write? ● Strong Consistency: Write CL + Read CL > RF ● X2 is newer than X1 Node1 =X2 Node2 =X1 Correct quorum read! Node3 =X2
  • 6. ■ Node 3 dies and Node 4 replaces it ■ What do we expect from a QUORUM read that follows a QUORUM write? Replace operation: Not streaming latest copy Node1 =X2 Node2 =X1 Node4 (replacing) =?
  • 7. ■ This is what you would expect: Replace operation: Not streaming latest copy Node4 (replacing) =X2 Stream: X2Node1 =X2 Node2 =X1
  • 8. ■ This is what (may) happen: Replace operation: Not streaming latest copy Stream: X1 Node1 =X2 Node2 =X1 Node4 (replacing) =X1
  • 9. ■ This is what (may) happen: Replace operation: Not streaming latest copy Wrong quorum read! Node1 =X2 Node2 =X1 Node4 (replacing) =X1
  • 10. ■ What if we have to replace again before repair is run? ■ Node 1 dies and Node 5 replaces it Replace operation: Not streaming latest copy New data was lost! Node5 (replacing) =X1 Node2 =X1 Node4 (replacing) =X1 Stream: X1
  • 11. 2. Rebuild operation ■ To get all the data this node owns from other nodes ● E.g., rebuild a new DC ● Token ring does not change ■ Suffer from the resumable issue ■ Suffer from the "not streaming latest copy” issue ● Stream data from only one of the replicas which might not have the latest copy
  • 12. 3. Removenode operation ■ To remove a dead node out of the cluster ● Token ring changes ■ Suffer from the resumable issue ■ Suffer from the "not streaming latest copy” issue ● Remaining nodes pull data from other nodes for the new ranges they own ● Stream from only one of the replicas which might not have the latest copy
  • 13. 4. Decommission operation ■ To remove a live node from the cluster ● Token ring changes ■ Suffer from the resumable issue ■ Do not suffer from the “not streaming latest copy” issue ● The leaving node pushes data to other nodes which are the new owner Stream: Y2 Node1 =X2 Node3 (leaving) =X2, Y2 Node2 =Y2 Stream: X2 Node 1: new owner of range for X2 Node 2: new owner of range for Y2 Node 3: loses the range for X2 and Y2
  • 14. 5. Bootstrap operation ■ To add a new node into the cluster ● Token ring changes ■ Suffer from the resumable issue ■ Do not suffer from the "not streaming latest copy” issue ● New node pulls data from existing nodes that are losing the token ranges Stream: Y2 Node1 =X2 Node3 (joining) =X2, Y2 Node2 =Y2 Stream: X2 Node 1: loses range for X2 Node 2: loses range for Y2 Node 3: new owner of the range for X2 and Y2
  • 15. Node operation summary Node Operations Token ring change Resumable issue Latest copy issue Replace No Yes Yes Rebuild No Yes Yes Removenode Yes Yes Yes Decommission Yes Yes No Bootstrap Yes Yes No
  • 17. Repair based node operations The idea is: use repair to sync data between replicas instead of streaming
  • 18. Benefits of repair based node operation ■ Latest copy is guaranteed ● The operation node will always have the latest copy ■ Resumable in nature ● Repair skips the already synced data very quickly ● E.g., Restart the replace operation from where it failed ■ No extra data is streamed ● E.g., rebuild twice, will not stream the same data twice ■ Free repair during node operations ● No need to run repair before/after the node operations ● Simplify the procedure and reduce the chance to make mistakes ■ Unify code path for node operations and repair ● Retire regular streaming code ■ The way you operate the cluster stays the same ● You can still use the nodetool rebuild, decommission commands
  • 19. Isn’t repair a heavy operation ■ Node operations assume the data is already consistent ● The job to make data consistent is repair ● We recommend repair before node operations ● Repair + streaming won’t be faster than doing only repair ■ Old repair is not fast enough: (partition level repair) ● Over-streaming problem ● Granularity is ~100 partitions ■ New repair is fast (row level repair introduced in Scylla 3.1 ) ● No overstreaming ● Only mismatched rows are synced ● Foundation of the repair based node operations
  • 20. Optimize repair for node operations ■ Increased the internal row buffer size ● Old 256 KiB (3.1) to current 32 MiB(3.2+) ● Good for cross DC cluster with high latency links ■ Improved data transfer efficiency between nodes ● From rpc verb (3.1) to rpc stream (3.2+) ● More efficient to transfer large amount of data ● Same as regular stream based node operations
  • 22. Repair v.s. Stream based rebuild operation 1/2 Rebuild from 1 DC Method Space after rebuild Time to rebuild Notes us-east Stream 26GB 573s Stream 10% of vnode ranges at a time us-east Repair 26GB 368s Repair works on more vnode ranges in parallel, 1.5X less time ● 3 nodes in the cluster, 1 node per DC, 3 DCs ● AWS, i3.2xlarge ● 150M partitions on each node ● RF = { eu-west=1, us-east=1, us-west-2=1 } ● 80 ms latency ● Run rebuild on DC us-west-2
  • 23. Repair v.s. Stream based rebuild operation 2/2 Rebuild from 2 DCs Method Space after rebuild Time to rebuild Notes us-east and eu-west Stream 39GB 1500s Two rebuild operations, streams 2X data, total time 573 + 927 = 1500s us-east and eu-west Repair 26GB 611s Single rebuild to sync from two DCs, streams no extra data, 2.5x less time ● 3 nodes in the cluster, 3 DCs, 1 node per DC ● AWS, i3.2xlarge ● 150M partitions on each node ● RF = { eu-west=1,us-east=1,us-west-2=1 } ● 80ms latency ● Run rebuild on DC use-west-2
  • 24. Thank you Stay in touch Any questions? Asias He asias@scylladb.com @asias_he

Editor's Notes

  1. Here is an example of not streaming latest copy issue. We have 3 nodes. We do a quorum write. The second node missed the write When we do a quorum read. We will still have the correct result.
  2. This is what you would expect. We would stream the latest copy to the replacing node.
  3. But this what may happen. We stream the old copy to the replacing node.
  4. As a result, the quorum read will be wrong.
  5. In this case, node 1 dies and node 5 replaces it. Unfortunately, node2 streams the old copy to the replacing node. As a result, the new data was lost!
  6. .
  7. The first test is to rebuild from 1DC. 3 nodes in the cluster, ... As we can see, in this test, the repair based operation is actually faster. This is mainly because, internally the repair works on more vnode ranges in parallel than streaming. In streaming, we stream 10% of the vnode ranges at a time. But there is only one pending stream plan in parallel. However, even if repair based rebuild is slower in some cases, for instance, not considering the parallel contribution, it would be acceptable, because repair has more work to do and it is much much safer.
  8. The second test is to rebuild from 2 DCs. For the stream based operation, We have to perform two rebuilds operations. It streams 2 times more data to the rebuilding node. For the repair based operation, we only need to perform one rebuild operation to sync from two DCS. It streams no extra data and uses less time. The time difference is around 2.5 times.