YDB is an open-source Distributed SQL database available under Apache 2.0 license. It’s easily scalable across thousands of nodes and is known to be always available. Version migration is a natural process for every software system. In majority of use cases YDB is used as a mission critical OLTP database that cannot afford maintenance windows and must remain available during version migrations. In this talk we will briefly describe YDB layered architecture and share some tricks to minimize database unavailability during minor and major versions migrations from YDB server and applications point of view.
3. 3
YDB — Open-Source
Distributed SQL
Database
• Relational
ACID OLTP transactions
• Consistency
Strongly consistent
Serializable transaction isolation level
• Mission critical database
Works for projects with 24x7 requirements
• Highly available
Survives AZ plus rack failure w/o human
intervention, available for read/write
4. 4
Users operate tables
to read and write data
• Tables have a primary key (PK)
• Tables are sorted by PK
• Tables could grow up
to petabytes of data
• Tables are automatically
partitioned
Row 1
Row 2
…
Row N
Column 1 Column 2 … Column N
Primary key
Partition 1
…
Partition N
Split point
5. 7
Every software requires update
Single DB server
App1 App2
DB Cluster
App1 App2 App4
App3
DB Node DB Node 1 DB Node 2 DB Node 3
6. 8
What is rolling update
App1 App2
Compute Node 2 Compute Node 3
App3
Compute Node 1
7. 9
1 2 3
Minimize
error rate
Keep read
and write
availability
Minimize latency
degradation
Rolling update challenges are
15. 17
1 2 3
Minimize
error rate
Keep read
and write
availability
Minimize latency
degradation
Rolling update
is a challenging task
16. 18
Restart storage nodes one by one
AZ1 AZ2 AZ3
Compute
node 100
Rolling restart tool
Storage server 1
Compute
node 101
Compute
node 102
Storage server 2
Compute
node 103
Compute
node 104
Storage server 3
Compute
node 105
17. 19
YDB helps to restart storage
Rolling
restart
tool
4: list of nodes safe to restart
Storage restart constraints
• Keep read and right availability
• Restart as fast as possible
Large cluster restart is long
• 1 node restarts about 10s
• 1k node cluster restart 3h
YDB
Cluster
Management
System
3: request for restart
2: token + list of all nodes
1: restart storage nodes
18. 20
CMS controls compute restarts
Rolling
restart
tool
Compute restart constraints
• Keep enough resources
• Restart as fast as possible
Restart multiple databases
• Databases are independent
• Different databases could be restarted
simultaneously
YDB
Cluster
Management
System
4: list of nodes safe to restart
3: request for restart
2: token + list of all nodes
1: restart compute nodes for some databases
19. 21
CMS allows
to keep availability
Cluster Management System
• YDB Component
• Protects cluster
from unavailability
• Enables restart orchestration
25. 27
• Cluster Management System
Tracks availability
Speeds-up restart
• Session and tablets evacuation
Reduces errors
YDB solves
challenges
26. 28
• PostgreSQL wire and syntax
compatibility (C++)
To simplify migration of our users
we make PostgreSQL compatibility
• OLAP engine (C++)
We build OLAP engine
to improve analytical experience
• Column-organized tables (C++)
We make column-organized tables
to improve performance of analytical queries
• K8S deployment (Go)
We make our own k8s operator to make
it easyto deploy our database
in a cloud environment
Current
challenges
27. 29
It’s easy
to try YDB
ydb.tech
• Managed service in Nebius Cloud
• On premise deployment options