Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fail-Safe Cluster for FirebirdSQL
and something more
Alex Koviazin, IBSurgeon
www.ib-aid.com
• Replication, Recovery and
Optimization Firebird since 2002
• In Brazil since 2006 (Firebase)
• Platinum Sponsor of Fireb...
Agenda
• In general - what is fail-safe cluster?
• What we need to build fail-safe cluster?
• Overview of Firebird version...
What is a fail-safe cluster?
User
1
User
2
User
3
User
N
User
N
What is a fail-safe cluster internally?
User
1
User
2
User
3
User
N
User
N
Firebird master Firebird replica
What is a fail-safe cluster?
User
1
User
2
User
3
User
N
User
N
Firebird master Firebird replica
Synchronization
When master fails, users reconnect to
new master
User
1
User
2
User
3
User
N
User
N
Firebird master Firebird replica
Synch...
For Fail-Safe Cluster we need
1. At Server Side
• Synchronization of master and replica
• Monitoring and switching mechani...
Database fail-safe cluster is NOT:
1) Automatically deployed solution
• Complex setup (2-3 hours of work)
2) True scalable...
How Fail-Safe Cluster Work: Applications
From the application point of view
1. End-user applications connect to the
Firebi...
How Fail-Safe Cluster Work: Server
Firebird master
Monitoring
Firebird replica
Monitoring
1.All nodes monitor each other: ...
After replica fails, there is no
cluster anymore!
Just a single database!
Minimal fail-safe cluster
Firebird master
Monitoring
Firebird replica
1
Monitoring
Firebird replica
2
Monitoring
Minimal fail-safe cluster
Firebird master
Monitoring
Firebird master
2
Monitoring
Firebird replica
2
Monitoring
What is fail-safe cluster good for?
• For web-applications
• Web caches data intensively
• Tolerant to reconnects
• Many s...
HOW TO CREATE FAIL-SAFE
CLUSTER IN FIREBIRD
2 main things to implement fail-safe
cluster in Firebird
1. Synchronization between master and
replica
2. Monitoring and s...
Synchronization
Native
• Suitable for high-load
• Easy setup
• Strict master-slave, no
conflicts
• DDL replication
• Very ...
Comparison of Firebird versions with
replication
HQbird RedDatabase Avalerion Firebird
Status (July 16) Production Product...
Native replication in HQbird
Firebird master Firebird replica
1. No Backup/restore needed
2. Sync and Async options
3. Eas...
Steps to setup replication
SYNCHRONOUS
1. Stop Firebird
2. Copy database to
replica(s)
3. Configure replication
at master
...
Configuration for replication
Replication.conf
Sync
<database c:datatest2.fdb>
replica_database sysdba:masterkey@replicase...
Downtime to setup replication
• Synchronous
• Time to copy database between servers + Time to setup
• Asynchronous
• Time ...
Monitoring and switching from master to
replica
• HQbird (Windows, Linux)
• Pacemaker (Linux only)
• Other tools (self-dev...
Elections and Impeachment
• Not about presidents elections 
• Replicas must select new master after old master
failure
Pr...
Algorithm
1. We have priority list: master, replica1, replica 2
2. Each node watches for all other nodes
3. If master node...
After failure – re-initialization
• Downtime needed!
• Stop all nodes
• Copy database to the new/reinitialized node
• Reco...
Fail-Safe Cluster is a custom solution
• HQbird gives options, you decide how to
implement them
• Requires careful plannin...
Do we really need cluster and
high availability?!
Why we REALLY need high-availability
• If database size is big (20-100Gb)
• Automatic recovery with FirstAID is very slow
...
Mirror (warm-standby) – easy to setup,
reliable mirror (one or more)
Firebird master
Monitoring
Firebird replica
Monitorin...
Can be geographically distributed (FTP,
Amazon, cloud backup)
Firebird master
Monitoring
Firebird replica
Monitoring
Repli...
Licenses
HQbird Enterprise licenses needed for
FAIL-SAFE CLUSTER
Master+replica + N*replicas*0.5
For minimal cluster = 1 m...
Thank you!
Solution
Changes are buffered per transaction, transferred in
batches, synchronized at commit
Follows the master priority ...
Txn 1 Txn 2 Txn N
Buffer 1
Txn 1 Txn 2 Txn N
Master DB
Replica DB
Buffer 2 Buffer N
Synchronous replication
Solution
Changes are synchronously journalled on the master side
Journal better be placed on a separate disk
Journal consi...
Txn 1 Txn 2 Txn N
Buffer 1
Master DB
Buffer 2 Buffer N
Journal
Archive
Asynchronous replication
Txn 1 Txn 2 Txn N
Replica DB
Archive
Asynchronous replication
Upcoming SlideShare
Loading in …5
×

Fail-Safe Cluster for FirebirdSQL and something more

1,455 views

Published on

With Firebird HQbird it is possible to create high available cluster or warm standby solution. This presentation defines the problem and describes ways how to create such solutions.

Published in: Technology
  • Be the first to comment

Fail-Safe Cluster for FirebirdSQL and something more

  1. 1. Fail-Safe Cluster for FirebirdSQL and something more Alex Koviazin, IBSurgeon www.ib-aid.com
  2. 2. • Replication, Recovery and Optimization Firebird since 2002 • In Brazil since 2006 (Firebase) • Platinum Sponsor of Firebird Foundation • Based in Moscow, Russia www.ib-aid.com
  3. 3. Agenda • In general - what is fail-safe cluster? • What we need to build fail-safe cluster? • Overview of Firebird versions with high-availability • Trigger-based and native replication as basis for fail-safe cluster • How HQbird Enterprise implements fail-safe cluster • Alternatives of fail-safe cluster: Mirror/Warm Standby • Special offer for FDD visitors
  4. 4. What is a fail-safe cluster? User 1 User 2 User 3 User N User N
  5. 5. What is a fail-safe cluster internally? User 1 User 2 User 3 User N User N Firebird master Firebird replica
  6. 6. What is a fail-safe cluster? User 1 User 2 User 3 User N User N Firebird master Firebird replica Synchronization
  7. 7. When master fails, users reconnect to new master User 1 User 2 User 3 User N User N Firebird master Firebird replica Synchronization
  8. 8. For Fail-Safe Cluster we need 1. At Server Side • Synchronization of master and replica • Monitoring and switching mechanism 2. At Application Side • Reconnection cycle in case of disconnects • web is good example
  9. 9. Database fail-safe cluster is NOT: 1) Automatically deployed solution • Complex setup (2-3 hours of work) 2) True scalable solution 3) Multi-master replication • Replicas are read only 4) Geographically distributed solution • Fast network connection required
  10. 10. How Fail-Safe Cluster Work: Applications From the application point of view 1. End-user applications connect to the Firebird database, work as usual 2. If Firebird database becomes unavailable, end-user applications try to reconnect to the same host (correctly, no error screens) 3. End-user applications should better use caching (CachedUpdates) to avoid loss of uncommitted data
  11. 11. How Fail-Safe Cluster Work: Server Firebird master Monitoring Firebird replica Monitoring 1.All nodes monitor each other: replica watches for master 2.If master fails, replica a) make “confirmation shoot” to master b) Changes itself as master – script + DNS
  12. 12. After replica fails, there is no cluster anymore! Just a single database!
  13. 13. Minimal fail-safe cluster Firebird master Monitoring Firebird replica 1 Monitoring Firebird replica 2 Monitoring
  14. 14. Minimal fail-safe cluster Firebird master Monitoring Firebird master 2 Monitoring Firebird replica 2 Monitoring
  15. 15. What is fail-safe cluster good for? • For web-applications • Web caches data intensively • Tolerant to reconnects • Many small queries • For complex database systems with administrator support
  16. 16. HOW TO CREATE FAIL-SAFE CLUSTER IN FIREBIRD
  17. 17. 2 main things to implement fail-safe cluster in Firebird 1. Synchronization between master and replica 2. Monitoring and switching to the new server
  18. 18. Synchronization Native • Suitable for high-load • Easy setup • Strict master-slave, no conflicts • DDL replication • Very low delay in synchronization (seconds) Trigger-based • <200 users • Difficult setup • Mixed mode, conflicts are possible • No DDL replication • Delay is 1-2 minutes
  19. 19. Comparison of Firebird versions with replication HQbird RedDatabase Avalerion Firebird Status (July 16) Production Production Beta Alpha Supported Firebird version 2.5, 30 2.5 3.0 4.0 ODS 11.2, 12.0 11.3 ? 13.0 Compatibility with Firebird 100% Backup - Restore ? 100% Synchronous Yes Yes No Yes Asynchronous Yes Yes Yes Yes Price per master+replica R$1990 R$1100 (FDD only!) ~US$3500 ? Free (in 2018) http://ib-aid.com/fdd2016
  20. 20. Native replication in HQbird Firebird master Firebird replica 1. No Backup/restore needed 2. Sync and Async options 3. Easy setup (configuration only) Sync: ~1-3 seconds Async: 1 minute
  21. 21. Steps to setup replication SYNCHRONOUS 1. Stop Firebird 2. Copy database to replica(s) 3. Configure replication at master 4. Configure replication at replica 5. Start Firebird at master and replica ASYNC 1. Stop Firebird 2. Copy database to the same computer 3. Configure replication at master 4. Start Firebird 5. Copy and configure at replica(s)
  22. 22. Configuration for replication Replication.conf Sync <database c:datatest2.fdb> replica_database sysdba:masterkey@replicaserver:/data/testrepl2.fdb </database> Async <database C:dbwmaster.fdb> log_directory d:dblogs log_archive_directory D:dbarchlogs log_archive_command "copy $(logpathname) $(archpathname)" </database>
  23. 23. Downtime to setup replication • Synchronous • Time to copy database between servers + Time to setup • Asynchronous • Time to copy database at the same server • Suitable for geographically distributed systems ASYNC is much faster to setup!
  24. 24. Monitoring and switching from master to replica • HQbird (Windows, Linux) • Pacemaker (Linux only) • Other tools (self-developed)
  25. 25. Elections and Impeachment • Not about presidents elections  • Replicas must select new master after old master failure Priority list MASTER1 REPLICA1 REPLICA2 REPLICA3 REPLICA4
  26. 26. Algorithm 1. We have priority list: master, replica1, replica 2 2. Each node watches for all other nodes 3. If master node fails, the first available replica takes over the 1. Disables original master – stop Firebird server 2. Changes DNS of replica to master 3. Changes other servers configurations 4. Restart Firebird server at new master to changes take effect 5. Send notifications
  27. 27. After failure – re-initialization • Downtime needed! • Stop all nodes • Copy database to the new/reinitialized node • Reconfigure replication configuration • Start Firebird at all nodes Too complex?
  28. 28. Fail-Safe Cluster is a custom solution • HQbird gives options, you decide how to implement them • Requires careful planning and implementation • Requires high-quality infrastructure • Requires 2+ replicas • Network outage in 10 seconds will cause all replicas to degrade! • High-speed network and high speed drives
  29. 29. Do we really need cluster and high availability?!
  30. 30. Why we REALLY need high-availability • If database size is big (20-100Gb) • Automatic recovery with FirstAID is very slow (downtime > 8 hours) or impossible (>100Gb) • Manual recovery is expensive (US$1000+) and slow (1 day+) • Big database = Big company • Losses are not tolerated • Downtime is very expensive ($$$)
  31. 31. Mirror (warm-standby) – easy to setup, reliable mirror (one or more) Firebird master Monitoring Firebird replica Monitoring Replica segment 1 Replica segment 2 Replica segment N Based on async replication
  32. 32. Can be geographically distributed (FTP, Amazon, cloud backup) Firebird master Monitoring Firebird replica Monitoring Replica segment 1 Replica segment 2 Replica segment N
  33. 33. Licenses HQbird Enterprise licenses needed for FAIL-SAFE CLUSTER Master+replica + N*replicas*0.5 For minimal cluster = 1 master and 2 replicas: 1 HQbird Enterprise + 0.5*HQbird Enterprise MIRROR (Warm-Standby) = Master+replica 1 HQbird Enterprise
  34. 34. Thank you!
  35. 35. Solution Changes are buffered per transaction, transferred in batches, synchronized at commit Follows the master priority of locking Replication errors can either interrupt operations or just detach replica Replica is available for read-only queries (with caveats) Instant takeover (Heartbeat, Pacemaker) • Issues Additional CPU and I/O load on the slave side Replica cannot be recreated online Synchronous replication
  36. 36. Txn 1 Txn 2 Txn N Buffer 1 Txn 1 Txn 2 Txn N Master DB Replica DB Buffer 2 Buffer N Synchronous replication
  37. 37. Solution Changes are synchronously journalled on the master side Journal better be placed on a separate disk Journal consists of multiple segments (files) that are rotated Filled segments are transfered to the slave and applied to the replica in background Replica can be recreated online Issues Replica always lags behind under load Takeover is not immediate Read-only queries work with historical data Asynchronous replication
  38. 38. Txn 1 Txn 2 Txn N Buffer 1 Master DB Buffer 2 Buffer N Journal Archive Asynchronous replication
  39. 39. Txn 1 Txn 2 Txn N Replica DB Archive Asynchronous replication

×