Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql

EDB Postgres Failover Manager Guide
EDB Postgres Failover Manager Version 3.4
January 23, 2019

Copy right © 2013 - 2019 EnterpriseDB Corporation. All rights reserv ed.
EDB Postgres Failover Manager Guide, Version 3.4
by EnterpriseDB Corporation
Copyright © 2013 -2019 EnterpriseDBCorporation. All rights reserved.
EnterpriseDB Corporation, 34 Crosby Drive Suite 201, Bedford, MA 01730, USA
T +1 781 357 3390 F +1 978 467 1307 E info@enterprisedb.com www.enterprisedb.com

Table of Contents
1 Introduction......................................................................................................................5
1.1 What’s New..............................................................................................................6
1.2 Typographical Conventions Used in this Guide....................................................7
2 Failover Manager - Overview.........................................................................................8
2.1 Supported Platforms..............................................................................................10
2.2 Prerequisites...........................................................................................................11
2.3 Tutorial - Configuring a Simple Failover Manager Cluster................................14
3 Installing and Configuring Failover Manager..............................................................18
3.1 Installing an RPM Package on a RedHat, CentOS,or OEL Host ......................18
3.1.1 Installation Locations........................................................................................20
3.2 Installing an RPM Package on a Debian or Ubuntu Host...................................21
3.3 Installing an RPM Package on a SLES Host.......................................................22
3.4 Extending FailoverManager Permissions............................................................23
3.4.1 Running FailoverManager without sudo.........................................................25
3.5 Configuring FailoverManager..............................................................................27
3.5.1 The Cluster Properties File ...............................................................................27
3.5.1.1 Specifying Cluster Properties...................................................................28
3.5.1.2 Encrypting Your Database Password.......................................................45
3.5.2 The Cluster Members File.................................................................................47
3.6 Using Failover Manager with Virtual IP Addresses............................................48
4 Using Failover Manager...............................................................................................52
4.1 Managing a FailoverManager Cluster.................................................................52
4.1.1 Starting the FailoverManager Cluster.............................................................53
4.1.2 Adding Nodes to a Cluster................................................................................53
4.1.3 Changing the Priority ofa Standby..................................................................54
4.1.4 Promoting a FailoverManager Node...............................................................55
4.1.5 Stopping a Failover ManagerAgent................................................................56
4.1.6 Stopping a Failover Manager Cluster...............................................................57
4.1.7 Removing a Node froma Cluster.....................................................................57
4.2 Monitoring a Failover Manager Cluster...............................................................58
4.2.1 Reviewing the Cluster Status Report...............................................................58
4.2.2 Monitoring Streaming Replication with Postgres Enterprise Manager.........61

4.3 Running Multiple Agents on a Single Node........................................................64
4.3.1 RHEL 6.xor CentOS 6.x..................................................................................66
4.3.2 RHEL 7.xor CentOS 7.x..................................................................................67
5 Controlling the FailoverManager Service...................................................................68
5.1 Using the service Utility on RHEL 6.x and CentOS 6.x.....................................68
5.2 Using the systemctl Utility on RHEL 7.xand CentOS 7.x.................................70
5.3 Using the efmUtility.............................................................................................71
6 Controlling Logging.......................................................................................................75
6.1 Enabling syslog Log File Entries..........................................................................76
7 Notifications..................................................................................................................78
8 Supported Failover and Failure Scenarios....................................................................85
8.1 Master Database is Down......................................................................................86
8.2 Standby Database is Down....................................................................................88
8.3 MasterAgent Exits or Node Fails.........................................................................89
8.4 Standby Agent Exits or Node Fails ......................................................................91
8.5 Dedicated Witness Agent Exits / Node Fails.......................................................92
8.6 Nodes Become Isolated fromthe Cluster.............................................................93
9 Upgrading an Existing Cluster......................................................................................94
9.1 Un-installing FailoverManager............................................................................96
9.2 Performing a Database Update (Minor Version).................................................97
10 Troubleshooting.............................................................................................................98
11 AppendixA - Configuring Streaming Replication......................................................99
11.1 Limited Support for Cascading Replication.......................................................104
12 AppendixB - Configuring SSL Authentication on a FailoverManager Cluster.....105
13 Inquiries ........................................................................................................................107

Copy right © 2013 – 2019 EnterpriseDB Corporation. All rights reserv ed.
5
1 Introduction
EDB Postgres FailoverManager(EFM)is a high-availability module fromEnterpriseDB
that enablesa Postgres Masternode to automatically failoverto a Standbynode in the
event ofa software orhardware failure on theMaster.
This guide providesinformationaboutinstalling,configuringand using Failover
Manager3.4.
This document usesPostgresto mean eitherthe PostgreSQLor EDB PostgresAdvanced
Serverdatabase. Formore information aboutusing EDBPostgres products,please visit
the EnterpriseDBwebsiteat:
http://www.enterprisedb.com/documentation

6
1.1 What’s New
The following changes havebeenmade to EDBPostgres FailoverManagerto create
version 3.4:
 FailoverManagernowallows you touse the master.shutdown.as.failure property
to indicate thatany shutdownofthe agenton themasternode should betreated as
a failure. Fore more information,see Section 3.5.1. A notification hasbeen
added that willalert you when the master.shutdown.as.failure propertyis
set to true.
 Agent exit notificationsare nowa WARNING level; this canhelp drawattentionto
caseswhere an agenthasfailed to restart (forinstance,aftera machine reboot).
Formore information,see Section7.
 FailoverManagerwill retry verifying thata VIP is not in use duringa promotion.
Formore information,see Section3.6.

7
1.2 Typographical Conventions Used in this Guide
Certain typographicalconventionsare used in this manualto clarify the meaning and
usage ofvariouscommands,statements,programs,examples,etc.Thissection providesa
summary ofthese conventions.
In the following descriptionsa termrefers to anyword orgroupofwordsthat are
languagekeywords,user-suppliedvalues,literals,etc.A term’s exact meaning depends
upon thecontext in which it is used.
 Italic font introducesa newterm,typically,in the sentencethatdefinesit forthe
first time.
 Fixed-width (mono-spaced) font is used forterms thatmust be given
literally such as SQLcommands,specific table andcolumn namesused in the
examples,programming language keywords,etc.Forexample, SELECT * FROM
emp;
 Italic fixed-width font is usedforterms forwhich the usermust
substitute valuesin actualusage.Forexample, DELETE FROM table_name;
 A verticalpipe | denotesa choice betweenthe terms oneitherside ofthe pipe.A
verticalpipe is used to separate twoormore alternative termswithin square
brackets (optionalchoices)orbraces(one mandatorychoice).
 Square brackets []denote thatone ornoneofthe enclosedterm(s)may be
substituted.Forexample, [ a | b ], means chooseone of“a” or“b” orneither
of the two.
 Braces {}denote that exactly one ofthe enclosed alternativesmust be specified.
Forexample, { a | b }, means exactly one of“a” or“b” must be specified.
 Ellipses ... denote thatthe proceedingtermmay be repeated.Forexample, [ a |
b ] ... means that youmay havethe sequence,“b a a b a”.

8
2 Failover Manager - Overview
An EDB Postgres FailoverManager(EFM)clusteris comprisedofFailoverManager
processesthat reside onthe following hostson a network:
 A Masternode-The Masternodeis the primary database serverthat is servicing
database clients.
 One ormore Standbynodes -A Standbynodeis a streaming replication server
associatedwith the Masternode.
 A Witness node -The Witnessnode confirms assertionsofeitherthe Masterora
Standbyin a failover scenario. A clusterdoesnotneeda dedicated witnessnode
if the clustercontainsthreeormore nodes;ifyou do not havea third cluster
member that is a database host,youcan add a dedicatedWitnessnode.
Traditionally,a cluster is a single instanceofPostgresmanaging multiple databases. In
this document,the termclusterrefers to a FailoverManagercluster. A FailoverManager
clusterconsistsofa Masteragent,one ormore Standbyagents,andan optionalWitness
agent that reside on serversin a cloud oron a traditionalnetworkand communicateusing
the JGroups toolkit.
Figure 2.1 - A FM scenario employing a Virtual IP address.

9
When a non-witness agentstarts,it connectsto the localdatabase and checksthestateof
the database:
 If the agent cannotreach the database,it will start in idle mode.
 If it finds thatthe database is in recovery,the agent assumes the role of standby;
 If the database is notin recovery,the agent assumes the role ofmaster.
In the event ofa failover,FailoverManagerattemptsto ensure that the promotedstandby
is the most up-to-date standby in the cluster; please note that data lossis possible ifthe
standbynodeis not in sync with the masternode.
JGroups providestechnology thatallows FailoverManagerto createclusterswhose
member nodescan communicatewith each otheranddetectnode failures. Formore
information about JGroups,visit the officialproject siteat:
http://www.jgroups.org
Figure 2.1 illustrates a FailoverManagerclusterthatemploysa virtualIPaddress. You
can use a load balancerin place ofa virtualIPaddressifyou provide yourown fencing
script to re-configure the load balancerin the eventofa failure. Formore information
about usingFailoverManagerwith a virtualIPaddress,see Section 3.6. Formore
information about usinga fencingscript,seeSection3.5.1.

10
2.1 Supported Platforms
FailoverManager3.4is supported onEDB PostgresAdvanced ServerorPostgreSQL
(version 9.3and higher)installations running on:
 CentOS6.x and 7.x
 Red Hat Enterprise Linux6.x and 7.x
 Oracle Enterprise Linux6.x and 7.x
 Red Hat Enterprise Linux(IBM Power8Little Endian orppc64le)7.x
 Debian 9
 SLES 12
 Ubuntu 18.04

11
2.2 Prerequisites
Before configuring a FailoverManagercluster,you mustsatisfy theprerequisites
describedbelow.
Install Java 1.8 (or later)
Before using FailoverManager,you must first installJava (version1.8or later). Failover
Manageris testedwith OpenJDK,and we strongly recommend installing thatversionof
Java. InstallationinstructionsforJava are platformspecific; formore information,visit:
https://openjdk.java.net/install/
Provide an SMTP Server
You can receive notificationsfromFailoverManageras specified by a user-defined
notification script,by email,or both.
 If you are using emailnotifications,an SMTPservermust be runningon each
node ofthe FailoverManagerscenario.
 If you provide a valuein the script.notification property,youcanleave the
user.email field blank; an SMTPserveris not required.
If an event occurs, FailoverManagerinvokesthe script (ifprovided),and sends a
notification emailto any emailaddressesspecified in the user.email parameterofthe
clusterpropertiesfile. Formore information about usingan SMTPserver,visit:
https://access.redhat.com/site/documentation
Formore information,see Section3.5.1.1.
Configure Streaming Replication
FailoverManagerrequires thatPostgreSQLstreaming replication be configured between
the Masternode and the Standbynode ornodes. FailoverManagerdoesnot support
othertypesofreplication.
Unless specified with the -sourcenode option,a recovery.conf file is copied froma
randomstandbynode to the stoppedmasterduringswitchover. You should ensure that
the pathswithin therecovery.conf files on yourstandbynodes are consistent before
performing a switchover. Formore information aboutthe -sourcenode option,please
see Section 4.1.4.
Please note that FailoverManagerdoesnot supportautomatic reconfigurationofthe
standbydatabasesaftera failoverifyou use replication slots tomanageyourWAL

12
segments. Ifyou use replicationslots,youshould set the auto.reconfigure parameter
to false,and manually reconfigure thestandbyserversin the eventofa failover.
Modify the pg_hba.conf File
You must modify the pg_hba.conf file on the MasterandStandbynodes,adding
entries thatallowcommunicationbetween the allofthe nodesin the cluster. The
following example demonstratesentries thatmight be made to the pg_hba.conf file on
the Masternode:
# access for itself
host fmdb efm 127.0.0.1/32 md5
# access for standby
host fmdb efm 192.168.27.1/32 md5
# access for witness
host fmdb efm 192.168.27.34/32 md5
Where:
efm specifies the name ofa valid database user.
fmdb specifiesthe name ofa databaseto which theefm usermay connect.
Formore information aboutthe properties file,see Section 3.5.1.
By default,the pg_hba.conf file resides in the data directory,underyourPostgres
installation. Aftermodifying the pg_hba.conf file,you must reloadthe configuration
file on each node forthe changesto take effect. You can use the following command:
# systemctl reload edb-as-x
Where x specifiesthe Postgresversion.
Using Autostartfor the Database Servers
If a Masternodereboots,FailoverManagermay detectthe database is downon the
Masternodeandpromote a Standbynodeto therole ofMaster.Ifthis happens,the
FailoverManageragenton the(rebooted) Masternode willnot get a chanceto write the
recovery.conf file; the rebootedMasternodewill return to the clusteras a second
Masternode.
To prevent this,start the FailoverManageragentbeforestartingthedatabaseserver. The
agent will start in idle mode,and checkto see ifthere is already a masterin the cluster.
If there is a masternode,the agentwill verify that a recovery.conf file exists,and the
database will not startas a secondmaster.

13
Ensure Communication Through Firewalls
If a Linux firewall (i.e. iptables)is enabled on thehostofa FailoverManagernode,
you may need to addrules tothe firewallconfigurationthat allow tcp communication
between the FailoverManagerprocessesin the cluster. Forexample:
# iptables -I INPUT -p tcp --dport 7800:7810 -j ACCEPT
/sbin/service iptables save
The command shown above opensa smallrange ofports(7800 through7810). Failover
Managerwill connect via the port that correspondsto the port specified in the cluster
propertiesfile.
Ensure that the db.user has SufficientPrivileges
The database userspecified in the efm.properties file must have sufficient privileges
to invoke the following functions onbehalfofFailoverManager:
pg_current_wal_lsn()
pg_last_wal_replay_lsn()
pg_wal_replay_pause()
pg_is_wal_replay_paused()
pg_wal_replay_resume()
Fordetailed information abouteachofthesefunctions,please see the PostgreSQLcore
documentation,available at:
https://www.postgresql.org/docs/10/static/index.html

14
2.3 Tutorial - Configuring a Simple Failover Manager Cluster
This tutorialdescribes quickly configuringa FailoverManagerclusterin a test
environment. Othersectionsin this guideprovide key informationthat you should read
and understandbefore configuring FailoverManagerfora productiondeployment.
This tutorialassumesthat:
 A databaseserveris runningand streaming replicationis set up betweena master
and one ortwo standby nodes.
 You have installed FailoverManageron each node. Formore information about
installing FailoverManager,see Section 3.
The example that follows createsa clusternamed efm.
You should start the configuration process on a masterorstandbynode. Then, copythe
configurationfiles to othernodes to save time.
Step1:Create Working ConfigurationFiles
Copy the providedsample files to create EFM configurationfiles,and correct the
ownership:
cd /etc/edb/efm-3.4
cp efm.properties.in efm.properties
cp efm.nodes.in efm.nodes
chown efm:efm efm.properties
chown efm:efm efm.nodes
Step2:Create an Encrypted Password
Create the encryptedpassword (neededforthe properties file):
/usr/edb/efm-3.4/bin/efm encrypt efm
Follow the onscreeninstructionsto produce theencrypted version ofyourdatabase
password.
Step3:Update the efm.properties File
The cluster_name.properties file contains parameters that specify connection
properties andbehaviorsforyourFailoverManagercluster. Modificationsto property
settingsare applied when FailoverManagerstarts.

15
The following propertiesare the minimal propertiesrequired toconfigurea Failover
Managercluster. Ifyou are configuring a productionsystem,pleasesee 3.5.1fora
complete list ofproperties.
Database connection properties(needed evenon thewitnesssoit can connect toother
databases whenneeded):
db.user
db.password.encrypted
db.port
db.database
Ownerof the data directory (usually postgres orenterprisedb):
db.service.owner
Only one ofthe following properties is needed. Ifyou provide the service name,EFM
will use a service command to controlthe databaseserver whennecessary;ifyou
provide the locationofthe Postgres bin directory,EFM will use pg_ctl to controlthe
database server.
db.service.name
db.bin
The data directory in which EFM will find or create recovery.conf files:
db.recovery.conf.dir
Set to receive email notifications (the notification text is also includedin the agent log):
user.email
This is the localaddressofthe nodeand the portto use forEFM. Othernodeswill use
this addressto reach the agent,and the agentwill also use this addressforconnectingto
the localdatabase (as opposedto connectingto localhost). An example ofthe format is
included below:
bind.address=1.2.3.4:7800
Set this property to true on a witnessnodeand false ifit is a masterorstandby:
is.witness
If you are running ona networkwithoutaccessto the Internet,change thisto an address
that is available on yournetwork:
pingServerIp=8.8.8.8

16
When configuringa productioncluster,the following propertiescanbe either true or
false depending onyoursystemconfiguration and usage. Set thembothto true to
simplify startupifyou're configuring an EFM test cluster.
auto.allow.hosts=true
stable.nodes.file=true
Step4:Update the efm.nodes File
The cluster_name.nodes file is read at startupto tellan agenthowto find the rest of
the clusteror,in the caseofthe first node started,can be used to simplify authorizationof
subsequent nodes.
Add the addressesandportsof each nodein the clusterto this file. One node will act as
the membership coordinator;the list should include at least the membership coordinator's
address:
1.2.3.4:7800
1.2.3.5:7800
1.2.3.6:7800
Please note that the FailoverManageragentwill not verify the contentofthe efm.nodes
file; the agent expectsthatsome ofthe addressesin the file cannot be reached(e.g.that
anotheragent hasn’tbeenstarted yet). Formore information about the efm.nodes file,
see Section 3.5.2.
Step5:Configure the Other Nodes
Copy the efm.properties andefm.nodes files to the /etc/edb/efm-3.4 directory
on the othernodes in yoursample cluster. Aftercopyingthefiles,change the file
ownership sothe files are ownedby efm:efm. The efm.properties file can be the
same on every node, except forthe followingproperties:
 Modify the bind.address propertyto use thenode’slocaladdress.
 Set is.witness to true if the node is a witnessnode. Ifthe node is a witness
node,the propertiesrelatingto a localdatabaseinstallationwillbe ignored.
Step6:Startthe EFM Cluster
On any node,start the FailoverManageragent. The agent is namedefm-3.4; youcan
use yourplatform-specific servicecommand to controlthe service. Forexample,on a
CentOSorRHEL 7.x host usethe command:
systemctl start efm-3.4
On a a CentOSorRHEL 6.x host use the command:

17
service efm-3.4 start
Afterthe agent starts,run the following command to see thestatusofthe single-node
cluster. You should see theaddressesofthe othernodesin the Allowed node host
list.
/usr/edb/efm-3.4/bin/efm cluster-status efm
Start the agenton theothernodes.Run the efmcluster-statusefmcommand on any node
to see the clusterstatus.
If any agent fails to start,see thestartup log forinformationaboutwhat wentwrong:
cat /var/log/efm-3.4/startup-efm.log
Performing aSwitchover
If the clusterstatusoutput showsthat themasterandstandby(s)are in sync,youcan
performa switchoverwith the followingcommand:
/usr/edb/efm-3.4/bin/efm promote efm -switchover
That command will promote a standbyand reconfigure the masterdatabase as a new
standbyin the cluster. To switch back,run the command again.
Formore information aboutusingthe efm command line tool,see Section5.3.

18
3 Installing and Configuring Failover
Manager
Before installing and configuringFailoverManager,you must create a Postgres
streaming replicationscenario,andensure that the nodeshave sufficientpermissionsto
communicate with each other. You must also have credentialsthat allowaccessto the
EnterpriseDBrepository.
To requestcredentialsforthe repository,visit theEnterpriseDBAdvanced Downloads
page at:
https://www.enterprisedb.com/advanced-downloads
Follow the links in the EDB FailoverManagertable to requestcredentials.
3.1 Installing an RPM Package on a RedHat, CentOS, or OEL
Host
Afterreceiving yourcredentials,youmust create the EnterpriseDBrepository
configurationfile on each nodeofthe cluster, and thenmodify the file to enable access.
The following stepsprovide detailed informationaboutaccessingthe EnterpriseDB
repository;the stepsmustbe performed on each nodeofthe cluster:
1. Use the edb-repo packageto create the repositoryconfiguration file. You can
downloadand invoke the edb-repo file,or use rpmoryumto create the
repository. Assume superuserprivilegesanduse either rpm oryum to create the
EnterpriseDBrepository configurationfile. :
rpm -Uvh http://yum.enterprisedb.com/edbrepos/edb-repo-
latest.noarch.rpm
or
yum install -y http://yum.enterprisedb.com/edbrepos/edb-
repo-latest.noarch.rpm
The repositoryconfiguration file is named edb.repo; it resides in
/etc/yum.repos.d.
2. Use yourchoiceofeditorto modify the repositoryconfiguration file,enabling the
[enterprisedb-tools] and the [enterprisedb-dependencies]entries.
To enable a repository,change the valueofthe enabledparameterto 1and replace

19
the username and password placeholdersin the baseurlspecification with your
username and the repositorypassword.
[enterprisedb-tools]
name=EnterpriseDB Tools $releasever - $basearch
baseurl=http://<username>:<password>@yum.enterprisedb.com/t
ools/redhat/rhel-$releasever-$basearch
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY
[enterprisedb-dependencies]
name=EnterpriseDB Dependencies $releasever - $basearch
baseurl=http://<username>:<password>@yum.enterprisedb.com/d
ependencies/redhat/rhel-$releasever-$basearch
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY
3. Aftermodifying applicable entries in the repository configurationfile,save the
configurationfile and exit the editor.
Then,you canuse theyuminstallcommand to installFailoverManager. Forexample, to
installFailoverManagerversion 3.4,use the command:
yum install edb-efm34
When you installan RPM package that is signedby a source that is notrecognized by
yoursystem,yummay askforyourpermissionto import the key to yourlocalserver. If
prompted,andyouare satisfiedthatthe packagescome froma trustworthysource,entera
y, and press Return to continue.
During the installation,yummay encountera dependencythatit cannot resolve. Ifit
does,it will provide a list ofthe required dependenciesthat youmustmanually resolve.
FailoverManagermust be installed by root. During theinstallationprocess,the
installerwill also create a usernamed efm that hassufficient privilegesto invoke scripts
that controlthe FailoverManagerservice forclustersownedby enterprisedb or
postgres.
If you are using FailoverManagerto monitora clusterownedby a userotherthan
enterprisedb orpostgres,see Section 3.4,Extending FailoverManager
Permissions.
Afterinstalling FailoverManageron eachnodeofthe cluster,you must:
1. Modify the clusterproperties file on each node. Fordetailed information about
modifying the clusterpropertiesfile,see Section 3.5.1.

20
2. Modify the clustermembers file on each node. Formore information about the
clustermembers file,see Section 3.5.2.
3. If applicable,configure andtest virtualIPaddresssettingsand any scripts thatare
identified in the clusterpropertiesfile.
4. Start the FailoverManageragent oneachnode ofthecluster. Formore
information about controlling theFailoverManagerservice,seeSection 5.
3.1.1 Installation Locations
FailoverManagercomponents are installed in the following locations:
Component Location
Executables /usr/edb/efm-3.4/bin
Libraries /usr/edb/efm-3.4/lib
Cluster configuration files /etc/edb/efm-3.4
Logs /var/log/efm-3.4
Lock files /var/lock/efm-3.4
Log rotation file /etc/logrotate.d/efm-3.4
sudo configuration file /etc/sudoers.d/efm-34
Binary to access VIP without sudo /usr/edb/efm-3.4/bin/secure

21
3.2 Installing an RPM Package on a Debian or Ubuntu Host
To installFailoverManager,youmustalso havecredentials thatallowaccessto the
EnterpriseDBrepository. To request credentialsforthe repository,visit the
EnterpriseDBAdvanced Downloads page at:
Follow the links in the EDB FailoverManagertable to requestcredentials.
The following stepswill walk you throughusingthe EnterpriseDBapt repository to
installFailoverManager. Whenusingthe commands,replacethe username and
password with the credentialsprovided byEnterpriseDB.
1. Assume superuserprivileges:
sudo su -
2. Configure the EnterpriseDBapt repository:
sh -c 'echo "deb
https://username:password@apt.enterprisedb.com/$(lsb_releas
e -cs)-edb/ $(lsb_release -cs) main" >
/etc/apt/sources.list.d/edb-$(lsb_release -cs).list'
3. Add support to yoursystemforsecure APTrepositories:
apt-get install apt-transport-https
4. Add the EDBsigning key:
wget -q -O - https:// username: password
@apt.enterprisedb.com/edb-deb.gpg.key | apt-key add -
5. Update the repositorymeta data:
apt-get update
6. InstallFailoverManager:
apt-get install edb-efm34

22
3.3 Installing an RPM Package on a SLES Host
To installFailoverManager,youmustalso havecredentials thatallowaccessto the
EnterpriseDBrepository. To request credentialsforthe repository,visit theAdvanced
Downloads page at:
You can use thezypper package managerto installa FailoverManageragenton an
SLES 12 host. zypper will attempt to satisfypackagedependencies asit installs a
package,butrequires accesstospecific repositoriesthat are nothostedat EnterpriseDB.
You must assume superuserprivilegesandstopany firewalls beforeinstalling Failover
Manager. Then,usethe following commandsto add EnterpriseDBrepositoriesto your
system:
zypper addrepo http://zypp.enterprisedb.com/suse/epas96-sles.repo
zypper addrepo http://zypp.enterprisedb.com/suse/epas-sles-
tools.repo
zypper addrepo http://zypp.enterprisedb.com/suse/epas-sles-
dependencies.repo
The commands create the repository configurationfiles in the /etc/zypp/repos.d
directory. Then,use the followingcommand torefreshthe metadata onyourSLES host
to include the EnterpriseDBrepository:
zypper refresh
When prompted,providecredentials forthe repository, and specify a to always trust the
provided key,andupdatethe metadata to include the EnterpriseDBrepository.
You must also addSUSEConnect andtheSUSEPackage Hub extension to the SLES
host,andregisterthehostwith SUSE,allowing accessto SUSErepositories. Use the
commands:
zypper install SUSEConnect
SUSEConnect -r registration_number -e user_id
SUSEConnect -p PackageHub/12/x86_64
SUSEConnect -p sle-sdk/12/x86_64
Then,you canuse thezypperutility to installa FailoverManageragent:
zypper install edb-efm34
Fordetailed information aboutregistering a SUSEhost,visit:
https://www.suse.com/support/kb/doc/?id=7016626

23
3.4 Extending Failover Manager Permissions
During the FailoverManagerinstallation,theinstallercreatesa usernamed efm. efm
does nothave sufficientprivilegesto performmanagement functionsthatare normally
limited to the database owneroroperatingsystemsuperuser.
 When performing managementfunctionsrequiring database superuserprivileges,
efm invokes the efm_db_functions script.
 When performing managementfunctionsrequiring operatingsystemsuperuser
privileges,efm invokesthe efm_root_functions script.
 When assigningorreleasinga virtualIPaddress, efm invokestheefm_address
script.
The efm_db_functions orefm_root_functions scriptsperformmanagement
functions onbehalfofthe efm user.
The sudoers file containsentriesthat allowthe userefm to controlthe FailoverManager
service forclustersownedby postgres orenterprisedb. You can modify a copyof
the sudoersfile to grant permissionto manage Postgresclusters ownedby otherusersto
efm.
The efm-34 file is located in /etc/sudoers.d,andcontainsthe followingentries:
# Copyright EnterpriseDB Corporation, 2014-2019. All Rights
# Reserved.
#
# Do not edit this file. Changes to the file may be overwritten
# during an upgrade.
#
# This file assumes you are running your efm cluster as user
# 'efm'. If not, then you will need to copy this file.
# Allow user 'efm' to sudo efm_db_functions as either 'postgres'
# or 'enterprisedb'. If you run your db service under a
# non-default account, you will need to copy this file to grant
# the proper permissions and specify the account in your efm
# cluster properties file by changing the 'db.service.owner'
# property.
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-3.4
/bin/efm_db_functions
# Allow user 'efm' to sudo efm_root_functions as 'root' to
# write/delete the PID file, validate the db.service.owner
# property, etc.

24
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_root_functions
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_address
# relax tty requirement for user 'efm'
Defaults:efm !requiretty
If you are using FailoverManagerto monitorclustersthat are owned byusersotherthan
postgres or enterprisedb,make a copy ofthe efm-34 file, and modify the content
to allow the userto accesstheefm_functions script to manage theirclusters.
If an agent cannot start becauseofpermission problems,make sure the default
/etc/sudoers file containsthefollowing line at the end ofthe file:
## Read drop-in files from /etc/sudoers.d (the # here does not
# mean a comment)
#includedir /etc/sudoers.d

25
3.4.1 Running Failover Manager without sudo
By default,FailoverManageruses sudo to securely manage accessto system
functionality. Ifyou chooseto configure FailoverManagerto run withoutsudoaccess,
please note that root accessis stillrequired to:
 installthe FailoverManagerRPM.
 performFailoverManagersetuptasks.
To run FailoverManagerwithoutsudo,youmustselect a database processownerthat
will have privilegestoperformmanagement functionson behalfofFailoverManager.
The usercouldbe the default database superuser(forexample, enterprisedb or
postgres)or anotherprivilegeduser. Afterselectingthe user:
1. Use the following command to addtheuserto the efm group:
usermod -a -G efm enterprisedb
This should allowthe userto write to /var/run/efm-3.4 and
/var/lock/efm-3.4.
2. If you are reusing a clustername,remove any previously created log files; the
newuserwill not be able to write to log files created bythe default (orother)
owner.
3. Copy the clusterproperties templatefile and the nodestemplate file:
su - enterprisedb
cp /etc/edb/efm-3.4/efm.properties.in
directory/cluster_name.properties
cp /etc/edb/efm-3.4/efm.nodes.in
directory/cluster_name.nodes
Then,modify the clusterpropertiesfile, providing the name ofthe userin the
db.service.owner property. You must also ensure thatthe db.service.name
propertyis blank; without sudo,youcannot runserviceswithoutroot access.
Aftermodifying the configuration,the newuser cancontrolFailoverManagerwith the
following command:
/usr/edb/efm-3.4/bin/runefm.sh start|stop
directory/cluster_name.properties
Where directory/cluster_name.properties specifies the fullpath and name of
the clusterpropertiesfile. Please note thatthe usermust ensure that the fullpath to the
propertiesfile must be provided wheneverthe non-default useris controlling agentsor
using the efm script.

26
To allow the newuserto manage FailoverManageras a service,youmustprovide a
customscript orunit file.
FailoverManageruses a binary named manage-vip that residesin /usr/edb/efm-
3.4/bin/secure/ to performVIP management operationswithout sudo privileges.
This script usessetuid to acquire with the privilegesneededto manage VirtualIP
addresses.
 This directory is only accessible to root and usersin the efm group.
 The binary is only executable by root andthe efm group.
Forsecurity reasons,we recommend againstmodifyingthe accessprivilegesof the
/usr/edb/efm-3.4/bin/secure/ directory orthe manage-vip script.
Formore information aboutusingFailoverManagerwithout sudo,visit:
https://www.enterprisedb.com/blog/running-edb-postgres-failover-manager-without-sudo

27
3.5 Configuring Failover Manager
Configurable FailoverManagerpropertiesare specified in two user-modifiable files:
 efm.properties
 efm.nodes
The efm.properties file containsthe propertiesofthe individualnodeon whichit
resides,while the efm.nodes file containsa list ofthe currentFailoverManagercluster
members. By default,the installerplacesthefiles in the /etc/edb/efm-3.4 directory.
Please note that alluserscriptsreferenced in the propertiesfile will be invoked as the
FailoverManageruser.
3.5.1 The Cluster Properties File
The FailoverManagerinstallercreatesa file template forthe clusterpropertiesfile named
efm.properties.in in the /etc/edb/efm-3.4 directory. Aftercompletingthe
FailoverManagerinstallation,youmustmake a working copy ofthe templatebefore
modifying the file contents. Forexample,the following command copiesthe
efm.properties.in file, creating a propertiesfile named efm.properties:
# cp /etc/edb/efm-3.4/efm.properties.in /etc/edb/efm-3.4/efm.properties
Aftercopying thetemplatefile,change the ownerofthe file to efm:
# chown efm:efm efm.properties
Please note:By default,FailoverManagerexpects the clusterpropertiesfile to be named
efm.properties. If you name the properties file somethingotherthan
efm.properties,you must modify the service script orunit file to instructFailover
Managerto use a different name.
Aftercreating theclusterpropertiesfile,add (ormodify)configurationparametervalues
as required. Fordetailed information about each property,seeSection 3.5.1.1.
The propertyfiles are ownedby root. The FailoverManagerservicescript expectsto
find the files in the /etc/edb/efm-3.4 directory. Ifyou move the propertyfile to
anotherlocation,youmustcreatea symbolic linkthat specifies thenewlocation.

28
3.5.1.1 Specifying Cluster Properties
You can use thepropertieslisted in the clusterpropertiesfile to specify connection
propertiesandbehaviorsforyourFailoverManagercluster. Modificationsto property
settingswill be applied when FailoverManagerstarts. Ifyou modify a propertyvalue
you must restart FailoverManagerto apply the changes.
Property valuesare case-sensitive. Note that while Postgresuses quoted stringsin
parametervalues,FailoverManagerdoes not allowquoted stringsin propertyvalues.
Forexample, while you might specify an IPaddressin a Postgres configuration
parameteras:
listen_addresses='192.168.2.47'
FailoverManagerrequires thatthe value notbe enclosed in quotes:
bind.address=192.168.2.54:7800
Use the properties in the efm.properties file to specify connection,administrative,and
operationaldetails forFailoverManager.
Use the following properties to specify connection details forthe FailoverManager
cluster:
# The value for the password property should be the output from
# 'efm encrypt' -- do not include a cleartext password here. To
# prevent accidental sharing of passwords among clusters, the
# cluster name is incorporated into the encrypted password. If
# you change the cluster name (the name of this file), you must
# encrypt the password again with the new name.
# The db.port property must be the same for all nodes.
db.user=
db.password.encrypted=
db.port=
db.database=
The db.user specified must havesufficient privilegesto invoke selectedPostgreSQL
commands on behalfofFailoverManager. Formore information,please see Section 2.2.
Forinformation aboutencryptingthepassword forthe database user, see Section 3.5.1.2.
Use the db.service.owner propertyto specify thename ofthe operating systemuser
that ownsthe clusterthat is beingmanaged byFailoverManager. This propertyis not
required on a dedicatedwitnessnode.
# This property tells EFM which OS user owns the $PGDATA dir for
# the 'db.database'. By default, the owner is either 'postgres'

29
# for PostgreSQL or 'enterprisedb' for EDB Postgres Advanced
# Server. However, if you have configured your db to run as a
# different user, you will need to copy the /etc/sudoers.d/efm-XX
# conf file to grant the necessary permissions to your db owner.
#
# This username must have write permission to the
# 'db.recovery.conf.dir' specified below.
db.service.owner=
Specify the name ofthe database servicein the db.service.name property ifyou use the
service or systemctl command when startingorstopping the service.
# Specify the proper service name in order to use service
# commands rather than pg_ctl to start/stop/restart a database.
# For example, if this property is set, then 'service <name>
# restart' or 'systemctl restart <name>' (depending on OS
# version) will be used to restart the database rather than
# pg_ctl. This property is required unless db.bin is set.
db.service.name=
You should use thesame service controlmechanism(pg_ctl,service,or
systemctl)each time you start orstopthe database service. Ifyou use the pg_ctl
programto controlthe service,specify thelocation ofthe pg_ctl programin the db.bin
property.
# Specify the directory containing the pg_ctl command, for
# example: /usr/pgsql-9.6/bin. Unless the db.service.name
# property is used, the pg_ctl command is used to
# start/stop/restart databases as needed after a failover or
# switchover. This property is required unless db.service.name
# is set.
db.bin=
Use the db.recovery.conf.dir property to specify the locationto which a recoveryfile
will be written on the Masternodeofthe cluster,and a triggerfile is written on a
Standby. This propertyis not requiredon a dedicated witnessnode.
# Specify the location of the db recovery.conf file on the node.
# On a standby node, the trigger file location is read from the
# file in this directory. After a failover, the recovery.conf
# files on remaining standbys are changed to point to the new
# master db (a copy of the original is made first). On a master
# node, a recovery.conf file will be written during failover and
# promotion to ensure that the master node can not be restarted
# as the master database.

30
db.recovery.conf.dir=
Use the jdbc.sslmode propertyto instruct FailoverManagerto use SSLconnections;
by default,SSLis disabled.
# Use the jdbc.sslmode property to enable ssl for EFM
# connections. Setting this property to anything but 'disable'
# will force the agents to use 'ssl=true' for all JDBC database
# connections (to both local and remote databases).
# Valid values are:
#
# disable - Do not use ssl for connections.
# verify-ca - EFM will perform CA verification before allowing
# the certificate.
# require - Verification will not be performed on the server
# certificate.
jdbc.sslmode=disable
Forinformation aboutconfiguring andusingSSL,please see:
https://www.postgresql.org/docs/10/static/ssl-tcp.html
and
https://jdbc.postgresql.org/documentation/94/ssl.html
Use the user.email propertyto specify anemailaddress (ormultiple email addresses)
that will receive any notificationssent byFailoverManager.
# Email address(es) for notifications. The value of this
# property must be the same across all agents. Multiple email
# addresses must be separated by space. If using a notification
# script instead, this property can be left blank.
user.email=
Use the notification.level propertyto specify the minimumseveritylevelat which
FailoverManagerwill send usernotifications orwhen a notificationscriptis called. For
a complete list ofnotifications,please see Section 7.
# Minimum severity level of notifications that will be sent by
# the agent. The minimum level also applies to the notification
# script (below). Valid values are INFO, WARNING, and SEVERE.
# A list of notifications is grouped by severity in the user's
# guide.
notification.level=INFO
Use the script.notification propertyto specify thepathto a user-supplied script
that actsas a notificationservice;the script willbe passeda message subject anda

31
messagebody. The scriptwill be invoked eachtime FailoverManagergeneratesa user
notification.
# Absolute path to script run for user notifications.
#
# This is an optional user-supplied script that can be used for
# notifications instead of email. This is required if not using
# email notifications. Either/both can be used. The script will
# be passed two parameters: the message subject and the message
# body.
script.notification=
The bind.address property specifies the IPaddressandportnumberofthe agent onthe
current node ofthe FailoverManagercluster.
# This property specifies the ip address and port that jgroups
# will bind to on this node. The value is of the form
# <ip>:<port>.
# Note that the port specified here is used for communicating
# with other nodes, and is not the same as the admin.port below,
# used only to communicate with the local agent to send control
# signals.
# For example, <provide_your_ip_address_here>:7800
bind.address=
Use the admin.port propertyto specify a port on whichFailoverManagerlistens for
administrative commands.
# This property controls the port binding of the administration
# server which is used for some commands (ie cluster-status). The
# default is 7809; you can modify this value if the port is
# already in use.
admin.port=7809
Set the is.witness propertyto true to indicate that thecurrent node is a witnessnode.
If is.witness is true, the localagent willnot checkto seeifa localdatabaseis
running.
# Specifies whether or not this is a witness node. Witness nodes
# do not have local databases running.
is.witness=
The Postgres pg_is_in_recovery() functionis a booleanfunctionthat reportsthe
recovery state ofa database. The function returns true ifthe databaseis in recovery,or
false if the database is notin recovery. Whenan agentstarts,it connectsto the local

32
database andinvokesthe pg_is_in_recovery() function. Ifthe serverresponds
true, the agentassumestherole ofstandby; ifthe serverresponds false,the agent
assumesthe role ofmaster. Ifthere is no localdatabase,the agent willassume an idle
state.
If is.witness is true, FailoverManagerwill not checktherecoverystate.
The local.period property specifies howmany secondsbetween attemptsto contact
the database server.
The local.timeout property specifies howlongan agent willwait fora positive
response fromthe localdatabase server.
The local.timeout.final propertyspecifies howlong an agent willwait afterthe
final attempt to contact the database serveronthe currentnode. Ifa response is not
received fromthe databasewithin the numberofsecondsspecified by the
local.timeout.final property,the database is assumed tohave failed.
Forexample, given the default valuesoftheseproperties,a checkofthe localdatabase
happensonce every 10seconds. Ifan attempt to contactthe localdatabase doesnotcome
backpositivewithin 60seconds,FailoverManagermakes a finalattempt to contactthe
database. Ifa responseis not receivedwithin 10seconds,FailoverManagerdeclares
database failure and notifies the administratorlisted in the user.email property. These
propertiesare not requiredon a dedicated witnessnode.
# These properties apply to the connection(s) EFM uses to monitor
# the local database. Every 'local.period' seconds, a database
# check is made in a background thread. If the main monitoring
# thread does not see that any checks were successful in
# 'local.timeout' seconds, then the main thread makes a final
# check with a timeout value specified by the
# 'local.timeout.final' value. All values are in seconds.
# Whether EFM uses single or multiple connections for database
# checks is controlled by the 'db.reuse.connection.count'
# property.
local.period=10
local.timeout=60
local.timeout.final=10
If necessary,youshould modify these values tosuit yourbusinessmodel.
Use the remote.timeout propertyto specify howmany secondsan agentwaits fora
response froma remote database server(i.e.,howlong a standby agent waitsto verify
that the masterdatabase is actually down before performing failover).
# Timeout for a call to check if a remote database is responsive.
# For example, this is how long a standby would wait for a
# DB ping request from itself and the witness to the master DB
# before performing failover.

33
remote.timeout=10
Use the node.timeout propertyto specify the numberofsecondsthat an agent will wait
for a responsefroma node when determiningifa node hasfailed. The node.timeout
propertyvalue specifiesa timeout value foragent-to-agent communication; othertimeout
propertiesin the clusterpropertiesfile specify valuesforagent-to-database
communication.
# The total amount of time in seconds to wait before determining
# that a node has failed or been disconnected from this node.
#
# The value of this property must be the same across all agents.
node.timeout=50
Use the stop.isolated.master propertyto instruct FailoverManagerto shut down the
database ifa masteragentdetectsthat it is isolated. Whentrue (the default),Failover
Managerwill stop thedatabase before invokingthescriptspecified in the script.
master.isolated property.
# Shut down the database after a master agent detects that it has
# been isolated from the majority of the efm cluster. If set to
# true, efm will stop the database before running the
# 'script.master.isolated' script, if a script is specified.
stop.isolated.master=true
Use the stop.failed.master property to instructFailoverManagerto attempt to shut
down a masterdatabase ifit can not reach the database. Iftrue, FailoverManagerwill
run the script specified in the script.db.failure propertyafterattempting to shutdownthe
database.
# Attempt to shut down a failed master database after EFM can no
# longer connect to it. This can be used for added safety in the
# case a failover is caused by a failure of the network on the
# master node.
# If specified, a 'script.db.failure' script is run after this
attempt.
stop.failed.master=true
Use the master.shutdown.as.failure parameterto indicate that anyshutdown ofthe
FailoverManageragenton themasternodeshould be treated asa failure. If this
parameteris set to true and the masteragentstops(forany reason),theclusterwill
attempt to confirmif the databaseon themasternodeis running:
 If the database is reached,a notification willbe sent informing you oftheagent
status.

34
 If the database is notreached,a failoverwill occur.
# Treat a master agent shutdown as a failure. This can be set to
# true to treat a master agent shutdown as a failure situation,
# e.g. during the shutdown of a node, accidental or otherwise.
# Caution should be used when using this feature, as it could
# cause an unwanted promotion in the case of performing master
# database maintenance.
# Please see the user's guide for more information.
master.shutdown.as.failure=false
The master.shutdown.as.failure property is meant tocatchusererrorrather
failures,such asthe accidentalshutdownofa masternode. The propershutdownofa
node can appearto therest ofthe cluster like a userhasstoppedthe masterFailover
Manageragent (forexample to performmaintenanceon themasterdatabase). Ifyou set
the master.shutdown.as.failure propertyto true,care must be takenwhen
performing maintenance.
To performmaintenance onthe masterdatabase when master.shutdown.as.failure
is true, you shouldstopthe masteragentand wait to receivea notification thatthe
masteragent hasfailed but thedatabase is stillrunning. Thenit is safe to stopthe master
database. Alternatively,youcan use the efm stop-cluster command to stop allofthe
agentswithoutfailure checksbeingperformed.
Use the pingServer propertyto specify the IPaddressofa serverthatFailover
Managercan use to confirmthat networkconnectivityis not a problem.
# This is the address of a well-known server that EFM can ping
# in an effort to determine network reachability issues. It
# might be the IP address of a nameserver within your corporate
# firewall or another server that *should* always be reachable
# via a 'ping' command from each of the EFM nodes.
#
# There are many reasons why this node might not be considered
# reachable: firewalls might be blocking the request, ICMP might
# be filtered out, etc.
#
# Do not use the IP address of any node in the EFM cluster
# (master, standby, or witness because this ping server is meant
# to provide an additional layer of information should the EFM
# nodes lose sight of each other.
#
# The installation default is Google's DNS server.
pingServerIp=8.8.8.8
Use the pingServerCommand propertyto specify the commandusedto testnetwork
connectivity.

35
# This command will be used to test the reachability of certain
# nodes.
#
# Do not include an IP address or hostname on the end of
# this command - it will be added dynamically at runtime with the
# values contained in 'virtualIp' and 'pingServer'.
#
# Make sure this command returns reasonably quickly - test it
# from a shell command line first to make sure it works properly.
pingServerCommand=/bin/ping -q -c3 -w5
Use the auto.allow.hosts propertyto instruct the serverto usethe addresses
specified in the .nodes file ofthe first node started to update theallowed host list.
Enabling this property (settingauto.allow.hosts to true)can simplify clusterstart-
up.
# Have the first node started automatically add the addresses
# from its .nodes file to the allowed host list. This will make
# it faster to start the cluster when the initial set of hosts
# is already known.
auto.allow.hosts=false
Use the stable.nodes.file property to instructthe serverto notrewrite the nodesfile
when a node joins orleaves the cluster. This propertyis most usefulin clusterswith
unchangingIPaddresses.
# When set to true, EFM will not rewrite the .nodes file whenever
# new nodes join or leave the cluster. This can help starting a
# cluster in the cases where it is expected for member addresses
# to be mostly static, and combined with 'auto.allow.hosts' makes
# startup easier when learning failover manager.
stable.nodes.file=false
The db.reuse.connection.count propertyallows the administratorto specify the
numberoftimes FailoverManagerreuses the same database connection to checkthe
database health. The default value is 0,indicatingthat FailoverManagerwill create a
fresh connectioneachtime. This propertyis not required ona dedicatedwitnessnode.
# This property controls how many times a database connection is
# reused before creating a new one. If set to zero, a new
# connection will be created every time an agent pings its local
# database.
db.reuse.connection.count=0

36
The auto.failover propertyenablesautomatic failover. By default, auto.failover
is set to true.
# Whether or not failover will happen automatically when the master
# fails. Set to false if you want to receive the failover notifications
# but not have EFM actually perform the failover steps.
# The value of this property must be the same across all agents.
auto.failover=true
Use the auto.reconfigure property to instructFailoverManagerto enable ordisable
automatic reconfiguration ofremaining Standbyserversafterthe primary standbyis
promoted to Master. Set the property to true to enable automatic reconfiguration(the
default)orfalse to disable automatic reconfiguration. Thisproperty is notrequiredon
a dedicated witness node.
# After a standby is promoted, failover manager will attempt to
# update the remaining standbys to use the new master. Failover
# manager will back up recovery.conf, change the host parameter
# of the primary_conninfo entry, and restart the database. The
# restart command is contained in either the efm_db_functions or
# efm_root_functions file; default when not running db as an os
# service is:
# "pg_ctl restart -m fast -w -t <timeout> -D <directory>"
# where the timeout is the local.timeout property value and the
# directory is specified by db.recovery.conf.dir. To turn off
# automatic reconfiguration, set this property to false.
auto.reconfigure=true
Please note:primary_conninfo is a space-delimited list ofkeyword=value pairs.
Please note: Ifyou are usingreplication slotsto manage yourWALsegments,automatic
reconfigurationis not supported; youshould set auto.reconfigure to false. For
more information,see Section 2.2.
Use the promotable propertyto indicate that a node should notbe promoted. To
override the setting,usethe efm set-priority command at runtime; formore
information about theefm set-priority command,see Section 5.3.
# A standby with this set to false will not be added to the
# failover priority list, and so will not be available for
# promotion. The property will be used whenever an agent starts
# as a standby or resumes as a standby after being idle. After
# startup/resume, the node can still be added or removed from the
# priority list with the 'efm set-priority' command. This
# property is required for all non-witness nodes.
promotable=true

37
Use the minimum.standbys propertyto specify the minimumnumberofstandbynodes
that will be retained on a cluster; ifthe standbycount dropsto the specified minimum, a
replica node will not be promotedin the eventofa failure of the masternode.
# Instead of setting specific standbys as being unavailable for
# promotion, this property can be used to set a minimum number
# of standbys that will not be promoted. Set to one, for
# example, promotion will not happen if it will drop the number
# of standbys below this value. This property must be the same on
# each node.
minimum.standbys=0
Use the recovery.check.period property to specify thenumberofsecondsthat
FailoverManagerwill wait before checks tosee ifa database is out ofrecovery.
# Time in seconds between checks to see if a promoting database
# is out of recovery.
recovery.check.period=2
Use the auto.resume.period propertyto specify the numberofseconds(aftera
monitored database fails,and anagent hasassumed anidle state,orwhen starting in
IDLE mode)during which an agent will attempt to resume monitoringthat database.
# Period in seconds for IDLE agents to try to resume monitoring
# after a database failure or when starting in IDLE mode. Set to
# 0 for agents to not try to resume (in which case the
# 'efm resume <cluster>' command is used after bringing a
# database back up).
auto.resume.period=0
FailoverManagerprovidessupport forclustersthat use a virtualIP. If yourclusterusesa
virtualIP, provide the host name orIP address in the virtualIp property;specify the
corresponding prefixin the virtualIp.prefix property. IfvirtualIp is left blank,
virtualIP support is disabled.
Use the virtualIp.interface propertyto providethe networkinterface used bythe
VIP.
The specified virtualIPaddressis assignedonly to themasternodeofthe cluster. Ifyou
specify virtualIp.single=true,the same VIP addresswill be used onthe new
masterin the event ofa failover. Specify a value of false to providea uniqueIP
address foreachnode ofthe cluster.
Forinformation aboutusing a virtualIPaddress,see Section 3.6.

38
# These properties specify the IP and prefix length that will be
# remapped during failover. If you do not use a VIP as part of
# your failover solution, leave the virtualIp property blank to
# disable Failover Manager support for VIP processing (assigning,
# releasing, testing reachability, etc).
#
# If you specify a VIP, the interface and prefix are required.
#
# If specify a host name, it will be resolved to an IP address
# when acquiring or releasing the VIP. If the host name resolves
# to more than one IP address, there is no way to predict which
# address Failover Manager will use.
#
# By default, the virtualIp and virtualIp.prefix values must be
# the same across all agents. If you set virtualIp.single to
# false, you can specify unique values for virtualIp and
# virtualIp.prefix on each node.
#
# If you are using an IPv4 address, the virtualIp.interface value
# should not contain a secondary virtual ip id (do not include
# ":1", etc).
virtualIp=
virtualIp.interface=
virtualIp.prefix=
virtualIp.single=true
Provide pathsto scriptsthat reconfigure yourload balancerin the event ofa switchover
or masterfailure scenario. These scriptswill also be invokedin the event ofa standby
failure.
If you are using these properties,theyshould be providedon every nodeofthe cluster
(master,standby,andwitness). This ensuresthat ifa databasenode fails,anothernode
will call the detach script with thefailed node'saddress.
Set the check.vip.before.promotion propertyto false to indicatethatFailover
Managerwill not checkto seeifa VIP is in use before assigningit to a a newmasterin
the event ofa failure. Please notethatthiscould result in multiple nodesbroadcastingon
the same VIP address;unlessthemasternode is isolatedorcan be shutdownvia another
process,you should set thisproperty to true.
# Whether to check if the VIP (when used) is still in use before
# promoting after a master failure. Turning this off may allow
# the new master to have the VIP even though another node is also
# broadcasting it. This should only be used in environments where
# it is known that the failed master node will be isolated or
# shut down through other means.
check.vip.before.promotion=true

39
Provide a script name after the script.load.balancer.attach propertyto identify
a script thatwill be invoked whena nodeshould be attached tothe loadbalancer. Use
the script.load.balancer.detach propertyto specify the name ofa script that will
be invoked whena node should be detachedfromthe load balancer. Include the%h
placeholderto representthe IPaddressofthe node that is being attached orremovedfrom
the cluster.
# Absolute path to load balancer scripts
# The attach script is called when a node should be attached to
# the load balancer, for example after a promotion. The detach
# script is called when a node should be removed, for example
# when a database has failed or is about to be stopped. Use %h to
# represent the IP/hostname of the node that is being
# attached/detached.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %h
script.load.balancer.attach=
script.load.balancer.detach=
script.fence specifies the path to anoptionaluser-supplied script that willbe invoked
during the promotionofa standbynode to masternode.
# absolute path to fencing script run during promotion
#
# This is an optional user-supplied script that will be run
# during failover on the standby database node. If left blank,
# no action will be taken. If specified, EFM will execute this
# script before promoting the standby.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.fence=/somepath/myscript %p %f
#
# NOTE: FAILOVER WILL NOT OCCUR IF THIS SCRIPT RETURNS A NON-ZERO
EXIT CODE.
script.fence=
Use the script.post.promotion propertyto specify the pathto an optionaluser-
suppliedscriptthatwill be invoked aftera standby node hasbeen promoted tomaster.
# Absolute path to fencing script run after promotion
#
# This is an optional user-supplied script that will be run after

40
# failover on the standby node after it has been promoted and
# is no longer in recovery. The exit code from this script has
# no effect on failover manager, but will be included in a
# notification sent after the script executes.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.post.promotion=/somepath/myscript %f %p
script.post.promotion=
Use the script.resumed propertyto specify an optionalpath toa user-supplied script
that will be invoked when an agent resumesmonitoring ofa database.
# Absolute path to resume script
#
# This script is run before an IDLE agent resumes
# monitoring its local database.
script.resumed=
Use the script.db.failure property to specify thecomplete path to an optionaluser-
suppliedscriptthatFailoverManagerwill invoke if an agent detectsthat the database that
it monitors hasfailed.
# Absolute path to script run after database failure
#
# an agent detects that its local database has failed.
script.db.failure=
Use the script.master.isolated propertyto specify the complete pathto an optional
user-suppliedscriptthatFailoverManagerwill invoke if the agentmonitoringthemaster
database detectsthat themasteris isolatedfromthe majority ofthe FailoverManager
cluster. This script is called immediately afterthe VIPis released (ifa VIP is in use).
# Absolute path to script run on isolated master
#
# a master agent detects that it has been isolated from the
# majority of the efm cluster.
script.master.isolated=

41
Use the script.remote.pre.promotion property to specify thepath andname ofa
script that willbe invoked on anyagent nodesnot involved in the promotionwhena node
is about to promote its database to master.
Include the %p placeholderto identify theaddressofthe newprimary node.
# Absolute path to script invoked on non-promoting agent nodes
# before a promotion.
#
# This optional user-supplied script will be invoked on other
# agents when a node is about to promote its database. The exit
# code from this script has no effect on Failover Manager, but
# will be included in a notification sent after the script
# executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.pre.promotion=/path_name/script_name %p
script.remote.pre.promotion=
Use the script.remote.post.promotion property to specify the path andname ofa
script that willbe invoked on any non-masternodes aftera promotion occurs.
Include the %p placeholderto identify theaddressofthe newprimary node.
# Absolute path to script invoked on non-master agent nodes
# after a promotion.
#
# This optional user-supplied script will be invoked on nodes
# (except the new master) after a promotion occurs. The exit code
# from this script has no effect on Failover Manager, but will be
# included in a notification sent after the script executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.post.promotion=/path_name/script_name %p
script.remote.post.promotion=
Use the script.custom.monitor property to provide thename and locationofan
optionalscript that willbe invoked on regularintervals (specified in secondsbythe
custom.monitor.interval property).

42
Use custom.monitor.timeout to specify the maximum time that the script will be
allowed to run; if script executiondoesnotcomplete within the time specified,Failover
Managerwill send a notification.
Set custom.monitor.safe.mode to true to instruct FailoverManagerto report non-
zero exit codes fromthe script,but not promote a standbyas a result ofan exit code.
# Absolute path to a custom monitoring script.
#
# Use script.custom.monitor to specify the location and name of
# an optional user-supplied script that will be invoked
# periodically to perform custom monitoring tasks. A non-zero
# exit value means that a check has failed; this will be treated
# as a database failure. On a master node, script failure will
# cause a promotion. On a standby node script failure will
# generate a notification and the agent will become IDLE.
#
# The custom.monitor.* properties are required if a custom
# monitoring script is specified:
#
# custom.monitor.interval is the time in seconds between
executions of the script.
#
# custom.monitor.timeout is a timeout value in seconds for how
# long the script will be allowed to run. If script execution
# exceeds the specified time, the task will be stopped and a
# notification sent. Subsequent runs will continue.
#
# If custom.monitor.safe.mode is set to true, non-zero exit codes
# from the script will be reported but will not cause a promotion
# or be treated as a database failure. This allows testing of the
# script without affecting EFM.
#
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=
Use the sudo.command propertyto specify a command thatwill be invoked by Failover
Managerwhen performing tasksthat require extended permissions. Use thisoption to
include command optionsthat might be specific to yoursystemauthentication.
Use the sudo.user.command property to specify a command that willbe invoked by
FailoverManagerwhenexecutingcommandsthatwill be performed by the database
owner.
# Command to use in place of 'sudo' if desired when efm runs
# the efm_db_functions or efm_root_functions, or efm_address
# scripts.
# Sudo is used in the following ways by efm:

43
#
# sudo /usr/edb/efm-<version>/bin/efm_address <arguments>
# sudo /usr/edb/efm-<version>/bin/efm_root_functions <arguments>
# sudo -u <db service owner>
/usr/edb/efm-<version>/bin/efm_db_functions
<arguments>
#
# 'sudo' in the first two examples will be replaced by the value
# of the sudo.command property. 'sudo -u <db service owner>' will
# be replaced by the value of the sudo.user.command property.
# The '%u' field will be replaced with the db owner.
sudo.command=sudo
sudo.user.command=sudo -u %u
Use the lock.dir property to specify an alternate locationforthe FailoverManagerlock
file; the file preventsFailoverManagerfromstartingmultiple (potentially orphaned)
agentsfora single clusteron the node.
# Specify the directory of lock file on the node. Failover
# Manager creates a file named <cluster>.lock at this location to
# avoid starting multiple agents for same cluster. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/lock/efm-<version>'
lock.dir=
Use the log.dir property to specify thelocation towhich agentlog files will be written;
FailoverManagerwill attempt to createthe directory ifthe directorydoesnotexist.
# Specify the directory of agent logs on the node. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/log/efm-<version>'. (To store
# Failover Manager startup logs in a custom location, modify the
# path in the service script to point to an existing, writable
# directory.)
# If using a custom log directory, you must configure
# logrotate separately. Use 'man logrotate' for more information.
log.dir=
Afterenabling the UDPorTCP protocolon a FailoverManagerhost,youcanenable
logging to syslog. Use thesyslog.protocol parameterto specify theprotocoltype
(UDP or TCP) and the syslog.port parameterto specify the listenerport ofthe syslog
host. The syslog.facility valuemay be usedasan identifierforthe processthat
created the entry; thevaluemust be between LOCAL0 and LOCAL7.

44
# Syslog information. The syslog service must be listening on
# the port for the given protocol, which can be UDP or TCP.
# The facilities supported are LOCAL0 through LOCAL7.
syslog.host=localhost
syslog.port=514
syslog.protocol=UDP
syslog.facility=LOCAL1
Use the file.log.enabled and syslog.enabled propertiesto specify the typeof
logging thatyouwish to implement. Set file.log.enabled to true to enable logging
to a file; enable the UDPprotocolorTCPprotocoland set syslog.enabled to true to
enable loggingto syslog. You can enable loggingto botha file and syslog.
# Which logging is enabled.
file.log.enabled=true
syslog.enabled=false
Formore information aboutconfiguring syslog logging,seeSection 6.1.
Use the jgroups.loglevel andefm.loglevel parametersto specify the levelof
detaillogged by FailoverManager. The default value is INFO. Formore information
about logging,seeSection6,ControllingLogging.
# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# information about the TCP connection attempts that may help
# diagnose the connection failures.
jgroups.loglevel=INFO
efm.loglevel=INFO
Use the jvm.options propertyto passJVM-relatedconfigurationinformation. The
default settingspecifies theamount ofmemory that the FailoverManageragent will be
allowed to use.
# Extra information that will be passed to the JVM when starting
# the agent.
jvm.options=-Xmx128m

45
3.5.1.2 Encrypting Your Database Password
FailoverManagerrequires youto encrypt yourdatabase passwordbeforeincluding it in
the clusterpropertiesfile. Use the efm utility (located in the /usr/edb/efm-3.4 /bin
directory)to encrypt thepassword. When encryptinga password,youcan eitherpassthe
passwordon thecommand line when you invoke the utility,orusethe EFMPASS
environmentvariable.
To encrypt a password,usethe command:
# efm encrypt cluster_name [ --from-env ]
Where cluster_name specifies thename ofthe FailoverManagercluster.
If you include the --from-env option,youmust export thevalueyouwish to encrypt
before invoking the encryptionutility. Forexample:
export EFMPASS=password
If you do not include the--from-env option,FailoverManagerwill prompt you to
enterthe database password twice before generatingan encryptedpassword foryouto
place in yourclusterpropertyfile. When the utility sharestheencrypted password,copy
and paste theencrypted password into the clusterproperty files.
Please note: Many Java vendors shiptheirversionofJava with full-strengthencryption
included,butnotenableddue toexport restrictions. Ifyou encounteran error thatrefers
to an illegal key size when attempting toencrypt thedatabasepassword,you should
downloadand enable a Java Cryptography Extension (JCE)that providesan unlimited
policy foryourplatform.
The following example demonstratesusingtheencrypt utility to encrypt a password for
the acctg cluster:
# efm encrypt acctg
This utility will generate an encrypted password for you to place
in your EFM cluster property file:
/etc/edb/efm-3.4/acctg.properties
Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c
Please paste this into your acctg.properties file
db.password.encrypted=516b36fb8031da17cfbc010f7d09359c

46
Please note:the utility will notify you ifa propertiesfile does notexist.
Afterreceiving yourencryptedpassword,paste the passwordinto thepropertiesfile and
start the FailoverManagerservice. Ifthere is a problemwith the encrypted password,the
FailoverManagerservice willnot start:
[witness@localhost ~]# service efm-3.4 start
Starting local efm-3.4 service: [FAILED]
If you receive this message whenstartingtheFailoverManagerservice,pleasesee the
startuplog (located in /var/log/efm-3.4/startup-efm.log)formore information.
If you are using RHEL 7.x or CentOS7.x, startup information is alsoavailable with the
following command:
systemctl status efm-3.4
To prevent a clusterfrominadvertently connectingto thedatabaseofanothercluster,the
clustername is incorporatedinto theencrypted password. Ifyou modify the cluster
name,you will need to re-encrypt the databasepassword and update theclusterproperties
file.
Using the EFMPASSEnvironment Variable
The following example demonstratesusingthe--from-env environmentvariable when
encrypting a password. Before invokingthe efm encrypt command,setthe value of
EFMPASS to the password (1safepassword):
# export EFMPASS=1safepassword
Then,invoke efm encrypt,specifyingthe --from-env option:
# efm encrypt acctg --from-env
# 7ceecd8965fa7a5c330eaa9e43696f83
The encryptedpassword(7ceecd8965fa7a5c330eaa9e43696f83)is returned asa
text value; when usinga script,youcan checktheexit code ofthe command to confirm
that the command succeeded. A successfulexecution returns 0.

47
3.5.2 The Cluster Members File
Each node in a FailoverManagerclusterhasa clustermembers file. When an agent
starts,it usesthe file to locate otherclustermembers. The FailoverManagerinstaller
creates a file template forthe clustermembers file named efm.nodes.in in the
/etc/edb/efm-3.4 directory. Aftercompletingthe FailoverManagerinstallation,you
must make a working copy ofthe template:
# cp /etc/edb/efm-3.4/efm.nodes.in /etc/edb/efm-3.4/efm.nodes
Aftercopying thetemplatefile,change the ownerofthe file to efm:
chown efm:efm efm.nodes
By default,FailoverManagerexpectsthe clustermembers file to be named efm.nodes.
If you name the clustermembers file somethingotherthan efm.nodes,you mustmodify
the FailoverManagerservice script to instructFailoverManagerto usethe new name.
The clustermembers file on the first nodestarted canbe empty; thisnodewill become
the Membership Coordinator. On each subsequentnode,theclustermemberfile must
contain theaddressandportnumberofthe Membership Coordinator. Each entryin the
clustermembers file must be listed in an address:port format,with multiple entries
separatedby white space.
The Membership Coordinator will updatethe contentsofthe efm.nodes file to match the
current members ofthe cluster. As agents join orleave the cluster,the efm.nodes files
on otheragents are updatedto reflect the currentclustermembership. Ifyou invoke the
efm stop-cluster command,FailoverManagerdoesnot modify the file.
If the Membership Coordinatorleavesthecluster,anothernode willassume the role.
You can use theefm cluster-status command to find theaddressofthe Membership
Coordinator. Ifa node joins orleaves a clusterwhile an agent is down,youmust
manually ensurethatthe file includesat leastthe currentMembershipCoordinator.
If you knowthe IPaddressesandportsofthe nodesthatwill be joining the cluster,you
can include the addressesin the clustermembers file at any time. At startup,any
addressesthat donot identify clustermembers willbe ignored unlessthe
auto.allow.hosts property(in the clusterproperties file)is set to true. Formore
information,see Section4.1.2.
If the stable.nodes.file property is setto true,the MembershipCoordinatorwillnot
updatethe .nodes file when clustermembers join orleave the cluster; this behavioris
most usefulwhen the IPaddressesofclustermembers donotchange often. For
information about modifyingclusterproperties,see Section 3.5.1.1.

48
3.6 Using Failover Manager with Virtual IP Addresses
FailoverManagerusestheefm_address script to assignorreleasea virtualIPaddress.
Please note that virtualIPaddressesare not supportedby many cloud providers.In those
environments,anothermechanismshould be used(suchas an Elastic IPAddresson
AWS),which can be changedwhenneeded bya fencingorpost-promotionscript.
By default,the script residesin:
/usr/edb/efm-3.4/bin/efm_address
Use the following command variationsto assign orrelease an IPv4orIPv6IP address.
To assigna virtualIPv4IP address:
# efm_address add4 interface_name IPv4_addr/prefix
To assigna virtualIPv6IP address:
# efm_address add6 interface_name IPv6_addr/prefix
To release a virtualaddress:
# efm_address del interface_name IP_address/prefix
Where:
interface_name matchesthe name specified in the virtualIp.interface
propertyin the clusterpropertiesfile.
IPv4_addr or IPv6_addr matches the name specified in the virtualIp
propertyin the clusterpropertiesfile.
prefix matches the value specified in the virtualIp.prefix propertyin the
clusterpropertiesfile.
Formore information aboutpropertiesthatdescribe a virtualIPaddress,see Section
3.5.1.1.
You must invoke the efm_address script astheroot user. The efm useris created
during the installation,andis grantedprivilegesin the sudoers file to run the
efm_address script. Formore information aboutthe sudoers file,see Section 3.4,
Extending FailoverManagerPermissions.

49
Testing the VIP
When usinga virtualIP(VIP) addresswith FailoverManager,it is important to testthe
VIP functionality manually before startingfailovermanager. This will catch any
network-relatedissuesbefore they cause a problemduring an actualfailover. The
following stepstestthe actionsthat failovermanagerwill take. The example usesthe
following propertyvalues:
virtualIp=172.24.38.239
virtualIp.interface=eth0
virtualIp.prefix=24
pingServerCommand=/bin/ping -q -c3 -w5
Please note:the virtualIp.prefix specifies the numberofsignificant bitsin the
virtualIp address.
When instructedto ping the VIPfrom a node,use the commanddefinedbythe
pingServerCommand property.
1. Ping the VIP from all nodesto confirmthat the addressis notalreadyin use:
# /bin/ping -q -c3 -w5 172.24.38.239
PING 172.24.38.239 (172.24.38.239) 56(84) bytes of data.
--- 172.24.38.239 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet
loss, time 3000ms
You should see 100% packet loss.
2. Run the efm_address add4 command on the Masternode to assignthe VIPand then
confirmwith ip address:
# efm_address add4 eth0 172.24.38.239/24
# ip address
<output truncated>
eth0 Link encap:Ethernet HWaddr 36:AA:A4:F4:1C:40
inet addr:172.24.38.239 Bcast:172.24.38.255
...
3. Ping the VIP from the othernodesto verify that theycan reachtheVIP:
# /bin/ping -q -c3 -w5 172.24.38.239
--- 172.24.38.239 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time
1999ms
rtt min/avg/max/mdev = 0.023/0.025/0.029/0.006 ms

50
You should see no packetloss.
4. Use the efm_address del command to release the addressonthe masternodeand
confirmthe nodehasbeenreleasedwith ip address:
# efm_address del eth0 172.24.38.239/24
# ip address
eth0 Link encap:Ethernet HWaddr 22:00:0A:89:02:8E
inet addr:10.137.2.142 Bcast:10.137.2.191
...
The output fromthis stepshould not showan eth0interface
5. Repeat step 3,this time verifying thatthe Standby andWitnessdo not see theVIP in
use:
# /bin/ping -q -c3 -w5 172.24.38.239
--- 172.24.38.239 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet
loss, time 3000ms
You should see 100% packet loss. Repeat this stepon allnodes.
6. Repeat step 2on allStandbynodesto assign the VIPto every node. You can ping the
VIP from any nodeto verify that it is in use.
# efm_address add4 eth0 172.24.38.239/24
# ip address
<output truncated>
eth0 Link encap:Ethernet HWaddr 36:AA:A4:F4:1C:40
inet addr:172.24.38.239 Bcast:172.24.38.255
...
Afterthe teststeps above,releasethe VIPfromany non-Masternode before attempting
to start FailoverManager.
Please note:the networkinterfaceusedforthe VIP does not have tobe the same interface
used forthe FailoverManageragent's bind.address value. Themasteragent will drop
the VIP as needed duringa failover,and FailoverManagerwill verify that the VIP is no
longeravailable before promotinga standby. A failure ofthe bind addressnetworkwill
lead to masterisolationand failover.
If the VIP uses a different interface, youmay encountera timing conditionwhere the rest
of the clusterchecksfora reachable VIPbefore the masteragent hasdropped it. In this
case,EFM will retry the VIP checkforthe numberofsecondsspecified in the
node.timeout propertyto help ensurethata failoverhappensas expected.

51

52
4 Using Failover Manager
FailoverManageroffers support formonitoringand failoverofclusterswith one ormore
Standbyservers. You can add orremove nodesfromthe clusteras yourdemandfor
resourcesgrowsorshrinks.
If a Masternodereboots,FailoverManagermay detectthe database is downon the
Masternodeandpromote a Standbynodeto therole ofMaster.Ifthis happens,the
FailoverManageragenton the(rebooted)Masternode willnot get a chanceto write the
recovery.conf file; the rebootedMasternodewill return to the clusteras a second
Masternode. To prevent this,start theFailoverManageragent before startingthe
database server. The agent willstart in idle mode,and checkto see ifthere is already a
masterin the cluster.Ifthere is a masternode,theagentwill verify that a
recovery.conf file exists,and the database willnot start as a secondmaster.
4.1 Managing a Failover Manager Cluster
Once configured,a FailoverManagerclusterrequires noregular maintenance. The
following sectionsprovideinformationaboutperforming the management tasksthat may
occasionally be requiredby a FailoverManagerCluster.
By default,some ofthe commandslistedbelowmust beinvoked by efm orby an OS
superuser;an administratorcan selectively permit users toinvoke thesecommandsby
adding the userto the efm group. Thecommandsare:
 efm allow-node
 efm disallow-node
 efm promote
 efm resume
 efm set-priority
 efm stop-cluster
 efm upgrade-conf

53
4.1.1 Starting the Failover Manager Cluster
You can start the nodesofa FailoverManagerclusterin any order.
To start the FailoverManagercluster on RHEL 6.x orCentOS6.x, assume superuser
privileges,and invoke thecommand:
To start the FailoverManagerclusteron RHEL 7.x orCentOS7.x, assume superuser
If the clusterpropertiesfile for the node specifiesthat is.witness is true,the node
will start as a Witnessnode.
If the node is not a dedicatedWitnessnode,FailoverManagerwill connectto thelocal
database andinvoke the pg_is_in_recovery() function. Ifthe serverresponds
false, the agentassumesthenode is a Masternode,andassigns a virtualIPaddressto
the node (ifapplicable). Ifthe serverresponds true,the FailoverManageragent
assumesthat the nodeis a Standby server. Ifthe serverdoesnot respond,theagentwill
start in an idle state.
Afterjoining the cluster,theFailoverManageragent checksthesupplied database
credentialsto ensure that it can connectto allofthe databaseswithin the cluster. Ifthe
agent cannot connect,the agentwill shut down.
If a newmasterorstandbynodejoins a cluster,allofthe existing nodeswill also confirm
that theycan connect to the databaseon thenewnode.
4.1.2 Adding Nodes to a Cluster
You can add a nodeto a FailoverManagerclusterat any time. Whenyouadda nodeto a
cluster,youmust modify theclusterto allowthe newnode,and thentellthe newnode
howto find the cluster. The following stepsdetailaddinga nodeto a cluster:
1. Unless auto.allow.hosts is setto true,use the efm allow-node command,
to add the IPaddressofthe newnodeto the FailoverManager allowed node host
list. When invokingthe command,specify the clustername andthe IPaddressof
the newnode:
efm allow-node cluster_name ip_address

54
Formore information aboutusingthe efm allow-node command orcontrolling
a FailoverManagerservice,see Section5.
Installa FailoverManageragentand configure the clusterpropertiesfile on the
newnode. Formore information about modifyingthepropertiesfile,see Section
3.5.1.
2. Configure the clustermembers file on the newnode,adding an entry forthe
Membership Coordinator. Formore information aboutmodifyingthe cluster
members file, see Section3.5.2.
3. Assume superuserprivilegesonthe newnode,andstart theFailoverManager
agent. To start theFailoverManagerclusteron RHEL6.x or CentOS6.x, assume
superuserprivileges,and invoke thecommand:
To start the FailoverManagerclusteron RHEL 7.x orCentOS7.x, assume
superuserprivileges,and invoke thecommand:
When thenewnode joinsthecluster,FailoverManagerwill send a notificationto the
administratoremail providedin the user.email property,and/orwill invoke the
specified notificationscript.
Please Note:To be a usefulStandbyforthe current node,the node must bea standbyin
the PostgreSQLStreaming Replication scenario.
4.1.3 Changing the Priority of a Standby
If yourFailoverManagerclusterincludesmore thanoneStandby server,youcan usethe
efm set-priority command to influence thepromotionpriority ofa Standbynode.
Invoke the command onanyexisting memberofthe FailoverManagercluster,and
specify a priority value afterthe IPaddressofthe member.
Forexample, the following command instructsFailoverManagerthat the acctg cluster
member that is monitoring 10.0.1.9:7800 is the primary Standby(1):
efm set-priority acctg 10.0.1.9:7800 1
In the event ofa failover,FailoverManagerwill first retrieve information fromPostgres
streaming replicationto confirmwhich Standbynode hasthe mostrecent data,and
promote the node with theleastchance ofdata loss. Iftwo Standby nodescontain
equally up-to-datedata,the nodewith a higheruser-specified priority valuewill be

55
promoted to Master. To checkthe priorityvalueofyourStandbynodes,use the
command:
efm cluster-status cluster_name
Please note:The promotion prioritymay change ifa node becomesisolatedfromthe
cluster,and laterre-joinsthe cluster.
4.1.4 Promoting a Failover Manager Node
You can invoke efm promote on any nodeofa FailoverManagerclusterto start a
manualpromotion ofa Standbydatabaseto Masterdatabase.
Manualpromotionshould only be performed duringa maintenance windowforyour
database cluster. Ifyou do not have anup-to-date Standbydatabase available,youwill
be prompted before continuing. To starta manualpromotion,assume the identity ofefm
or the OSsuperuser,andinvoke the command:
efm promote cluster_name [-switchover]
[-sourcenode <address>] [-quiet]
Where:
cluster_name is the name ofthe FailoverManagercluster.
Include the –switchover option to reconfigure the originalMasteras a Standby.
If you include the –switchover keyword,the clustermustincludea masternode
and at least onestandby,and the nodesmust be in sync.
Include the –sourcenode keyword tospecify the nodefromwhich the
recovery.conf file will be copied to the master.
Include the -quiet switchto suppressnotifications duringswitchover.
During switchover:
 A recovery.conf file is copied froman existing standbyto the masternode.
 The masterdatabase is stopped.
 If you are using a VIP, the addressis releasedfromthe masternode.
 A standbyis promoted toreplace the masternode,andacquirestheVIP.
 The addressofthe newmasternode is addedto therecovery.conf file.
 The old masteris restarted;theagent willresume monitoring it as a standby.

56
During a manualpromotion,the Masteragent releasesthe virtualIPaddressbefore
creating a recovery.conf file in the directory specified by the db.recovery.conf.dir
property. The Masteragentremains running,andassumesa statusof Idle.
The Standby agent confirms thatthe virtualIPaddressis no longerin use before pinging
a well-known addressto ensure thatthe agent is not isolated fromthe network. The
Standbyagentrunsthefencing script andpromotesthe Standbydatabase to Master. The
StandbyagentthenassignsthevirtualIPaddressto theStandbynode,andrunsthe post-
promotion script (ifapplicable).
Please note that this command instructs the serviceto ignore the value specified in the
auto.failover parameterin the clusterpropertiesfile.
To return a node to therole ofmaster,place the node first in the promotion list:
efm set-priority cluster_name ip_address priority
Then,performa manualpromotion:
efm promote cluster_name -switchover
Formore information aboutusingthe efmutility,see Section5.3.
4.1.5 Stopping a Failover Manager Agent
When you stop an agent,FailoverManagerwill remove the node'saddressfromthe
clustermembers list on allofthe running nodesofthe cluster,but willnot remove the
address fromthe FailoverManager Allowed node host list.
To stop theFailoverManageragent onRHEL 6.x orCentOS6.x, assume superuser
service efm-3.4 stop
To stop theFailoverManageragent onRHEL 7.x orCentOS7.x, assume superuser
systemctl stop efm-3.4
Until you invoke theefm disallow-node command (removing the node's addressof
the node fromthe Allowed node host list), you can use theservice efm-3.4
start command to restart thenode at a laterdate withoutfirst runningthe efm allow-
node command again.

57
Please note that stopping an agent doesnot signalthe clusterthatthe agent hasfailed.
4.1.6 Stopping a Failover Manager Cluster
To stop a FailoverManagercluster,connectto anynode ofa FailoverManagercluster,
assume theidentity ofefm orthe OSsuperuser,andinvoke the command:
efm stop-cluster cluster_name
The command will cause all FailoverManageragentsto exit. Terminating the Failover
Manageragents completely disables allfailoverfunctionality.
Please Note:when youinvoke theefm stop-cluster command,allauthorized node
information is lost fromthe Allowed node host list.
4.1.7 Removing a Node from a Cluster
The efm disallow-node command removesthe IPaddressofa node fromthe
FailoverManagerAllowed node host list. Assume theidentity ofefmorthe OS
superuseron any existingnode (that is currently part ofthe runningcluster),and invoke
the efm disallow-node command,specifyingtheclustername andthe IPaddressof
the node:
efm disallow-node cluster_name ip_address
The efm disallow-node command will not stopa runningagent; the service will
continue to run onthe node untilyou stoptheagent (forinformation aboutcontrolling the
agent,seeSection5). If the agent orclusteris subsequently stopped,the nodewill not be
allowed to rejoin the cluster, and willbe removed fromthe failoverpriority list (and will
be ineligible for promotion).
Afterinvoking theefm disallow-node command,you mustuse theefm allow-
node command to add thenode to the clusteragain. Formore information aboutusing
the efmutility,see Section 5.3.

58
4.2 Monitoring a Failover Manager Cluster
You can use eitherthe FailoverManager efm cluster-status command orthe PEM
Client graphicalinterface to checkthe current status ofa monitorednode ofa Failover
Managercluster.
4.2.1 Reviewing the Cluster Status Report
The cluster-status command returnsa report that contains information about the
status oftheFailoverManagercluster. To invoke the command,enter:
# efm cluster-status cluster_name
The following statusreport is fora cluster namededb that has fournodesrunning:
efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108
Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106
Membership coordinator: 172.19.12.170
Standby priority host list:
172.19.13.113 172.19.14.106
Promote Status:
DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140
Standby database(s) in sync with master. It is safe to
promote.
[root@FOUR efm-3.4]}:
The Cluster Status section providesan overviewofthe statusofthe agentsthat reside
on each node ofthe cluster:

Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql

Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql

Similar to Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql (20)

Recently uploaded

Recently uploaded (20)

Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql