Pacemaker 操作方法メモ

ドキュメント
Clusters from Scratch Step-by-Step Instructions for Building Your First High-Availability Cluster
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html
•
http://clusterlabs.org/doc/Cluster_from_Scratch.pdf
Configuration Explained An A-Z guide to Pacemaker's Configuration Options
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html
•
Pacemaker HA 環境のリソース制御 (Start / Stop / Montor) 部分。リソース監視/制御を実施する
単体ではクラスタソフトとしては動作せず、クラスタ制御 (ノード監視等の死活監視) 用のコンポーネント (Corosync / Heartbeat) と連携を行う必要がある
Corosync HA 環境のクラスタ制御部分
Corosync は Heartbeat の後継ソフト
Corosync でノード停止を検知した後に、Pacemaker がリソースの操作 (Promote / Demoto / Stop 等) を実施する
リソース HA で制御をする必要がある対象 (Pacemaker の操作対象となるもの)
リソースエージェント (RA) リソースと Pacemaker を連携させるためのエージェント
Pacemaker はリソースエージェントに対して命令を実施することで、リソースの操作を行う
OCF (Open Cluster Framework) の仕様に従うことで、独自にシェルスクリプトで RA を作成することも可能
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/ap-ocf.html
watchdog メインプロセスに障害があった場合に、OS を再起動させることができる監視機構 (have-watchdog の設定)
STONITH Shoot The Other Node In The Head
スプリットブレインを防ぐための機構で、ノード間の通信に異常が発生した場合に、強制的に対向のノードを再起動 (フェンシング) することで、
両ノードがマスターになる (リソースを同時にアクセスする) ことを防ぐ
/usr/lib/stonith/plugins/external にプラグインのスクリプトが格納されている
フェンシングスプリットブレインを防ぐために、強制的に対向のノードを再起動する動作
一般的には、ハードウェアの IPMI と連動して、停止 / 再起動の制御を実施している
両ノードが対向ノードを同時に落とす (落としあい/相打ち) ことを防ぐために、プラグインを使用することができる
stonith-wrapper
https://ja.osdn.net/projects/linux-ha/wiki/stonith-wrapper
STONITH (ストニス)
https://docs.microsoft.com/ja-jp/azure/virtual-machines/linux/classic/mysql-cluster1.
負荷分散セットを使用して Linux の MySQL をクラスター化する•
PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション) @Open Source Conference 2014
https://www.slideshare.net/tatsuyaw/pacemaker-osc2014tokyo-31882542
•
第4章フェンシング
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/high_availability_add-on_overview/ch-fencing
•
HAクラスタをフェイルオーバ失敗から救おう
https://ja.osdn.net/projects/linux-ha/docs/Pacemaker_OSC2013Kyoto_20130803/ja/1/Pacemaker_OSC2013Kyoto_20130803.pdf
•
試して覚えるPacemaker入門排他制御編
http://linux-ha.osdn.jp/wp/wp-content/uploads/076783ca53a363270d253bbb98b59e83.pdf
•
VIPcheckリソースエージェント
https://ja.osdn.net/projects/linux-ha/wiki/VIPcheck
•
Pacemaker + Corosyncでのクラスタ環境の構築 [CentOS]
http://dan-project.blog.so-net.ne.jp/2016-05-09
•
グローバルクラスタオプション
https://www.suse.com/ja-jp/documentation/sle_ha/book_sleha/data/sec_ha_config_basics_global.html
•
2台でHA構築 (CentOS6.9) 書きかけ
https://qiita.com/tukiyo3/items/162e131007365fc4fe80
•
Fencing and Stonith
http://clusterlabs.org/doc/crm_fencing.html
•
Corosync の quorum の設定
https://qiita.com/ngyuki/items/f8111de17b470b5509c7
•
STONITHプラグイン「external/ssh」でシステムを自動的に再起動してみる。
http://labunix.hateblo.jp/entry/20130722/1374496401
•
クラウド環境での STONITH
STONITH を使用した SUSE での高可用性のセットアップ
https://docs.microsoft.com/ja-jp/azure/virtual-machines/workloads/sap/ha-setup-with-stonith
•
Azure Virtual Machines (VM) 上の SAP HANA の高可用性 | Microsoft Docs
https://docs.microsoft.com/ja-jp/azure/virtual-machines/workloads/sap/sap-hana-high-availability
•
SAP の場合、SAP on Azure 向けのフェンスエージェント (stonith:fence_azure_arm) が使用されている。
負荷分散セットを使用して Linux の MySQL をクラスター化する
https://docs.microsoft.com/ja-jp/azure/virtual-machines/linux/classic/mysql-cluster
•
外部スクリプトとして作成したフェンスエージェントを使用して、Azure cli からシャットダウンを実施している
https://github.com/bureado/aztonith
EC2でSTONITH
https://bcblog.sios.jp/ec2-stonith/
•
インターコネクト
Pacemaker-1.0 インストール方法 CentOS 5編
http://linux-ha.osdn.jp/wp/archives/4219
•
コンポーネント参考設定サービスログ
pacemaker
(cluster resource manager : クラスタのリソース制御)
https://github.com/ClusterLabs/pacemaker
可用性グループのリソースについては Master / Slave タイプのリ
ソースとして作成する
Corosync の quorum の設定
https://qiita.com/ngyuki/items/f8111de17b470b5509c7
/etc/default/pacemaker
/lib/systemd/system/pacemaker.service
/var/lib/pacemaker/cib/cib.xml
-> pcs cluster cib で確認可能
CIB : Cluster Information Base
/etc/logrotate.d/pacemaker
/var/log/pacemaker.log
Pacemaker
2017年11月1日 9:04
SQL Server on Linux - 1 ページ

ソースとして作成する CIB : Cluster Information Base
corosync
cluster engine daemon and utilities
ノードの死活監視を実施
https://github.com/ClusterLabs/corosync
/usr/sbin/corosync
/usr/sbin/corosync-cfgtool
/usr/sbin/corosync-cmapctl
/usr/sbin/corosync-cpgtool
/usr/sbin/corosync-keygen
/usr/sbin/corosync-quorumtool
/lib/systemd/system/corosync.service
/etc/corosync/corosync.conf
/etc/logrotate.d/corosync
/var/log/corosync/corosync.log
pcs
Pacemaker Configuration System
Pacemaker の制御コマンド (以前の crm コマンド)
https://github.com/ClusterLabs/pcs
/usr/sbin/pcs
(実体は Python のスクリプト)
/etc/default/pcsd
/etc/init.d/pcsd
/etc/logrotate.d/pcsd
/etc/pam.d/pcsd
/lib/systemd/system/pcsd.service
/etc/logrotate.d/pcsd
/var/log/pcsd
fence-agents
Fence Agents for Red Hat Cluster
クラスタ障害発生時のフェンス (遮断) の動作の制御
https://github.com/ClusterLabs/fence-agents
1.3.3. Fencing
https://access.redhat.com/documentation/ja-
jp/red_hat_enterprise_linux/5/html/cluster_suite_overview/s2-
fencing-overview-cso
resource-agents
Cluster Resource Agents
リソースの制御を実施 (Resource Agent : RA)
https://github.com/ClusterLabs/resource-agents
OCF(OCF : Open Cluster Framework) 格納先 /usr/lib/ocf/lib OCFリソースエージェント開発者ガイド
http://linux-ha.osdn.jp/wp/archives/4328
•
SQL Server 向け OCF ocf:mssql:ag - Availability Group resource agent.
ocf:mssql:fci - Failover Cluster Instance resource agent.
pcs resource list で取得
pcs resource describe ocf:mssql:ag
クラスター管理コマンド管理 pcs High Availability Add-On リファレンス•
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/
第1章 Pacemaker を使用した Red Hat High Availability クラスターの作成•
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/ch-startup-haaa>
第3章 pcs コマンドラインインターフェース
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/ch-pcscommand-haar
•
付録B pcs コマンドの使用例
https://access.redhat.com/documentation/ja-JP/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ap-
configfile-HAAR.html
•
crm Pacemaker 1.1 以前で使用されていたコマンド。次のコマンドでインストーすることが可能。
Pacemakerの制御コマンドを「crm」に戻して運用する (1/3)
http://www.atmarkit.co.jp/ait/articles/1611/10/news005.html
•
apt install -y crmsh
モニタリング crm_mon
cibadmin --query
pcs コマンドクラスターの操作クラスターの破棄 sudo pcs cluster destroy --all
設定の確認状態の確認 pcs status
pcs status --full
pcs status resources
pcs status cluster
設定の確認 pcs config
ノードの操作ノードの停止全ノードの停止 : pcs cluster stop --all
特定ノードの停止 : pcs cluster stopp <ノード名>
ノードの開始全ノードの開始 : pcs cluster start --all
特定ノードの開始 : pcs cluster start <ノード名>
ノードをスタンバイ状態に移行全ノードをスタンバイ状態に移行 : pcs cluster standby --all
特定ノードをスタンバイ状態に移行 : pcs cluster standby <ノード名>
ノードをスタンバイ状態から解除全ノードをスタンバイ状態から解除 : pcs cluster unstandby --all
特定ノードをスタンバイ状態から解除 : pcs cluster unstandby <ノード名>
クラスターの操作クラスターの強制停止 pcs cluster kill
設定のダンプ pcs cluster cib <ファイル名 (～.cib)>
設定のロード pcs cluster cib-push <ファイル名>
ダンプしたファイルの設定確認 / 操作 pcs -f <ファイル名> <コマンド>
pcs -f cluster.cib config
リソースのフェールオーバー履歴のクリアすべてのリソース pcs resource cleanup
特定のリソース
(ノード単位のクリアが現状はない)
pcs resource cleanup ag_cluster-master
リソースの確認 pcs resource list
pcs resource show <リソース名>
STNOITH の設定無効化 sudo pcs property set stonith-enabled=false
有効化 sudo pcs property set stonith-enabled=true
true の場合は STONITHリソースが定義されていなと、Master でないノードの Promotion Score が
-INFINITY となり、リソースの開始が拒否される
STONITH (フェンス) エージェントの確認
第4章フェンス機能: STONITH の設定
https://access.redhat.com/documentation/ja-
jp/red_hat_enterprise_linux/6/html/configuring_the_red_hat_high_availability_add-on_with_pacemaker/ch-fencing-
haar
•
sudo stonith list
ログの確認 journalctl -xe -u pacemaker
tail -f 100 /var/log/syslog
tail -f /var/log/corosync/corosync.log

tail -f /var/log/corosync/corosync.log
クォーラム投票数の確認 pcs status corosync
corosync の設定ファイル /etc/corosync/corosync.conf
2 ノードクラスターの場合、quorum のセクションに次のような設定を行う (通常は自動で設定されているはずである）
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: 192.168.1.1
}
node {
ring0_addr: 192.168.1.2
}
}
ヘルプ man corosync.conf
man votequorum
クォーラム総評数の過半数を満たさない場合でも起動させる
デフォルトは stop となっている
pcs property set no-quorum-policy=ignore
SQL Server 向け設定
クエリによるフェールオーバー ALTER AVAILABILITY GROUP [SoLAG1] SET (ROLE = SECONDARY)
GO
EXEC sp_set_session_context @key = N'external_cluster', @value = N'yes', @read_only = 1
GO
ALTER AVAILABILITY GROUP [SoLAG1] FAILOVER
GO
リソースの操作フェールオーバー pcs resource move ag_cluster-master SoL01 --master
pcs resource move ag_cluster-master SoL02 --master
同期コミット数の調整同期コミット数の変更 sudo pcs resource update ag_cluster required_synchronized_secondaries_to_commit=0
sudo pcs resource update ag_cluster required_synchronized_secondaries_to_commit=1
設定の確認 sudo pcs resource show ag_cluster
運用
設定の確認 Pacemaker の設定 pcs property
リソースの設定 pcs resource show ag_cluster --full
制約の設定 pcs constraint
ロケーションの設定 pcs constraint location show --full
Fence エージェントの一覧 pcs stonith list
投票数の確認 pcs status corosync
Pacemaker のコマンドによるフェールオー
バー
pcs resource move ag_cluster-master SoL02 --master
フェールオーバーカウントのリセット
第5回 Pacemakerを運用してみよう！［保守運用編(2)］
http://gihyo.jp/admin/serial/01/pacemaker/0005
•
pcs resource failcount reset ag_cluster
スコアの確認
Promotion Score
(合計スコア / location score + master
score)
Promote : Master に昇格
Demote : Slave に降格
制約の確認 pcs constraint
Promotion Score
このスコアが一番高いノードを Master に昇格させる
Promotion Score = location 設定値 + Master Score
Location 設定値 : pcs constraint location
Master Score : リソースエージェントがノードの状態に応じて設定
crm_simulate -sL | grep -i promotion
Master Score crm_mon -fAr
crm_mon -rfotcARj
PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション) @Open Source Conference 2014
https://www.slideshare.net/tatsuyaw/pacemaker-osc2014tokyo-31882542
•
動かして理解するPacemaker ～CRM設定編～その３
https://linux-ha.osdn.jp/wp/archives/3868
•
障害時にサブサーバへ自動で切り替える「高可用性WordPressシステム」の作り方後編 (1/3)
http://www.atmarkit.co.jp/ait/articles/1601/21/news007.html
•
アクティブ機のデータディスクが壊れたら遅滞なくフェイルオーバさせる方法
https://blog.3ware.co.jp/2015/04/アクティブ機のデータディスクが壊れたら遅滞な/
•
スコアの調整
stickiness = リソースがその場にとどまろう
とする強さ
プロパティの設定設定の変更 pcs property set default-resource-stickiness=200
最大値を設定 pcs property set default-resource-stickiness="INFINITY"
設定の解除 pcs property unset default-resource-stickiness

設定の確認 pcs property
試行回数の変更 (優先順位
に応じて試行)
pcs property set start-failure-is-fatal=false
起動の失敗をリソースに対して致命的と処理するかどうかを指定、false に設定するとリソースの failcount と migration-threshold の値を使用する
(リソース用の migration-threshold オプションの設定は「障害発生のためリソースを移動する」を参照)
https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/configuring_the_red_hat_high_availability_add-
on_with_pacemaker/ch-clusteropts-haar
リソースの初期値の設定最大値を設定 pcs resource defaults resource-stickiness=INFINITY migration-threshold=1
migration-threshold : ノードでのリトライ数の上限
設定の解除 pcs resource defaults resource-stickiness= migration-threshold=
設定の確認 pcs resource defaults
リソース個別リソースのメタ情報としてスコアを付与 pcs resource meta ag_cluster resource-stickiness="INFINITY" migration-threshold=1
pcs resource meta ag_cluster migration-threshold=1
pcs resource meta ag_cluster migration-threshold=
pcs resource show ag_cluster
pcs resource defaults
pcs resource meta ag_cluster resource-stickiness=200
pcs resource meta ag_cluster migration-threshold=
pcs resource show ag_cluster
ノード (ロケーション / コロケーション) の設定設定の確認 pcs constraint show --full
pcs constraint location show --full
設定の削除 pcs constraint location remove cli-prefer-ag_cluster-master
# 手動フェールオーバーした場合は、フェールオーバー先が INFINITY になる設定が残るため削除する必要がある
特定ノードでの実行を避ける pcs constraint location ag_cluster-master avoids SoL03
pcs constraint location remove location-ag_cluster-master-SoL03--INFINITY
優先度の調整 pcs constraint location ag_cluster-master prefers SoL01=100
pcs constraint location remove location-ag_cluster-master-SoL01-100
pcs constraint location ag_cluster-master prefers SoL02=100
pcs constraint location ag_cluster-master prefers SoL03=1

Pacemaker 操作方法メモ

More Related Content

What's hot

Similar to Pacemaker 操作方法メモ

More from Masayuki Ozawa

Pacemaker 操作方法メモ