© 2015 NTT Software Innovation Center
サンパト プリヤンカラ
I三P I方D
NTTソフトウェアイノベーションセンタ
2016/06/23
仮想マシンHA機能の現状と今後の方向性
仮想マシンHA機能の現状と今後の方向性
2Copyright©2015 NTT corp. All Rights Reserved.
仮想マシンHA機能とは,
仮想マシンの高可用性を実現
仮想マシン(VM)故障時に、オペレータを介さずに自動復旧する機
能を提供する.
仮想マシンHA機能の必要性
• PET VMの存在
[1]
• すべてのアプリケーションはクラウドネイティブではない
• 仮想マシンの高可用性を実現させるためのOSS
[1] http://www.slideshare.net/randybias/pets-vs-cattle-the-elastic-cloud-story
http://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/
3Copyright©2015 NTT corp. All Rights Reserved.
仮想マシンHAの要件
仮想マシンHAのuser storyについてコミュニティで議論しています.[2]
[2] http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html
運用:
• Capacity Reservation
仮想マシンを復旧させるために、必要
な空き容量を常に確保
• Host Maintenance
計画メンテナンスなどを行う際に
仮想マシンHA機能を無効化
• Event History
過去のイベント履歴など
故障検知:
• Computeノード故障
• プロセス故障
 qemu-kvmプロセス故障(VM crashes)
 nova-computeプロセスの異常
• VM故障
 I/O errorなど
• その他の故障
 Network component fails
 AZ, DC, Region failure
• 仮想マシンが故障から自動的に復旧する
• VM毎に仮想マシンHA機能を設定可能
4Copyright©2015 NTT corp. All Rights Reserved.
Openstack HA Team
Andrew Beekhof
@REDHAT
Ken Gaillot
@REDHAT
Michele Baldessari
@REDHAT
Adam Spiers
@SUSE
Dawid Deja
@Intel
主なメンバー
NTT Group
SIC: Sampath Priyankara, Masahito Muroi, Toshikazu Ichikawa
NTT Data: Takashi Kajinami
NTT Data Inc: Tushar Patil
Team情報
ML: openstack-dev@lists.openstack.org subject: [openstack-dev] [HA]
IRC: http://eavesdrop.openstack.org/#High_Availability_Meeting
ミーティング時間:
毎週の月曜日 17:00 JST
相談・雑談: #openstack-ha
5Copyright©2015 NTT corp. All Rights Reserved.
Austinサミットで議論
http://clusterlabs.org/pipermail/users/2016-May/002864.html
Photo credit goes to Ken Gaillot
今後の仮想マシンHAの進め方について議論した.
開催:Austinサミット期間中で,4/26 – 4/29 ad-hoc
Etherpad:
https://etherpad.openstack.org/p/newton-instance-ha
議論初日からの流れはetherpadの上から順に記述されています
サミット会場でのランチミーティング
6Copyright©2015 NTT corp. All Rights Reserved.
既存の仮想マシンHAソリューション
Source:
発表者:Adam Spiers, Dawid Deja
https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation
7Copyright©2015 NTT corp. All Rights Reserved.
Austinサミットで議論内容
いくつかのアプローチが提案されているが、それぞれに長所短所があ
り、まだコミュニティとして方向性が決まっていない
• 今回の議論結果
• 公式プロジェクト化はチームとして進める(リリースチームと相談済み)
• 本来解としてMistralのWorkflowでMasakari相当機能を実現する
→ コミュニティで引き続き議論
• 短期解としてMasakariとresource-agent連携を進める
• チームとしてのToDo事項
• Mistralの品質向上(高可用性や信頼性など) (Intel)
• Masakariとresource-agentとの連携(NTT, Suse, RedHat)
• Pacemakerへのアラート機能追加 (RedHat)
• resource-agentにおけるVM状態監視機能の追加(Suse)
8Copyright©2015 NTT corp. All Rights Reserved.
Openstack Instance HA - Short term implementation
Resource-agent(libvirt-
monitor)
Resource-agent(process-
monitor)
pacemaker
ComputeNode
Resource-agent(libvirt-
monitor)
Resource-agent(process-
monitor)
pacemaker
ComputeNode
Nova
Mistral
VM
1.Resource-
agent/pacemakersend
thealerttoHA-
controller
3. Mistral workflow call nova API
to evacuate/restart/rebuild..etc
4. Nova perform the rescue task
Mistral
work
flows
Congress
HAController
operator
Monasca
Role of HA Controller (leverage Masakari’s
code)
1. Avoid fault/duplicate alerts, prioritize
alerts
2. API to get past/ongoing Alert/Action
details
3. Stop recursive rescue of VM after
configurable retry counts.
4. Control the parallel processing
evacuate/migrate count
5. Fail-over-host management
6. Capacity reservation of fail-over-hosts
(i.e. block nova scheduler to select the node
for new vms)
7. Resume non-processed alerts when HA
Contoller fails over
Need HA !!!
9Copyright©2015 NTT corp. All Rights Reserved.
Openstack Instance HA - Long term implementation
Resource-agent(libvirt-
monitor)
Resource-agent(process-
monitor)
pacemaker
ComputeNode
Resource-agent(libvirt-
monitor)
Resource-agent(process-
monitor)
pacemaker
ComputeNode
Nova
Mistral
VM
Resource-agent/pacemaker send the alert to
monasca
Mistral workflow call nova
API to
evacuate/restart/rebuild..etc
Nova perform the rescue task
Mistral Work
flows
Monasca
Congress
7. Avoid alert losses in Monasca
failover
Nova scheduler
6. For capacity reservation, add Congress consultation to
Nova(#5) so that the request is passed or rejected based on
the request context such as VM creation by a user or
evacuation by system to the fail-over-hosts
1. Need to ignore false alerts, prioritize the alerts
and some cases need to be neglected such as VM
failure come right before host failure.
2. Mistral API to get the past flow execution details
3. If VM frequently fail even after autorecovery,
must stop rescue of theVM after configurable
retry count
4. Control the parallel processing
evacuate/migrate count
5. Need to implement custom filter in Nova to select the
appropriate fail-over-host using same shared storage
コミュニティで引き続き議論中
10Copyright©2015 NTT corp. All Rights Reserved.
コミュニティで引き続き議論
(1) Product Working Groupでの取り組み
High Availability for VMs
https://wiki.openstack.org/wiki/ProductTeam/User_Stories/HA_VMs
User story
http://specs.openstack.org/openstack/openstack-user-stories/user-
stories/proposed/ha_vm.html
Gerrit:
https://review.openstack.org/#/c/318431/
(2) Openstack HA Teamでの議論
ML: openstack-dev@lists.openstack.org subject: [openstack-dev] [HA]
IRC: http://eavesdrop.openstack.org/#High_Availability_Meeting
過去のログ: http://eavesdrop.openstack.org/meetings/ha/
ミーティング時間:
毎週の月曜日 17:00 JST
相談・雑談: #openstack-ha
11Copyright©2015 NTT corp. All Rights Reserved.
NTTの取り組み
Masakari
https://github.com/ntt-sic/masakari
• 仮想マシン故障した際に自動復旧
• 5分以内に復旧
故障検知:
• VM故障
• Libvirtイベント監視
• プロセス故障
• プロセスの死活監視
• Computeノード故障
• pacemaker利用
Masakariを試す
Vagrantを利用したPOCが用意しています.
https://github.com/ntt-sic/masakari-deploy
運用:
Capacity Reservation
• Reserved hostの管理
Event History
• Masakariの各種ログ
12Copyright©2015 NTT corp. All Rights Reserved.
Masakari Architecture Overview
ComputeNodesControllerNodes
&BackendNodes
13Copyright©2015 NTT corp. All Rights Reserved.
VM故障からの復旧
Libvirt
Masakari
1. Notify down VM’s Info
(VM-ID, Host Name, etc.)
Libvirt Monitor
Detect VM down
VM1 VM2 VM3
Libvirt
Libvirt Monitor
VM5 VM6
HostHost
Nova
2. Call Rebuild API for the down VM
3. Rebuild the VM
Down
14Copyright©2015 NTT corp. All Rights Reserved.
プロセス故障からの復旧
1. Restart manager
process when it’s down
Process Monitor
Masakari
2. Notify manager process down
if fail to restart few times
Libvirt Nova-compute
Host A
Libvirt Nova-compute
Host B
Nova
3. Notify Nova to disable schedule
for Host A
Process Monitor
Down
15Copyright©2015 NTT corp. All Rights Reserved.
Computeノード故障からの復旧
RA
CIB
RA
RA
Node’s
Status
pacemaker
Heartbeat communications
Masakari
Check its Host’s status
1. Notify another host down
Start
Stop
Monitor
WatchDog&
Shutdowner
Host Fail Monitor
Polling
RA
CIB
RA
RA
Node’s
Status
pacemaker
Start
Stop
Monitor
WatchDog&
Shutdowner
Host Fail Monitor
Polling
Down
Host A Host B
Nova
2. Call Evacuate API for all VM on Host B
16Copyright©2015 NTT corp. All Rights Reserved.
Masakariの今後
Masakariをopenstackプロジェクトとして提案中
https://review.openstack.org/#/c/330370/
https://launchpad.net/masakari
Todo:
• Masakari API
• Failover Segment
• Capacity Reservation
• Host Maintenance
• Event History
https://github.com/ntt-sic/masakari/wiki/Masakari-API-Design
• Openstack resource-agent連携
• 各masakari-monitorの変わりにopenstack-resource-agentを利用
17Copyright©2015 NTT corp. All Rights Reserved.
御清聴ありがとうございました

openstackの仮想マシンHA機能の現状と今後の方向性

  • 1.
    © 2015 NTTSoftware Innovation Center サンパト プリヤンカラ I三P I方D NTTソフトウェアイノベーションセンタ 2016/06/23 仮想マシンHA機能の現状と今後の方向性 仮想マシンHA機能の現状と今後の方向性
  • 2.
    2Copyright©2015 NTT corp.All Rights Reserved. 仮想マシンHA機能とは, 仮想マシンの高可用性を実現 仮想マシン(VM)故障時に、オペレータを介さずに自動復旧する機 能を提供する. 仮想マシンHA機能の必要性 • PET VMの存在 [1] • すべてのアプリケーションはクラウドネイティブではない • 仮想マシンの高可用性を実現させるためのOSS [1] http://www.slideshare.net/randybias/pets-vs-cattle-the-elastic-cloud-story http://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/
  • 3.
    3Copyright©2015 NTT corp.All Rights Reserved. 仮想マシンHAの要件 仮想マシンHAのuser storyについてコミュニティで議論しています.[2] [2] http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html 運用: • Capacity Reservation 仮想マシンを復旧させるために、必要 な空き容量を常に確保 • Host Maintenance 計画メンテナンスなどを行う際に 仮想マシンHA機能を無効化 • Event History 過去のイベント履歴など 故障検知: • Computeノード故障 • プロセス故障  qemu-kvmプロセス故障(VM crashes)  nova-computeプロセスの異常 • VM故障  I/O errorなど • その他の故障  Network component fails  AZ, DC, Region failure • 仮想マシンが故障から自動的に復旧する • VM毎に仮想マシンHA機能を設定可能
  • 4.
    4Copyright©2015 NTT corp.All Rights Reserved. Openstack HA Team Andrew Beekhof @REDHAT Ken Gaillot @REDHAT Michele Baldessari @REDHAT Adam Spiers @SUSE Dawid Deja @Intel 主なメンバー NTT Group SIC: Sampath Priyankara, Masahito Muroi, Toshikazu Ichikawa NTT Data: Takashi Kajinami NTT Data Inc: Tushar Patil Team情報 ML: openstack-dev@lists.openstack.org subject: [openstack-dev] [HA] IRC: http://eavesdrop.openstack.org/#High_Availability_Meeting ミーティング時間: 毎週の月曜日 17:00 JST 相談・雑談: #openstack-ha
  • 5.
    5Copyright©2015 NTT corp.All Rights Reserved. Austinサミットで議論 http://clusterlabs.org/pipermail/users/2016-May/002864.html Photo credit goes to Ken Gaillot 今後の仮想マシンHAの進め方について議論した. 開催:Austinサミット期間中で,4/26 – 4/29 ad-hoc Etherpad: https://etherpad.openstack.org/p/newton-instance-ha 議論初日からの流れはetherpadの上から順に記述されています サミット会場でのランチミーティング
  • 6.
    6Copyright©2015 NTT corp.All Rights Reserved. 既存の仮想マシンHAソリューション Source: 発表者:Adam Spiers, Dawid Deja https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation
  • 7.
    7Copyright©2015 NTT corp.All Rights Reserved. Austinサミットで議論内容 いくつかのアプローチが提案されているが、それぞれに長所短所があ り、まだコミュニティとして方向性が決まっていない • 今回の議論結果 • 公式プロジェクト化はチームとして進める(リリースチームと相談済み) • 本来解としてMistralのWorkflowでMasakari相当機能を実現する → コミュニティで引き続き議論 • 短期解としてMasakariとresource-agent連携を進める • チームとしてのToDo事項 • Mistralの品質向上(高可用性や信頼性など) (Intel) • Masakariとresource-agentとの連携(NTT, Suse, RedHat) • Pacemakerへのアラート機能追加 (RedHat) • resource-agentにおけるVM状態監視機能の追加(Suse)
  • 8.
    8Copyright©2015 NTT corp.All Rights Reserved. Openstack Instance HA - Short term implementation Resource-agent(libvirt- monitor) Resource-agent(process- monitor) pacemaker ComputeNode Resource-agent(libvirt- monitor) Resource-agent(process- monitor) pacemaker ComputeNode Nova Mistral VM 1.Resource- agent/pacemakersend thealerttoHA- controller 3. Mistral workflow call nova API to evacuate/restart/rebuild..etc 4. Nova perform the rescue task Mistral work flows Congress HAController operator Monasca Role of HA Controller (leverage Masakari’s code) 1. Avoid fault/duplicate alerts, prioritize alerts 2. API to get past/ongoing Alert/Action details 3. Stop recursive rescue of VM after configurable retry counts. 4. Control the parallel processing evacuate/migrate count 5. Fail-over-host management 6. Capacity reservation of fail-over-hosts (i.e. block nova scheduler to select the node for new vms) 7. Resume non-processed alerts when HA Contoller fails over Need HA !!!
  • 9.
    9Copyright©2015 NTT corp.All Rights Reserved. Openstack Instance HA - Long term implementation Resource-agent(libvirt- monitor) Resource-agent(process- monitor) pacemaker ComputeNode Resource-agent(libvirt- monitor) Resource-agent(process- monitor) pacemaker ComputeNode Nova Mistral VM Resource-agent/pacemaker send the alert to monasca Mistral workflow call nova API to evacuate/restart/rebuild..etc Nova perform the rescue task Mistral Work flows Monasca Congress 7. Avoid alert losses in Monasca failover Nova scheduler 6. For capacity reservation, add Congress consultation to Nova(#5) so that the request is passed or rejected based on the request context such as VM creation by a user or evacuation by system to the fail-over-hosts 1. Need to ignore false alerts, prioritize the alerts and some cases need to be neglected such as VM failure come right before host failure. 2. Mistral API to get the past flow execution details 3. If VM frequently fail even after autorecovery, must stop rescue of theVM after configurable retry count 4. Control the parallel processing evacuate/migrate count 5. Need to implement custom filter in Nova to select the appropriate fail-over-host using same shared storage コミュニティで引き続き議論中
  • 10.
    10Copyright©2015 NTT corp.All Rights Reserved. コミュニティで引き続き議論 (1) Product Working Groupでの取り組み High Availability for VMs https://wiki.openstack.org/wiki/ProductTeam/User_Stories/HA_VMs User story http://specs.openstack.org/openstack/openstack-user-stories/user- stories/proposed/ha_vm.html Gerrit: https://review.openstack.org/#/c/318431/ (2) Openstack HA Teamでの議論 ML: openstack-dev@lists.openstack.org subject: [openstack-dev] [HA] IRC: http://eavesdrop.openstack.org/#High_Availability_Meeting 過去のログ: http://eavesdrop.openstack.org/meetings/ha/ ミーティング時間: 毎週の月曜日 17:00 JST 相談・雑談: #openstack-ha
  • 11.
    11Copyright©2015 NTT corp.All Rights Reserved. NTTの取り組み Masakari https://github.com/ntt-sic/masakari • 仮想マシン故障した際に自動復旧 • 5分以内に復旧 故障検知: • VM故障 • Libvirtイベント監視 • プロセス故障 • プロセスの死活監視 • Computeノード故障 • pacemaker利用 Masakariを試す Vagrantを利用したPOCが用意しています. https://github.com/ntt-sic/masakari-deploy 運用: Capacity Reservation • Reserved hostの管理 Event History • Masakariの各種ログ
  • 12.
    12Copyright©2015 NTT corp.All Rights Reserved. Masakari Architecture Overview ComputeNodesControllerNodes &BackendNodes
  • 13.
    13Copyright©2015 NTT corp.All Rights Reserved. VM故障からの復旧 Libvirt Masakari 1. Notify down VM’s Info (VM-ID, Host Name, etc.) Libvirt Monitor Detect VM down VM1 VM2 VM3 Libvirt Libvirt Monitor VM5 VM6 HostHost Nova 2. Call Rebuild API for the down VM 3. Rebuild the VM Down
  • 14.
    14Copyright©2015 NTT corp.All Rights Reserved. プロセス故障からの復旧 1. Restart manager process when it’s down Process Monitor Masakari 2. Notify manager process down if fail to restart few times Libvirt Nova-compute Host A Libvirt Nova-compute Host B Nova 3. Notify Nova to disable schedule for Host A Process Monitor Down
  • 15.
    15Copyright©2015 NTT corp.All Rights Reserved. Computeノード故障からの復旧 RA CIB RA RA Node’s Status pacemaker Heartbeat communications Masakari Check its Host’s status 1. Notify another host down Start Stop Monitor WatchDog& Shutdowner Host Fail Monitor Polling RA CIB RA RA Node’s Status pacemaker Start Stop Monitor WatchDog& Shutdowner Host Fail Monitor Polling Down Host A Host B Nova 2. Call Evacuate API for all VM on Host B
  • 16.
    16Copyright©2015 NTT corp.All Rights Reserved. Masakariの今後 Masakariをopenstackプロジェクトとして提案中 https://review.openstack.org/#/c/330370/ https://launchpad.net/masakari Todo: • Masakari API • Failover Segment • Capacity Reservation • Host Maintenance • Event History https://github.com/ntt-sic/masakari/wiki/Masakari-API-Design • Openstack resource-agent連携 • 各masakari-monitorの変わりにopenstack-resource-agentを利用
  • 17.
    17Copyright©2015 NTT corp.All Rights Reserved. 御清聴ありがとうございました

Editor's Notes

  • #13 This is quick architecture overview of Masakari. Masakari is roughly divided to 2 parts. One is Masakari controller presented in light blue box at top of the slide. Another is state monitoring processes displayed in the boxes at bottom of the slide. The controller process is in charge of calling OpenStack API depending on the type of notification from monitoring processes. The monitoring processes are monitoring whether each type of error Masakari want to detect occurs or not. In following slides, I’ll present you how each monitoring process is monitoring different errors.
  • #17 I show you a quick instruction of Masakari. Before setting up Masakari, there are 2 prerequisites, Masakari assumes Compute Node uses KVM as its virtualizing technology and shares storages for ephemeral disks, NFS or ceph. 2.