© 2015 NTT Software Innovation Center
仮想マシンの高可用性と
異常系試験フレームワーク
Sampath Priyankara
SIC I三P I方D
2Copyright©2015 NTT corp. All Rights Reserv
仮想マシンの高可用性
Masakari
3Copyright©2015 NTT corp. All Rights Reserv
What is Masakari?
Service : Instances High Availability Service
Mission:
Provide instances high availability service
for OpenStack clouds by automatically
recovering the instances from failures.
https://governance.openstack.org/tc/refer
ence/projects/masakari.html
4Copyright©2015 NTT corp. All Rights Reserv
Masakri in action
Virtual Machine High Availability
with Platform9 OpenStack
• https://platform9.com/blog/virtual-machine-high-
availability-platform9-openstack/
• https://www.openstack.org/summit/boston-
2017/summit-schedule/events/17502/zero-touch-
high-availability-with-masakari-and-consul
5Copyright©2015 NTT corp. All Rights Reserv
Project History
[2015 Jul] Started on Github
https://github.com/ntt-sic/masakari [no longer maintained]
[2016 Jun] Moved into OpenStack name space and rebuild
[2017 Oct] Became official OpenStack project
Related work:
[openstack-ha team]
Define spcs for converged upstream solution for VMHA
Wiki: https://wiki.openstack.org/wiki/Meetings/HATeamMeeting
spec: https://review.openstack.org/#/q/project:openstack/openstack-resource-agents-
specs
[OpenStack Product Working Group]
Development Proposal for VMHA
Wiki: https://wiki.openstack.org/wiki/ProductTeam
Feature: http://featuretracker.openstack.org/projectDetail/0003
6Copyright©2015 NTT corp. All Rights Reserv
Project Structure
Masakari
• Masakari
• Masakari Engine and APIs
• Repositories: openstack/masakari
• masakari-monitors
• Monitors to detect failures and send notifications to masakari API
• Repositories: openstack/masakari-monitors
• python-masakariclient
• Python wrapper for masakari APIs
• Repositories: openstack/python-masakariclient
• Masakari Specifications
• masakari-specs
• Specifications for all masakari projects
• Repositories: openstack/masakari-specs
7Copyright©2015 NTT corp. All Rights Reserv
Masakari Architecture
masakari
masakari-monitors
python-masakariclient
8Copyright©2015 NTT corp. All Rights Reserv
Masakri Features
 Supported failure models
• Host failures, Process failures, Instance failures
Full set of monitors to detect failures
Client Library for easy operation
Customizable host recovery work flow
• Auto(use nova scheduler), Reserved host, Hybrid
 Masakari APIs:
Failover Segments (segments):
Define boundaries and behaviors for failover
Hosts (hosts):
Define compute nodes and Reserved compute nodes
Notifications (notifications):
Receive failure notifications from monitors
9Copyright©2015 NTT corp. All Rights Reserv
Current contributors
Mitaka to Queens In Queens
http://stackalytics.com/?module=masakari&release=all http://stackalytics.com/?module=masakari&release=queens
1
0Copyright©2015 NTT corp. All Rights Reserv
Work in progress
• Recovery Method customization
• [spec] https://review.openstack.org/#/c/458023/
• Intrusive Instance Monitoring
• [spec] https://review.openstack.org/#/c/469070/
• Add event notification feature to masakari
• [spec] https://review.openstack.org/#/c/473057/
• Masakari Horizon plugin
• In progress
1
1Copyright©2015 NTT corp. All Rights Reserv
Future work items
• Ironic Bare Metal Instance HA
• Volume boot Bare Metal Instance
• External monitoring mechanism
• Fencing mechanism
• Instance recovery work flow
• Force STONITH and Fencing
• Node health prediction
• Resource migration
• Forcefully bring down the node
• New masakri-monitors
• Something other than pacemaker. (ex: consul)
1
2Copyright©2015 NTT corp. All Rights Reserv
Masakari Community
Where to find us
IRC: #openstack-masakari
ML: openstack-dev with [masakari]
Weekly IRC meeting
Agenda:
https://wiki.openstack.org/wiki/Meetings/Masakari
On Tuesday at 0400 UTC
http://eavesdrop.openstack.org/#Masakari_Team_Meeti
ng
1
3Copyright©2015 NTT corp. All Rights Reserv
異常系試験フレームワーク
Eris
1
4Copyright©2015 NTT corp. All Rights Reserv
What is Eris?
Destructive test framework for OpenStack
• Test robustness/resiliency of OpenStack at the
CI/CD gates.
• Test different architectures on 3rd party CIs.
• Common test cases developed by community.
• Automated evaluation of KPIs and report
generation.
Current Team
• NTT, AT&T, Intel (LCOO WG members)
1
5Copyright©2015 NTT corp. All Rights Reserv
About Eris
Eris Wiki:
https://openstack-lcoo.atlassian.net/wiki/spaces/LCOO/pages/13393034/Eris+-
+Extreme+Testing+Framework+for+OpenStack
Summit Forum:
• https://etherpad.openstack.org/p/SYD-extreme-testing
• https://etherpad.openstack.org/p/LCOO-Extreme_Testing-QA-ERIS
Demo:
https://openstack-
lcoo.atlassian.net/wiki/spaces/LCOO/pages/22872097/Extreme+Testing+Demo
1
6Copyright©2015 NTT corp. All Rights Reserv
Form Discussion
Similar Efforts
 OPNFV Yardstick[1]
 enos: Experimental eNvironment for OpenStack
New requirements
 scale/performance testing for nova
 Integrate with OPNFV testing tools
Discussion with QA team
 PoC demo
 Discuss future road map
「1」https://wiki.opnfv.org/display/yardstick/Yardstick
1
7Copyright©2015 NTT corp. All Rights Reserv
Work Items
• Create SIG
• Discuss with other community
such as OPNFV, nova, Rally..etc.
• Update QA Spec with details in
Eris wiki
• Probably divide in to several spes
• Enhance Eris PoC and integrate
into Masakari CI
1
8Copyright©2015 NTT corp. All Rights Reserv
Self-healing SIG
1
9Copyright©2015 NTT corp. All Rights Reserv
Self-healing SIG
Manage OpenStack infrastructure in a policy-driven
fashion, reacting to failures and other events by
automatically healing or optimize services.
2
0Copyright©2015 NTT corp. All Rights Reserv
Self-healing SIG Members
HA of individual services
Monasca: monitoring
Aodh: alarming
Congress: policy-based governance
Mistral: workflow
Senlin: clustering service
Vitrage: root cause analysis
Watcher: optimization
Masakari: compute plane HA
Freezer-dr: compute plane HA
Heat: orchestration (normally used for cloud applications, but can also deploy
cloud infrastructure via TripleO)
Doctor: fault management and maintenance for NFV
Fault Genes Working Group: Fault classification & Recovery Strategy
Craton: Fleet management
Kolla: Containerized OpenStack deployment tool
kolla-k8s: same as above but in kubernetes cluster

OpenStack Sydney summit - OpenStack HA and Testing

  • 1.
    © 2015 NTTSoftware Innovation Center 仮想マシンの高可用性と 異常系試験フレームワーク Sampath Priyankara SIC I三P I方D
  • 2.
    2Copyright©2015 NTT corp.All Rights Reserv 仮想マシンの高可用性 Masakari
  • 3.
    3Copyright©2015 NTT corp.All Rights Reserv What is Masakari? Service : Instances High Availability Service Mission: Provide instances high availability service for OpenStack clouds by automatically recovering the instances from failures. https://governance.openstack.org/tc/refer ence/projects/masakari.html
  • 4.
    4Copyright©2015 NTT corp.All Rights Reserv Masakri in action Virtual Machine High Availability with Platform9 OpenStack • https://platform9.com/blog/virtual-machine-high- availability-platform9-openstack/ • https://www.openstack.org/summit/boston- 2017/summit-schedule/events/17502/zero-touch- high-availability-with-masakari-and-consul
  • 5.
    5Copyright©2015 NTT corp.All Rights Reserv Project History [2015 Jul] Started on Github https://github.com/ntt-sic/masakari [no longer maintained] [2016 Jun] Moved into OpenStack name space and rebuild [2017 Oct] Became official OpenStack project Related work: [openstack-ha team] Define spcs for converged upstream solution for VMHA Wiki: https://wiki.openstack.org/wiki/Meetings/HATeamMeeting spec: https://review.openstack.org/#/q/project:openstack/openstack-resource-agents- specs [OpenStack Product Working Group] Development Proposal for VMHA Wiki: https://wiki.openstack.org/wiki/ProductTeam Feature: http://featuretracker.openstack.org/projectDetail/0003
  • 6.
    6Copyright©2015 NTT corp.All Rights Reserv Project Structure Masakari • Masakari • Masakari Engine and APIs • Repositories: openstack/masakari • masakari-monitors • Monitors to detect failures and send notifications to masakari API • Repositories: openstack/masakari-monitors • python-masakariclient • Python wrapper for masakari APIs • Repositories: openstack/python-masakariclient • Masakari Specifications • masakari-specs • Specifications for all masakari projects • Repositories: openstack/masakari-specs
  • 7.
    7Copyright©2015 NTT corp.All Rights Reserv Masakari Architecture masakari masakari-monitors python-masakariclient
  • 8.
    8Copyright©2015 NTT corp.All Rights Reserv Masakri Features  Supported failure models • Host failures, Process failures, Instance failures Full set of monitors to detect failures Client Library for easy operation Customizable host recovery work flow • Auto(use nova scheduler), Reserved host, Hybrid  Masakari APIs: Failover Segments (segments): Define boundaries and behaviors for failover Hosts (hosts): Define compute nodes and Reserved compute nodes Notifications (notifications): Receive failure notifications from monitors
  • 9.
    9Copyright©2015 NTT corp.All Rights Reserv Current contributors Mitaka to Queens In Queens http://stackalytics.com/?module=masakari&release=all http://stackalytics.com/?module=masakari&release=queens
  • 10.
    1 0Copyright©2015 NTT corp.All Rights Reserv Work in progress • Recovery Method customization • [spec] https://review.openstack.org/#/c/458023/ • Intrusive Instance Monitoring • [spec] https://review.openstack.org/#/c/469070/ • Add event notification feature to masakari • [spec] https://review.openstack.org/#/c/473057/ • Masakari Horizon plugin • In progress
  • 11.
    1 1Copyright©2015 NTT corp.All Rights Reserv Future work items • Ironic Bare Metal Instance HA • Volume boot Bare Metal Instance • External monitoring mechanism • Fencing mechanism • Instance recovery work flow • Force STONITH and Fencing • Node health prediction • Resource migration • Forcefully bring down the node • New masakri-monitors • Something other than pacemaker. (ex: consul)
  • 12.
    1 2Copyright©2015 NTT corp.All Rights Reserv Masakari Community Where to find us IRC: #openstack-masakari ML: openstack-dev with [masakari] Weekly IRC meeting Agenda: https://wiki.openstack.org/wiki/Meetings/Masakari On Tuesday at 0400 UTC http://eavesdrop.openstack.org/#Masakari_Team_Meeti ng
  • 13.
    1 3Copyright©2015 NTT corp.All Rights Reserv 異常系試験フレームワーク Eris
  • 14.
    1 4Copyright©2015 NTT corp.All Rights Reserv What is Eris? Destructive test framework for OpenStack • Test robustness/resiliency of OpenStack at the CI/CD gates. • Test different architectures on 3rd party CIs. • Common test cases developed by community. • Automated evaluation of KPIs and report generation. Current Team • NTT, AT&T, Intel (LCOO WG members)
  • 15.
    1 5Copyright©2015 NTT corp.All Rights Reserv About Eris Eris Wiki: https://openstack-lcoo.atlassian.net/wiki/spaces/LCOO/pages/13393034/Eris+- +Extreme+Testing+Framework+for+OpenStack Summit Forum: • https://etherpad.openstack.org/p/SYD-extreme-testing • https://etherpad.openstack.org/p/LCOO-Extreme_Testing-QA-ERIS Demo: https://openstack- lcoo.atlassian.net/wiki/spaces/LCOO/pages/22872097/Extreme+Testing+Demo
  • 16.
    1 6Copyright©2015 NTT corp.All Rights Reserv Form Discussion Similar Efforts  OPNFV Yardstick[1]  enos: Experimental eNvironment for OpenStack New requirements  scale/performance testing for nova  Integrate with OPNFV testing tools Discussion with QA team  PoC demo  Discuss future road map 「1」https://wiki.opnfv.org/display/yardstick/Yardstick
  • 17.
    1 7Copyright©2015 NTT corp.All Rights Reserv Work Items • Create SIG • Discuss with other community such as OPNFV, nova, Rally..etc. • Update QA Spec with details in Eris wiki • Probably divide in to several spes • Enhance Eris PoC and integrate into Masakari CI
  • 18.
    1 8Copyright©2015 NTT corp.All Rights Reserv Self-healing SIG
  • 19.
    1 9Copyright©2015 NTT corp.All Rights Reserv Self-healing SIG Manage OpenStack infrastructure in a policy-driven fashion, reacting to failures and other events by automatically healing or optimize services.
  • 20.
    2 0Copyright©2015 NTT corp.All Rights Reserv Self-healing SIG Members HA of individual services Monasca: monitoring Aodh: alarming Congress: policy-based governance Mistral: workflow Senlin: clustering service Vitrage: root cause analysis Watcher: optimization Masakari: compute plane HA Freezer-dr: compute plane HA Heat: orchestration (normally used for cloud applications, but can also deploy cloud infrastructure via TripleO) Doctor: fault management and maintenance for NFV Fault Genes Working Group: Fault classification & Recovery Strategy Craton: Fleet management Kolla: Containerized OpenStack deployment tool kolla-k8s: same as above but in kubernetes cluster