AUTO-SCALE A SELF-HEALING
CLUSTER IN OPENSTACK
2018 Việt Nam OpenInfraDay
Rico Lin, irc: ricolin <rico.lin@easystack.cn> @ EasyStack
Xin chào các bạn, Mình tên là Rico Lin, đến từ Đài Loan, lần đầu tiên
sang Việt Nam, cảm thấy rất thích và vui. Hôm nay Mình sẽ chia sẽ
cho các bạn, chủ đề là AUTO-SCALE A SELF-HEALING CLUSTER IN
OPENSTACK
October
2018
_____________ A _______________
_________ IN OPENSTACK
_____________ A _______________
CLUSTER IN OPENSTACK
A Unit in Application cluster
Pool
Network
Subnet
Loadbalancer
Floating IP Heal monitor
Pool Member
Nova
Nginx
Unit with Heat
Software Deploy
Nova Server
What you can install with
● heat-config-ansible
● heat-config-apply-config
● heat-config-cfn-init
● heat-config-chef
● heat-config-docker-cmd
● heat-config-docker-compose
● heat-config-hiera
● heat-config-json-file
● heat-config-kubelet
● heat-config-puppet
● heat-config-salt
● heat-config-script
And you can customize your own
hook
os-collect-config
os-refresh-config
os-apply-config
kubelet-hook$ kubelet
Webserver
done
config-notify
Signal
● CCFN_SIGNAL
● TEMP_URL_SIGNAL
● NO_SIGNAL
● HEAT_SIGNAL
● ZAQAR_SIGNAL
Software Config
Pool
Network
Subnet
Loadbalancer
Floating IP Heal monitor
Pool Member
Nginx
Heat Container Agent
Heat container agents [sample in repo]
Software Deploy
Nova Server
What you can install with
● heat-config-ansible
● heat-config-apply-config
● heat-config-cfn-init
● heat-config-chef
● heat-config-docker-cmd
● heat-config-docker-compose
● heat-config-hiera
● heat-config-json-file
● heat-config-kubelet
● heat-config-puppet
● heat-config-salt
● heat-config-script
And you can customize your own
hook
os-collect-config
os-refresh-config
os-apply-config
kubelet-hook$ kubelet
Webserver
done
config-notify
Signal
● CCFN_SIGNAL
● TEMP_URL_SIGNAL
● NO_SIGNAL
● HEAT_SIGNAL
● ZAQAR_SIGNAL
Dockers
Software Config
Pool
Network
Subnet
Loadbalancer
Floating IP Heal monitor
Pool Member
Heat container agents [sample in repo]
config:
type: OS::Heat::SoftwareConfig
properties:
group: script
outputs:
- name: result
config: { get_file: example-script.sh }
deployment:
type: OS::Heat::SoftwareDeployment
properties:
config: { get_resource: config }
server: { get_resource: server }
start_container_agent:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ./start-container-agent.sh}
server:
type: OS::Nova::Server
properties:
image: {get_param: image}
flavor: {get_param: flavor}
key_name: {get_param: key_name}
networks:
- network: {get_param: private_net}
security_groups:
- {get_resource: the_sg}
user_data_format: SOFTWARE_CONFIG
user_data: {get_attr: [start_container_agent, config]}
#!/bin/bash
set -ux
# heat-docker-agent service
cat <<EOF > /etc/systemd/system/heat-container-agent.service
[Unit]
Description=Heat Container Agent
After=docker.service
Requires=docker.service
[Service]
TimeoutSec=5min
RestartSec=5min
User=root
Restart=on-failure
ExecStartPre=-/usr/bin/docker rm -f heat-container-agent
ExecStartPre=-/usr/bin/docker pull
docker.io/rico/heat-container-agent
ExecStart=/usr/bin/docker run --name heat-container-agent 
--privileged 
--net=host 
-v /run/systemd:/run/systemd 
-v /etc/sysconfig:/etc/sysconfig 
-v /etc/systemd/system:/etc/systemd/system 
-v /var/lib/heat-cfntools:/var/lib/heat-cfntools 
-v /var/lib/cloud:/var/lib/cloud 
-v /tmp:/tmp 
-v /etc/hosts:/etc/hosts 
docker.io/rico/heat-container-agent
ExecStop=/usr/bin/docker stop heat-container-agent
[Install]
WantedBy=multi-user.target
EOF
# enable and start heat-container-agent
chmod 0640 /etc/systemd/system/heat-container-agent.service
/usr/bin/systemctl enable heat-container-agent.service
/usr/bin/systemctl start --no-block heat-container-agent.service
Demo
_____________ A SELF-HEALING
CLUSTER IN OPENSTACK
Self Healing
XXX::Server
XXX::Signal XXX::Alarm
XXX::Workflow
Signal
Meter
Trigger
XXX::AutoScaling
Fix
Self Healing
XXX::Server
XXX::Signal XXX::Alarm
XXX::Workflow
Signal
Meter
Trigger
XXX::AutoScaling
How you
metering?
How you
handle
signal?
How you
trigger a
fix job
What's
meter to
you?
Fix
Self Healing
server:
type: OS::Nova::Server
properties:
...
alarm_queue:
type: OS::Zaqar::Queue
error_event_alarm:
type: OS::Aodh::EventAlarm
properties:
event_type: compute.instance.update
query:
- field: traits.instance_id
value: {get_resource: server}
op: eq
- field: traits.state
value: error
op: eq
alarm_queues:
- {get_resource: alarm_queue}
alarm_subscription:
type: OS::Zaqar::MistralTrigger
properties:
queue_name: {get_resource: alarm_queue}
workflow_id: {get_resource: autoheal}
input:
stack_id: {get_param: "OS::stack_id"}
root_stack_id:
if:
- is_standalone
- {get_param: "OS::stack_id"}
- {get_param: "root_stack_id"}
autoheal:
type: OS::Mistral::Workflow
properties:
description: >
Mark a server as unhealthy and commence a stack update
to replace it.
input:
stack_id:
root_stack_id:
type: direct
tasks:
- name: resources_mark_unhealthy
action:
list_join:
- ' '
- - heat.resources_mark_unhealthy
- stack_id=<% $.stack_id %>
- resource_name=<%
env().notification.body.reason_data.event.traits.where($[0] =
'instance_id').select($[2]).first() %>
- mark_unhealthy=true
- resource_status_reason='Marked by alarm'
on_success:
- stacks_update
- name: stacks_update
action: heat.stacks_update stack_id=<% $.root_stack_id
%> existing=true
OpenStack Self Healing SIG[link]
Demo
AUTO-SCALE A SELF-HEALING
CLUSTER IN OPENSTACK
Auto Scaling
AutoScalingGroup
ScalingPolicy XXX::Alarm
Signal
Meter
Trigger
Scale
Auto Scaling
AutoScalingGroup
ScalingPolicy XXX::Alarm
Signal
Meter
Trigger
What to
Alarm
Scale
What to
scale
Auto Scaling https://github.com/openstack/heat-templates/tree/master/hot/autoscaling.yaml
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
# min_adjustment_step:
web_server_scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: -1
cpu_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description: Scale up if CPU > 80%
metric: cpu_util
aggregation_method: mean
granularity: 300
evaluation_periods: 1
threshold: 80
resource_type: instance
comparison_operator: gt
alarm_actions:
- str_replace:
template: trust+url
params:
url: {get_attr: [web_server_scaleup_policy, signal_url]}
query:
list_join:
- ''
- - {'=': {server_group: {get_param: "OS::stack_id"}}}
cpu_alarm_low:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
monitoring:
type: monitor.yaml
properties:
url: get_attr: [web_server_scaleup_policy, signal_url]
ScalingPolicy
Stack
Monitor service AutoScalingGroup
Instance
1
1.Metering
2 N
2.Alarm
3.Scale
Choose your own structure
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
outputs:
signal_url:
value: {get_attr: [web_server_scaleup_policy, signal_url]}
ScalingPolicy
Stack
Monitor service
AutoScalingGroup
Instance
1
1.Metering
2 N
2.Alarm
3.Scale
Choose your own structure
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
outputs:
signal_url:
value: {get_attr: [web_server_scaleup_policy, signal_url]}
ScalingPolicy
Stack
Monitor service
AutoScalingGroup
Instance
1
1.Metering
2 N
2.Alarm
3.Scale
Choose your own structure
curl -i -H "X-Auth-Token: $TOKEN" -X POST $Signal_url
curl -i -H "Content-Type: application/json" -d '{ "auth": { "identity": { "methods":
["password"], "password": { "user": { "name": "admin", "domain": { "id":
"default" }, "password": "password" } } }, "scope": { "project": {
"name": "admin", "domain": { "id": "default" } } } }}'
http://$KEYSTONE/identity/v3/auth/tokens ; echo
Look into options for auto-scaling
OS::Heat::AutoScalingGroup
● Properties
○ resource:
■ type: web_server.yaml
■ properties
○ min_size: 10
○ max_size: 100
○ cooldown: 30
○ desired_capacity: 30
○ rolling_updates
■ min_in_service: 5
■ max_batch_size: 10
■ pause_time: 15
● Attributes
○ outputs
○ outputs_list
○ current_size
○ refs [IDs]
○ refs_map {[names: IDs]}
Look into options for auto-scaling
OS::Heat::ScalingPolicy
● Properties
○ adjustment_type: change_in_capacity
■ exact_capacity
■ change_in_capacity
■ percent_change_in_capacity
○ auto_scaling_group_id: asg_id
○ cooldown: 60
○ scaling_adjustment: 5
○ # min_adjustment_step:
● Attributes
○ alarm_url
○ signal_url
Demo
• Review https://goo.gl/4KL1gN
• StoryBoard (Bugs/BP)
https://storyboard.openstack.org/#!/project_group/82
• StoryBoard guide
https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info
• Documents https://docs.openstack.org/heat/latest/
• Release Notes https://docs.openstack.org/releasenotes/heat/
• Feedback or Provide ideas = irc: #heat
• Feedback your Use cases
https://etherpad.openstack.org/p/heat-usecases
• Team meeting time Wednesday 14:00 UTC #heat (meeting wiki and
archive)
Join Heat
➔ Boston Summit
◆ Heat project update [ slide & video ]
◆ Heat Onboarding [ slide & video ]
➔ Sydney Summit
◆ Heat project update [ slide & video ]
◆ Heat Onboarding [ slide & video ]
➔ Vancouver Summit
◆ Heat project update [ slide & video ]
◆ Heat Onboarding [ slide & video ]
➔ Heat templates
➔ PTG Etherpad
Q & A
Links: demo video
If you wondering what your product or you can interact with Open
Source Cloud community: Embrace community! Embrace Life!

Autoscale a self-healing cluster in OpenStack with Heat

  • 1.
    AUTO-SCALE A SELF-HEALING CLUSTERIN OPENSTACK 2018 Việt Nam OpenInfraDay Rico Lin, irc: ricolin <rico.lin@easystack.cn> @ EasyStack Xin chào các bạn, Mình tên là Rico Lin, đến từ Đài Loan, lần đầu tiên sang Việt Nam, cảm thấy rất thích và vui. Hôm nay Mình sẽ chia sẽ cho các bạn, chủ đề là AUTO-SCALE A SELF-HEALING CLUSTER IN OPENSTACK October 2018
  • 2.
  • 3.
  • 4.
    A Unit inApplication cluster Pool Network Subnet Loadbalancer Floating IP Heal monitor Pool Member Nova Nginx
  • 5.
    Unit with Heat SoftwareDeploy Nova Server What you can install with ● heat-config-ansible ● heat-config-apply-config ● heat-config-cfn-init ● heat-config-chef ● heat-config-docker-cmd ● heat-config-docker-compose ● heat-config-hiera ● heat-config-json-file ● heat-config-kubelet ● heat-config-puppet ● heat-config-salt ● heat-config-script And you can customize your own hook os-collect-config os-refresh-config os-apply-config kubelet-hook$ kubelet Webserver done config-notify Signal ● CCFN_SIGNAL ● TEMP_URL_SIGNAL ● NO_SIGNAL ● HEAT_SIGNAL ● ZAQAR_SIGNAL Software Config Pool Network Subnet Loadbalancer Floating IP Heal monitor Pool Member Nginx
  • 6.
  • 7.
    Heat container agents[sample in repo] Software Deploy Nova Server What you can install with ● heat-config-ansible ● heat-config-apply-config ● heat-config-cfn-init ● heat-config-chef ● heat-config-docker-cmd ● heat-config-docker-compose ● heat-config-hiera ● heat-config-json-file ● heat-config-kubelet ● heat-config-puppet ● heat-config-salt ● heat-config-script And you can customize your own hook os-collect-config os-refresh-config os-apply-config kubelet-hook$ kubelet Webserver done config-notify Signal ● CCFN_SIGNAL ● TEMP_URL_SIGNAL ● NO_SIGNAL ● HEAT_SIGNAL ● ZAQAR_SIGNAL Dockers Software Config Pool Network Subnet Loadbalancer Floating IP Heal monitor Pool Member
  • 8.
    Heat container agents[sample in repo] config: type: OS::Heat::SoftwareConfig properties: group: script outputs: - name: result config: { get_file: example-script.sh } deployment: type: OS::Heat::SoftwareDeployment properties: config: { get_resource: config } server: { get_resource: server } start_container_agent: type: OS::Heat::SoftwareConfig properties: group: ungrouped config: {get_file: ./start-container-agent.sh} server: type: OS::Nova::Server properties: image: {get_param: image} flavor: {get_param: flavor} key_name: {get_param: key_name} networks: - network: {get_param: private_net} security_groups: - {get_resource: the_sg} user_data_format: SOFTWARE_CONFIG user_data: {get_attr: [start_container_agent, config]} #!/bin/bash set -ux # heat-docker-agent service cat <<EOF > /etc/systemd/system/heat-container-agent.service [Unit] Description=Heat Container Agent After=docker.service Requires=docker.service [Service] TimeoutSec=5min RestartSec=5min User=root Restart=on-failure ExecStartPre=-/usr/bin/docker rm -f heat-container-agent ExecStartPre=-/usr/bin/docker pull docker.io/rico/heat-container-agent ExecStart=/usr/bin/docker run --name heat-container-agent --privileged --net=host -v /run/systemd:/run/systemd -v /etc/sysconfig:/etc/sysconfig -v /etc/systemd/system:/etc/systemd/system -v /var/lib/heat-cfntools:/var/lib/heat-cfntools -v /var/lib/cloud:/var/lib/cloud -v /tmp:/tmp -v /etc/hosts:/etc/hosts docker.io/rico/heat-container-agent ExecStop=/usr/bin/docker stop heat-container-agent [Install] WantedBy=multi-user.target EOF # enable and start heat-container-agent chmod 0640 /etc/systemd/system/heat-container-agent.service /usr/bin/systemctl enable heat-container-agent.service /usr/bin/systemctl start --no-block heat-container-agent.service
  • 9.
  • 10.
  • 11.
  • 12.
    Self Healing XXX::Server XXX::Signal XXX::Alarm XXX::Workflow Signal Meter Trigger XXX::AutoScaling Howyou metering? How you handle signal? How you trigger a fix job What's meter to you? Fix
  • 13.
    Self Healing server: type: OS::Nova::Server properties: ... alarm_queue: type:OS::Zaqar::Queue error_event_alarm: type: OS::Aodh::EventAlarm properties: event_type: compute.instance.update query: - field: traits.instance_id value: {get_resource: server} op: eq - field: traits.state value: error op: eq alarm_queues: - {get_resource: alarm_queue} alarm_subscription: type: OS::Zaqar::MistralTrigger properties: queue_name: {get_resource: alarm_queue} workflow_id: {get_resource: autoheal} input: stack_id: {get_param: "OS::stack_id"} root_stack_id: if: - is_standalone - {get_param: "OS::stack_id"} - {get_param: "root_stack_id"} autoheal: type: OS::Mistral::Workflow properties: description: > Mark a server as unhealthy and commence a stack update to replace it. input: stack_id: root_stack_id: type: direct tasks: - name: resources_mark_unhealthy action: list_join: - ' ' - - heat.resources_mark_unhealthy - stack_id=<% $.stack_id %> - resource_name=<% env().notification.body.reason_data.event.traits.where($[0] = 'instance_id').select($[2]).first() %> - mark_unhealthy=true - resource_status_reason='Marked by alarm' on_success: - stacks_update - name: stacks_update action: heat.stacks_update stack_id=<% $.root_stack_id %> existing=true
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Auto Scaling https://github.com/openstack/heat-templates/tree/master/hot/autoscaling.yaml resources: asg: type:OS::Heat::AutoScalingGroup properties: min_size: 1 max_size: 3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 # min_adjustment_step: web_server_scaledown_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: -1 cpu_alarm_high: type: OS::Aodh::GnocchiAggregationByResourcesAlarm properties: description: Scale up if CPU > 80% metric: cpu_util aggregation_method: mean granularity: 300 evaluation_periods: 1 threshold: 80 resource_type: instance comparison_operator: gt alarm_actions: - str_replace: template: trust+url params: url: {get_attr: [web_server_scaleup_policy, signal_url]} query: list_join: - '' - - {'=': {server_group: {get_param: "OS::stack_id"}}} cpu_alarm_low: type: OS::Aodh::GnocchiAggregationByResourcesAlarm
  • 20.
    resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size:3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 monitoring: type: monitor.yaml properties: url: get_attr: [web_server_scaleup_policy, signal_url] ScalingPolicy Stack Monitor service AutoScalingGroup Instance 1 1.Metering 2 N 2.Alarm 3.Scale Choose your own structure
  • 21.
    resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size:3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 outputs: signal_url: value: {get_attr: [web_server_scaleup_policy, signal_url]} ScalingPolicy Stack Monitor service AutoScalingGroup Instance 1 1.Metering 2 N 2.Alarm 3.Scale Choose your own structure
  • 22.
    resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size:3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 outputs: signal_url: value: {get_attr: [web_server_scaleup_policy, signal_url]} ScalingPolicy Stack Monitor service AutoScalingGroup Instance 1 1.Metering 2 N 2.Alarm 3.Scale Choose your own structure curl -i -H "X-Auth-Token: $TOKEN" -X POST $Signal_url curl -i -H "Content-Type: application/json" -d '{ "auth": { "identity": { "methods": ["password"], "password": { "user": { "name": "admin", "domain": { "id": "default" }, "password": "password" } } }, "scope": { "project": { "name": "admin", "domain": { "id": "default" } } } }}' http://$KEYSTONE/identity/v3/auth/tokens ; echo
  • 23.
    Look into optionsfor auto-scaling OS::Heat::AutoScalingGroup ● Properties ○ resource: ■ type: web_server.yaml ■ properties ○ min_size: 10 ○ max_size: 100 ○ cooldown: 30 ○ desired_capacity: 30 ○ rolling_updates ■ min_in_service: 5 ■ max_batch_size: 10 ■ pause_time: 15 ● Attributes ○ outputs ○ outputs_list ○ current_size ○ refs [IDs] ○ refs_map {[names: IDs]}
  • 24.
    Look into optionsfor auto-scaling OS::Heat::ScalingPolicy ● Properties ○ adjustment_type: change_in_capacity ■ exact_capacity ■ change_in_capacity ■ percent_change_in_capacity ○ auto_scaling_group_id: asg_id ○ cooldown: 60 ○ scaling_adjustment: 5 ○ # min_adjustment_step: ● Attributes ○ alarm_url ○ signal_url
  • 25.
  • 26.
    • Review https://goo.gl/4KL1gN •StoryBoard (Bugs/BP) https://storyboard.openstack.org/#!/project_group/82 • StoryBoard guide https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info • Documents https://docs.openstack.org/heat/latest/ • Release Notes https://docs.openstack.org/releasenotes/heat/ • Feedback or Provide ideas = irc: #heat • Feedback your Use cases https://etherpad.openstack.org/p/heat-usecases • Team meeting time Wednesday 14:00 UTC #heat (meeting wiki and archive) Join Heat ➔ Boston Summit ◆ Heat project update [ slide & video ] ◆ Heat Onboarding [ slide & video ] ➔ Sydney Summit ◆ Heat project update [ slide & video ] ◆ Heat Onboarding [ slide & video ] ➔ Vancouver Summit ◆ Heat project update [ slide & video ] ◆ Heat Onboarding [ slide & video ] ➔ Heat templates ➔ PTG Etherpad
  • 27.
    Q & A Links:demo video If you wondering what your product or you can interact with Open Source Cloud community: Embrace community! Embrace Life!