ceilometer
Ceilometer

Heat

Alarming

Alarming

Julien Danjou
jd__@Freenode // @juldanjou
julien@danjou.info

Nick Barcet
nijaba@Freenode // @nijaba
nick@enovance.com

Eoghan Glynn
eglynn@Freenode
eglynn@redhat.com
Speakers
● Nick Barcet co-founded the Ceilometer project at the
Folsom summit and led the project through incubation
● Julien Danjou has been a core Ceilometer contributor
from the outset, taking over the PTL reins for Havana
● Eoghan Glynn drove the addition of the Alarming
feature to Ceilometer over the Havana cycle
Two seemingly disjoint
projects intersect
● Heat is a template-driven orchestration engine
○ automates complex deployments via declarative
configuration
● Ceilometer is a metering infrastructure
○ collects data measuring resource usage and
performance
● Appear on the surface to have minimal commonality ...
Ceilometer Workflow
Collect

Transform

Publish

Store

Aggregate

● Collect from OpenStack components
● Transform metering data if necessary
● Publish meters to any destination (including
Ceilometer itself)
● Store received meters
● Aggregate samples via a REST API
Heat Workflow
my_stack.template
{ "AWSTemplateFormat" : "2010-09-09",
"Parameters": { "VolumeSize" : { … } }
"Mappings": {
"Flavor2Arch" : { "tiny": {"Arch" : "64" },
... },
"Resources": {
"MyInstance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : { “Volumes” : […] }
} } },
"Outputs": { "DNS" : { "Value" : { … } } } }
Heat Workflow
{ "AWSTemplateFormat" : "2010-0909",
"Parameters": { "VolumeSize" : { …
}}
"Mappings": {
"Flavor2Arch" : { "tiny": {"Arch" : "64"
},
... },
"Resources": {
"MyInstance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : { “Volumes” : […] }
} } },
"Outputs": { "DNS" : { "Value" : { … }
}}}

consumed by

Heat Engine
Heat Workflow
{ "AWSTemplateFormat" : "2010-0909",
"Parameters": { "VolumeSize" : { …
}}
"Mappings": {
"Flavor2Arch" : { "tiny": {"Arch" : "64"
},
... },
"Resources": {
"MyInstance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : { “Volumes” : […] }
} } },
"Outputs": { "DNS" : { "Value" : { … }
}}}

consumed by

interacts with

Heat Engine
Heat Workflow
{ "AWSTemplateFormat" : "2010-0909",
"Parameters": { "VolumeSize" : { …
}}
"Mappings": {
"Flavor2Arch" : { "tiny": {"Arch" : "64"
},
... },
"Resources": {
"MyInstance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : { “Volumes” : […] }
} } },
"Outputs": { "DNS" : { "Value" : { … }
}}}

consumed by

interacts with

Heat Engine

my_stack

Instance

Volume

spins up
Heat Autoscaling v1.0
CW-lite

reports
load
my_stack

push-stats
Heat Autoscaling v1.0
Heat Engine
reports
load

scales out
stack
my_stack

Instance
Instance
Instance
Heat Autoscaling v1.0
Heat Engine
reports
load

scales out
stack
my_stack

Instance
Instance
Instance
Instance
Ceilometer to the rescue!
● compute agent already collects most
relevant stats from outside the instance
● API service exposes aggregation over the
evaluation window
● define new API exposing alarm lifecycle
● provide new service to evaluate alarms
against their defined rules
● additional service driving asynchronous
notifications when alarms fire
How it all hangs together
{ "AWSTemplateFormat" : "2010-0909",
"Parameters": { "VolumeSize" : { …
}}
"Mappings": {
"Flavor2Arch" : { "tiny": {"Arch" : "64"
},
... },
"Resources": {
"MyInstance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : { “Volumes” : […] }
} } },
"Outputs": { "DNS" : { "Value" : { … }
}}}

added to template

● alarms bounding busy/idleness of
instances
● membership of autoscale group
represented via user metadata
● alarm actions refer to scale
up/down policies
● action URLs are pre-signed
● policies define adjustment step size
& cooldown period
How it all hangs together
"CPUAlarmHigh": {
"Type": "OS::Metering::Alarm",
"Properties": {
"meter_name": "cpu_util", threshold: "75"
"evaluation_periods": "5", "period": "60",
"statistic": "avg", "comparison_operator": "gt",
"description": "Scale-up if CPU > 75% for 300s",
"alarm_actions":[…"ScaleUpPolicy", "AlarmUrl"…],
"matching_metadata": {
"metadata.user_metadata.server_group":
"MyWebServerGroup"
}}}
How it all hangs together
Heat Engine
injects user
metadata
my_stack

Instance
How it all hangs together
Heat Engine
injects user
metadata
my_stack

Instance

API service

Ceilometer

creates
alarms
How it all hangs together
API service

Heat Engine
injects user
metadata
my_stack

Instance

monitors
instances

Compute
Agent

Ceilometer

creates
alarms
How it all hangs together
API service

Heat Engine
injects user
metadata
my_stack

Instance

triggers
alarm

monitors
instances

Alarm
evaluator

Compute
Agent

Ceilometer

creates
alarms
How it all hangs together
alarming

Heat Engine
Alarms
injects user
metadata
my_stack

Instance
Instance
Instance

scales out
stack

Compute

Ceilometer

API
How it all hangs together
alarming

Heat Engine
Alarms
injects user
metadata
my_stack

Instance
Instance
Instance
Instance
Instance

scales out
stack

Compute

Ceilometer

API
How it all hangs together
API service

my_stack
Meter store

Instance

provides
alarm rules

queries
stats

reports
samples

Compute
Agent

Alarm
evaluator

Ceilometer

Heat Engine
Lessons learned
Keys to successful intra-project interactions:
● buy-in from stakeholders on both sides
● early validation and proof-points
● protect consuming project from churn during
the development cycle
● split deliverables into bite-sized separately
consumable chunks
Future directions
● expand metering coverage to also capture:
○ memory utilization %
○ LBaaS statistics
○ network & disk I/O rates

● add combination alarm support to Heat
templates
○ allow thresholds over multiple metrics to be modeled

● exclude low-quality datapoints
○ avoid scaling when only outliers have reported metrics
Future directions
● monitor baremetal via IPMI or SNMP
○ autoscale groups of hosts managed as Ironic instances

● constrain alarms for time-of-day or day-of-week
○ e.g. set the bar higher on weekends, lower on weekdays

● decouple autoscaling usage from Heat templates
● authenticate webhook calls with keystone trusts
○ avoid ec2-signer use without keystone EC2 tokens ext
Further questions?
● Chat on Freenode:
○ #openstack-metering
○ #heat

● Mail the dev list:
○ openstack-dev@lists.openstack.org

● Harangue us via Launchpad:
○ https://launchpad.net/ceilometer/+filebug

Ceilometer + Heat = Alarming

  • 1.
    ceilometer Ceilometer Heat Alarming Alarming Julien Danjou jd__@Freenode //@juldanjou julien@danjou.info Nick Barcet nijaba@Freenode // @nijaba nick@enovance.com Eoghan Glynn eglynn@Freenode eglynn@redhat.com
  • 2.
    Speakers ● Nick Barcetco-founded the Ceilometer project at the Folsom summit and led the project through incubation ● Julien Danjou has been a core Ceilometer contributor from the outset, taking over the PTL reins for Havana ● Eoghan Glynn drove the addition of the Alarming feature to Ceilometer over the Havana cycle
  • 3.
    Two seemingly disjoint projectsintersect ● Heat is a template-driven orchestration engine ○ automates complex deployments via declarative configuration ● Ceilometer is a metering infrastructure ○ collects data measuring resource usage and performance ● Appear on the surface to have minimal commonality ...
  • 4.
    Ceilometer Workflow Collect Transform Publish Store Aggregate ● Collectfrom OpenStack components ● Transform metering data if necessary ● Publish meters to any destination (including Ceilometer itself) ● Store received meters ● Aggregate samples via a REST API
  • 5.
    Heat Workflow my_stack.template { "AWSTemplateFormat": "2010-09-09", "Parameters": { "VolumeSize" : { … } } "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } } } }
  • 6.
    Heat Workflow { "AWSTemplateFormat": "2010-0909", "Parameters": { "VolumeSize" : { … }} "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } }}} consumed by Heat Engine
  • 7.
    Heat Workflow { "AWSTemplateFormat": "2010-0909", "Parameters": { "VolumeSize" : { … }} "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } }}} consumed by interacts with Heat Engine
  • 8.
    Heat Workflow { "AWSTemplateFormat": "2010-0909", "Parameters": { "VolumeSize" : { … }} "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } }}} consumed by interacts with Heat Engine my_stack Instance Volume spins up
  • 9.
  • 10.
    Heat Autoscaling v1.0 HeatEngine reports load scales out stack my_stack Instance Instance Instance
  • 11.
    Heat Autoscaling v1.0 HeatEngine reports load scales out stack my_stack Instance Instance Instance Instance
  • 12.
    Ceilometer to therescue! ● compute agent already collects most relevant stats from outside the instance ● API service exposes aggregation over the evaluation window ● define new API exposing alarm lifecycle ● provide new service to evaluate alarms against their defined rules ● additional service driving asynchronous notifications when alarms fire
  • 13.
    How it allhangs together { "AWSTemplateFormat" : "2010-0909", "Parameters": { "VolumeSize" : { … }} "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } }}} added to template ● alarms bounding busy/idleness of instances ● membership of autoscale group represented via user metadata ● alarm actions refer to scale up/down policies ● action URLs are pre-signed ● policies define adjustment step size & cooldown period
  • 14.
    How it allhangs together "CPUAlarmHigh": { "Type": "OS::Metering::Alarm", "Properties": { "meter_name": "cpu_util", threshold: "75" "evaluation_periods": "5", "period": "60", "statistic": "avg", "comparison_operator": "gt", "description": "Scale-up if CPU > 75% for 300s", "alarm_actions":[…"ScaleUpPolicy", "AlarmUrl"…], "matching_metadata": { "metadata.user_metadata.server_group": "MyWebServerGroup" }}}
  • 15.
    How it allhangs together Heat Engine injects user metadata my_stack Instance
  • 16.
    How it allhangs together Heat Engine injects user metadata my_stack Instance API service Ceilometer creates alarms
  • 17.
    How it allhangs together API service Heat Engine injects user metadata my_stack Instance monitors instances Compute Agent Ceilometer creates alarms
  • 18.
    How it allhangs together API service Heat Engine injects user metadata my_stack Instance triggers alarm monitors instances Alarm evaluator Compute Agent Ceilometer creates alarms
  • 19.
    How it allhangs together alarming Heat Engine Alarms injects user metadata my_stack Instance Instance Instance scales out stack Compute Ceilometer API
  • 20.
    How it allhangs together alarming Heat Engine Alarms injects user metadata my_stack Instance Instance Instance Instance Instance scales out stack Compute Ceilometer API
  • 21.
    How it allhangs together API service my_stack Meter store Instance provides alarm rules queries stats reports samples Compute Agent Alarm evaluator Ceilometer Heat Engine
  • 22.
    Lessons learned Keys tosuccessful intra-project interactions: ● buy-in from stakeholders on both sides ● early validation and proof-points ● protect consuming project from churn during the development cycle ● split deliverables into bite-sized separately consumable chunks
  • 23.
    Future directions ● expandmetering coverage to also capture: ○ memory utilization % ○ LBaaS statistics ○ network & disk I/O rates ● add combination alarm support to Heat templates ○ allow thresholds over multiple metrics to be modeled ● exclude low-quality datapoints ○ avoid scaling when only outliers have reported metrics
  • 24.
    Future directions ● monitorbaremetal via IPMI or SNMP ○ autoscale groups of hosts managed as Ironic instances ● constrain alarms for time-of-day or day-of-week ○ e.g. set the bar higher on weekends, lower on weekdays ● decouple autoscaling usage from Heat templates ● authenticate webhook calls with keystone trusts ○ avoid ec2-signer use without keystone EC2 tokens ext
  • 25.
    Further questions? ● Chaton Freenode: ○ #openstack-metering ○ #heat ● Mail the dev list: ○ openstack-dev@lists.openstack.org ● Harangue us via Launchpad: ○ https://launchpad.net/ceilometer/+filebug