Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enforcing Application SLA with Congress and Monasca

Integration between OpenStack Congress and Monasca to achieve alarm based policy enforcement for SLA

  • Login to see the comments

  • Be the first to like this

Enforcing Application SLA with Congress and Monasca

  1. 1. Enforcing Application SLAs with Congress and Monasca Fabio Giannetti, Ken Owens April 28, 2016
  2. 2. • Vision • Congress and Monasca implementing: • OPS/NOC SLA Policies • App Intent SLA Policies • Current State and Next Steps Outline
  3. 3. Vision
  4. 4. • Application owners/developers do not care about the underlining infrastructure unless it is a problem. • Microservices based architectures demands inherently granular application design. • SLAs for applications must be holistic and independent of the underlining infrastructure Vision Host Virtualization VirtualizationContainer Container Container Container Srvc Srvc Srvc Srvc Srvc Srvc Srvc Application A Application B
  5. 5. Enable business/application owners to easily define the aspects that are relevant in running their applications with the budget constraints that are imposed by IT. Vision
  6. 6. Monitoring is now holistic and has to consider various level of virtualization and harmonize data over the different layers. Containers are short lived and moved around the available infrastructure. Vision Host Virtualization VirtualizationContainer Container Container Container
  7. 7. Application owners’ soft limits (alarms) are notified back and hard limits (actions) are performed whenever required. Vision
  8. 8. OPS/NOC SLA using Congress and Monasca
  9. 9. Underutilized Servers  OPS/NOC Policy Example error(vm, email) :- nova:server_owner(vm, owner), two_months_before_today(start, end), ceilometer:statistics(vm, start, end, “cpu-util”, cpu), cpu < 5, keystone:email(owner, email) two_months_before_today(start, end) :- date:today(end), date:minus(end, “2 months”, start) If a VM has less than 5% CPU utilization for the last 2 months, then notify its owner via email
  10. 10. Current Solution Ceilometer API Congress API Policy Engine Ceilometer Datasource GET /v2/meters/cpu_util/statistics?resource_ id=… VM UUID (Resource ID) CPU xxxxxxxx-0001-xxxx-xxxxxxxxxxx xxxxxxxx-0002-xxxx-xxxxxxxxxxx xxxxxxxx-0003-xxxx-xxxxxxxxxxx xxxxxxxx-0004-xxxx-xxxxxxxxxxx xxxxxxxx-0005-xxxx-xxxxxxxxxxx Poll every <n>s 40 30 2 70 55
  11. 11. Current Solution Congress APIPolicy Engine Ceilometer Datasource VM UUID (Resource ID) CPU xxxxxxxx-0001-xxxx xxxxxxxx-0002-xxxx xxxxxxxx-0003-xxxx xxxxxxxx-0004-xxxx xxxxxxxx-0005-xxxx 40 30 2 70 55 Nova API Nova Datasource Keystone Datasource Keystone API VM Owner xxxxxxxx-0001-xxxx Ann xxxxxxxx-0002-xxxx Fabio xxxxxxxx-0003-xxxx Fabio xxxxxxxx-0004-xxxx Ken xxxxxxxx-0005-xxxx Ken Owner Email Ann AnnNotRealEmail@cisco.com Fabio FabioNotRealEmail@cisco.com Ken KenNotRealEmail@cisco.com VM Email xxxxxxxx-0003-xxxx FabioNotRealEmail@cisco.com
  12. 12. From Policy to Alarm error(vm, email) :- nova:server_owner(vm, owner), two_months_before_today(start, end), monasca_alarms:stats(vm, start, end, “cpu.user_perc”, cpu), cpu < 5, keystone:email(owner, email) two_months_before_today(start, end) :- date:today(end), date:minus(end, “2 months”, start) { "name":"Average CPU percent is less than 5", "description":"The average CPU percent is lesser than 5", "expression":"(avg(cpu.user_perc{resource_id=vm}) < 5)", "match_by":[ "resource_id" ], "severity":”HIGH", "ok_actions":[ ”action_id_for_ok" ], "alarm_actions":[ ”action_id_for_alarm" ] }
  13. 13. Proposed Solution (receiving notif.) Metrics DB Monasca Agents Monasca API Notification Engine Threshold Engine Persister Kafka Cluster Congress API Policy Engine Monasca Alarm Datasource Webhook: …/v1/data- sources/monasca_alarm ?execute&action=handl e_alarm Settings DB monasca notification-create congress WEBHOOK http:…/v1/data- sources/monasca_alarm?execute&action=handle_ala handle_alarm(params) VM UUID (Resource ID) CPU xxxxxxxx-0003-xxxx 2 POST /v2.0/alarm-definitions
  14. 14. Proposed Solution (receiving notifications) Congress API Policy Engine Monasca Alarm Datasource VM UUID (Resource ID) CPU xxxxxxxx-0003-xxxx 2 Nova API Nova Datasource Keystone Datasource Keystone API VM Owner xxxxxxxx-0003-xxxx Fabio Owner Email Fabio FabioNotRealEmail@cisco.com VM Email xxxxxxxx-0003-xxxx FabioNotRealEmail@cisco.com
  15. 15. Application Intent SLA using Congress and Monasca
  16. 16. VM Evacuation for Biz Critical App if Host has potential health issues  App Intent Policy Example error(vm) :- nova:show(vm, hostID), monasca_alarm:host_issues(hostID) If a Host has issues, for instance: 1. Unhealthy: cannot be pinged and or SSH into 2. Network errors and packet loss 3. Disk space below certain threshold
  17. 17. App Intent Policy: Metrics Correlation error(vm) :- nova:show(vm, hostID), monasca_alarm:host_issues(hostID) Metric Name Dimensions Value host_alive_status observer_host=fqdn, hostname=supplied hostname being checked, test_type=ping or ssh 0=online, 1=offline disk.space_used_perc device, mount_point The percentage of disk space that is being used on a device net.in_packets_dropped_sec device Number of inbound network packets dropped per second net.out_packets_dropped_sec device Number of outbound network packets dropped per second
  18. 18. App Intent Policy: Multi-Alarms #1 { "name":”Host is Unhealty", "description":"The host is considered unhealty", "expression":"(host_alive_status{host_id=hostID}) = 1)", "match_by":[ "host_id" ], ... } { "name":”Host disk getting full", "description":"The host disk is reaching capacity", "expression":"(disk.space_used_perc{host_id=hostID}) > 90)", "match_by":[ "host_id" ], ... } Metric Name Value host_alive_status 0=online, 1=offline disk.space_used_perc The percentage of disk space that is being used on a device net.in_packets_dropped_sec Number of inbound network packets dropped per second net.out_packets_dropped_se c Number of outbound network packets dropped per second
  19. 19. App Intent Policy: Multi-Alarms #2 { "name":”Host is Unhealty", "description":"The host is considered unhealty", "expression":"(net.in_packets_dropped_sec{host_id=hostID}) > 30)", "match_by":[ "host_id" ], ... } { "name":”Host disk getting full", "description":"The host disk is reaching capacity", "expression":"(net.out_packets_dropped_sec{host_id=hostID}) > 30)", "match_by":[ "host_id" ], ... } Metric Name Value host_alive_status 0=online, 1=offline disk.space_used_perc The percentage of disk space that is being used on a device net.in_packets_dropped_sec Number of inbound network packets dropped per second net.out_packets_dropped_sec Number of outbound network packets dropped per second
  20. 20. Current State and Future Work
  21. 21. Overall Architecture Settings DB Metrics DB Monasca Agents Monasca API Keystone Notification Engine Threshold Engine Persister Kafka Cluster Congress API Policy Engine Monasca Alarm Datasource Metric Value metric1 val1 metricN valN In Mem DB webhookrpc
  22. 22. • Done: • Developed a Monasca Datasource to validate integration. • Designed the solution and found the main integration points • To be Done: • Developed a Monasca Alarm Datasource leveraging the RPC capabilties in Congress. • Create a Congress Notification Webhook for Monasca • Develop a policy to alarm conversion component to develop policies prefixed with monasca-alarm. Current Status and Next Steps
  23. 23. OpenStack Summit Austin, Texas 2016 Thank You!

×