Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Understanding and Extending
Prometheus AlertManager
Lee Calcote
calcotestudios.com/talks
Lee Calcote
linkedin.com/in/leecalcote
@lcalcote
blog.gingergeek.com
lee@calcotestudios.com
clouds, containers, infrastruc...
Show of Hands
AlertManager
Prometheus
 is an alert...Alertmanager
@lcalcote
Purpose
ingester
grouper
de-duplicator
silencer
throttler
notifier
  Receivers
ˈnō-mən-ˌklā-chər
a brief Prometheus AlertManager construct review
match alerts to their receiver and
how ofte...
- matches alerts with specific labels and prevents
them from being included in notifications.
 
 - suppress specific notificat...
Inhibition
Multiple approaches to suppression
@lcalcote
repeat_interval
vs
Silences
vs
per routeglobalvia ui / api
Alerts
ALERT <alert name>
IF <PromQL vector expression>
FOR <duration>
LABELS { ... }
ANNOTATIONS { ... }
Supports clients...
Notification Integrations
@lcalcote
Notifying to Multiple Destinations
Use  to advance to next receiver.continue
route:
receiver: email_webhook
receivers:
- n...
Inhibitor
Dispatcher
Non-HA AlertManager Architecture
Silencer
de-duplication
Dispatcher sorts incoming alerts into
aggreg...
@lcalcote
High Availability
being introduced in 0.5
I gossip protocols.
built atop Weave Mesh
With HA, you no longer have ...
AlertManager UI
@lcalcote
@lcalcote
Story:
As an Operator, I would like to not only see a list of firing alerts,
but also a list of all transpired alerts, so t...
Environment
test setup
Random Sample Targets
$ git clone https://github.com/prometheus/client_golang.git
$ cd client_golang/examples/random
$ go ...
Prometheus and Alert Rules Setup
Follow the  to download, configure and run Prometheus.getting started instructions
$ ./pro...
Environment
development setup
@lcalcote
Grab Repos
$ git clone https://github.com/prometheus/alertmanager.git
Given that our user story includes making ...
Notification Integration
create an alert notification receiver.
 
route:
group_by: [cluster]
# If an alert isn't caught by a...
@lcalcote
The  can
assist in building
routing trees.
visual editor
Build, Run, Test
Verify you have a functional development
environment by building and running the project:
$ make assets #...
@lcalcote
Test
If you choose to setup a Slack channel, you
should now see new alerts firing as and
when your random targets...
/ui/app/js/app.js
Changelog
/api.go
/ui/app/partials/history.html
Angular
HTML
Go
Go & SQL
/provider/provider.go
/provider...
@lcalcote
All UI functionality should be addressable via API.
Let’s register a :
/api.go
new /history API endpoint
r.Get("...
@lcalcote
1. Add  (e.g. GetAll() AlertIterator) to /provider/provider.go
2. Add a  to /provider/sqlite/sqlite.go
3. Add a ...
@lcalcote
/ui/app/js/app.js
angular.module('am.controllers').controller('NavCtrl',
function($scope, $location) {
$scope.it...
@lcalcote
Finally, we’ll need a page in which to
view the transpired alerts. So, create a
new file, , under
/ui/app/partial...
@lcalcote
Summary
This example enhancement provides a view
of transient history — that of the period that
the SQlite datab...
Resources
IRC:  on  
 
Mailing lists:
 – discussing Prometheus usage and community support
 – contributing to Prometheus d...
Lee Calcote
Thank you.
Questions?
clouds, containers, infrastructure,
applications  and their management
linkedin.com/in/l...
Understanding and Extending Prometheus AlertManager
Upcoming SlideShare
Loading in …5
×

Understanding and Extending Prometheus AlertManager

8,677 views

Published on

Presented at CloudNativeCon+KubeCon EU, March, 2017.

Published in: Software

Understanding and Extending Prometheus AlertManager

  1. 1. Understanding and Extending Prometheus AlertManager Lee Calcote calcotestudios.com/talks
  2. 2. Lee Calcote linkedin.com/in/leecalcote @lcalcote blog.gingergeek.com lee@calcotestudios.com clouds, containers, infrastructure, applications  and their management calcotestudios.com/talks
  3. 3. Show of Hands
  4. 4. AlertManager Prometheus
  5. 5.  is an alert...Alertmanager @lcalcote Purpose ingester grouper de-duplicator silencer throttler notifier
  6. 6.   Receivers ˈnō-mən-ˌklā-chər a brief Prometheus AlertManager construct review match alerts to their receiver and how often to notify where and how to send alerts  Routes @lcalcote
  7. 7. - matches alerts with specific labels and prevents them from being included in notifications.    - suppress specific notifications when other specific alerts are already firing.    - categorizes alerts of similar nature into a single notification. Silencers Inhibitors Grouping ˈnō-mən-ˌklā-chər a brief Prometheus AlertManager construct review Muting Suppressing Correlating group_wait: 30s group_by: ['alertname', 'cluster'] group_interval: 5m @lcalcote
  8. 8. Inhibition Multiple approaches to suppression @lcalcote repeat_interval vs Silences vs per routeglobalvia ui / api
  9. 9. Alerts ALERT <alert name> IF <PromQL vector expression> FOR <duration> LABELS { ... } ANNOTATIONS { ... } Supports clients other than Prometheus is notified when alerts transition state @lcalcote a shared construct Prometheus AlertManager inactive firing pending state transition inactive firing notifications !
  10. 10. Notification Integrations @lcalcote
  11. 11. Notifying to Multiple Destinations Use  to advance to next receiver.continue route: receiver: email_webhook receivers: - name: email_webhook email_configs: - to: 'lee@example.io' webhook_configs: - url: <webhook url here> Use a  that goes to both destinations.receiver route: receiver: ops-team-all # default routes: - match: severity: page receiver: ops-team-b continue: true - match: severity: critical receiver: ops-team-a receivers: - name: ops-team-all email_configs: - to: ops-team-all@example.io - name: ops-team-a email_configs: - to: ops-team-a@example.io - name: ops-team-b email_configs: - to: ops-team-b@example.io or @lcalcote
  12. 12. Inhibitor Dispatcher Non-HA AlertManager Architecture Silencer de-duplication Dispatcher sorts incoming alerts into aggregation groups and assigns the correct notifiers to each. api Alert Provider UI Silence Provider store de-duplication subscribe Router batched alerts notification pipeline Notify Provider checks for previously sent notifications Retry RetryMaintenance Script ! @lcalcote alerts
  13. 13. @lcalcote High Availability being introduced in 0.5 I gossip protocols. built atop Weave Mesh With HA, you no longer have to monitor the monitor.   Designed for an alert to be sent to all instances in the cluster.   All Prometheus instances send alerts to all Alertmanager instances.   Guarantees notifications to be sent at least once. @lcalcote
  14. 14. AlertManager UI @lcalcote
  15. 15. @lcalcote
  16. 16. Story: As an Operator, I would like to not only see a list of firing alerts, but also a list of all transpired alerts, so that I may have additional context as the thresholding behavior for a given defined alert. @lcalcote Prologue: Alert troubleshooting is improved when operators have a view of what is firing, has recently fired, what is normal, but also go back in time and see what fired an hour ago. Understanding firing order assists in root cause analysis and identify problem areas.   Limitations: 1. AlertManager database (SQLite) is not intended to provide long-term storage. Acceptance Criteria: 1. Once fired, whether actively firing or not, alerts will be displayed on the History page. 2. Optionally, fired alerts will be notified to a Slack channel. Stretch: Include pagination Add a date range picker Add a host filter  
  17. 17. Environment test setup
  18. 18. Random Sample Targets $ git clone https://github.com/prometheus/client_golang.git $ cd client_golang/examples/random $ go get -d $ go build Fetch and compile the client library code example. Start example targets in separate terminals. $ ./random -listen-address=:8080 $ ./random -listen-address=:8081 $ ./random -listen-address=:8082 Be sure to create and run the  and point it at your soon-to-be AlertManager: random sample targets @lcalcote
  19. 19. Prometheus and Alert Rules Setup Follow the  to download, configure and run Prometheus.getting started instructions $ ./prometheus -config.file=prometheus.yml -alertmanager.url=http://localhost:9093 ALERT instance_down IF up == 0 FOR 5s LABELS {severity="page"} ANNOTATIONS { DESCRIPTION="{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 seconds.", SUMMARY="Instance {{$labels.instance}} down"} /alert.rules A simple alert rule that will fire when any given target is unreachable for longer than 5 seconds. @lcalcote ! ... # Load and evaluate rules in this file every 'evaluation_interval' seconds. rule_files: - "alert.rules" ... /prometheus.yml
  20. 20. Environment development setup
  21. 21. @lcalcote Grab Repos $ git clone https://github.com/prometheus/alertmanager.git Given that our user story includes making front-end changes to AlertManager, ensure that you install a small utility to generate Go code from any file. Clone AlertManager repo Get, build and copy go-bindata into any directory on your PATH $ go get -u github.com/jteeuwen/go-bindata/... $ cd $GOPATH/src/github.com/jteeuwen/go-bindata/go-bindata $ go build
  22. 22. Notification Integration create an alert notification receiver.   route: group_by: [cluster] # If an alert isn't caught by a route, send it slack. receiver: slack_general routes: # Send severity=slack alerts to slack. - match: severity: page receiver: slack_general receivers: - name: slack_general slack_configs: - api_url: '<your-web-url-here>' channel: '#<your-channel-name-here>' send_resolved: true Of the supported AlertManager receivers, let’s opt for integrating Slack. @lcalcote
  23. 23. @lcalcote The  can assist in building routing trees. visual editor
  24. 24. Build, Run, Test Verify you have a functional development environment by building and running the project: $ make assets # invokes go-bindata to inject static web files $ go build # compiles go code $ ./alertmanager -config.file=slack.yml # runs alertmanager with the specified configuration @lcalcote $ curl -X POST http://localhost:9090/-/reload $ kill -HUP `pgrep alertmanager` $ ./promtool check-config <config file> $ ./promtool check-rules <rules file> Reload Prometheus or AlertManager configs Validate Prometheus config and alert rules
  25. 25. @lcalcote Test If you choose to setup a Slack channel, you should now see new alerts firing as and when your random targets go up and down.
  26. 26. /ui/app/js/app.js Changelog /api.go /ui/app/partials/history.html Angular HTML Go Go & SQL /provider/provider.go /provider/sqlite/sqlite.go /provider/boltmem/boltmem.go
  27. 27. @lcalcote All UI functionality should be addressable via API. Let’s register a : /api.go new /history API endpoint r.Get("/history", ihf("history", api.listAllAlerts)) func (api *API) listAllAlerts(w http.ResponseWriter, r *http.Request) { alerts := api.alerts.GetAll() defer alerts.Close() With our /api/v1/history endpoint a newly addressable API endpoint, we’ll need to build a function to handle requests made to it. The  function will handle inbound HTTP requests made to the new endpoint. api.listAllAlerts
  28. 28. @lcalcote 1. Add  (e.g. GetAll() AlertIterator) to /provider/provider.go 2. Add a  to /provider/sqlite/sqlite.go 3. Add a to /provider/boltmem/boltmem.go a new AlertIterator new AlertProvider and SQL query new AlertIterator and AlertProvider With API endpoint, let’s turn our attention to the backend for collecting the right recordset from our data provider. /provider
  29. 29. @lcalcote /ui/app/js/app.js angular.module('am.controllers').controller('NavCtrl', function($scope, $location) { $scope.items = [{ name: 'History', url: 'history' }, angular.module('am.services').factory('History', function($resource) { return $resource('', {}, { 'query': { method: 'GET', url: 'api/v1/history' } }); } );  NavCtrl for the :History menu item as well as a :new History service angular.module('am.controllers').controller('HistoryCtrl', function($scope, History) { $scope.refresh = function () { History.query({}, function(data) { $scope.groups = data.data; console.log($scope.groups); }, function(data) { console.log(data.data); }) } $scope.refresh(); } ); and a :new History controller angular.module('am.directives').directive('history', function() { return { restrict: 'E', scope: { alert: '=', group: '=' }, templateUrl: 'app/partials/history.html' }; } ); Insert a :new History directive
  30. 30. @lcalcote Finally, we’ll need a page in which to view the transpired alerts. So, create a new file, , under /ui/app/partials.   history.html History.html will simply format the display a tabular recordset. A new recordset will be retrieved from our data provider. /ui/app/partials/history.html
  31. 31. @lcalcote Summary This example enhancement provides a view of transient history — that of the period that the SQlite database holds.   AlertManager is not currently intended to provide long-term storage.   Contributing is easier than you may think.   Reference Alert History fork Alert History tutorial
  32. 32. Resources IRC:  on     Mailing lists:  – discussing Prometheus usage and community support  – contributing to Prometheus development          to file bugs and features requests #prometheus irc.freenode.net prometheus-users prometheus-developers @PrometheusIO Prometheus repositories @lcalcote #
  33. 33. Lee Calcote Thank you. Questions? clouds, containers, infrastructure, applications  and their management linkedin.com/in/leecalcote @lcalcote blog.gingergeek.com lee@calcotestudios.com calcotestudios.com/talks yes, we're hiring

×