More Related Content Similar to Intro to Kapacitor for Alerting and Anomaly Detection (20) More from InfluxData (20) Intro to Kapacitor for Alerting and Anomaly Detection2. © 2017 InfluxData. All rights reserved.2 © 2018 InfluxData. All rights reserved.2
✓ What is Kapacitor
✓ How to Install Kapacitor
✓ Understand TICKscript
• Quoting rules
✓ Create Stream Tasks
• Trigger alerts
✓ Create Batch Tasks
✓ Understand Batch vs Stream
✓ Use Kapacitor as a Downsampling
engine
• Instead of Continuous queries
What we will be covering
3. © 2018 InfluxData. All rights reserved.3 © 2018 InfluxData. All rights reserved.3
InfluxData Products and Offerings
4. © 2017 InfluxData. All rights reserved.4 © 2017 InfluxData. All rights reserved.4
Key
Kapacitor
Capabilities
1
2
Alerting
Processing
ETL jobs
5. © 2018 InfluxData. All rights reserved.5
Kapacitor: Real-time Streaming Data Processing Engine
• Native data processing engine
• Processes both stream and batch data
from InfluxDB
• Plug in custom logic or user defined
functions to
– process alerts with dynamic thresholds
– match metrics for patterns
– compute statistical anomalies
– perform specific actions based on these
alerts like dynamic load rebalancing
• Downsample data
• Integrates with HipChat, OpsGenie,
Alerta, Sensu, PagerDuty, Slack,
and more
6. © 2018 InfluxData. All rights reserved.6
Install & Start Kapacitor
Download (All builds available at https://portal.influxdata.com/downloads)
Install
Start
wget
https://dl.influxdata.com/kapacitor/releases/kapacitor_1.5.1_amd64.deb
sudo dpkg -i kapacitor_1.5.1_amd64.deb
sudo systemctl start kapacitor
7. © 2018 InfluxData. All rights reserved.7
Install with apt
Configure InfluxData Repository
Install & Start Kapacitor
curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/lsb-release
echo "deb https://repos.influxdata.com/${DISTRIB_ID,,}
${DISTRIB_CODENAME} stable" | sudo tee
/etc/apt/sources.list.d/influxdb.list
sudo apt-get update && sudo apt-get install kapacitor
sudo service telegraf start
8. © 2018 InfluxData. All rights reserved.8 © 2018 InfluxData. All rights reserved.8
Defining Tasks
9. © 2017 InfluxData. All rights reserved.9
Batch Stream
● Queries InfluxDB periodically
● Does not buffer as much of data
in RAM
● Places additional query load on
InfluxDB
● Writes are mirrored onto the
InfluxDB instance
● Buffers all data in RAM
● Places additional write load on
Kapacitor
10. © 2018 InfluxData. All rights reserved.10
TICK Script
• Chain invocation language
– | chains together different
nodes
– . refers to specific attributes
on a node
• Variables refer to values
– Strings
– Ints, Floats
– Durations
– Pipelines
var measurement = 'requests'
var data = stream
|from()
.measurement(measurement)
|where(lambda: "is_up" == TRUE)
|where(lambda: "my_field" > 10)
|window()
.period(5m)
.every(5m)
// Count number of points in window
data
|count('value')
.as('the_count')
// Compute mean of data window
data
|mean('value')
.as('the_average')
11. © 2018 InfluxData. All rights reserved.11
TICKScript Syntax - Quoting
Rules
• Double Quotes
– References data in lambda
expression
• Single Quotes
– Literal String value
// ' means the use the literal string
value
var measurement = 'requests'
var data = stream
|from()
.measurement(measurement)
// " means use the reference value
|where(lambda: "is_up" == TRUE)
|where(lambda: "my_field" > 10)
|window()
.period(5m)
.every(5m)
// ' means to use the literal string
value
data
|count('value')
.as('the_count')
12. © 2018 InfluxData. All rights reserved.12
Create a Stream TICKscript
• Logs all data from the
measurement cpu
// cpu.tick
stream
|from()
.measurement('cpu')
|log()
13. © 2018 InfluxData. All rights reserved.13
Create a Stream Task
$ kapacitor define cpu
-tick cpu.tick
-type stream
-dbrp telegraf.autogen
// cpu.tick
stream
|from()
.measurement('cpu')
|log()
14. © 2018 InfluxData. All rights reserved.14
Show a Stream Task
$ kapacitor show cpu
ID: cpu
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 10 Oct 17 16:05 EDT
Modified: 10 Oct 17 16:05 EDT
LastEnabled: 01 Jan 01 00:00 UTC
Databases Retention Policies:
["telegraf"."autogen"]
TICKscript:
// cpu.tick
stream
|from()
.measurement('cpu')
|log()
DOT:
digraph cpu {
stream0 -> from1;
from1 -> log2;
}
$ kapacitor define cpu
-tick cpu.tick
-type stream
-dbrp telegraf.autogen
$ kapacitor enable cpu
15. © 2018 InfluxData. All rights reserved.15
Create a More Interesting
Stream TICKscript
• Create 5m windows of data
that emit every 1m
• Compute the average of the
field usage_user
• Log the result
// cpu.tick
stream
|from()
.measurement('cpu')
|window()
.period(5m)
.every(1m)
|mean('usage_user')
.as('mean_usage_user')
|log()
16. © 2018 InfluxData. All rights reserved.16
An even more interesting
TICKscript
• Filter on the tag
cpu=cpu-total
• Create 5m windows of data
that emit every 1m
• Compute the average of the
field usage_user
• Log the result
// cpu.tick
stream
|from()
.measurement('cpu')
|where(lambda: "cpu" ==
'cpu-total')
|window()
.period(5m)
.every(1m)
|mean('usage_user')
.as('mean_usage_user')
|log()
17. © 2018 InfluxData. All rights reserved.17
Adding Alerts
• Filter on the tag
cpu=cpu-total
• Create 5m windows of data
that emit every 1m
• Compute the average of the
field usage_user
• Alert when
mean_usage_user > 80
• Send alert to
– Slack channel alerts
– Email oncall@example.com
// cpu.tick
stream
|from()
.measurement('cpu')
|where(lambda: "cpu" ==
'cpu-total')
|window()
.period(5m)
.every(1m)
|mean('usage_user')
.as('mean_usage_user')
|alert()
.crit(lambda: "mean_usage_user"
> 80)
.message('cpu usage high!')
.slack()
.channel('alerts')
.email('oncall@example.com')
18. © 2018 InfluxData. All rights reserved.18
Create a Batch TICKscript
• Query 5m windows of data
every 1m
• Compute the average of the
field usage_user
• Log the result
// batch_cpu.tick
batch
|query('''
SELECT mean("usage_user") AS
mean_usage_user
FROM "telegraf"."autogen"."cpu"
''')
.period(5m)
.every(1m)
|log()
19. © 2018 InfluxData. All rights reserved.19
Create a Batch Task
$ kapacitor define batch_cpu
-tick batch_cpu.tick
-type batch
-dbrp telegraf.autogen
// batch_cpu.tick
batch
|query('''
SELECT mean("usage_user") AS
mean_usage_user
FROM "telegraf"."autogen"."cpu"
''')
.period(5m)
.every(1m)
|log()
20. © 2018 InfluxData. All rights reserved.20
Create a Batch TICKscript
• Query 5m windows of data
every 1m
• Compute the average of the
field usage_user
• Alert when
mean_usage_user > 80
• Send alert to
– Slack channel alerts
– Email oncall@example.com
// batch_cpu.tick
batch
|query('''
SELECT mean("usage_user") AS
mean_usage_user
FROM "telegraf"."autogen"."cpu"
''')
.period(5m)
.every(1m)
|alert()
.crit(lambda: "mean_usage_user" >
80)
.message('cpu usage high!')
.slack()
.channel('alerts')
.email('oncall@example.com')
21. © 2017 InfluxData. All rights reserved.21
Batching Streaming
● Queries InfluxDB periodically
● Does not buffer as much of data
in RAM
● Places additional query load on
InfluxDB
● Writes are mirrored onto the
InfluxDB instance
● Buffers all data in RAM
● Places additional write load on
Kapacitor
22. © 2018 InfluxData. All rights reserved.22 © 2018 InfluxData. All rights reserved.22
Using Kapacitor for Downsampling
23. © 2018 InfluxData. All rights reserved.23
What is Downsampling?
Downsampling is the process of reducing a sequence of points in a
series to a single data point
• Computing the average, max, min, etc of a window of data for a
particular series
24. © 2018 InfluxData. All rights reserved.24
Why Downsample?
• Faster queries
• Store less data
25. © 2018 InfluxData. All rights reserved.25
Why Downsample using Kapacitor?
Offload computation to a separate host
• Create a task that aggregates your data
– Can be stream or batch depending on your use-case
• Writes data back into InfluxDB
– Typically back into another retention policy
Expanded capabilities beyond InfluxQL
• Math across measurements
• User-defined Functions
26. © 2018 InfluxData. All rights reserved.26
Example
• Downsample the data into 5m
windows
• Store that data back into the
5m retention policy in the
telegraf database
// batch_cpu.tick
batch
|query('''
SELECT mean("usage_user") AS
usage_user
FROM "telegraf"."autogen"."cpu"
''')
.period(5m)
.every(5m)
|influxDBOut()
.database('telegraf')
.retentionPolicy('5m')
.tag('source', 'kapacitor')
27. © 2017 InfluxData. All rights reserved.27
Batching Streaming
● Queries InfluxDB periodically
● Does not buffer as much of data
in RAM
● Places additional query load on
InfluxDB
● Writes are mirrored onto the
InfluxDB instance
● Buffers all data in RAM
● Places additional write load on
Kapacitor