SlideShare a Scribd company logo
Kapacitor
-Real TIme Data Processing Engine
Agenda
● Kapacitor introduction
● Integration and Installation of Kapacitor in TICK Stack
● TICK Script
● CQ/DownSampling
● Join Node
● User defined function in TICK Script
● Enriching Data with Kapacitor
● Anomaly Detection using Kapacitor
TICK Stack
How Kapacitor fits ?
Kapacitor
● Kapacitor is a native data processing engine.
● It can process both stream and batch data from InfluxDB.
● It lets you plug in your own custom logic or user-defined
functions to process alerts with dynamic thresholds.
● Key Kapacitor Capabilities
○ Alerting
○ ETL (Extraction, Transformation and Loading)
○ Action Oriented
○ Streaming Analytics
○ Anomaly Detection
Installing TICK Stack
#Installing InfluxDB
wget https://dl.influxdata.com/influxdb/releases/influxdb_1.5.2_amd64.deb
sudo dpkg -i influxdb_1.5.2_amd64.deb
#Installing Telegraf
wget https://dl.influxdata.com/telegraf/releases/telegraf_1.6.1-1_amd64.deb
sudo dpkg -i telegraf_1.6.1-1_amd64.deb
#Installing Cronograf
wget https://dl.influxdata.com/chronograf/releases/chronograf_1.4.4.1_amd64.deb
sudo dpkg -i chronograf_1.4.4.1_amd64.deb
#Installing Kapacitor
wget https://dl.influxdata.com/kapacitor/releases/kapacitor_1.4.1_amd64.deb
sudo dpkg -i kapacitor_1.4.1_amd64.deb
Source: https://portal.influxdata.com/downloads
Kapacitor Components
● Server Daemon (Kapacitord)
● CLI (Kapacitor)
○ Call HTTP API
○ Non Interactive
● Tasks (Unit of work)
○ Defined by TICK Script
○ Stream or Batch
○ DAG Pipeline
● Recordings
○ Useful for isolated testing
● Replay
○ Useful for isolated testing
TICK Script
● Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
● Each TICKscript defines a pipeline that tells Kapacitor which data to process and how.
● Pipeline is a Directed Acyclic Graph (DAG)
● Components:
○ Statements
○ Variables
○ Comments
○ Literals
■ Boolean - True and False
■ Numbers - int or float
■ Strings
■ Duration - 1u, 10ms,
Source: https://docs.influxdata.com/kapacitor/v1.4/tick/syntax/
dbrp "kss"."autogen"
stream
// Select just the cpu measurement from our example database.
|from()
.measurement('cpu')
.groupBy('cpu', 'host')
|alert()
.id('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}')
// Email subject
.message('{{ .ID }} is {{ .Level}} Usage Idle Value: {{ index
.Fields "usage_idle" }}')
//Email body as HTML
.details('''
<h1>{{ .ID }}</h1>
<b>{{ .Message }}</b><br>
Usage Idle Value: {{ index .Fields "usage_idle" }}
''')
.crit(lambda: int("usage_idle") < 100)
// Whenever we get an alert write it to a file.
.log('/tmp/alerts.log')
// Whenever we get an alert send a mail.
.email('prashant.vats@tothenew.com')
TICKscript nodes overview
● Nodes represent process invocation units that either take data as a batch or a point-by-point stream,
and then alter the data, store the data, or trigger some other activity based on changes in the data (e.g.,
an alert).
● The property methods for these two nodes define the type of task that you are running, either stream or
batch.
● Available Nodes
○ https://docs.influxdata.com/kapacitor/v1.4/nodes/
● TICK Script Specification
○ https://docs.influxdata.com/kapacitor/v1.4/reference/spec/
|from()
|alert()
.id('{{ index .Tags "host" }}')
.exec('script/handler.py')
|watch()
|eval()
Lambda Expression
TICK Script uses Lambda Expression to define transformation on data points as well as define
boolean condition that act as filter.
.WHERE Expression
.where(lambda:”host” == `server1`)
Built in Expression
.where(lambda: sqrt(“value”) < 5)
EVAL Node
|eval(lambda: if("field" > threshold AND "field" != 0, 'true', 'false'))
.as('value')
Example
Write a tick script which stream the measurement “cpu” from kss database of
influxdb and generate an alert.
Kapacitor CLI Hands-On
Single Versus Double Quotes
var data = stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('cpu')
// NOTE: Double quotes on server1
.where(lambda: "host" == "server1")
var data = stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('cpu')
// NOTE: Single quotes on server1
.where(lambda: "host" == 'server1')
The result of this search will always be
empty, because double quotes were used
around “server1”
Template Task
1. Write generic TICK Script.
2. Define Template.
3. Write JSON variable file.
4. Define Task using template and variable JSON File.
5. YAML Definition
DownSampling of Data
● Continuous Query (CQ)
○ Continuous queries are created on a database
○ database admins are allowed to create continuous queries.
○ Downsampling is one of the primary use case of Downsampling
○ Continuous queries are not applied to historical data.
● Using Kapactior task instead of CQ
CREATE CONTINUOUS
QUERY cpu_idle_mean ON
telegraf BEGIN
SELECT mean("usage_idle")
as usage_idle
INTO mean_cpu_idle
FROM cpu
GROUP BY time(5m),*
END
stream
|from()
.database('telegraf')
.measurement('cpu')
.groupBy(*)
|window()
.period(5m)
.every(5m)
|mean('usage_idle')
.as('usage_idle')
|influxDBOut()
.database('telegratelegraff')
.retentionPolicy(‘one_year’)
.measurement('mean_cpu_idle')
.precision('s')
Stream
Task
batch
|query('SELECT mean(usage_idle)
as usage_idle FROM
"telegraf"."default".cpu')
.period(5m)
.every(5m)
.groupBy(*)
|influxDBOut()
.database('telegraf')
.retentionPolicy(‘one_year’)
.measurement('mean_cpu_idle')
.precision('s')
Batch
Task
CQ Vs Batch Vs Stream
● When should we use Kapacitor instead of CQs?
○ You want to perform more action rather than just downsampling.
○ To isolate the workload from InfuxDB
○ But if we have handful CQs and just performing downsampling for retention policies, there is no
need to add kapacitor in your infrastructure.
● When should we use stream tasks vs batch tasks in Kapacitor?
○ RAM and time period are two major factor which decide it
○ A stream task will have to keep all data in RAM for the specified period.
○ If this period is too long for the available RAM then you will first need to store the data in
InfluxDB and then query using a batch task.
○ If you have some timestamp constraint and need real time processing then use stream task
○ A stream task does have one slight advantage in that since its watching the stream of data it
understands time by the timestamps on the data, while batch do this by query.
○ As such there are no race conditions for whether a given point will make it into a window or not.
If you are using a batch task it is still possible for a point to arrive late and be missed in a
window.
○ Data will be sequential in stream on the basis of time, in batch not necessarily.
Join Node
● It is one the TICK script Node which Join this node with other nodes. The data are joined on
timestamp.
● We can define type of join and tolerance.
var errors = batch
|query('''
SELECT value
FROM
"join"."autogen".errors''')
.groupBy(*)
.period(5s)
. every(1s)
var requests = batch
|query('''
SELECT value
FROM
"join"."autogen".requests''')
.groupBy(*)
.period(5s)
.every(1s)
errors
|join(requests)
.as('errors', 'requests')
// points that are within 1 second are considered the
same time.
.tolerance(1s)
// fill missing values with 0, implies outer join.
.fill(0.0)
|eval(lambda: "errors.value" / "requests.value")
.as('error_rate')
|influxDBOut()
.database('join')
.retentionPolicy('autogen')
.measurement('join_wala')
User Defined Function (UDF)
● Write your own algorithm/function and plug them into kapacitor.
● Build custom function run and in its own process and kapacitor communicates to it via defined
protocol.
● As of now supported language is GO and Python.
● UDF Handler has some method which must be implemented at the functionality level.
● We will see the example using Python
Writing a UDF
● Implement a UDF handler interface
● Write a TICK script which uses the UDF
● Configure the UDF inside of Kapacitor
UDF Handler Interface
● info
○ When Kapacitor is started it will call the info method.
○ The info method is used to parse the options associated with the UDF.
○ It specifies which type of data it wants(like int, float) and provides (stream or batch)
● init
○ init is run when task containing the UDF is start executing.
○ it receives a list of specified options that are pulled from TICK script.
● begin_batch
○ Should be used if UDF wants (receives) data in batch form.
● end_batch
○ Should be used if UDF provides(send out) data in batch form.
● point
○ should be used if the UDF wants and/or provides data in stream form.
● snapshot/restore
○ is used to save and restore the state of the UDF process .
○ not necessarily needed.
...Continued
● UDF Handler:
https://github.com/influxdata/kapacitor/tree/master/udf/agent/examples/moving_avg
● TICK Script
stream
|from()
.measurement('cpu')
.where(lambda: "cpu" == 'cpu-total')
@pyavg()
.field('usage_idle')
.size(10)
.as('cpu_avg')
|influxDBOut()
.database('udf')
[udf]
[udf.functions]
[udf.functions.pyavg]
prog = "/usr/bin/python2"
args = ["-u",
"/etc/kapacitor/script/kapacitor/udf/agent/examples/moving_av
g/moving_avg.py"]
timeout = "10s"
[udf.functions.pyavg.env]
PYTHONPATH =
"/etc/kapacitor/script/kapacitor/udf/agent/py"
Enriching Your Data with Kapacitor
Problem: How do I summarize my data for the entire month of August just for business hours, defined by Monday through
Friday between 0800AM and 0500PM for a range of time.
● As per InfluxQL we are limited to : SELECT * FROM “mymeasurement” WHERE time >= ‘2017-08-01
08:00:00.000000’ and time <= ‘2017-08-31 17:00:00.000000’;
● Configure Telegraf to write to Kapacitor instead of directly to InfluxDB
[[outputs.influxdb]]
urls = [http://localhost:9092]
database = “kap_telegraf”
retention_policy = “autogen”
...Continued
stream
|from()
.database('kap_telegraf')
|eval(lambda: if(((weekday("time") >= 1 AND weekday("time") <= 5) AND (hour("time") >= 8 AND (hour("time")*100+minute("time")) <= 1700)),
'true', 'false'))
.as('business_hours')
.tags('business_hours')
.keep()
|delete()
.field('business_hours')
|influxDBOut()
.database('telegraf')
.retentionPolicy('autogen')
.tag('kapacitor_augmented','true')
Once data begins the flow through Kapacitor to InfluxDB, you can then add your condition AND business_hours=’true’ to
the first query we specified:
SELECT * FROM “mymeasurement” WHERE time >= ‘2017-08-01 08:00:00.000000’ and time <= ‘2017-08-31
17:00:00.000000’ AND business_hours=’true’;
What can be further explored ?
● Anomaly Detection Algorithms
○ (https://docs.influxdata.com/kapacitor/v1.4/guides/anomaly_detection/)
● Kapacitar Nodes
○ EC2 AutoScale
○ K8 AutoScale
○ Docker Swarm AutoScaling
○ Service Discovery in K8
● Machine Learning using Kapacitor (Smart Alerting)

More Related Content

What's hot

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
SANG WON PARK
 
[DevGround] 린하게 구축하는 스타트업 데이터파이프라인
[DevGround] 린하게 구축하는 스타트업 데이터파이프라인[DevGround] 린하게 구축하는 스타트업 데이터파이프라인
[DevGround] 린하게 구축하는 스타트업 데이터파이프라인
Jae Young Park
 
Splunk HTTP Event Collector
Splunk HTTP Event CollectorSplunk HTTP Event Collector
Splunk HTTP Event Collector
Splunk
 
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
Juhong Park
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
Databricks
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
Splunk
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
confluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
confluent
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 

What's hot (20)

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
[DevGround] 린하게 구축하는 스타트업 데이터파이프라인
[DevGround] 린하게 구축하는 스타트업 데이터파이프라인[DevGround] 린하게 구축하는 스타트업 데이터파이프라인
[DevGround] 린하게 구축하는 스타트업 데이터파이프라인
 
Splunk HTTP Event Collector
Splunk HTTP Event CollectorSplunk HTTP Event Collector
Splunk HTTP Event Collector
 
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
[NDC 2018] Spark, Flintrock, Airflow 로 구현하는 탄력적이고 유연한 데이터 분산처리 자동화 인프라 구축
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Splunk Distributed Management Console
Splunk Distributed Management Console                                         Splunk Distributed Management Console
Splunk Distributed Management Console
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 

Similar to Kapacitor - Real Time Data Processing Engine

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
Vasia Kalavri
 
Virtual training Intro to Kapacitor
Virtual training  Intro to Kapacitor Virtual training  Intro to Kapacitor
Virtual training Intro to Kapacitor
InfluxData
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
Ihor Bobak
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
Brian Hughes
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Databricks
 
Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
Yulia Tsisyk
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
CUSTIS
 
[232]TensorRT를 활용한 딥러닝 Inference 최적화
[232]TensorRT를 활용한 딥러닝 Inference 최적화[232]TensorRT를 활용한 딥러닝 Inference 최적화
[232]TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
Fastly
 
Analytics with Spark
Analytics with SparkAnalytics with Spark
Analytics with Spark
Probst Ludwine
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Mr. Vengineer
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Monitoring InfluxEnterprise
Monitoring InfluxEnterpriseMonitoring InfluxEnterprise
Monitoring InfluxEnterprise
InfluxData
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
Anyscale
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
InfluxData
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Citus Data
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
Stefan Krawczyk
 

Similar to Kapacitor - Real Time Data Processing Engine (20)

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Virtual training Intro to Kapacitor
Virtual training  Intro to Kapacitor Virtual training  Intro to Kapacitor
Virtual training Intro to Kapacitor
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
 
[232]TensorRT를 활용한 딥러닝 Inference 최적화
[232]TensorRT를 활용한 딥러닝 Inference 최적화[232]TensorRT를 활용한 딥러닝 Inference 최적화
[232]TensorRT를 활용한 딥러닝 Inference 최적화
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Analytics with Spark
Analytics with SparkAnalytics with Spark
Analytics with Spark
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
 
GCF
GCFGCF
GCF
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Monitoring InfluxEnterprise
Monitoring InfluxEnterpriseMonitoring InfluxEnterprise
Monitoring InfluxEnterprise
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Kapacitor - Real Time Data Processing Engine

  • 1. Kapacitor -Real TIme Data Processing Engine
  • 2. Agenda ● Kapacitor introduction ● Integration and Installation of Kapacitor in TICK Stack ● TICK Script ● CQ/DownSampling ● Join Node ● User defined function in TICK Script ● Enriching Data with Kapacitor ● Anomaly Detection using Kapacitor
  • 4. Kapacitor ● Kapacitor is a native data processing engine. ● It can process both stream and batch data from InfluxDB. ● It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. ● Key Kapacitor Capabilities ○ Alerting ○ ETL (Extraction, Transformation and Loading) ○ Action Oriented ○ Streaming Analytics ○ Anomaly Detection
  • 5. Installing TICK Stack #Installing InfluxDB wget https://dl.influxdata.com/influxdb/releases/influxdb_1.5.2_amd64.deb sudo dpkg -i influxdb_1.5.2_amd64.deb #Installing Telegraf wget https://dl.influxdata.com/telegraf/releases/telegraf_1.6.1-1_amd64.deb sudo dpkg -i telegraf_1.6.1-1_amd64.deb #Installing Cronograf wget https://dl.influxdata.com/chronograf/releases/chronograf_1.4.4.1_amd64.deb sudo dpkg -i chronograf_1.4.4.1_amd64.deb #Installing Kapacitor wget https://dl.influxdata.com/kapacitor/releases/kapacitor_1.4.1_amd64.deb sudo dpkg -i kapacitor_1.4.1_amd64.deb Source: https://portal.influxdata.com/downloads
  • 6. Kapacitor Components ● Server Daemon (Kapacitord) ● CLI (Kapacitor) ○ Call HTTP API ○ Non Interactive ● Tasks (Unit of work) ○ Defined by TICK Script ○ Stream or Batch ○ DAG Pipeline ● Recordings ○ Useful for isolated testing ● Replay ○ Useful for isolated testing
  • 7. TICK Script ● Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks. ● Each TICKscript defines a pipeline that tells Kapacitor which data to process and how. ● Pipeline is a Directed Acyclic Graph (DAG) ● Components: ○ Statements ○ Variables ○ Comments ○ Literals ■ Boolean - True and False ■ Numbers - int or float ■ Strings ■ Duration - 1u, 10ms, Source: https://docs.influxdata.com/kapacitor/v1.4/tick/syntax/ dbrp "kss"."autogen" stream // Select just the cpu measurement from our example database. |from() .measurement('cpu') .groupBy('cpu', 'host') |alert() .id('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}') // Email subject .message('{{ .ID }} is {{ .Level}} Usage Idle Value: {{ index .Fields "usage_idle" }}') //Email body as HTML .details(''' <h1>{{ .ID }}</h1> <b>{{ .Message }}</b><br> Usage Idle Value: {{ index .Fields "usage_idle" }} ''') .crit(lambda: int("usage_idle") < 100) // Whenever we get an alert write it to a file. .log('/tmp/alerts.log') // Whenever we get an alert send a mail. .email('prashant.vats@tothenew.com')
  • 8. TICKscript nodes overview ● Nodes represent process invocation units that either take data as a batch or a point-by-point stream, and then alter the data, store the data, or trigger some other activity based on changes in the data (e.g., an alert). ● The property methods for these two nodes define the type of task that you are running, either stream or batch. ● Available Nodes ○ https://docs.influxdata.com/kapacitor/v1.4/nodes/ ● TICK Script Specification ○ https://docs.influxdata.com/kapacitor/v1.4/reference/spec/ |from() |alert() .id('{{ index .Tags "host" }}') .exec('script/handler.py') |watch() |eval()
  • 9. Lambda Expression TICK Script uses Lambda Expression to define transformation on data points as well as define boolean condition that act as filter. .WHERE Expression .where(lambda:”host” == `server1`) Built in Expression .where(lambda: sqrt(“value”) < 5) EVAL Node |eval(lambda: if("field" > threshold AND "field" != 0, 'true', 'false')) .as('value')
  • 10. Example Write a tick script which stream the measurement “cpu” from kss database of influxdb and generate an alert. Kapacitor CLI Hands-On
  • 11. Single Versus Double Quotes var data = stream |from() .database('telegraf') .retentionPolicy('autogen') .measurement('cpu') // NOTE: Double quotes on server1 .where(lambda: "host" == "server1") var data = stream |from() .database('telegraf') .retentionPolicy('autogen') .measurement('cpu') // NOTE: Single quotes on server1 .where(lambda: "host" == 'server1') The result of this search will always be empty, because double quotes were used around “server1”
  • 12. Template Task 1. Write generic TICK Script. 2. Define Template. 3. Write JSON variable file. 4. Define Task using template and variable JSON File. 5. YAML Definition
  • 13. DownSampling of Data ● Continuous Query (CQ) ○ Continuous queries are created on a database ○ database admins are allowed to create continuous queries. ○ Downsampling is one of the primary use case of Downsampling ○ Continuous queries are not applied to historical data. ● Using Kapactior task instead of CQ CREATE CONTINUOUS QUERY cpu_idle_mean ON telegraf BEGIN SELECT mean("usage_idle") as usage_idle INTO mean_cpu_idle FROM cpu GROUP BY time(5m),* END stream |from() .database('telegraf') .measurement('cpu') .groupBy(*) |window() .period(5m) .every(5m) |mean('usage_idle') .as('usage_idle') |influxDBOut() .database('telegratelegraff') .retentionPolicy(‘one_year’) .measurement('mean_cpu_idle') .precision('s') Stream Task batch |query('SELECT mean(usage_idle) as usage_idle FROM "telegraf"."default".cpu') .period(5m) .every(5m) .groupBy(*) |influxDBOut() .database('telegraf') .retentionPolicy(‘one_year’) .measurement('mean_cpu_idle') .precision('s') Batch Task
  • 14. CQ Vs Batch Vs Stream ● When should we use Kapacitor instead of CQs? ○ You want to perform more action rather than just downsampling. ○ To isolate the workload from InfuxDB ○ But if we have handful CQs and just performing downsampling for retention policies, there is no need to add kapacitor in your infrastructure. ● When should we use stream tasks vs batch tasks in Kapacitor? ○ RAM and time period are two major factor which decide it ○ A stream task will have to keep all data in RAM for the specified period. ○ If this period is too long for the available RAM then you will first need to store the data in InfluxDB and then query using a batch task. ○ If you have some timestamp constraint and need real time processing then use stream task ○ A stream task does have one slight advantage in that since its watching the stream of data it understands time by the timestamps on the data, while batch do this by query. ○ As such there are no race conditions for whether a given point will make it into a window or not. If you are using a batch task it is still possible for a point to arrive late and be missed in a window. ○ Data will be sequential in stream on the basis of time, in batch not necessarily.
  • 15. Join Node ● It is one the TICK script Node which Join this node with other nodes. The data are joined on timestamp. ● We can define type of join and tolerance. var errors = batch |query(''' SELECT value FROM "join"."autogen".errors''') .groupBy(*) .period(5s) . every(1s) var requests = batch |query(''' SELECT value FROM "join"."autogen".requests''') .groupBy(*) .period(5s) .every(1s) errors |join(requests) .as('errors', 'requests') // points that are within 1 second are considered the same time. .tolerance(1s) // fill missing values with 0, implies outer join. .fill(0.0) |eval(lambda: "errors.value" / "requests.value") .as('error_rate') |influxDBOut() .database('join') .retentionPolicy('autogen') .measurement('join_wala')
  • 16. User Defined Function (UDF) ● Write your own algorithm/function and plug them into kapacitor. ● Build custom function run and in its own process and kapacitor communicates to it via defined protocol. ● As of now supported language is GO and Python. ● UDF Handler has some method which must be implemented at the functionality level. ● We will see the example using Python Writing a UDF ● Implement a UDF handler interface ● Write a TICK script which uses the UDF ● Configure the UDF inside of Kapacitor
  • 17. UDF Handler Interface ● info ○ When Kapacitor is started it will call the info method. ○ The info method is used to parse the options associated with the UDF. ○ It specifies which type of data it wants(like int, float) and provides (stream or batch) ● init ○ init is run when task containing the UDF is start executing. ○ it receives a list of specified options that are pulled from TICK script. ● begin_batch ○ Should be used if UDF wants (receives) data in batch form. ● end_batch ○ Should be used if UDF provides(send out) data in batch form. ● point ○ should be used if the UDF wants and/or provides data in stream form. ● snapshot/restore ○ is used to save and restore the state of the UDF process . ○ not necessarily needed.
  • 18. ...Continued ● UDF Handler: https://github.com/influxdata/kapacitor/tree/master/udf/agent/examples/moving_avg ● TICK Script stream |from() .measurement('cpu') .where(lambda: "cpu" == 'cpu-total') @pyavg() .field('usage_idle') .size(10) .as('cpu_avg') |influxDBOut() .database('udf') [udf] [udf.functions] [udf.functions.pyavg] prog = "/usr/bin/python2" args = ["-u", "/etc/kapacitor/script/kapacitor/udf/agent/examples/moving_av g/moving_avg.py"] timeout = "10s" [udf.functions.pyavg.env] PYTHONPATH = "/etc/kapacitor/script/kapacitor/udf/agent/py"
  • 19. Enriching Your Data with Kapacitor Problem: How do I summarize my data for the entire month of August just for business hours, defined by Monday through Friday between 0800AM and 0500PM for a range of time. ● As per InfluxQL we are limited to : SELECT * FROM “mymeasurement” WHERE time >= ‘2017-08-01 08:00:00.000000’ and time <= ‘2017-08-31 17:00:00.000000’; ● Configure Telegraf to write to Kapacitor instead of directly to InfluxDB [[outputs.influxdb]] urls = [http://localhost:9092] database = “kap_telegraf” retention_policy = “autogen”
  • 20. ...Continued stream |from() .database('kap_telegraf') |eval(lambda: if(((weekday("time") >= 1 AND weekday("time") <= 5) AND (hour("time") >= 8 AND (hour("time")*100+minute("time")) <= 1700)), 'true', 'false')) .as('business_hours') .tags('business_hours') .keep() |delete() .field('business_hours') |influxDBOut() .database('telegraf') .retentionPolicy('autogen') .tag('kapacitor_augmented','true') Once data begins the flow through Kapacitor to InfluxDB, you can then add your condition AND business_hours=’true’ to the first query we specified: SELECT * FROM “mymeasurement” WHERE time >= ‘2017-08-01 08:00:00.000000’ and time <= ‘2017-08-31 17:00:00.000000’ AND business_hours=’true’;
  • 21. What can be further explored ? ● Anomaly Detection Algorithms ○ (https://docs.influxdata.com/kapacitor/v1.4/guides/anomaly_detection/) ● Kapacitar Nodes ○ EC2 AutoScale ○ K8 AutoScale ○ Docker Swarm AutoScaling ○ Service Discovery in K8 ● Machine Learning using Kapacitor (Smart Alerting)