"Digital Signal Processing powered by Kafka streams can open IoT to a new era of possibilities by bringing computational power closer to the location where the data originated.
The purpose of this presentation is to demonstrate the power of Kafka Streams as a backbone for mathematical methods of signal processing. In this session, we’ll explore transforming signals from the time domain to the frequency domain using FFT, maximizing the level of compression of input signals while building a precise frequency alert system.
This presentation focuses on the following areas:
-Signal imitators of periodic waveforms (sine, triangle, square, sawtooth, etc.) in compliance with OpenMetrics standard.
-Pipeline of logical operators (AND, OR, XOR, etc.) that simulate joined superposition of various signal inputs.
-The processor that performs FFT, converting input signals into individual spectral components and thereby provides frequency information about the signal.
-Prometheus/Grafana visualization of FFT of input signals in real time.
By the end of the session, you’ll understand the fundamentals of digital signal processing using Kafka and have the tools you need to build and implement FFT in Kafka Streams."
3. Digital Signal Processing powered by Kafka streams can open IoT to a new era of possibilities by
bringing computational power closer to the location where the data originated.
The purpose of this presentation is to demonstrate the power of Kafka Streams as a backbone for
mathematical methods of signal processing. In this session, we’ll explore transforming signals from
the time domain to the frequency domain using FFT, maximizing the level of compression of input
signals while building a precise frequency alert system.
This presentation focuses on the following areas:
-Signal imitators of periodic waveforms (sine, triangle, square, sawtooth, etc.) in compliance with
OpenMetrics standard.
-The processor that performs FFT, converting input signals into individual spectral components and
thereby provides frequency information about the signal.
-Prometheus/Grafana visualization of FFT of input signals in real time.
By the end of the session, you’ll understand the fundamentals of digital signal processing using
Kafka and have the tools you need to build and implement FFT in Kafka Streams.
Agenda
25. Fourier Analysis
https://www.mathworks.com/matlabcentral/fileexchange/106725-fourier-analysis
Fs = 1000; # Sampling frequency
T = 1/Fs; # Sampling period
L = 1500; # Length of signal
t = (0:L-1)*T; # Time vector
1). Form a signal containing a DC offset of amplitude 0.8, a 50 Hz sinusoid of amplitude 0.7, and a 120 Hz
sinusoid of amplitude 1.
S = 0.8 + 0.7*sin(2*pi*50*t) + sin(2*pi*120*t);
2). Corrupt the signal with zero-mean random noise with a variance of 4.
X = S + 2*randn(size(t));
26. Complex Magnitude of FFT spectrum
https://www.mathworks.com/help/matlab/ref/fft.html
1). Compute the Fourier transform of the signal.
Y = fft(X);
the signal have three frequency peaks at 0 Hz, 50 Hz, and 120 H
2). Complex numbers -> complex Magnitude
plot(Fs/L*(0:L-1),abs(Y),"LineWidth",3)
title("Complex Magnitude of fft Spectrum")
xlabel("f (Hz)")
ylabel("|fft(X)|")
27. Single-Sided Amplitude Spectrum of X(t)
https://www.mathworks.com/help/matlab/ref/fft.html
3). find the amplitudes of the three frequency peaks
P2 = abs(Y/L);
P1 = P2(1:L/2+1);
P1(2:end-1) = 2*P1(2:end-1);
4). take the Fourier transform of the original, uncorrupted
signal and retrieve the exact amplitudes at 0.8, 0.7, and 1.0.
Y = fft(S);
P2 = abs(Y/L);
P1 = P2(1:L/2+1);
P1(2:end-1) = 2*P1(2:end-1);
32. UDF
@Udf(description = "fft time series List<T>, returns frequency List<T>")
public <T> List<T> fft (
@UdfParameter( description = "time domain", value = "timeDomain") final List<T> timeDomain) {
double[] arr = timeDomain.stream().mapToDouble(d->(double)d).toArray();
// FFT
double[] adjustedArr = arrayHelper.adjustPowerTwo(arr);
double[] frequencyDomain = transform(adjustedArr);
final List<T> result = (List<T>) DoubleStream.of(frequencyDomain).boxed().collect(Collectors.toList());
return result;
}
33. FFT.jar
How to build UDF:
1. copy this JAR file to the ksqlDB extension directory
2. restart your ksqlDB server so that it can pick up the new JAR containing your custom ksqlDB function.
/Users/ikhalitov/CFLT/cp_zip/confluent-7.4.1/etc/ksqldb/ext/ksql-server.properties
...
ksql.extension.dir=/Users/ikhalitov/CFLT/cp_zip/confluent-7.4.1/etc/ksqldb/ext
SHOW FUNCTIONS;
DESCRIBE FUNCTION FFT;
35. WAVES_JET
CREATE STREAM IF NOT EXISTS WAVES_JET (
NAME VARCHAR
, TYPE VARCHAR
, timestamp BIGINT
, dimensions STRUCT<
formula VARCHAR,
frequency DOUBLE,
msStampInterval BIGINT
>
, `values` STRUCT<
`value` DOUBLE
>
) WITH (
KAFKA_TOPIC = 'wave',
PARTITIONS = 1,
REPLICAS = 1,
VALUE_FORMAT = 'JSON'
);
{
"name": "sensor#1",
"type": "curve",
"timestamp": 1695756674961,
"dimensions": {
"formula": "math.sin(x)",
"msStampInterval": 10
},
"values": {
"value": -0.2426000471283583
}
}
36. CREATE TABLE IF NOT EXISTS WAVES_WINDOWED_TBL
WITH (KAFKA_TOPIC='wave-windowed',
VALUE_FORMAT='JSON',
PARTITIONS=6,
REPLICAS = 1
, retention_ms=3600000
) AS
SELECT
-- EARLIEST_BY_OFFSET(ROWKEY)
-- EARLIEST_BY_OFFSET(AS_VALUE(ROWKEY)) AS MQTT_KEY
NAME AS SENSOR
, COUNT(*) AS SAMPLES
, EARLIEST_BY_OFFSET(dimensions->frequency) AS FREQUENCY
, EARLIEST_BY_OFFSET(dimensions->msStampInterval) AS
MS_SAMPLING_INTERVAL
, COLLECT_LIST(timestamp) AS T
, COLLECT_LIST(`values`->`value`) AS V
, FROM_UNIXTIME(WINDOWSTART) AS WINDOW_START
, FROM_UNIXTIME(WINDOWEND) AS WINDOW_END
, FROM_UNIXTIME(max(ROWTIME)) AS WINDOW_EMIT
FROM WAVES_JET
-- WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 5 SECOND)
WINDOW HOPPING (SIZE 20 SECONDS, ADVANCE BY 2 SECOND)
GROUP BY NAME
EMIT FINAL;
{
"SAMPLES": 1596,
"FREQUENCY": null,
"MS_SAMPLING_INTERVAL": 10,
"T": [
1695756698005,
…
1695756717976,
1695756717988
],
"V": [
0.03141014189275034,
0.11285466728516695,
0.18738077087567337,
…
-0.15022507859000683,
-0.07532702075491876
],
"WINDOW_START": 1695756698000,
"WINDOW_END": 1695756718000,
"WINDOW_EMIT": 1695756717989
}
WAVES_WINDOWED_TBL
37. WAVES_WINDOWED_JET
CREATE STREAM IF NOT EXISTS WAVES_WINDOWED_JET (
SENSOR VARCHAR KEY,
MQTT_KEY VARCHAR,
SAMPLES INTEGER,
MS_SAMPLING_INTERVAL INTEGER,
T ARRAY<BIGINT>,
V ARRAY<DOUBLE>,
WINDOW_START TIMESTAMP,
WINDOW_END TIMESTAMP
) WITH (
kafka_topic = 'wave-windowed',
VALUE_FORMAT = 'JSON'
);
KStream - KTable duality
38. WAVES_STEP_1_JET
CREATE OR REPLACE STREAM IF NOT EXISTS WAVES_STEP_1_JET
WITH (
kafka_topic = 'wave-step-1',
VALUE_FORMAT = 'JSON'
) AS
SELECT
SENSOR,
AS_VALUE(SENSOR) AS SENSOR_VAL,
SAMPLES AS SAMPLES_NUM,
MS_SAMPLING_INTERVAL,
CAST(MS_SAMPLING_INTERVAL AS DOUBLE)/1000 AS SAMPLING_PERIOD,
UNIX_TIMESTAMP(WINDOW_START) AS UNIX_TIMESTAMP_START,
UNIX_TIMESTAMP(WINDOW_END) AS UNIX_TIMESTAMP_END,
T,
FFT(V) AS FOURIER
FROM WAVES_WINDOWED_JET EMIT CHANGES;
{
"SENSOR_VAL":
"sensor#1u0000u0000u0001 ",
"SAMPLES_NUM": 1588,
"MS_SAMPLING_INTERVAL": 10,
"SAMPLING_PERIOD": 0.01,
"UNIX_TIMESTAMP_START":
1695757040000,
"UNIX_TIMESTAMP_END":
1695757060000,
"T": [
1695757040007,
1695757040020,
…
],
"FOURIER ": [
0.03141014189275034,
…
-0.15022507859000683,
-0.07532702075491876
],
39. WAVES_STEP_2_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_2_JET
WITH (
kafka_topic = 'wave-step-2',
VALUE_FORMAT = 'JSON'
) AS
SELECT
SENSOR,
…
SAMPLING_PERIOD,
1 / SAMPLING_PERIOD AS SAMPLING_FREQ,
SAMPLES_NUM * SAMPLING_PERIOD * 2 AS TIME_VECTOR,
UNIX_TIMESTAMP_END - UNIX_TIMESTAMP_START AS MS_TIME_VECTOR,
1 / (SAMPLES_NUM * SAMPLING_PERIOD * 2) AS DELTA_FREQ,
ARRAY_LENGTH(T) AS T_LENGTH,
T,
TRANSFORM(T, x => x - UNIX_TIMESTAMP_START) AS NORM_T,
ARRAY_LENGTH(FOURIER) AS FOURIER_LENGTH,
FOURIER
FROM WAVES_STEP_1_JET EMIT CHANGES;
40. WAVES_STEP_3_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_3_JET
WITH (
kafka_topic = 'wave-step-3',
VALUE_FORMAT = 'JSON',
RETENTION_MS=600000
) AS
SELECT
SENSOR,
…
T_LENGTH,
T,
TRANSFORM(T, x => DELTA_FREQ * (SAMPLES_NUM-1) * ((CAST(x AS DOUBLE) -
UNIX_TIMESTAMP_START)/MS_TIME_VECTOR) ) AS FREQUENCIES,
FOURIER_LENGTH,
TRANSFORM(FOURIER, x => x/(SAMPLES_NUM/2)) AS MAGNITUDE
FROM WAVES_STEP_2_JET EMIT CHANGES;
41. WAVES_STEP_4_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_4_JET
WITH (
kafka_topic = 'wave-step-4',
VALUE_FORMAT = 'JSON',
RETENTION_MS=600000
) AS
SELECT
SENSOR,
SENSOR_VAL,
UNIX_TIMESTAMP_START,
UNIX_TIMESTAMP_END,
SAMPLES_NUM,
MS_SAMPLING_INTERVAL,
MS_TIME_VECTOR,
DELTA_FREQ,
T_LENGTH,
TRANSFORM(T, x => CAST(x AS VARCHAR)) AS T_VARCHAR,
FOURIER_LENGTH,
MAGNITUDE
42. WAVES_STEP_5_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_5_JET
WITH (
kafka_topic = 'wave-step-5',
VALUE_FORMAT = 'JSON',
RETENTION_MS=600000
) AS
SELECT
SENSOR,
SENSOR_VAL,
UNIX_TIMESTAMP_START,
UNIX_TIMESTAMP_END,
SAMPLES_NUM,
MS_SAMPLING_INTERVAL,
MS_TIME_VECTOR,
DELTA_FREQ,
AS_MAP(T_VARCHAR, MAGNITUDE) AS POINTS
FROM WAVES_STEP_4_JET EMIT CHANGES;
43. WAVES_STEP_6_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_6_JET
WITH (
kafka_topic = 'wave-step-6',
VALUE_FORMAT = 'JSON',
RETENTION_MS=600000
) AS
SELECT
SENSOR,
SENSOR_VAL,
UNIX_TIMESTAMP_START,
UNIX_TIMESTAMP_END,
SAMPLES_NUM,
MS_SAMPLING_INTERVAL,
MS_TIME_VECTOR,
DELTA_FREQ,
FILTER(POINTS, (k,v) => v > 0.4 ) FILTERED
FROM WAVES_STEP_5_JET EMIT CHANGES;
44. WAVES_STEP_7_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_7_JET
WITH (
kafka_topic = 'wave-step-7',
VALUE_FORMAT = 'JSON',
RETENTION_MS=600000
) AS
SELECT
SENSOR,
SENSOR_VAL,
UNIX_TIMESTAMP_START,
UNIX_TIMESTAMP_END,
SAMPLES_NUM,
MS_SAMPLING_INTERVAL,
MS_TIME_VECTOR,
DELTA_FREQ,
MAP_KEYS(FILTERED) AS X_TIME,
MAP_VALUES(FILTERED) AS Y_AMPL
FROM WAVES_STEP_6_JET EMIT CHANGES;
45. WAVES_STEP_8_JET
CREATE STREAM IF NOT EXISTS WAVES_STEP_8_JET
WITH (
kafka_topic = 'wave-step-8',
VALUE_FORMAT = 'JSON',
RETENTION_MS=3600000
) AS
SELECT
SENSOR,
SENSOR_VAL,
UNIX_TIMESTAMP_START,
UNIX_TIMESTAMP_END,
SAMPLES_NUM,
MS_SAMPLING_INTERVAL,
MS_TIME_VECTOR,
DELTA_FREQ,
TRANSFORM(X_TIME, x => DELTA_FREQ * (SAMPLES_NUM-1) * ((CAST(x AS DOUBLE) -
UNIX_TIMESTAMP_START)/MS_TIME_VECTOR) ) AS X_FREQ,
Y_AMPL
FROM WAVES_STEP_7_JET EMIT CHANGES;
47. Transform
CREATE TYPE IF NOT EXISTS TAGS_TYPE AS STRUCT<sensor_id VARCHAR, formula
VARCHAR>;
-- DROP TYPE TAGTAGS_TYPES;
CREATE OR REPLACE STREAM IF NOT EXISTS WAVE_INFLUXDB_JET
(
`measurement` VARCHAR,
`tags` TAGS_TYPE,
`time` VARCHAR,
`wave` DOUBLE
)
WITH (
kafka_topic = 'wave-influxdb',
VALUE_FORMAT = 'JSON',
PARTITIONS = 1,
RETENTION_MS=600000
);
48. INSERT INTO SELECT FROM
INSERT INTO WAVE_INFLUXDB_JET
SELECT
'WAVE' AS `measurement`
, STRUCT(sensor_id := NAME, formula := dimensions->formula) AS `tags`
, TIMESTAMPTOSTRING(TIMESTAMP, 'yyyy-MM-dd HH:mm:ss.SSS') AS `time`
,`values`->`value` AS `wave`
FROM WAVES_JET
EMIT CHANGES;