SlideShare a Scribd company logo
1 of 80
Download to read offline
Building a Streaming ETL solution
with Rail Data
@rmoff @confluentinc
On Track with Apache Kafka®:🚂
h/t@kennybastani/@gamussa
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
https:!//wiki.openraildata.com
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Real-time visualisation of rail data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Real-time visualisation of rail data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Why do trains get cancelled?
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Do cancellation reasons vary by Train Operator?
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Tell me when trains are delayed at my station
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Config
Config
SQL
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Code!
http://rmoff.dev/kafka-trains-code-01
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Getting
the data in 📥
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Data Sources
Train Movements
Continuous feed Updated Daily
Schedules Reference data
Static
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
Amazon S3
Google BigQuery
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
kafkacat
• Kafka CLI producer & consumer
• Also metadata inspection
• Integrates beautifully with unix
pipes!
• Get it!
• https:!//github.com/edenhill/kafkacat/
• apt-get install kafkacat
• brew install kafkacat
• https:!//hub.docker.com/r/confluentinc/cp-kafkacat/
curl -s “https:"//my.api.endpoint" | 
kafkacat -b localhost:9092 -t topic -P
https://rmoff.net/2018/05/10/quick-n-easy-population-of-realistic-test-data-into-kafka/
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Movement
data
A train's movements
Arrived / departed
Where
What time
Variation from timetable
A train was cancelled
Where
When
Why
A train was 'activated'
Links a train to a schedule
https://wiki.openraildata.com/index.php?title=Train_Movements
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Ingest ActiveMQ with Kafka Connect
"connector.class" : "io.confluent.connect.activemq.ActiveMQSourceConnector",
"activemq.url" : "tcp:"//datafeeds.networkrail.co.uk:61619",
"jms.destination.type": "topic",
"jms.destination.name": "TRAIN_MVT_EA_TOC",
"kafka.topic" : "networkrail_TRAIN_MVT",
Northern Trains
GNER*
Movement events
TransPennine
* Remember them?
networkrail_TRAIN_MVT
Kafka Connect
Kafka
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Envelope
Payload
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Message transformation
kafkacat -b localhost:9092 -G tm_explode
networkrail_TRAIN_MVT -u | 
jq -c '.text|fromjson[]' | 
kafkacat -b localhost:9092 -t
networkrail_TRAIN_MVT_X -T -P
Extract
element
Convert
JSON
Explode
Array
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Schedule
data
https://wiki.openraildata.com/index.php?title=SCHEDULE
Train routes
Type of train (diesel/electric)
Classes, sleeping accomodation,
reservations, etc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Load JSON from S3 into Kafka
curl -s -L -u "$USER:$PW" "$FEED_URL" | 
gunzip | 
kafkacat -b localhost -P -t CIF_FULL_DAILY
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Reference
data
https://wiki.openraildata.com/index.php?title=SCHEDULE
Train company names
Location codes
Train type codes
Location geocodes
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Geocoding
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Static reference data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
AA:{"canx_reason_code":"AA","canx_reason":"Waiting acceptance into off Network Terminal or Yard"}
AC:{"canx_reason_code":"AC","canx_reason":"Waiting train preparation or completion of TOPS list/RT3973"}
AE:{"canx_reason_code":"AE","canx_reason":"Congestion in off Network Terminal or Yard"}
AG:{"canx_reason_code":"AG","canx_reason":"Wagon load incident including adjusting loads or open door"}
AJ:{"canx_reason_code":"AJ","canx_reason":"Waiting Customer’s traffic including documentation"}
DA:{"canx_reason_code":"DA","canx_reason":"Non Technical Fleet Holding Code"}
DB:{"canx_reason_code":"DB","canx_reason":"Train Operations Holding Code"}
DC:{"canx_reason_code":"DC","canx_reason":"Train Crew Causes Holding Code"}
DD:{"canx_reason_code":"DD","canx_reason":"Technical Fleet Holding Code"}
DE:{"canx_reason_code":"DE","canx_reason":"Station Causes Holding Code"}
DF:{"canx_reason_code":"DF","canx_reason":"External Causes Holding Code"}
DG:{"canx_reason_code":"DG","canx_reason":"Freight Terminal and or Yard Holding Code"}
DH:{"canx_reason_code":"DH","canx_reason":"Adhesion and or Autumn Holding Code"}
FA:{"canx_reason_code":"FA","canx_reason":"Dangerous goods incident"}
FC:{"canx_reason_code":"FC","canx_reason":"Freight train driver"}
kafkacat -l canx_reason_code.dat
-b localhost:9092 -t canx_reason_code -P -K:
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Transforming
the data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Location
Movement Activation Schedule
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Streams of events
Time
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Stream Processing with KSQL
Stream: widgets
Stream: widgets_red
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Stream Processing with KSQL
Stream:
widgets
Stream:
widgets_red
CREATE STREAM widgets_red AS
SELECT * FROM widgets
WHERE colour='RED';
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Event data
NETWORKRAIL_TRAIN_MVT_X
ActiveMQ
Kafka
Connect
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Message routing with KSQL
NETWORKRAIL_TRAIN_MVT_X
TRAIN_ACTIVATIONS
TRAIN_CANCELLATIONS
TRAIN_MOVEMENTS
MSG_TYPE='0001'
MSG_TYPE='0003'
MSG_TYPE='0002'
Time
TimeTimeTime
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Split a stream of data (message routing)
CREATE STREAM TRAIN_CANCELLATIONS_00
AS
SELECT *
FROM NETWORKRAIL_TRAIN_MVT_X
WHERE header!->msg_type = '0002';
CREATE STREAM TRAIN_MOVEMENTS_00
AS
SELECT *
FROM NETWORKRAIL_TRAIN_MVT_X
WHERE header!->msg_type = '0003';
Source topic
NETWORKRAIL_TRAIN_MVT_X
MSG_TYPE='0003'
Movement events
MSG_TYPE=‘0002'
Cancellation events
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Joining events to lookup data (stream-table joins)
CREATE STREAM TRAIN_MOVEMENTS_ENRICHED AS
SELECT TM.LOC_ID AS LOC_ID,
TM.TS AS TS,
L.DESCRIPTION AS LOC_DESC,
[…]
FROM TRAIN_MOVEMENTS TM
LEFT JOIN LOCATION L
ON TM.LOC_ID = L.ID;
TRAIN_MOVEMENTS
(Stream)
LOCATION
(Table)
TRAIN_MOVEMENTS_ENRICHED
(Stream)
TS 10:00:01
LOC_ID 42
TS 10:00:02
LOC_ID 43
TS 10:00:01
LOC_ID 42
LOC_DESC Menston
TS 10:00:02
LOC_ID 43
LOC_DESC Ilkley
ID DESCRIPTION
42 MENSTON
43 Ilkley
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Decode values, and generate composite key
+---------------+----------------+---------+---------------------+----------------+------------------------+
| CIF_train_uid | sched_start_dt | stp_ind | SCHEDULE_KEY | CIF_power_type | POWER_TYPE |
+---------------+----------------+---------+---------------------+----------------+------------------------+
| Y82535 | 2019-05-20 | P | Y82535/2019-05-20/P | E | Electric |
| Y82537 | 2019-05-20 | P | Y82537/2019-05-20/P | DMU | Other |
| Y82542 | 2019-05-20 | P | Y82542/2019-05-20/P | D | Diesel |
| Y82535 | 2019-05-24 | P | Y82535/2019-05-24/P | EMU | Electric Multiple Unit |
| Y82542 | 2019-05-24 | P | Y82542/2019-05-24/P | HST | High Speed Train |
SELECT JsonScheduleV1"->CIF_train_uid + '/'
+ JsonScheduleV1"->schedule_start_date + '/'
+ JsonScheduleV1"->CIF_stp_indicator AS SCHEDULE_KEY,
CASE
WHEN JsonScheduleV1"->schedule_segment"->CIF_power_type = 'D'
THEN 'Diesel'
WHEN JsonScheduleV1"->schedule_segment"->CIF_power_type = 'E'
THEN 'Electric'
WHEN JsonScheduleV1"->schedule_segment"->CIF_power_type = 'ED'
THEN 'Electro-Diesel'
END AS POWER_TYPE
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Schemas
CREATE STREAM SCHEDULE_RAW (
TiplocV1 STRUCT<transaction_type VARCHAR,
tiploc_code VARCHAR,
NALCO VARCHAR,
STANOX VARCHAR,
crs_code VARCHAR,
description VARCHAR,
tps_description VARCHAR>)
WITH (KAFKA_TOPIC='CIF_FULL_DAILY',
VALUE_FORMAT='JSON');
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Handling JSON data
CREATE STREAM NETWORKRAIL_TRAIN_MVT_X (
header STRUCT< msg_type VARCHAR, […] >,
body VARCHAR)
WITH (KAFKA_TOPIC='networkrail_TRAIN_MVT_X',
VALUE_FORMAT='JSON');
CREATE STREAM TRAIN_CANCELLATIONS_00 AS
SELECT HEADER,
EXTRACTJSONFIELD(body,'$.canx_timestamp') AS canx_timestamp,
EXTRACTJSONFIELD(body,'$.canx_reason_code') AS canx_reason_code,
[…]
FROM networkrail_TRAIN_MVT_X
WHERE header"->msg_type = '0002';
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Serialisation & Schemas
-> Confluent
Schema Registry
Avro Protobuf JSON CSV
https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
The Confluent Schema Registry
Source
Avro
Message
Target
Schema
RegistryAvro
Schema
KSQL Kafka
ConnectAvro
Message
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Reserialise data
CREATE STREAM TRAIN_CANCELLATIONS_00
WITH (VALUE_FORMAT='AVRO') AS
SELECT *
FROM networkrail_TRAIN_MVT_X
WHERE header"->msg_type = '0002';
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
CREATE STREAM TIPLOC_FLAT_KEYED
SELECT TiplocV1"->TRANSACTION_TYPE AS TRANSACTION_TYPE ,
TiplocV1"->TIPLOC_CODE AS TIPLOC_CODE ,
TiplocV1"->NALCO AS NALCO ,
TiplocV1"->STANOX AS STANOX ,
TiplocV1"->CRS_CODE AS CRS_CODE ,
TiplocV1"->DESCRIPTION AS DESCRIPTION ,
TiplocV1"->TPS_DESCRIPTION AS TPS_DESCRIPTION
FROM SCHEDULE_RAW
PARTITION BY TIPLOC_CODE;
Flatten and re-key a stream
{
"TiplocV1": {
"transaction_type": "Create",
"tiploc_code": "ABWD",
"nalco": "513100",
"stanox": "88601",
"crs_code": "ABW",
"description": "ABBEY WOOD",
"tps_description": "ABBEY WOOD"
}
}
ksql> DESCRIBE TIPLOC_FLAT_KEYED;
Name : TIPLOC_FLAT_KEYED
Field | Type
----------------------------------------------
TRANSACTION_TYPE | VARCHAR(STRING)
TIPLOC_CODE | VARCHAR(STRING)
NALCO | VARCHAR(STRING)
STANOX | VARCHAR(STRING)
CRS_CODE | VARCHAR(STRING)
DESCRIPTION | VARCHAR(STRING)
TPS_DESCRIPTION | VARCHAR(STRING)
----------------------------------------------
Access nested
element
Change
partitioning key
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Using the
data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
Amazon S3
Google BigQuery
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kafka -> Elasticsearch
{
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00",
"connection.url": "http:"//elasticsearch:9200",
"type.name": "type.name=kafkaconnect",
"key.ignore": "false",
"schema.ignore": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kibana for real-time visualisation of rail data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kibana for real-time analysis of rail data
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kafka -> Postgres
{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connection.url": "jdbc:postgresql:"//postgres:5432/",
"connection.user": "postgres",
"connection.password": "postgres",
"auto.create": true,
"auto.evolve": true,
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "MESSAGE_KEY",
"topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00",
"transforms": "dropArrays",
"transforms.dropArrays.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.dropArrays.blacklist": "SCHEDULE_SEGMENT_LOCATION, HEADER"
}
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kafka -> Postgres
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
But do you even need a database?
SELECT TIMESTAMPTOSTRING(WINDOWSTART(),'yyyy-MM-dd') AS WINDOW_START_TS, VARIATION_STATUS,
SUM(CASE WHEN TOC = 'Arriva Trains Northern' THEN 1 ELSE 0 END) AS Arriva_CT,
SUM(CASE WHEN TOC = 'East Midlands Trains' THEN 1 ELSE 0 END) AS EastMidlands_CT,
SUM(CASE WHEN TOC = 'London North Eastern Railway' THEN 1 ELSE 0 END) AS LNER_CT,
SUM(CASE WHEN TOC = 'TransPennine Express' THEN 1 ELSE 0 END) AS TransPennineExpress_CT
FROM TRAIN_MOVEMENTS WINDOW TUMBLING (SIZE 1 DAY)
GROUP BY VARIATION_STATUS;
+------------+-------------+--------+----------+------+-------+
| Date | Variation | Arriva | East Mid | LNER | TPE |
+------------+-------------+--------+----------+------+-------+
| 2019-07-02 | OFF ROUTE | 46 | 78 | 20 | 167 |
| 2019-07-02 | ON TIME | 19083 | 3568 | 1509 | 2916 |
| 2019-07-02 | LATE | 30850 | 7953 | 5420 | 9042 |
| 2019-07-02 | EARLY | 11478 | 3518 | 1349 | 2096 |
| 2019-07-03 | OFF ROUTE | 79 | 25 | 41 | 213 |
| 2019-07-03 | ON TIME | 19512 | 4247 | 1915 | 2936 |
| 2019-07-03 | LATE | 37357 | 8258 | 5342 | 11016 |
| 2019-07-03 | EARLY | 11825 | 4574 | 1888 | 2094 |
TRAIN_MOVEMENTS
Time
TOC_STATS
Time
Aggregate
Aggregate
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kafka -> S3
{
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"tasks.max": "1",
"topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00",
"s3.region": "us-west-2",
"s3.bucket.name": "rmoff-rail-streaming-demo",
"flush.size": "65536",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.avro.AvroFormat",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"schema.compatibility": "NONE",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"path.format":"'year'=YYYY/'month'=MM/'day'=dd",
"timestamp.extractor":"Record",
"partition.duration.ms": 1800000,
"locale":"en",
"timezone": "UTC"
}
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Kafka -> S3
$ aws s3 ls s3:!//rmoff-rail-streaming-demo/topics/TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00/year=2019/month=07/day=07/
2019-07-07 10:05:41 200995548 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0004963416.avro
2019-07-07 19:47:11 87631494 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0005007151.avro
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Infering table config from S3 files with AWS Glue
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Inferring table config from S3 files with AWS Glue
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Analysis with AWS Athena
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Graph analysis
{
"connector.class": "streams.kafka.connect.sink.Neo4jSinkConnector",
"topics": "TRAIN_CANCELLATIONS_02",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "sink-neo4j-train-00_dlq",
"errors.deadletterqueue.topic.replication.factor": 1,
"errors.deadletterqueue.context.headers.enable":true,
"neo4j.server.uri": "bolt:"//neo4j:7687",
"neo4j.authentication.basic.username": "neo4j",
"neo4j.authentication.basic.password": "connect",
"neo4j.topic.cypher.TRAIN_CANCELLATIONS_02": "MERGE (train:train{id: event.TRAIN_ID}) MERGE
(toc:toc{toc: event.TOC}) MERGE (canx_reason:canx_reason{reason: event.CANX_REASON}) MERGE
(canx_loc:canx_loc{location: coalesce(event.CANCELLATION_LOCATION,'<unknown>')}) MERGE (train)-
[:OPERATED_BY]!->(toc) MERGE (canx_loc)!<-[:CANCELLED_AT{reason:event.CANX_REASON,
time:event.CANX_TIMESTAMP}]-(train)-[:CANCELLED_BECAUSE]!->(canx_reason)"
}
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Graph relationships
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Tell me when trains are delayed at my station
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Event-driven Alerting - logic
CREATE STREAM TRAINS_DELAYED_ALERT AS
SELECT […]
FROM TRAIN_MOVEMENTS T
INNER JOIN ALERT_CONFIG A
ON T.LOC_NLCDESC = A.STATION
WHERE TIMETABLE_VARIATION
> A.ALERT_OVER_MINS;
TRAIN_MOVEMENTS
Time
TRAIN_DELAYED_ALERT
Time
ALERT_CONFIG
Key/Value state
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Sending messages to Telegram
curl -X POST -H 'Content-Type: application/json'
-d '{"chat_id": "-364377679", "text": "This is a test from curl"}'
https:"//api.telegram.org/botxxxxx/sendMessage
{
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"request.method": "post",
"http.api.url": "https:!//api.telegram.org/[…]/sendMessage",
"headers": "Content-Type: application/json",
"topics": "TRAINS_DELAYED_ALERT_TG",
"tasks.max": "1",
"batch.prefix": "{"chat_id":"-364377679","parse_mode": "markdown",",
"batch.suffix": "}",
"batch.max.size": "1",
"regex.patterns":".*{MESSAGE=(.*)}.*",
"regex.replacements": ""text":"$1"",
"regex.separator": "~",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
Kafka Kafka Connect
HTTP Sink
Telegram
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Event-driven Alerting - config
$ kafkacat -b localhost:9092 -t alert_config -P -K: "<<EOF
LEEDS:{"STATION":"LEEDS","ALERT_OVER_MINS":"45"}
EOF
CREATE TABLE ALERT_CONFIG (STATION VARCHAR,
ALERT_OVER_MINS INT)
WITH (KAFKA_TOPIC='alert_config',
VALUE_FORMAT='JSON',
KEY='STATION');
ksql> SELECT STATION, ALERT_OVER_MINS
FROM ALERT_CONFIG WHERE STATION='LEEDS';
LEEDS | 45
$ kafkacat -b localhost:9092 -t alert_config -P -K: "<<EOF
LEEDS:{"STATION":"LEEDS","ALERT_OVER_MINS":"20"}
EOF
ksql> SELECT STATION, ALERT_OVER_MINS
FROM ALERT_CONFIG WHERE STATION='LEEDS';
LEEDS | 20
Compacted topic
CREATE STREAM ALERT_CONFIG_STREAM (STATION VARCHAR,
ALERT_OVER_MINS INT)
WITH (KAFKA_TOPIC='alert_config',
VALUE_FORMAT='JSON');
ksql> SELECT STATION, ALERT_OVER_MINS
FROM ALERT_CONFIG_STREAM WHERE STATION='LEEDS';
LEEDS | 45
ksql> SELECT STATION, ALERT_OVER_MINS
FROM ALERT_CONFIG_STREAM WHERE STATION='LEEDS';
LEEDS | 45
LEEDS | 20
* until topic compaction runs*
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Running it &
monitoring &
maintenance
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Consumer lag
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
System health & Throughput
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Restart failing connectors
rmoff@proxmox01 > crontab -l
"*/5 * * * * /home/rmoff/restart_failed_connector_tasks.sh
rmoff@proxmox01 > cat /home/rmoff/restart_failed_connector_tasks.sh
"#!/usr/bin/env bash
# @rmoff / June 6, 2019
# Restart any connector tasks that are FAILED
curl -s "http:"//localhost:8083/connectors" | 
jq '.[]' | 
xargs -I{connector_name} curl -s "http:"//localhost:8083/connectors/"{connector_name}"/status" | 
jq -c -M '[select(.tasks[].state"=="FAILED") | .name,"§±§",.tasks[].id]' | 
grep -v "[]"| 
sed -e 's/^[""//g'| sed -e 's/","§±§",//tasks"//g'|sed -e 's/]$"//g'| 
xargs -I{connector_and_task} curl -X POST "http:"//localhost:8083/connectors/"{connector_and_task}"/
restart"
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Is everything running? http://rmoff.dev/kafka-trains-code-01
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Start at
latest
message
check_latest_timestamp.sh
kafkacat -b localhost:9092 -t networkrail_TRAIN_MVT_X -o-1 -c1 -C | 
jq '.header.msg_queue_timestamp' | 
sed -e 's/""//g' | 
sed -e 's/000$"//g' | 
xargs -Ifoo date "--date=@foo
Just read one
message
Convert from epoch to human-
readable timestamp format
$ ./check_latest_timestamp.sh
Fri Jul 5 16:28:09 BST 2019
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
✅ Stream raw data ✅ Stream enriched data ✅ Historic data & data replay
KSQL
Kafka
Connect
Kafka
Connect
Kafka
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Config
Config
SQL
Fully Managed Kafka
as a Service
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Free Books!
http://cnfl.io/book-bundle
#EOFhttps://talks.rmoff.net
@rmoff @confluentinc
💬 Join the Confluent Community Slack group at http://cnfl.io/slack

More Related Content

What's hot

GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processingconfluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connectorconfluent
 
Deep Dive Series #3: Schema Validation + Structured Audit Logs
Deep Dive Series #3: Schema Validation + Structured Audit LogsDeep Dive Series #3: Schema Validation + Structured Audit Logs
Deep Dive Series #3: Schema Validation + Structured Audit Logsconfluent
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...HostedbyConfluent
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...confluent
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Casesconfluent
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka confluent
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent
 

What's hot (20)

GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Deep Dive Series #3: Schema Validation + Structured Audit Logs
Deep Dive Series #3: Schema Validation + Structured Audit LogsDeep Dive Series #3: Schema Validation + Structured Audit Logs
Deep Dive Series #3: Schema Validation + Structured Audit Logs
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
Apicurio Registry: Event-driven APIs & Schema governance for Apache Kafka | F...
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 

Similar to On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat... On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...HostedbyConfluent
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...HostedbyConfluent
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Kafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platformKafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platformPaolo Castagna
 
Connected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache KafkaConnected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache KafkaKai Wähner
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Lightbend
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and CaliforniumJulien Vermillard
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Matt Stubbs
 
DevNation Live 2020 - What's new with Apache Camel 3
DevNation Live 2020 - What's new with Apache Camel 3DevNation Live 2020 - What's new with Apache Camel 3
DevNation Live 2020 - What's new with Apache Camel 3Claus Ibsen
 
What's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkWhat's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkRed Hat Developers
 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformPaolo Castagna
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 

Similar to On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data (20)

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat... On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Dat...
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Kafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platformKafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platform
 
Connected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache KafkaConnected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache Kafka
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
 
DevNation Live 2020 - What's new with Apache Camel 3
DevNation Live 2020 - What's new with Apache Camel 3DevNation Live 2020 - What's new with Apache Camel 3
DevNation Live 2020 - What's new with Apache Camel 3
 
What's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech TalkWhat's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech Talk
 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Framework
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data

  • 1. Building a Streaming ETL solution with Rail Data @rmoff @confluentinc On Track with Apache Kafka®:🚂
  • 2.
  • 4.
  • 5. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 6. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc https:!//wiki.openraildata.com
  • 7. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Real-time visualisation of rail data
  • 8. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Real-time visualisation of rail data
  • 9. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Why do trains get cancelled?
  • 10. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Do cancellation reasons vary by Train Operator?
  • 11. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Tell me when trains are delayed at my station
  • 12. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 13. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 14. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Config Config SQL
  • 15. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Code! http://rmoff.dev/kafka-trains-code-01
  • 16. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Getting the data in 📥
  • 17. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Data Sources Train Movements Continuous feed Updated Daily Schedules Reference data Static
  • 18. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect syslog Amazon S3 Google BigQuery
  • 19. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc kafkacat • Kafka CLI producer & consumer • Also metadata inspection • Integrates beautifully with unix pipes! • Get it! • https:!//github.com/edenhill/kafkacat/ • apt-get install kafkacat • brew install kafkacat • https:!//hub.docker.com/r/confluentinc/cp-kafkacat/ curl -s “https:"//my.api.endpoint" | kafkacat -b localhost:9092 -t topic -P https://rmoff.net/2018/05/10/quick-n-easy-population-of-realistic-test-data-into-kafka/
  • 20. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Movement data A train's movements Arrived / departed Where What time Variation from timetable A train was cancelled Where When Why A train was 'activated' Links a train to a schedule https://wiki.openraildata.com/index.php?title=Train_Movements
  • 21. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Ingest ActiveMQ with Kafka Connect "connector.class" : "io.confluent.connect.activemq.ActiveMQSourceConnector", "activemq.url" : "tcp:"//datafeeds.networkrail.co.uk:61619", "jms.destination.type": "topic", "jms.destination.name": "TRAIN_MVT_EA_TOC", "kafka.topic" : "networkrail_TRAIN_MVT", Northern Trains GNER* Movement events TransPennine * Remember them? networkrail_TRAIN_MVT Kafka Connect Kafka
  • 22. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Envelope Payload
  • 23. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Message transformation kafkacat -b localhost:9092 -G tm_explode networkrail_TRAIN_MVT -u | jq -c '.text|fromjson[]' | kafkacat -b localhost:9092 -t networkrail_TRAIN_MVT_X -T -P Extract element Convert JSON Explode Array
  • 24. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Schedule data https://wiki.openraildata.com/index.php?title=SCHEDULE Train routes Type of train (diesel/electric) Classes, sleeping accomodation, reservations, etc
  • 25. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Load JSON from S3 into Kafka curl -s -L -u "$USER:$PW" "$FEED_URL" | gunzip | kafkacat -b localhost -P -t CIF_FULL_DAILY
  • 26. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Reference data https://wiki.openraildata.com/index.php?title=SCHEDULE Train company names Location codes Train type codes Location geocodes
  • 27. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Geocoding
  • 28. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Static reference data
  • 29. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc AA:{"canx_reason_code":"AA","canx_reason":"Waiting acceptance into off Network Terminal or Yard"} AC:{"canx_reason_code":"AC","canx_reason":"Waiting train preparation or completion of TOPS list/RT3973"} AE:{"canx_reason_code":"AE","canx_reason":"Congestion in off Network Terminal or Yard"} AG:{"canx_reason_code":"AG","canx_reason":"Wagon load incident including adjusting loads or open door"} AJ:{"canx_reason_code":"AJ","canx_reason":"Waiting Customer’s traffic including documentation"} DA:{"canx_reason_code":"DA","canx_reason":"Non Technical Fleet Holding Code"} DB:{"canx_reason_code":"DB","canx_reason":"Train Operations Holding Code"} DC:{"canx_reason_code":"DC","canx_reason":"Train Crew Causes Holding Code"} DD:{"canx_reason_code":"DD","canx_reason":"Technical Fleet Holding Code"} DE:{"canx_reason_code":"DE","canx_reason":"Station Causes Holding Code"} DF:{"canx_reason_code":"DF","canx_reason":"External Causes Holding Code"} DG:{"canx_reason_code":"DG","canx_reason":"Freight Terminal and or Yard Holding Code"} DH:{"canx_reason_code":"DH","canx_reason":"Adhesion and or Autumn Holding Code"} FA:{"canx_reason_code":"FA","canx_reason":"Dangerous goods incident"} FC:{"canx_reason_code":"FC","canx_reason":"Freight train driver"} kafkacat -l canx_reason_code.dat -b localhost:9092 -t canx_reason_code -P -K:
  • 30. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 31. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Transforming the data
  • 32. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Location Movement Activation Schedule
  • 33. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Streams of events Time
  • 34. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Stream Processing with KSQL Stream: widgets Stream: widgets_red
  • 35. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Stream Processing with KSQL Stream: widgets Stream: widgets_red CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour='RED';
  • 36. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 37. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Event data NETWORKRAIL_TRAIN_MVT_X ActiveMQ Kafka Connect
  • 38. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Message routing with KSQL NETWORKRAIL_TRAIN_MVT_X TRAIN_ACTIVATIONS TRAIN_CANCELLATIONS TRAIN_MOVEMENTS MSG_TYPE='0001' MSG_TYPE='0003' MSG_TYPE='0002' Time TimeTimeTime
  • 39. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Split a stream of data (message routing) CREATE STREAM TRAIN_CANCELLATIONS_00 AS SELECT * FROM NETWORKRAIL_TRAIN_MVT_X WHERE header!->msg_type = '0002'; CREATE STREAM TRAIN_MOVEMENTS_00 AS SELECT * FROM NETWORKRAIL_TRAIN_MVT_X WHERE header!->msg_type = '0003'; Source topic NETWORKRAIL_TRAIN_MVT_X MSG_TYPE='0003' Movement events MSG_TYPE=‘0002' Cancellation events
  • 40. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Joining events to lookup data (stream-table joins) CREATE STREAM TRAIN_MOVEMENTS_ENRICHED AS SELECT TM.LOC_ID AS LOC_ID, TM.TS AS TS, L.DESCRIPTION AS LOC_DESC, […] FROM TRAIN_MOVEMENTS TM LEFT JOIN LOCATION L ON TM.LOC_ID = L.ID; TRAIN_MOVEMENTS (Stream) LOCATION (Table) TRAIN_MOVEMENTS_ENRICHED (Stream) TS 10:00:01 LOC_ID 42 TS 10:00:02 LOC_ID 43 TS 10:00:01 LOC_ID 42 LOC_DESC Menston TS 10:00:02 LOC_ID 43 LOC_DESC Ilkley ID DESCRIPTION 42 MENSTON 43 Ilkley
  • 41. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Decode values, and generate composite key +---------------+----------------+---------+---------------------+----------------+------------------------+ | CIF_train_uid | sched_start_dt | stp_ind | SCHEDULE_KEY | CIF_power_type | POWER_TYPE | +---------------+----------------+---------+---------------------+----------------+------------------------+ | Y82535 | 2019-05-20 | P | Y82535/2019-05-20/P | E | Electric | | Y82537 | 2019-05-20 | P | Y82537/2019-05-20/P | DMU | Other | | Y82542 | 2019-05-20 | P | Y82542/2019-05-20/P | D | Diesel | | Y82535 | 2019-05-24 | P | Y82535/2019-05-24/P | EMU | Electric Multiple Unit | | Y82542 | 2019-05-24 | P | Y82542/2019-05-24/P | HST | High Speed Train | SELECT JsonScheduleV1"->CIF_train_uid + '/' + JsonScheduleV1"->schedule_start_date + '/' + JsonScheduleV1"->CIF_stp_indicator AS SCHEDULE_KEY, CASE WHEN JsonScheduleV1"->schedule_segment"->CIF_power_type = 'D' THEN 'Diesel' WHEN JsonScheduleV1"->schedule_segment"->CIF_power_type = 'E' THEN 'Electric' WHEN JsonScheduleV1"->schedule_segment"->CIF_power_type = 'ED' THEN 'Electro-Diesel' END AS POWER_TYPE
  • 42. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Schemas CREATE STREAM SCHEDULE_RAW ( TiplocV1 STRUCT<transaction_type VARCHAR, tiploc_code VARCHAR, NALCO VARCHAR, STANOX VARCHAR, crs_code VARCHAR, description VARCHAR, tps_description VARCHAR>) WITH (KAFKA_TOPIC='CIF_FULL_DAILY', VALUE_FORMAT='JSON');
  • 43. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Handling JSON data CREATE STREAM NETWORKRAIL_TRAIN_MVT_X ( header STRUCT< msg_type VARCHAR, […] >, body VARCHAR) WITH (KAFKA_TOPIC='networkrail_TRAIN_MVT_X', VALUE_FORMAT='JSON'); CREATE STREAM TRAIN_CANCELLATIONS_00 AS SELECT HEADER, EXTRACTJSONFIELD(body,'$.canx_timestamp') AS canx_timestamp, EXTRACTJSONFIELD(body,'$.canx_reason_code') AS canx_reason_code, […] FROM networkrail_TRAIN_MVT_X WHERE header"->msg_type = '0002';
  • 44. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Serialisation & Schemas -> Confluent Schema Registry Avro Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf
  • 45. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc The Confluent Schema Registry Source Avro Message Target Schema RegistryAvro Schema KSQL Kafka ConnectAvro Message
  • 46. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Reserialise data CREATE STREAM TRAIN_CANCELLATIONS_00 WITH (VALUE_FORMAT='AVRO') AS SELECT * FROM networkrail_TRAIN_MVT_X WHERE header"->msg_type = '0002';
  • 47. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc CREATE STREAM TIPLOC_FLAT_KEYED SELECT TiplocV1"->TRANSACTION_TYPE AS TRANSACTION_TYPE , TiplocV1"->TIPLOC_CODE AS TIPLOC_CODE , TiplocV1"->NALCO AS NALCO , TiplocV1"->STANOX AS STANOX , TiplocV1"->CRS_CODE AS CRS_CODE , TiplocV1"->DESCRIPTION AS DESCRIPTION , TiplocV1"->TPS_DESCRIPTION AS TPS_DESCRIPTION FROM SCHEDULE_RAW PARTITION BY TIPLOC_CODE; Flatten and re-key a stream { "TiplocV1": { "transaction_type": "Create", "tiploc_code": "ABWD", "nalco": "513100", "stanox": "88601", "crs_code": "ABW", "description": "ABBEY WOOD", "tps_description": "ABBEY WOOD" } } ksql> DESCRIBE TIPLOC_FLAT_KEYED; Name : TIPLOC_FLAT_KEYED Field | Type ---------------------------------------------- TRANSACTION_TYPE | VARCHAR(STRING) TIPLOC_CODE | VARCHAR(STRING) NALCO | VARCHAR(STRING) STANOX | VARCHAR(STRING) CRS_CODE | VARCHAR(STRING) DESCRIPTION | VARCHAR(STRING) TPS_DESCRIPTION | VARCHAR(STRING) ---------------------------------------------- Access nested element Change partitioning key
  • 48. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 49. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Using the data
  • 50. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 51. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc
  • 52. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect syslog Amazon S3 Google BigQuery
  • 53. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kafka -> Elasticsearch { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00", "connection.url": "http:"//elasticsearch:9200", "type.name": "type.name=kafkaconnect", "key.ignore": "false", "schema.ignore": "true", "key.converter": "org.apache.kafka.connect.storage.StringConverter" }
  • 54. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kibana for real-time visualisation of rail data
  • 55. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kibana for real-time analysis of rail data
  • 56. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kafka -> Postgres { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "connection.url": "jdbc:postgresql:"//postgres:5432/", "connection.user": "postgres", "connection.password": "postgres", "auto.create": true, "auto.evolve": true, "insert.mode": "upsert", "pk.mode": "record_key", "pk.fields": "MESSAGE_KEY", "topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00", "transforms": "dropArrays", "transforms.dropArrays.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.dropArrays.blacklist": "SCHEDULE_SEGMENT_LOCATION, HEADER" }
  • 57. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kafka -> Postgres
  • 58. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc But do you even need a database? SELECT TIMESTAMPTOSTRING(WINDOWSTART(),'yyyy-MM-dd') AS WINDOW_START_TS, VARIATION_STATUS, SUM(CASE WHEN TOC = 'Arriva Trains Northern' THEN 1 ELSE 0 END) AS Arriva_CT, SUM(CASE WHEN TOC = 'East Midlands Trains' THEN 1 ELSE 0 END) AS EastMidlands_CT, SUM(CASE WHEN TOC = 'London North Eastern Railway' THEN 1 ELSE 0 END) AS LNER_CT, SUM(CASE WHEN TOC = 'TransPennine Express' THEN 1 ELSE 0 END) AS TransPennineExpress_CT FROM TRAIN_MOVEMENTS WINDOW TUMBLING (SIZE 1 DAY) GROUP BY VARIATION_STATUS; +------------+-------------+--------+----------+------+-------+ | Date | Variation | Arriva | East Mid | LNER | TPE | +------------+-------------+--------+----------+------+-------+ | 2019-07-02 | OFF ROUTE | 46 | 78 | 20 | 167 | | 2019-07-02 | ON TIME | 19083 | 3568 | 1509 | 2916 | | 2019-07-02 | LATE | 30850 | 7953 | 5420 | 9042 | | 2019-07-02 | EARLY | 11478 | 3518 | 1349 | 2096 | | 2019-07-03 | OFF ROUTE | 79 | 25 | 41 | 213 | | 2019-07-03 | ON TIME | 19512 | 4247 | 1915 | 2936 | | 2019-07-03 | LATE | 37357 | 8258 | 5342 | 11016 | | 2019-07-03 | EARLY | 11825 | 4574 | 1888 | 2094 | TRAIN_MOVEMENTS Time TOC_STATS Time Aggregate Aggregate
  • 59. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kafka -> S3 { "connector.class": "io.confluent.connect.s3.S3SinkConnector", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "tasks.max": "1", "topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00", "s3.region": "us-west-2", "s3.bucket.name": "rmoff-rail-streaming-demo", "flush.size": "65536", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "format.class": "io.confluent.connect.s3.format.avro.AvroFormat", "schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator", "schema.compatibility": "NONE", "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner", "path.format":"'year'=YYYY/'month'=MM/'day'=dd", "timestamp.extractor":"Record", "partition.duration.ms": 1800000, "locale":"en", "timezone": "UTC" }
  • 60. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Kafka -> S3 $ aws s3 ls s3:!//rmoff-rail-streaming-demo/topics/TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00/year=2019/month=07/day=07/ 2019-07-07 10:05:41 200995548 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0004963416.avro 2019-07-07 19:47:11 87631494 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0005007151.avro
  • 61. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Infering table config from S3 files with AWS Glue
  • 62. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Inferring table config from S3 files with AWS Glue
  • 63. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Analysis with AWS Athena
  • 64. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Graph analysis { "connector.class": "streams.kafka.connect.sink.Neo4jSinkConnector", "topics": "TRAIN_CANCELLATIONS_02", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "errors.tolerance": "all", "errors.deadletterqueue.topic.name": "sink-neo4j-train-00_dlq", "errors.deadletterqueue.topic.replication.factor": 1, "errors.deadletterqueue.context.headers.enable":true, "neo4j.server.uri": "bolt:"//neo4j:7687", "neo4j.authentication.basic.username": "neo4j", "neo4j.authentication.basic.password": "connect", "neo4j.topic.cypher.TRAIN_CANCELLATIONS_02": "MERGE (train:train{id: event.TRAIN_ID}) MERGE (toc:toc{toc: event.TOC}) MERGE (canx_reason:canx_reason{reason: event.CANX_REASON}) MERGE (canx_loc:canx_loc{location: coalesce(event.CANCELLATION_LOCATION,'<unknown>')}) MERGE (train)- [:OPERATED_BY]!->(toc) MERGE (canx_loc)!<-[:CANCELLED_AT{reason:event.CANX_REASON, time:event.CANX_TIMESTAMP}]-(train)-[:CANCELLED_BECAUSE]!->(canx_reason)" }
  • 65. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Graph relationships
  • 66. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Tell me when trains are delayed at my station
  • 67. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Event-driven Alerting - logic CREATE STREAM TRAINS_DELAYED_ALERT AS SELECT […] FROM TRAIN_MOVEMENTS T INNER JOIN ALERT_CONFIG A ON T.LOC_NLCDESC = A.STATION WHERE TIMETABLE_VARIATION > A.ALERT_OVER_MINS; TRAIN_MOVEMENTS Time TRAIN_DELAYED_ALERT Time ALERT_CONFIG Key/Value state
  • 68. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Sending messages to Telegram curl -X POST -H 'Content-Type: application/json' -d '{"chat_id": "-364377679", "text": "This is a test from curl"}' https:"//api.telegram.org/botxxxxx/sendMessage { "connector.class": "io.confluent.connect.http.HttpSinkConnector", "request.method": "post", "http.api.url": "https:!//api.telegram.org/[…]/sendMessage", "headers": "Content-Type: application/json", "topics": "TRAINS_DELAYED_ALERT_TG", "tasks.max": "1", "batch.prefix": "{"chat_id":"-364377679","parse_mode": "markdown",", "batch.suffix": "}", "batch.max.size": "1", "regex.patterns":".*{MESSAGE=(.*)}.*", "regex.replacements": ""text":"$1"", "regex.separator": "~", "key.converter": "org.apache.kafka.connect.storage.StringConverter" } Kafka Kafka Connect HTTP Sink Telegram
  • 69. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Event-driven Alerting - config $ kafkacat -b localhost:9092 -t alert_config -P -K: "<<EOF LEEDS:{"STATION":"LEEDS","ALERT_OVER_MINS":"45"} EOF CREATE TABLE ALERT_CONFIG (STATION VARCHAR, ALERT_OVER_MINS INT) WITH (KAFKA_TOPIC='alert_config', VALUE_FORMAT='JSON', KEY='STATION'); ksql> SELECT STATION, ALERT_OVER_MINS FROM ALERT_CONFIG WHERE STATION='LEEDS'; LEEDS | 45 $ kafkacat -b localhost:9092 -t alert_config -P -K: "<<EOF LEEDS:{"STATION":"LEEDS","ALERT_OVER_MINS":"20"} EOF ksql> SELECT STATION, ALERT_OVER_MINS FROM ALERT_CONFIG WHERE STATION='LEEDS'; LEEDS | 20 Compacted topic CREATE STREAM ALERT_CONFIG_STREAM (STATION VARCHAR, ALERT_OVER_MINS INT) WITH (KAFKA_TOPIC='alert_config', VALUE_FORMAT='JSON'); ksql> SELECT STATION, ALERT_OVER_MINS FROM ALERT_CONFIG_STREAM WHERE STATION='LEEDS'; LEEDS | 45 ksql> SELECT STATION, ALERT_OVER_MINS FROM ALERT_CONFIG_STREAM WHERE STATION='LEEDS'; LEEDS | 45 LEEDS | 20 * until topic compaction runs*
  • 70. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Running it & monitoring & maintenance
  • 71. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Consumer lag
  • 72. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc System health & Throughput
  • 73. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Restart failing connectors rmoff@proxmox01 > crontab -l "*/5 * * * * /home/rmoff/restart_failed_connector_tasks.sh rmoff@proxmox01 > cat /home/rmoff/restart_failed_connector_tasks.sh "#!/usr/bin/env bash # @rmoff / June 6, 2019 # Restart any connector tasks that are FAILED curl -s "http:"//localhost:8083/connectors" | jq '.[]' | xargs -I{connector_name} curl -s "http:"//localhost:8083/connectors/"{connector_name}"/status" | jq -c -M '[select(.tasks[].state"=="FAILED") | .name,"§±§",.tasks[].id]' | grep -v "[]"| sed -e 's/^[""//g'| sed -e 's/","§±§",//tasks"//g'|sed -e 's/]$"//g'| xargs -I{connector_and_task} curl -X POST "http:"//localhost:8083/connectors/"{connector_and_task}"/ restart"
  • 74. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Is everything running? http://rmoff.dev/kafka-trains-code-01
  • 75. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Start at latest message check_latest_timestamp.sh kafkacat -b localhost:9092 -t networkrail_TRAIN_MVT_X -o-1 -c1 -C | jq '.header.msg_queue_timestamp' | sed -e 's/""//g' | sed -e 's/000$"//g' | xargs -Ifoo date "--date=@foo Just read one message Convert from epoch to human- readable timestamp format $ ./check_latest_timestamp.sh Fri Jul 5 16:28:09 BST 2019
  • 76. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc ✅ Stream raw data ✅ Stream enriched data ✅ Historic data & data replay KSQL Kafka Connect Kafka Connect Kafka
  • 77. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Config Config SQL
  • 79. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Free Books! http://cnfl.io/book-bundle
  • 80. #EOFhttps://talks.rmoff.net @rmoff @confluentinc 💬 Join the Confluent Community Slack group at http://cnfl.io/slack