SlideShare a Scribd company logo
1 of 80
Download to read offline
Building a Streaming ETL solution
with Rail Data
@rmoff @confluentinc
On Track with Apache Kafka®:
🚂
h/t
@kennybastani
/
@gamussa
@rmoff | @confluentinc
@rmoff | @confluentinc
https://wiki.openraildata.com
@rmoff | @confluentinc
@rmoff | @confluentinc
@rmoff | @confluentinc
C
o
n
f
i
g
C
o
n
f
i
g
SQL
@rmoff | @confluentinc
Real-time visualisation of rail data
@rmoff | @confluentinc
Real-time visualisation of rail data
DEMO
@rmoff | @confluentinc
Code!
http://rmoff.dev/kafka-trains-code-01
@rmoff | @confluentinc
Graph relationships
@rmoff | @confluentinc
Realtime Alerts
@rmoff | @confluentinc
Getting
the data in 📥
@rmoff | @confluentinc
Data Sources
Train Movements
Continuous feed Updated Daily
Schedules Reference data
Static
@rmoff | @confluentinc
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
Amazon
Google
@rmoff | @confluentinc
• Kafka CLI producer & consumer
• Also metadata inspection
• Integrates beautifully with unix
pipes!
• Get it!
• https://github.com/edenhill/kafkacat/
• apt-get install kafkacat
• brew install kafkacat
• https://hub.docker.com/r/confluentinc/cp-kafkacat/
kafkacat
curl -s “https://my.api.endpoint" | 
kafkacat -b localhost:9092 -t topic -P
https://rmoff.net/2018/05/10/quick-n-easy-population-of-realistic-test-data-into-kafka/
@rmoff | @confluentinc
Movement
data
A train's movements
Arrived / departed
Where
What time
Variation from timetable
A train was cancelled
Where
When
Why
A train was 'activated'
Links a train to a schedule
https://wiki.openraildata.com/index.php?title=Train_Movements
@rmoff | @confluentinc
Ingest ActiveMQ with Kafka Connect
"connector.class" : "io.confluent.connect.activemq.ActiveMQSourceConnector",
"activemq.url" : "tcp://datafeeds.networkrail.co.uk:61619",
"jms.destination.type": "topic",
"jms.destination.name": "TRAIN_MVT_EA_TOC",
"kafka.topic" : "networkrail_TRAIN_MVT",
Northern Trains
GNER*
Movement events
TransPennine
* Remember them?
networkrail_TRAIN_MVT
Kafka Connect
Kafka
@rmoff | @confluentinc
Envelope
Payload
@rmoff | @confluentinc
Message transformation
kafkacat -b localhost:9092 -G tm_explode
networkrail_TRAIN_MVT -u | 
jq -c '.text|fromjson[]' | 
kafkacat -b localhost:9092 -t
networkrail_TRAIN_MVT_X -T -P
Extract
element
Convert
JSON
Explode
Array
@rmoff | @confluentinc
Schedule
data
https://wiki.openraildata.com/index.php?title=SCHEDULE
Train routes
Type of train (diesel/electric)
Classes, sleeping accomodation,
reservations, etc
@rmoff | @confluentinc
Load JSON from S3 into Kafka
curl -s -L -u "$USER:$PW" "$FEED_URL" | 
gunzip | 
kafkacat -b localhost -P -t CIF_FULL_DAILY
@rmoff | @confluentinc
Reference
data
https://wiki.openraildata.com/index.php?title=SCHEDULE
Train company names
Location codes
Train type codes
Location geocodes
@rmoff | @confluentinc
Geocoding
@rmoff | @confluentinc
Static reference data
@rmoff | @confluentinc
AA:{"canx_reason_code":"AA","canx_reason":"Waiting acceptance into off Network Terminal or Yard"}
AC:{"canx_reason_code":"AC","canx_reason":"Waiting train preparation or completion of TOPS list/RT3973"}
AE:{"canx_reason_code":"AE","canx_reason":"Congestion in off Network Terminal or Yard"}
AG:{"canx_reason_code":"AG","canx_reason":"Wagon load incident including adjusting loads or open door"}
AJ:{"canx_reason_code":"AJ","canx_reason":"Waiting Customer’s traffic including documentation"}
DA:{"canx_reason_code":"DA","canx_reason":"Non Technical Fleet Holding Code"}
DB:{"canx_reason_code":"DB","canx_reason":"Train Operations Holding Code"}
DC:{"canx_reason_code":"DC","canx_reason":"Train Crew Causes Holding Code"}
DD:{"canx_reason_code":"DD","canx_reason":"Technical Fleet Holding Code"}
DE:{"canx_reason_code":"DE","canx_reason":"Station Causes Holding Code"}
DF:{"canx_reason_code":"DF","canx_reason":"External Causes Holding Code"}
DG:{"canx_reason_code":"DG","canx_reason":"Freight Terminal and or Yard Holding Code"}
DH:{"canx_reason_code":"DH","canx_reason":"Adhesion and or Autumn Holding Code"}
FA:{"canx_reason_code":"FA","canx_reason":"Dangerous goods incident"}
FC:{"canx_reason_code":"FC","canx_reason":"Freight train driver"}
kafkacat -l canx_reason_code.dat
-b localhost:9092 -t canx_reason_code -P -K:
@rmoff | @confluentinc
@rmoff | @confluentinc
Transforming
the data
@rmoff | @confluentinc
Location
Movement Activation Schedule
@rmoff | @confluentinc
Location
Movement Activation Schedule
@rmoff | @confluentinc
Location
Movement Activation Schedule
@rmoff | @confluentinc
Location
Movement Activation Schedule
@rmoff | @confluentinc
Streams of events
Time
@rmoff | @confluentinc
Stream Processing with ksqlDB
Stream: widgets
Stream: widgets_red
@rmoff | @confluentinc
Stream Processing with ksqlDB
Stream:
widgets
Stream:
widgets_red
CREATE STREAM widgets_red AS
SELECT * FROM widgets
WHERE colour='RED';
@rmoff | @confluentinc
ksqlDB
@rmoff | @confluentinc
Event data
NETWORKRAIL_TRAIN_MVT_X
ActiveMQ
Kafka
Connect
@rmoff | @confluentinc
Message routing with ksqlDB
NETWORKRAIL_TRAIN_MVT_X
TRAIN_ACTIVATIONS
TRAIN_CANCELLATIONS
TRAIN_MOVEMENTS
MSG_TYPE='0001'
MSG_TYPE='0003'
MSG_TYPE='0002'
Time
Time
Time
Time
@rmoff | @confluentinc
Split a stream of data (message routing)
CREATE STREAM TRAIN_CANCELLATIONS_00
AS
SELECT *
FROM NETWORKRAIL_TRAIN_MVT_X
WHERE header->msg_type = '0002';
CREATE STREAM TRAIN_MOVEMENTS_00
AS
SELECT *
FROM NETWORKRAIL_TRAIN_MVT_X
WHERE header->msg_type = '0003';
Source topic
NETWORKRAIL_TRAIN_MVT_X
MSG_TYPE='0003'
Movement events
MSG_TYPE=‘0002'
Cancellation events
@rmoff | @confluentinc
Joining events to lookup data (stream-table joins)
CREATE STREAM TRAIN_MOVEMENTS_ENRICHED AS
SELECT TM.LOC_ID AS LOC_ID,
TM.TS AS TS,
L.DESCRIPTION AS LOC_DESC,
[…]
FROM TRAIN_MOVEMENTS TM
LEFT JOIN LOCATION L
ON TM.LOC_ID = L.ID;
TRAIN_MOVEMENTS
(Stream)
LOCATION
(Table)
TRAIN_MOVEMENTS_ENRICHED
(Stream)
TS 10:00:01
LOC_ID 42
TS 10:00:02
LOC_ID 43
TS 10:00:01
LOC_ID 42
LOC_DESC Menston
TS 10:00:02
LOC_ID 43
LOC_DESC Ilkley
ID DESCRIPTION
42 MENSTON
43 Ilkley
@rmoff | @confluentinc
Decode values, and generate composite key
+---------------+----------------+---------+---------------------+----------------+------------------------+
| CIF_train_uid | sched_start_dt | stp_ind | SCHEDULE_KEY | CIF_power_type | POWER_TYPE |
+---------------+----------------+---------+---------------------+----------------+------------------------+
| Y82535 | 2019-05-20 | P | Y82535/2019-05-20/P | E | Electric |
| Y82537 | 2019-05-20 | P | Y82537/2019-05-20/P | DMU | Other |
| Y82542 | 2019-05-20 | P | Y82542/2019-05-20/P | D | Diesel |
| Y82535 | 2019-05-24 | P | Y82535/2019-05-24/P | EMU | Electric Multiple Unit |
| Y82542 | 2019-05-24 | P | Y82542/2019-05-24/P | HST | High Speed Train |
SELECT JsonScheduleV1->CIF_train_uid + '/'
+ JsonScheduleV1->schedule_start_date + '/'
+ JsonScheduleV1->CIF_stp_indicator AS SCHEDULE_KEY,
CASE
WHEN JsonScheduleV1->schedule_segment->CIF_power_type = 'D'
THEN 'Diesel'
WHEN JsonScheduleV1->schedule_segment->CIF_power_type = 'E'
THEN 'Electric'
WHEN JsonScheduleV1->schedule_segment->CIF_power_type = 'ED'
THEN 'Electro-Diesel'
END AS POWER_TYPE
@rmoff | @confluentinc
Schemas
CREATE STREAM SCHEDULE_RAW (
TiplocV1 STRUCT<transaction_type VARCHAR,
tiploc_code VARCHAR,
NALCO VARCHAR,
STANOX VARCHAR,
crs_code VARCHAR,
description VARCHAR,
tps_description VARCHAR>)
WITH (KAFKA_TOPIC='CIF_FULL_DAILY',
VALUE_FORMAT='JSON');
@rmoff | @confluentinc
Handling JSON data
CREATE STREAM NETWORKRAIL_TRAIN_MVT_X (
header STRUCT< msg_type VARCHAR, […] >,
body VARCHAR)
WITH (KAFKA_TOPIC='networkrail_TRAIN_MVT_X',
VALUE_FORMAT='JSON');
CREATE STREAM TRAIN_CANCELLATIONS_00 AS
SELECT HEADER,
EXTRACTJSONFIELD(body,'$.canx_timestamp') AS canx_timestamp,
EXTRACTJSONFIELD(body,'$.canx_reason_code') AS canx_reason_code,
[…]
FROM networkrail_TRAIN_MVT_X
WHERE header->msg_type = '0002';
@rmoff | @confluentinc
The Confluent Schema Registry
Source
Avro
Message
Target
Schema
Registry
Avro
Schema
ksqlDB
Kafka
Connect
Avro
Message
@rmoff | @confluentinc
Reserialise data
CREATE STREAM TRAIN_CANCELLATIONS_00
WITH (VALUE_FORMAT='AVRO') AS
SELECT *
FROM networkrail_TRAIN_MVT_X
WHERE header->msg_type = '0002';
@rmoff | @confluentinc
CREATE STREAM TIPLOC_FLAT_KEYED
SELECT TiplocV1->TRANSACTION_TYPE AS TRANSACTION_TYPE ,
TiplocV1->TIPLOC_CODE AS TIPLOC_CODE ,
TiplocV1->NALCO AS NALCO ,
TiplocV1->STANOX AS STANOX ,
TiplocV1->CRS_CODE AS CRS_CODE ,
TiplocV1->DESCRIPTION AS DESCRIPTION ,
TiplocV1->TPS_DESCRIPTION AS TPS_DESCRIPTION
FROM SCHEDULE_RAW
PARTITION BY TIPLOC_CODE;
Flatten and re-key a stream
{
"TiplocV1": {
"transaction_type": "Create",
"tiploc_code": "ABWD",
"nalco": "513100",
"stanox": "88601",
"crs_code": "ABW",
"description": "ABBEY WOOD",
"tps_description": "ABBEY WOOD"
}
}
ksql> DESCRIBE TIPLOC_FLAT_KEYED;
Name : TIPLOC_FLAT_KEYED
Field | Type
----------------------------------------------
TRANSACTION_TYPE | VARCHAR(STRING)
TIPLOC_CODE | VARCHAR(STRING)
NALCO | VARCHAR(STRING)
STANOX | VARCHAR(STRING)
CRS_CODE | VARCHAR(STRING)
DESCRIPTION | VARCHAR(STRING)
TPS_DESCRIPTION | VARCHAR(STRING)
----------------------------------------------
Access nested
element
Change
partitioning key
@rmoff | @confluentinc
@rmoff | @confluentinc
Using the
data
@rmoff | @confluentinc
ksqlDB
@rmoff | @confluentinc
ksqlDB
@rmoff | @confluentinc
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
Amazon
Google
@rmoff | @confluentinc
Kafka -> Elasticsearch
{
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00",
"connection.url": "http://elasticsearch:9200",
"type.name": "type.name=kafkaconnect",
"key.ignore": "false",
"schema.ignore": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
@rmoff | @confluentinc
Kibana for real-time visualisation of rail data
@rmoff | @confluentinc
Kibana for real-time analysis of rail data
@rmoff | @confluentinc
Kafka -> Postgres
{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"connection.url": "jdbc:postgresql://postgres:5432/",
"connection.user": "postgres",
"connection.password": "postgres",
"auto.create": true,
"auto.evolve": true,
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "MESSAGE_KEY",
"topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00",
"transforms": "dropArrays",
"transforms.dropArrays.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.dropArrays.blacklist": "SCHEDULE_SEGMENT_LOCATION, HEADER"
}
@rmoff | @confluentinc
Kafka -> Postgres
@rmoff | @confluentinc
But do you even need a database?
SELECT TIMESTAMPTOSTRING(WINDOWSTART(),'yyyy-MM-dd') AS WINDOW_START_TS, VARIATION_STATUS,
SUM(CASE WHEN TOC = 'Arriva Trains Northern' THEN 1 ELSE 0 END) AS Arriva_CT,
SUM(CASE WHEN TOC = 'East Midlands Trains' THEN 1 ELSE 0 END) AS EastMidlands_CT,
SUM(CASE WHEN TOC = 'London North Eastern Railway' THEN 1 ELSE 0 END) AS LNER_CT,
SUM(CASE WHEN TOC = 'TransPennine Express' THEN 1 ELSE 0 END) AS TransPennineExpress_CT
FROM TRAIN_MOVEMENTS WINDOW TUMBLING (SIZE 1 DAY)
GROUP BY VARIATION_STATUS;
+------------+-------------+--------+----------+------+-------+
| Date | Variation | Arriva | East Mid | LNER | TPE |
+------------+-------------+--------+----------+------+-------+
| 2019-07-02 | OFF ROUTE | 46 | 78 | 20 | 167 |
| 2019-07-02 | ON TIME | 19083 | 3568 | 1509 | 2916 |
| 2019-07-02 | LATE | 30850 | 7953 | 5420 | 9042 |
| 2019-07-02 | EARLY | 11478 | 3518 | 1349 | 2096 |
| 2019-07-03 | OFF ROUTE | 79 | 25 | 41 | 213 |
| 2019-07-03 | ON TIME | 19512 | 4247 | 1915 | 2936 |
| 2019-07-03 | LATE | 37357 | 8258 | 5342 | 11016 |
| 2019-07-03 | EARLY | 11825 | 4574 | 1888 | 2094 |
TRAIN_MOVEMENTS
Time
TOC_STATS
Time
Aggregate
Aggregate
@rmoff | @confluentinc
Kafka -> S3
{
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"tasks.max": "1",
"topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00",
"s3.region": "us-west-2",
"s3.bucket.name": "rmoff-rail-streaming-demo",
"flush.size": "65536",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.avro.AvroFormat",
"schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
"schema.compatibility": "NONE",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"path.format":"'year'=YYYY/'month'=MM/'day'=dd",
"timestamp.extractor":"Record",
"partition.duration.ms": 1800000,
"locale":"en",
"timezone": "UTC"
}
@rmoff | @confluentinc
Kafka -> S3
$ aws s3 ls s3://rmoff-rail-streaming-demo/topics/TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00/year=2019/month=07/day=07/
2019-07-07 10:05:41 200995548 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0004963416.avro
2019-07-07 19:47:11 87631494 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0005007151.avro
@rmoff | @confluentinc
Analysis with AWS Athena
@rmoff | @confluentinc
Graph analysis
{
"connector.class": "streams.kafka.connect.sink.Neo4jSinkConnector",
"topics": "TRAIN_CANCELLATIONS_02",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "sink-neo4j-train-00_dlq",
"errors.deadletterqueue.topic.replication.factor": 1,
"errors.deadletterqueue.context.headers.enable":true,
"neo4j.server.uri": "bolt://neo4j:7687",
"neo4j.authentication.basic.username": "neo4j",
"neo4j.authentication.basic.password": "connect",
"neo4j.topic.cypher.TRAIN_CANCELLATIONS_02": "MERGE (train:train{id: event.TRAIN_ID}) MERGE
(toc:toc{toc: event.TOC}) MERGE (canx_reason:canx_reason{reason: event.CANX_REASON}) MERGE
(canx_loc:canx_loc{location: coalesce(event.CANCELLATION_LOCATION,'<unknown>')}) MERGE (train)-
[:OPERATED_BY]->(toc) MERGE (canx_loc)<-[:CANCELLED_AT{reason:event.CANX_REASON,
time:event.CANX_TIMESTAMP}]-(train)-[:CANCELLED_BECAUSE]->(canx_reason)"
}
@rmoff | @confluentinc
Graph relationships
@rmoff | @confluentinc
Tell me when trains are delayed at my station
@rmoff | @confluentinc
Event-driven Alerting - logic
CREATE STREAM TRAINS_DELAYED_ALERT AS
SELECT […]
FROM TRAIN_MOVEMENTS T
INNER JOIN ALERT_CONFIG A
ON T.LOC_NLCDESC = A.STATION
WHERE TIMETABLE_VARIATION
> A.ALERT_OVER_MINS;
TRAIN_MOVEMENTS
Time
TRAIN_DELAYED_ALERT
Time
ALERT_CONFIG
Key/Value state
@rmoff | @confluentinc
Sending messages to Telegram
curl -X POST -H 'Content-Type: application/json'
-d '{"chat_id": "-364377679", "text": "This is a test from curl"}'
https://api.telegram.org/botxxxxx/sendMessage
{
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"request.method": "post",
"http.api.url": "https://api.telegram.org/[…]/sendMessage",
"headers": "Content-Type: application/json",
"topics": "TRAINS_DELAYED_ALERT_TG",
"tasks.max": "1",
"batch.prefix": "{"chat_id":"-364377679","parse_mode": "markdown",",
"batch.suffix": "}",
"batch.max.size": "1",
"regex.patterns":".*{MESSAGE=(.*)}.*",
"regex.replacements": ""text":"$1"",
"regex.separator": "~",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}
Kafka Kafka Connect
HTTP Sink
Telegram
@rmoff | @confluentinc
Running it &
monitoring &
maintenance
@rmoff | @confluentinc
System health & Throughput
@rmoff | @confluentinc
Configuring and monitoring Kafka Connect
@rmoff | @confluentinc
Data Flow
@rmoff | @confluentinc
Consumer lag
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Restart failing connectors
rmoff@proxmox01 > crontab -l
"*/5 * * * * /home/rmoff/restart_failed_connector_tasks.sh
rmoff@proxmox01 > cat /home/rmoff/restart_failed_connector_tasks.sh
"#!/usr/bin/env bash
# @rmoff / June 6, 2019
# Restart any connector tasks that are FAILED
curl -s "http:"//localhost:8083/connectors" | 
jq '.[]' | 
xargs -I{connector_name} curl -s "http:"//localhost:8083/connectors/"{connector_name}"/status" | 
jq -c -M '[select(.tasks[].state"=="FAILED") | .name,"§±§",.tasks[].id]' | 
grep -v "[]"| 
sed -e 's/^[""//g'| sed -e 's/","§±§",//tasks"//g'|sed -e 's/]$"//g'| 
xargs -I{connector_and_task} curl -X POST "http:"//localhost:8083/connectors/"{connector_and_task}"/
restart"
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Is everything running? http://rmoff.dev/kafka-trains-code-01
🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data
@rmoff @confluentinc
Start at
latest
message
check_latest_timestamp.sh
kafkacat -b localhost:9092 -t networkrail_TRAIN_MVT_X -o-1 -c1 -C | 
jq '.header.msg_queue_timestamp' | 
sed -e 's/""//g' | 
sed -e 's/000$"//g' | 
xargs -Ifoo date "--date=@foo
Just read one
message
Convert from epoch to human-
readable timestamp format
$ ./check_latest_timestamp.sh
Mon 25 Jan 2021 13:39:40 GMT
@rmoff | @confluentinc
✅ Stream raw data ✅ Stream enriched data ✅ Historic data & data replay
ksqlDB
Kafka
Connect
Kafka
Connect
Kafka
@rmoff | @confluentinc
C
o
n
f
i
g
C
o
n
f
i
g
SQL
Fully Managed Kafka
as a Service
R
M
O
F
F
2
0
0
Free money!
(additional $200 towards
your bill 😄)
* T&C: https://www.confluent.io/confluent-cloud-promo-disclaimer
$200 USD off your bill each calendar month
for the first three months when you sign up
https://rmoff.dev/ccloud
developer.confluent.io
Learn Kafka.
Start building with
Apache Kafka at
Confluent Developer.
#EOF

More Related Content

What's hot

From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBMFrom bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBMHostedbyConfluent
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...confluent
 
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...confluent
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019confluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...confluent
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...confluent
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward
 

What's hot (20)

From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBMFrom bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
 
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
 
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
 
Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 

Similar to  On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data | Robin Moffat, Confluent

Integration of APEX and Oracle Forms
Integration of APEX and Oracle FormsIntegration of APEX and Oracle Forms
Integration of APEX and Oracle FormsRoel Hartman
 
Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1
Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1
Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1Axiros
 
Interactive Transmission System Computation Unit
Interactive Transmission System Computation UnitInteractive Transmission System Computation Unit
Interactive Transmission System Computation Unitmayankraj0805
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flinkKenny Gorman
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
EC-WEB 2014 "Automotive ranges as e-commerce data"
EC-WEB 2014 "Automotive ranges as e-commerce data"EC-WEB 2014 "Automotive ranges as e-commerce data"
EC-WEB 2014 "Automotive ranges as e-commerce data"François-Paul Servant
 
TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)
TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)
TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)Ripa Begum
 
Orchestration with Kubernetes
Orchestration with KubernetesOrchestration with Kubernetes
Orchestration with KubernetesKunal Kerkar
 
Sparkling Water Meetup
Sparkling Water MeetupSparkling Water Meetup
Sparkling Water MeetupSri Ambati
 
Deep dive into N1QL: SQL for JSON: Internals and power features.
Deep dive into N1QL: SQL for JSON: Internals and power features.Deep dive into N1QL: SQL for JSON: Internals and power features.
Deep dive into N1QL: SQL for JSON: Internals and power features.Keshav Murthy
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling WaterSri Ambati
 
Interoperable Web Services with JAX-WS
Interoperable Web Services with JAX-WSInteroperable Web Services with JAX-WS
Interoperable Web Services with JAX-WSCarol McDonald
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTKai Zhao
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitorInfluxData
 

Similar to  On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data | Robin Moffat, Confluent (20)

Integration of APEX and Oracle Forms
Integration of APEX and Oracle FormsIntegration of APEX and Oracle Forms
Integration of APEX and Oracle Forms
 
Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1
Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1
Introducing TR-069 - An Axiros Workshop for the TR-069 Protocol - Part 1
 
Interactive Transmission System Computation Unit
Interactive Transmission System Computation UnitInteractive Transmission System Computation Unit
Interactive Transmission System Computation Unit
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
EC-WEB 2014 "Automotive ranges as e-commerce data"
EC-WEB 2014 "Automotive ranges as e-commerce data"EC-WEB 2014 "Automotive ranges as e-commerce data"
EC-WEB 2014 "Automotive ranges as e-commerce data"
 
TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)
TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)
TPS Rail Brochure - Final V3 (Spreads in low quality no bleeds)
 
Orchestration with Kubernetes
Orchestration with KubernetesOrchestration with Kubernetes
Orchestration with Kubernetes
 
Sparkling Water Meetup
Sparkling Water MeetupSparkling Water Meetup
Sparkling Water Meetup
 
Deep dive into N1QL: SQL for JSON: Internals and power features.
Deep dive into N1QL: SQL for JSON: Internals and power features.Deep dive into N1QL: SQL for JSON: Internals and power features.
Deep dive into N1QL: SQL for JSON: Internals and power features.
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling Water
 
Interoperable Web Services with JAX-WS
Interoperable Web Services with JAX-WSInteroperable Web Services with JAX-WS
Interoperable Web Services with JAX-WS
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Odp
OdpOdp
Odp
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitor
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data | Robin Moffat, Confluent

  • 1. Building a Streaming ETL solution with Rail Data @rmoff @confluentinc On Track with Apache Kafka®: 🚂
  • 2.
  • 4.
  • 10. @rmoff | @confluentinc Real-time visualisation of rail data
  • 11. @rmoff | @confluentinc Real-time visualisation of rail data DEMO
  • 16. @rmoff | @confluentinc Data Sources Train Movements Continuous feed Updated Daily Schedules Reference data Static
  • 17. @rmoff | @confluentinc Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect syslog Amazon Google
  • 18. @rmoff | @confluentinc • Kafka CLI producer & consumer • Also metadata inspection • Integrates beautifully with unix pipes! • Get it! • https://github.com/edenhill/kafkacat/ • apt-get install kafkacat • brew install kafkacat • https://hub.docker.com/r/confluentinc/cp-kafkacat/ kafkacat curl -s “https://my.api.endpoint" | kafkacat -b localhost:9092 -t topic -P https://rmoff.net/2018/05/10/quick-n-easy-population-of-realistic-test-data-into-kafka/
  • 19. @rmoff | @confluentinc Movement data A train's movements Arrived / departed Where What time Variation from timetable A train was cancelled Where When Why A train was 'activated' Links a train to a schedule https://wiki.openraildata.com/index.php?title=Train_Movements
  • 20. @rmoff | @confluentinc Ingest ActiveMQ with Kafka Connect "connector.class" : "io.confluent.connect.activemq.ActiveMQSourceConnector", "activemq.url" : "tcp://datafeeds.networkrail.co.uk:61619", "jms.destination.type": "topic", "jms.destination.name": "TRAIN_MVT_EA_TOC", "kafka.topic" : "networkrail_TRAIN_MVT", Northern Trains GNER* Movement events TransPennine * Remember them? networkrail_TRAIN_MVT Kafka Connect Kafka
  • 22. @rmoff | @confluentinc Message transformation kafkacat -b localhost:9092 -G tm_explode networkrail_TRAIN_MVT -u | jq -c '.text|fromjson[]' | kafkacat -b localhost:9092 -t networkrail_TRAIN_MVT_X -T -P Extract element Convert JSON Explode Array
  • 23. @rmoff | @confluentinc Schedule data https://wiki.openraildata.com/index.php?title=SCHEDULE Train routes Type of train (diesel/electric) Classes, sleeping accomodation, reservations, etc
  • 24. @rmoff | @confluentinc Load JSON from S3 into Kafka curl -s -L -u "$USER:$PW" "$FEED_URL" | gunzip | kafkacat -b localhost -P -t CIF_FULL_DAILY
  • 25. @rmoff | @confluentinc Reference data https://wiki.openraildata.com/index.php?title=SCHEDULE Train company names Location codes Train type codes Location geocodes
  • 28. @rmoff | @confluentinc AA:{"canx_reason_code":"AA","canx_reason":"Waiting acceptance into off Network Terminal or Yard"} AC:{"canx_reason_code":"AC","canx_reason":"Waiting train preparation or completion of TOPS list/RT3973"} AE:{"canx_reason_code":"AE","canx_reason":"Congestion in off Network Terminal or Yard"} AG:{"canx_reason_code":"AG","canx_reason":"Wagon load incident including adjusting loads or open door"} AJ:{"canx_reason_code":"AJ","canx_reason":"Waiting Customer’s traffic including documentation"} DA:{"canx_reason_code":"DA","canx_reason":"Non Technical Fleet Holding Code"} DB:{"canx_reason_code":"DB","canx_reason":"Train Operations Holding Code"} DC:{"canx_reason_code":"DC","canx_reason":"Train Crew Causes Holding Code"} DD:{"canx_reason_code":"DD","canx_reason":"Technical Fleet Holding Code"} DE:{"canx_reason_code":"DE","canx_reason":"Station Causes Holding Code"} DF:{"canx_reason_code":"DF","canx_reason":"External Causes Holding Code"} DG:{"canx_reason_code":"DG","canx_reason":"Freight Terminal and or Yard Holding Code"} DH:{"canx_reason_code":"DH","canx_reason":"Adhesion and or Autumn Holding Code"} FA:{"canx_reason_code":"FA","canx_reason":"Dangerous goods incident"} FC:{"canx_reason_code":"FC","canx_reason":"Freight train driver"} kafkacat -l canx_reason_code.dat -b localhost:9092 -t canx_reason_code -P -K:
  • 36. @rmoff | @confluentinc Stream Processing with ksqlDB Stream: widgets Stream: widgets_red
  • 37. @rmoff | @confluentinc Stream Processing with ksqlDB Stream: widgets Stream: widgets_red CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour='RED';
  • 39. @rmoff | @confluentinc Event data NETWORKRAIL_TRAIN_MVT_X ActiveMQ Kafka Connect
  • 40. @rmoff | @confluentinc Message routing with ksqlDB NETWORKRAIL_TRAIN_MVT_X TRAIN_ACTIVATIONS TRAIN_CANCELLATIONS TRAIN_MOVEMENTS MSG_TYPE='0001' MSG_TYPE='0003' MSG_TYPE='0002' Time Time Time Time
  • 41. @rmoff | @confluentinc Split a stream of data (message routing) CREATE STREAM TRAIN_CANCELLATIONS_00 AS SELECT * FROM NETWORKRAIL_TRAIN_MVT_X WHERE header->msg_type = '0002'; CREATE STREAM TRAIN_MOVEMENTS_00 AS SELECT * FROM NETWORKRAIL_TRAIN_MVT_X WHERE header->msg_type = '0003'; Source topic NETWORKRAIL_TRAIN_MVT_X MSG_TYPE='0003' Movement events MSG_TYPE=‘0002' Cancellation events
  • 42. @rmoff | @confluentinc Joining events to lookup data (stream-table joins) CREATE STREAM TRAIN_MOVEMENTS_ENRICHED AS SELECT TM.LOC_ID AS LOC_ID, TM.TS AS TS, L.DESCRIPTION AS LOC_DESC, […] FROM TRAIN_MOVEMENTS TM LEFT JOIN LOCATION L ON TM.LOC_ID = L.ID; TRAIN_MOVEMENTS (Stream) LOCATION (Table) TRAIN_MOVEMENTS_ENRICHED (Stream) TS 10:00:01 LOC_ID 42 TS 10:00:02 LOC_ID 43 TS 10:00:01 LOC_ID 42 LOC_DESC Menston TS 10:00:02 LOC_ID 43 LOC_DESC Ilkley ID DESCRIPTION 42 MENSTON 43 Ilkley
  • 43. @rmoff | @confluentinc Decode values, and generate composite key +---------------+----------------+---------+---------------------+----------------+------------------------+ | CIF_train_uid | sched_start_dt | stp_ind | SCHEDULE_KEY | CIF_power_type | POWER_TYPE | +---------------+----------------+---------+---------------------+----------------+------------------------+ | Y82535 | 2019-05-20 | P | Y82535/2019-05-20/P | E | Electric | | Y82537 | 2019-05-20 | P | Y82537/2019-05-20/P | DMU | Other | | Y82542 | 2019-05-20 | P | Y82542/2019-05-20/P | D | Diesel | | Y82535 | 2019-05-24 | P | Y82535/2019-05-24/P | EMU | Electric Multiple Unit | | Y82542 | 2019-05-24 | P | Y82542/2019-05-24/P | HST | High Speed Train | SELECT JsonScheduleV1->CIF_train_uid + '/' + JsonScheduleV1->schedule_start_date + '/' + JsonScheduleV1->CIF_stp_indicator AS SCHEDULE_KEY, CASE WHEN JsonScheduleV1->schedule_segment->CIF_power_type = 'D' THEN 'Diesel' WHEN JsonScheduleV1->schedule_segment->CIF_power_type = 'E' THEN 'Electric' WHEN JsonScheduleV1->schedule_segment->CIF_power_type = 'ED' THEN 'Electro-Diesel' END AS POWER_TYPE
  • 44. @rmoff | @confluentinc Schemas CREATE STREAM SCHEDULE_RAW ( TiplocV1 STRUCT<transaction_type VARCHAR, tiploc_code VARCHAR, NALCO VARCHAR, STANOX VARCHAR, crs_code VARCHAR, description VARCHAR, tps_description VARCHAR>) WITH (KAFKA_TOPIC='CIF_FULL_DAILY', VALUE_FORMAT='JSON');
  • 45. @rmoff | @confluentinc Handling JSON data CREATE STREAM NETWORKRAIL_TRAIN_MVT_X ( header STRUCT< msg_type VARCHAR, […] >, body VARCHAR) WITH (KAFKA_TOPIC='networkrail_TRAIN_MVT_X', VALUE_FORMAT='JSON'); CREATE STREAM TRAIN_CANCELLATIONS_00 AS SELECT HEADER, EXTRACTJSONFIELD(body,'$.canx_timestamp') AS canx_timestamp, EXTRACTJSONFIELD(body,'$.canx_reason_code') AS canx_reason_code, […] FROM networkrail_TRAIN_MVT_X WHERE header->msg_type = '0002';
  • 46. @rmoff | @confluentinc The Confluent Schema Registry Source Avro Message Target Schema Registry Avro Schema ksqlDB Kafka Connect Avro Message
  • 47. @rmoff | @confluentinc Reserialise data CREATE STREAM TRAIN_CANCELLATIONS_00 WITH (VALUE_FORMAT='AVRO') AS SELECT * FROM networkrail_TRAIN_MVT_X WHERE header->msg_type = '0002';
  • 48. @rmoff | @confluentinc CREATE STREAM TIPLOC_FLAT_KEYED SELECT TiplocV1->TRANSACTION_TYPE AS TRANSACTION_TYPE , TiplocV1->TIPLOC_CODE AS TIPLOC_CODE , TiplocV1->NALCO AS NALCO , TiplocV1->STANOX AS STANOX , TiplocV1->CRS_CODE AS CRS_CODE , TiplocV1->DESCRIPTION AS DESCRIPTION , TiplocV1->TPS_DESCRIPTION AS TPS_DESCRIPTION FROM SCHEDULE_RAW PARTITION BY TIPLOC_CODE; Flatten and re-key a stream { "TiplocV1": { "transaction_type": "Create", "tiploc_code": "ABWD", "nalco": "513100", "stanox": "88601", "crs_code": "ABW", "description": "ABBEY WOOD", "tps_description": "ABBEY WOOD" } } ksql> DESCRIBE TIPLOC_FLAT_KEYED; Name : TIPLOC_FLAT_KEYED Field | Type ---------------------------------------------- TRANSACTION_TYPE | VARCHAR(STRING) TIPLOC_CODE | VARCHAR(STRING) NALCO | VARCHAR(STRING) STANOX | VARCHAR(STRING) CRS_CODE | VARCHAR(STRING) DESCRIPTION | VARCHAR(STRING) TPS_DESCRIPTION | VARCHAR(STRING) ---------------------------------------------- Access nested element Change partitioning key
  • 53. @rmoff | @confluentinc Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect syslog Amazon Google
  • 54. @rmoff | @confluentinc Kafka -> Elasticsearch { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00", "connection.url": "http://elasticsearch:9200", "type.name": "type.name=kafkaconnect", "key.ignore": "false", "schema.ignore": "true", "key.converter": "org.apache.kafka.connect.storage.StringConverter" }
  • 55. @rmoff | @confluentinc Kibana for real-time visualisation of rail data
  • 56. @rmoff | @confluentinc Kibana for real-time analysis of rail data
  • 57. @rmoff | @confluentinc Kafka -> Postgres { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "connection.url": "jdbc:postgresql://postgres:5432/", "connection.user": "postgres", "connection.password": "postgres", "auto.create": true, "auto.evolve": true, "insert.mode": "upsert", "pk.mode": "record_key", "pk.fields": "MESSAGE_KEY", "topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00", "transforms": "dropArrays", "transforms.dropArrays.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.dropArrays.blacklist": "SCHEDULE_SEGMENT_LOCATION, HEADER" }
  • 59. @rmoff | @confluentinc But do you even need a database? SELECT TIMESTAMPTOSTRING(WINDOWSTART(),'yyyy-MM-dd') AS WINDOW_START_TS, VARIATION_STATUS, SUM(CASE WHEN TOC = 'Arriva Trains Northern' THEN 1 ELSE 0 END) AS Arriva_CT, SUM(CASE WHEN TOC = 'East Midlands Trains' THEN 1 ELSE 0 END) AS EastMidlands_CT, SUM(CASE WHEN TOC = 'London North Eastern Railway' THEN 1 ELSE 0 END) AS LNER_CT, SUM(CASE WHEN TOC = 'TransPennine Express' THEN 1 ELSE 0 END) AS TransPennineExpress_CT FROM TRAIN_MOVEMENTS WINDOW TUMBLING (SIZE 1 DAY) GROUP BY VARIATION_STATUS; +------------+-------------+--------+----------+------+-------+ | Date | Variation | Arriva | East Mid | LNER | TPE | +------------+-------------+--------+----------+------+-------+ | 2019-07-02 | OFF ROUTE | 46 | 78 | 20 | 167 | | 2019-07-02 | ON TIME | 19083 | 3568 | 1509 | 2916 | | 2019-07-02 | LATE | 30850 | 7953 | 5420 | 9042 | | 2019-07-02 | EARLY | 11478 | 3518 | 1349 | 2096 | | 2019-07-03 | OFF ROUTE | 79 | 25 | 41 | 213 | | 2019-07-03 | ON TIME | 19512 | 4247 | 1915 | 2936 | | 2019-07-03 | LATE | 37357 | 8258 | 5342 | 11016 | | 2019-07-03 | EARLY | 11825 | 4574 | 1888 | 2094 | TRAIN_MOVEMENTS Time TOC_STATS Time Aggregate Aggregate
  • 60. @rmoff | @confluentinc Kafka -> S3 { "connector.class": "io.confluent.connect.s3.S3SinkConnector", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "tasks.max": "1", "topics": "TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00", "s3.region": "us-west-2", "s3.bucket.name": "rmoff-rail-streaming-demo", "flush.size": "65536", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "format.class": "io.confluent.connect.s3.format.avro.AvroFormat", "schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator", "schema.compatibility": "NONE", "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner", "path.format":"'year'=YYYY/'month'=MM/'day'=dd", "timestamp.extractor":"Record", "partition.duration.ms": 1800000, "locale":"en", "timezone": "UTC" }
  • 61. @rmoff | @confluentinc Kafka -> S3 $ aws s3 ls s3://rmoff-rail-streaming-demo/topics/TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00/year=2019/month=07/day=07/ 2019-07-07 10:05:41 200995548 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0004963416.avro 2019-07-07 19:47:11 87631494 TRAIN_MOVEMENTS_ACTIVATIONS_SCHEDULE_00+0+0005007151.avro
  • 63. @rmoff | @confluentinc Graph analysis { "connector.class": "streams.kafka.connect.sink.Neo4jSinkConnector", "topics": "TRAIN_CANCELLATIONS_02", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "errors.tolerance": "all", "errors.deadletterqueue.topic.name": "sink-neo4j-train-00_dlq", "errors.deadletterqueue.topic.replication.factor": 1, "errors.deadletterqueue.context.headers.enable":true, "neo4j.server.uri": "bolt://neo4j:7687", "neo4j.authentication.basic.username": "neo4j", "neo4j.authentication.basic.password": "connect", "neo4j.topic.cypher.TRAIN_CANCELLATIONS_02": "MERGE (train:train{id: event.TRAIN_ID}) MERGE (toc:toc{toc: event.TOC}) MERGE (canx_reason:canx_reason{reason: event.CANX_REASON}) MERGE (canx_loc:canx_loc{location: coalesce(event.CANCELLATION_LOCATION,'<unknown>')}) MERGE (train)- [:OPERATED_BY]->(toc) MERGE (canx_loc)<-[:CANCELLED_AT{reason:event.CANX_REASON, time:event.CANX_TIMESTAMP}]-(train)-[:CANCELLED_BECAUSE]->(canx_reason)" }
  • 65. @rmoff | @confluentinc Tell me when trains are delayed at my station
  • 66. @rmoff | @confluentinc Event-driven Alerting - logic CREATE STREAM TRAINS_DELAYED_ALERT AS SELECT […] FROM TRAIN_MOVEMENTS T INNER JOIN ALERT_CONFIG A ON T.LOC_NLCDESC = A.STATION WHERE TIMETABLE_VARIATION > A.ALERT_OVER_MINS; TRAIN_MOVEMENTS Time TRAIN_DELAYED_ALERT Time ALERT_CONFIG Key/Value state
  • 67. @rmoff | @confluentinc Sending messages to Telegram curl -X POST -H 'Content-Type: application/json' -d '{"chat_id": "-364377679", "text": "This is a test from curl"}' https://api.telegram.org/botxxxxx/sendMessage { "connector.class": "io.confluent.connect.http.HttpSinkConnector", "request.method": "post", "http.api.url": "https://api.telegram.org/[…]/sendMessage", "headers": "Content-Type: application/json", "topics": "TRAINS_DELAYED_ALERT_TG", "tasks.max": "1", "batch.prefix": "{"chat_id":"-364377679","parse_mode": "markdown",", "batch.suffix": "}", "batch.max.size": "1", "regex.patterns":".*{MESSAGE=(.*)}.*", "regex.replacements": ""text":"$1"", "regex.separator": "~", "key.converter": "org.apache.kafka.connect.storage.StringConverter" } Kafka Kafka Connect HTTP Sink Telegram
  • 68. @rmoff | @confluentinc Running it & monitoring & maintenance
  • 69. @rmoff | @confluentinc System health & Throughput
  • 70. @rmoff | @confluentinc Configuring and monitoring Kafka Connect
  • 73. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Restart failing connectors rmoff@proxmox01 > crontab -l "*/5 * * * * /home/rmoff/restart_failed_connector_tasks.sh rmoff@proxmox01 > cat /home/rmoff/restart_failed_connector_tasks.sh "#!/usr/bin/env bash # @rmoff / June 6, 2019 # Restart any connector tasks that are FAILED curl -s "http:"//localhost:8083/connectors" | jq '.[]' | xargs -I{connector_name} curl -s "http:"//localhost:8083/connectors/"{connector_name}"/status" | jq -c -M '[select(.tasks[].state"=="FAILED") | .name,"§±§",.tasks[].id]' | grep -v "[]"| sed -e 's/^[""//g'| sed -e 's/","§±§",//tasks"//g'|sed -e 's/]$"//g'| xargs -I{connector_and_task} curl -X POST "http:"//localhost:8083/connectors/"{connector_and_task}"/ restart"
  • 74. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Is everything running? http://rmoff.dev/kafka-trains-code-01
  • 75. 🚂 On Track with Apache Kafka: Building a Streaming ETL solution with Rail Data @rmoff @confluentinc Start at latest message check_latest_timestamp.sh kafkacat -b localhost:9092 -t networkrail_TRAIN_MVT_X -o-1 -c1 -C | jq '.header.msg_queue_timestamp' | sed -e 's/""//g' | sed -e 's/000$"//g' | xargs -Ifoo date "--date=@foo Just read one message Convert from epoch to human- readable timestamp format $ ./check_latest_timestamp.sh Mon 25 Jan 2021 13:39:40 GMT
  • 76. @rmoff | @confluentinc ✅ Stream raw data ✅ Stream enriched data ✅ Historic data & data replay ksqlDB Kafka Connect Kafka Connect Kafka
  • 78. Fully Managed Kafka as a Service R M O F F 2 0 0 Free money! (additional $200 towards your bill 😄) * T&C: https://www.confluent.io/confluent-cloud-promo-disclaimer $200 USD off your bill each calendar month for the first three months when you sign up https://rmoff.dev/ccloud
  • 79. developer.confluent.io Learn Kafka. Start building with Apache Kafka at Confluent Developer.
  • 80. #EOF