Real-time Geospatial Aircraft Monitoring Using Apache Kafka

March
2024
·
Kafka
Summit
London
Real-time Geospatial Aircraft Monitoring Using
Apache Kafka
Bhaarat Sharma – CTO & Co-Founder Raft
Neil Buesing – CTO & Co-Founder Kinetic Edge

March
2024
·
Kafka
Summit
London
We bridge the gap between humans
and data through radical
transparency and our obsession with
the mission..

March
2024
·
Kafka
Summit
London

March
2024
·
Kafka
Summit
London
Inspired by: Perishable Insights, Mike Gualtier, Forrester

March
2024
·
Kafka
Summit
London
Data Catalog

March
2024
·
Kafka
Summit
London
Data Pipelines

March
2024
·
Kafka
Summit
London
SQL over Kafka (backed by Pinot)

March
2024
·
Kafka
Summit
London
Fine grained access to dashboard

March
2024
·
Kafka
Summit
London
Architecture

March
2024
·
Kafka
Summit
London
Architecture - Focus for today

March
2024
·
Kafka
Summit
London
Legacy Data Formats
Challenge 1
11

March
2024
·
Kafka
Summit
London
Message Format XML

March
2024
·
Kafka
Summit
London
Message Format XML
● XML ⇒ JSON ✔
● JSON ⇒ XML ✖
consumers expected XML (& valid against XSD schema)
● Additional Challenges
○ marshaling bytes into XML
○ schema validation speed
○ thread safety

March
2024
·
Kafka
Summit
London
Message Format XML
● Custom Serializers - when XML parsing is "read-only"
○ XMLWrapper(byte[] bytes, Document document)
○ wrapper.serialize(xmlWrapper)

March
2024
·
Kafka
Summit
London
Message Format XML
● XOM
○ Slightly better thread safety
○ Faster Implementation
○ Still built on existing Parser technologies
■ so not fully better thread safety

March
2024
·
Kafka
Summit
London
Message Format XML
● XSD Validation
○ Only on incoming or outgoing topics.

March
2024
·
Kafka
Summit
London
Message Format XML
● Staying with XML easier than converting back to XML.
● Optimize Read-Only for speed by compromising on Storage.
● Lessons learned here would apply to other formats; just more
obvious with XML.

March
2024
·
Kafka
Summit
London
Throughput and Latency
Challenge 2
100k msgs/sec
150+ Sensors
Multiple Sensor
Fusion Engines
Enhanced Search

March
2024
·
Kafka
Summit
London
Conﬁgurations - Topics
Challenge 2 – Throughput and Latency
● 12 partitions
○ even distribution across availability zones (÷3)
○ even consumer workloads
■ 1/2/3/4/6/12 (6)
● Currently evaluating 24 for some topics
○ also evenly distributed across availability zones (÷3)
○ 1/2/3/4/6/8/12/24 (8)
■ 30->1/2/3/5/6/10/15/30 (7)
■ 36->1/2/3/4/6/9/12/18/36 (9)

March
2024
·
Kafka
Summit
London
Conﬁgurations - Producer
● buﬀer.size=200_000
○ or more
● linger.ms=10-50
○ balance of latency & throughput
● compression.type=lz4
○ Always test your data against your compression
○ Never compress compressed data

March
2024
·
Kafka
Summit
London
Conﬁgurations - Consumer
● 12 partitions
○ even distribution across availability zones (÷3)
○ even consumer workloads
■ 1/2/3/4/6/12
● max.partition.fetch.bytes & fetch.max.bytes
○ adding partitions can increase latency
(especially if number of consumers isn't increased)

March
2024
·
Kafka
Summit
London
Conﬁgurations - Performance
● Start with the producer
● Then the partitions
● Kafka Streams
○ State Store - Caching (Disable for Latency)
○ Commit Interval (Reduce for Visibility and Latency)
○ Threading - depends on topology & number of
containers.

March
2024
·
Kafka
Summit
London
Authentication and Authorization
Challenge 3

March
2024
·
Kafka
Summit
London
Authentication - Keycloak
Challenge 3 – Authentication & Authorization
● OAuth Callback Handler
○ Kafka 3.2.1+
○ self-signed certiﬁcates ...
● Custom Callback Handler
○ error handling considerations
● Librdkafka (Non-Java)
○ Leverage callback handler doing RESTful operation
(custom callback)

March
2024
·
Kafka
Summit
London
Authorization – Open Policy Agent
Challenge 3 – Authentication & Authorization
● OPA Rego ☞ reminds me of Prolog
● .rego examples
consumer_operations = {
"TOPIC" : [ "READ", "DESCRIBE" ],
"GROUP" : [ "READ", "DESCRIBE" ]
}
is_consumer_group {
user_groups(principal)[_] == topic_consumer_groups(topic_name)[_]
}
allow_consumer_group {
is_group_resource
is_consumer_operation
startswith(group_name, concat("-", [principal, ""]))
}

March
2024
·
Kafka
Summit
London
Pattern of life analysis – Warm Data
Challenge 4
Source: ADSB Exchange

March
2024
·
Kafka
Summit
London
Custom Kafka Rest API
Challenge 4 – Pattern of Life Analysis
● One of the easiest parts to build
○ Producer
■ linger.ms - major impact - chose wisely
○ Consumer -- not RESTful ☞ Websockets
● One of the easiest mistakes to make - waiting...
○ producer ﬂushing
○ linger.ms
○ Callback vs. Waiting on Future
Leverage Framework, e.g. Spring's Deferred Result
● Http Response Codes
○ 200 - OK
○ 201 - Created (try to avoid using this one, but some
clients....)
○ 202 - Accepted

March
2024
·
Kafka
Summit
London
Real Time Data – Hot Data
Challenge 5
Source: ADSB Exchange

March
2024
·
Kafka
Summit
London
Websocket - Consumer
Challenge 5 – Real Time Data – Hot Data
● consumer.subscribe() vs. consumer.assign()
● handling backpressure
● tried Java 21 and Virtual Threads - did not help...
● 2 implementation
○ web-socket consumption drains queue
○ 30 second eviction (independent of consumption)
● garbage collection
topic poll() websocket
LinkedBlockingQueue push()

March
2024
·
Kafka
Summit
London
Websocket - Consumer
Challenge 5 – Real Time Data – Hot Data
● Data Mashing over Websocket
○ XML - not great
○ JSON - better
○ Apache Arrow - best
topic poll() websocket
LinkedBlockingQueue push()
consumer thread thread / websocket

March
2024
·
Kafka
Summit
London
Real-Time Data Enrichment
Challenge 6

March
2024
·
Kafka
Summit
London
Kafka Streams
Challenge 6 – Real Time Data Enrichment
● Avoid initial rekeying (trust but verify), every rekey adds latency.
ﬁlter((k, v) -> {
if (k ≠ v.id()) {
log.error("invalid...");
return false;
}
● Global Tables / KTables

March
2024
·
Kafka
Summit
London
Kafka Streams
Challenge 6 – Real Time Data Enrichment
● Commit Interval
○ 100ms - 5000ms
● Threads
○ Containers vs Stream Threads
● If Scaling down and up (and not using static membership)
internal.leave.group.on.close = true
● If Scaling down and up (and static membership + Kafka Streams 3.3+)
KafkaStreams.CloseOptions closeOptions = new
KafkaStreams.CloseOptions().timeout(SHUTDOWN).leaveGroup(true);
streams.close(closeOptions);

March
2024
·
Kafka
Summit
London
Websocket Demo

March
2024
·
Kafka
Summit
London
Thank you
Questions

Real-time Geospatial Aircraft Monitoring Using Apache Kafka

Recommended

Recommended

More Related Content

Similar to Real-time Geospatial Aircraft Monitoring Using Apache Kafka

Similar to Real-time Geospatial Aircraft Monitoring Using Apache Kafka (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Real-time Geospatial Aircraft Monitoring Using Apache Kafka