SlideShare a Scribd company logo
1 of 44
Download to read offline
Near real time processing of spatial data in
Kafka Streams
Near real time processing of spatial data using Kafka StreamsNear real time processing of spatial data using Kafka Streams
Current 2022
Ian Feeney and Roman Kolesnev
October 2022
Austin, Texas
Spatial data in motion
Near real time processing of spatial data using Kafka StreamsNear real time processing of spatial data using Kafka Streams
Who are we?
2
Ian Feeney
Customer Innovation Engineer
ifeeney@confluent.io
Roman Kolesnev
Staff Customer Innovation Engineer
ifeeney@confluent.io
Confluent Solution
and Innovation
Division
https://www.confluent.io/confluent-accelerators
Everything happens somewhere
3
● Proliferation of Smartphones, GPS devices and IoT
● Vast quantities of spatial data being produced
● Realtime geolocation at the heart of UX, business processes
and business models
● Consider Uber
● Everything event happens somewhere
Spatial data in Kafka Streams
4
● Processing fast moving
unbounded event
streams
● React to events from
many source in real time
● How to process
geospatial data in Kafka
streams?
Agenda
5
1. Geospatial 101 - Spatial data and operations
2. Use case and demo application
3. Standard Kafka Streams approach
4. Custom state store approach
Geospatial Data
6
● Represent and encode the geographies of real world entities
● Raster
○ image based format
○ pixels carry value
● Vector
○ represent points lines and polygons using pairs of x,y coordinates
Geospatial Data
7
POINT(-122.098, 37.657)
Geospatial Data
8
LINSETRING (-122.130 37.686, -122.129 37.701, ….. -122.091
37.649)
Geospatial Data
9
Polygon((-122.112 37.715, -122.097 37.717 ….-122.112 37.715))
Geospatial Operations
10
● Geometry based functions
● Analysis of spatial data
○ Determine spatial relationships
○ Resolve spatial interactions
● Produce new data and insights
Geospatial Operations
11
● Spatial semantics for processing and querying our data
● contains, overlaps, touches, within, intersects
Geospatial Operations
12
ST_INTERSECTION(polygon_1, polygon_2)
Geospatial Operations
13
ST_BUFFER(point_1, distance)
Spatial Join
14
SELECT taxi_id, person_name, ST_AsText(taxi_positions.geom) taxi_geom
FROM taxi_positions
JOIN (
SELECT person_name, ST_BUFFER(geom, 2.5) geom
FROM people
WHERE person_name = 'Ian Feeney'
) buffered_person
ON ST_INTERSECTS(taxi_positions.geom, buffered_person.geom)
;
Use Case
15
Find taxis which are within 2.5 miles
of miles of a person’s location in
real time
Demo
16
● Kafka Streams Domain Specific Language (DSL) for the
processing topology
● Geohashing for rekeying
● Java Spatial4J library for spatial operations
17
Standard Kafka streams Approach
Kafka Streams Topology
18
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
Buffer people
19
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
Rekey
20
Records on both sides of a join must be:
● keyed by the same field
● routed to the same partition and handled
by the same processing task
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
Why Rekey?
Why Rekey?
Ensure related events routed
to the same partition
Geohashing for rekeying
23
image source: https://www.pubnub.com/learn/glossary/what-is-geohashing/
Geohashing for rekeying
24
geohash(POINT(-122.116 37.696), 1) -> 9
geohash(POINT(-122.116 37.696), 2) -> 9q
geohash(POINT(-122.116 37.696), 3) -> 9q9
geohash(POINT(-122.116 37.696), 4) -> 9q9n
geohash(POINT(-122.116 37.696), 5) -> 9q9nm
geohash(POINT(-122.116 37.696), 6) -> 9q9nmn
geohash(POINT(-122.116 37.696), 7) -> 9q9nmnm
.
.
.
geohash(POINT(-122.116 37.696), 20) -> 9q9nmnmwykky1x55w3uc
1. geohash(POINT(-122.116 37.696), 6) -> 9q9nmn
2. geohash(POINT(-122.114 37.695), 6) -> 9q9nmn
Geohashing - level 6
25
1. geohash(POINT(-122.116 37.696), 7) -> 9q9nmnm
2. geohash(POINT(-122.114 37.695), 7) -> 9q9nmnq
Geohashing - level 7
26
How are geohashes useful for us?
● Geospatial event key
● Taxis and people events which are within 2.5 miles of each other can
be:
● keyed on the same field
● routed to the same partition
● matched in the join
Rekey taxi events
28
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
Rekey buffered people events
29
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
Join Rekeyed Buffered People and Taxi events
30
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
False positives
31
Filter false positives
32
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
Standard Kafka Streams Approach Downsides
● False positives = Extra processing
● Person events duplicated = Extra data
● Limitations of state store
○ No spatial operations
○ Non key based lookups not possible*
*Even foreign key joins use key based lookups under the hood
Spatially enabled state store
34
Can we support spatial joins at the
state store layer?
Custom State Store
35
Requirements:
• Embeddable in Java
• Supports Spatial operations
Apache Lucene - High performance, embeddable, written in Java. Supports indexing, search,
spatial operations.
Incremental steps:
• WindowStore implementation using Lucene as storage engine.
• Extension of WindowStore interface.
• Custom SpatialJoin DSL operation implementation.
Custom State Store - Lucene Window Store
36
WindowStore implementation using Lucene as storage engine
Custom State Store - Lucene Window Store
37
Lucene WindowStore construction flow
Custom State Store - Lucene Spatial Window Store
38
Extension of WindowStore interface - adding spatial operations
Custom State Store - SpatialJoin
39
Custom SpatialJoin operation implementation
• Recreates KStreamToKStream join logic flow
• Join based on Key, Time and Point within radius lookup
• ValueTransformer implementation to stay at DSL level
SpatialJoinTransformer
• SpatialWindowStore - this
• SpatialWindowStore - other
• JoinWindows
• ValueJoiner
• PointExtractor
• joinRadius
Custom State Store - SpatialJoin
40
Custom SpatialJoin operation implementation architecture
Vanilla Kafka Streams vs. Custom State Store
41
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
SpatialJoin
TaxiPerson
Lucene
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
Join
Filter
TaxiPerson
RocksDB RocksDB Lucene
Reduce/Eliminate Person Duplication
42
Taxis People
Buffer
(mapValues)
Rekey (flatMap)
Rekey (map)
SpatialJoin
TaxiPerson
Lucene Lucene
Level 4 geohash
Level 5 geohash
And Finally……
43
1. You can use spatial semantics for event processing in Kafka Streams
2. Implementation of custom state store enables:
a. Spatial operations at the state store layer
b. Usage of secondary indexes and composite queries in state store
data lookups, including non key based lookups
We would love to hear from you.
44
Ian Feeney
ifeeney@confluent.io
Roman Kolesnev
rkolesnev@confluent.io
https://www.confluent.io/confluent-accelerators
https://confluent.io/community/ask-the-community

More Related Content

What's hot

Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoDatabricks
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Implementing zero trust in IBM Cloud Pak for Integration
Implementing zero trust in IBM Cloud Pak for IntegrationImplementing zero trust in IBM Cloud Pak for Integration
Implementing zero trust in IBM Cloud Pak for IntegrationKim Clark
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Casesconfluent
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Bilgin Ibryam
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkmxmxm
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 

What's hot (20)

Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
kafka
kafkakafka
kafka
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Implementing zero trust in IBM Cloud Pak for Integration
Implementing zero trust in IBM Cloud Pak for IntegrationImplementing zero trust in IBM Cloud Pak for Integration
Implementing zero trust in IBM Cloud Pak for Integration
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 

Similar to Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman Kolesnev | Current 2022

Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016Eduard Lazar
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...Karthik Murugesan
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker seriesMonal Daxini
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid CloudRafał Leszko
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloudRafał Leszko
 
Real-time Fraudulent Trips Detection with Xueyao Jiang
Real-time Fraudulent Trips Detection with Xueyao JiangReal-time Fraudulent Trips Detection with Xueyao Jiang
Real-time Fraudulent Trips Detection with Xueyao JiangHostedbyConfluent
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Zhenxiao Luo
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Globus
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward
 
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...HostedbyConfluent
 
OSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringOSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringGianluca Arbezzano
 
OSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
OSDC 2018 | Distributed Monitoring by Gianluca ArbezzanoOSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
OSDC 2018 | Distributed Monitoring by Gianluca ArbezzanoNETWAYS
 
FIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFermin Galan
 

Similar to Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman Kolesnev | Current 2022 (20)

Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
True Reusable Code - DevSum2016
True Reusable Code - DevSum2016True Reusable Code - DevSum2016
True Reusable Code - DevSum2016
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
 
Real-time Fraudulent Trips Detection with Xueyao Jiang
Real-time Fraudulent Trips Detection with Xueyao JiangReal-time Fraudulent Trips Detection with Xueyao Jiang
Real-time Fraudulent Trips Detection with Xueyao Jiang
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
Building a Scalable Real-Time Fleet Management IoT Data Tracker with Kafka St...
 
OSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoringOSDC 2018 - Distributed monitoring
OSDC 2018 - Distributed monitoring
 
OSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
OSDC 2018 | Distributed Monitoring by Gianluca ArbezzanoOSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
OSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
 
FIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scaleFIWARE: Managing Context Information at large scale
FIWARE: Managing Context Information at large scale
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman Kolesnev | Current 2022

  • 1. Near real time processing of spatial data in Kafka Streams Near real time processing of spatial data using Kafka StreamsNear real time processing of spatial data using Kafka Streams Current 2022 Ian Feeney and Roman Kolesnev October 2022 Austin, Texas Spatial data in motion Near real time processing of spatial data using Kafka StreamsNear real time processing of spatial data using Kafka Streams
  • 2. Who are we? 2 Ian Feeney Customer Innovation Engineer ifeeney@confluent.io Roman Kolesnev Staff Customer Innovation Engineer ifeeney@confluent.io Confluent Solution and Innovation Division https://www.confluent.io/confluent-accelerators
  • 3. Everything happens somewhere 3 ● Proliferation of Smartphones, GPS devices and IoT ● Vast quantities of spatial data being produced ● Realtime geolocation at the heart of UX, business processes and business models ● Consider Uber ● Everything event happens somewhere
  • 4. Spatial data in Kafka Streams 4 ● Processing fast moving unbounded event streams ● React to events from many source in real time ● How to process geospatial data in Kafka streams?
  • 5. Agenda 5 1. Geospatial 101 - Spatial data and operations 2. Use case and demo application 3. Standard Kafka Streams approach 4. Custom state store approach
  • 6. Geospatial Data 6 ● Represent and encode the geographies of real world entities ● Raster ○ image based format ○ pixels carry value ● Vector ○ represent points lines and polygons using pairs of x,y coordinates
  • 8. Geospatial Data 8 LINSETRING (-122.130 37.686, -122.129 37.701, ….. -122.091 37.649)
  • 9. Geospatial Data 9 Polygon((-122.112 37.715, -122.097 37.717 ….-122.112 37.715))
  • 10. Geospatial Operations 10 ● Geometry based functions ● Analysis of spatial data ○ Determine spatial relationships ○ Resolve spatial interactions ● Produce new data and insights
  • 11. Geospatial Operations 11 ● Spatial semantics for processing and querying our data ● contains, overlaps, touches, within, intersects
  • 14. Spatial Join 14 SELECT taxi_id, person_name, ST_AsText(taxi_positions.geom) taxi_geom FROM taxi_positions JOIN ( SELECT person_name, ST_BUFFER(geom, 2.5) geom FROM people WHERE person_name = 'Ian Feeney' ) buffered_person ON ST_INTERSECTS(taxi_positions.geom, buffered_person.geom) ;
  • 15. Use Case 15 Find taxis which are within 2.5 miles of miles of a person’s location in real time
  • 17. ● Kafka Streams Domain Specific Language (DSL) for the processing topology ● Geohashing for rekeying ● Java Spatial4J library for spatial operations 17 Standard Kafka streams Approach
  • 18. Kafka Streams Topology 18 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 19. Buffer people 19 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 20. Rekey 20 Records on both sides of a join must be: ● keyed by the same field ● routed to the same partition and handled by the same processing task Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 22. Why Rekey? Ensure related events routed to the same partition
  • 23. Geohashing for rekeying 23 image source: https://www.pubnub.com/learn/glossary/what-is-geohashing/
  • 24. Geohashing for rekeying 24 geohash(POINT(-122.116 37.696), 1) -> 9 geohash(POINT(-122.116 37.696), 2) -> 9q geohash(POINT(-122.116 37.696), 3) -> 9q9 geohash(POINT(-122.116 37.696), 4) -> 9q9n geohash(POINT(-122.116 37.696), 5) -> 9q9nm geohash(POINT(-122.116 37.696), 6) -> 9q9nmn geohash(POINT(-122.116 37.696), 7) -> 9q9nmnm . . . geohash(POINT(-122.116 37.696), 20) -> 9q9nmnmwykky1x55w3uc
  • 25. 1. geohash(POINT(-122.116 37.696), 6) -> 9q9nmn 2. geohash(POINT(-122.114 37.695), 6) -> 9q9nmn Geohashing - level 6 25
  • 26. 1. geohash(POINT(-122.116 37.696), 7) -> 9q9nmnm 2. geohash(POINT(-122.114 37.695), 7) -> 9q9nmnq Geohashing - level 7 26
  • 27. How are geohashes useful for us? ● Geospatial event key ● Taxis and people events which are within 2.5 miles of each other can be: ● keyed on the same field ● routed to the same partition ● matched in the join
  • 28. Rekey taxi events 28 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 29. Rekey buffered people events 29 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 30. Join Rekeyed Buffered People and Taxi events 30 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 32. Filter false positives 32 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson
  • 33. Standard Kafka Streams Approach Downsides ● False positives = Extra processing ● Person events duplicated = Extra data ● Limitations of state store ○ No spatial operations ○ Non key based lookups not possible* *Even foreign key joins use key based lookups under the hood
  • 34. Spatially enabled state store 34 Can we support spatial joins at the state store layer?
  • 35. Custom State Store 35 Requirements: • Embeddable in Java • Supports Spatial operations Apache Lucene - High performance, embeddable, written in Java. Supports indexing, search, spatial operations. Incremental steps: • WindowStore implementation using Lucene as storage engine. • Extension of WindowStore interface. • Custom SpatialJoin DSL operation implementation.
  • 36. Custom State Store - Lucene Window Store 36 WindowStore implementation using Lucene as storage engine
  • 37. Custom State Store - Lucene Window Store 37 Lucene WindowStore construction flow
  • 38. Custom State Store - Lucene Spatial Window Store 38 Extension of WindowStore interface - adding spatial operations
  • 39. Custom State Store - SpatialJoin 39 Custom SpatialJoin operation implementation • Recreates KStreamToKStream join logic flow • Join based on Key, Time and Point within radius lookup • ValueTransformer implementation to stay at DSL level SpatialJoinTransformer • SpatialWindowStore - this • SpatialWindowStore - other • JoinWindows • ValueJoiner • PointExtractor • joinRadius
  • 40. Custom State Store - SpatialJoin 40 Custom SpatialJoin operation implementation architecture
  • 41. Vanilla Kafka Streams vs. Custom State Store 41 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) SpatialJoin TaxiPerson Lucene Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) Join Filter TaxiPerson RocksDB RocksDB Lucene
  • 42. Reduce/Eliminate Person Duplication 42 Taxis People Buffer (mapValues) Rekey (flatMap) Rekey (map) SpatialJoin TaxiPerson Lucene Lucene Level 4 geohash Level 5 geohash
  • 43. And Finally…… 43 1. You can use spatial semantics for event processing in Kafka Streams 2. Implementation of custom state store enables: a. Spatial operations at the state store layer b. Usage of secondary indexes and composite queries in state store data lookups, including non key based lookups
  • 44. We would love to hear from you. 44 Ian Feeney ifeeney@confluent.io Roman Kolesnev rkolesnev@confluent.io https://www.confluent.io/confluent-accelerators https://confluent.io/community/ask-the-community