SlideShare a Scribd company logo
gschmutz
Stream Processing –
Concepts and Frameworks
JEE Conf 2019
Guido Schmutz (guido.schmutz@trivadis.com)
gschmutz http://guidoschmutz.wordpress.com
gschmutz
Agenda
Stream Processing – Concepts and Frameworks
1. Motivation for Stream Processing?
2. Capabilities for Stream Processing
3. Implementing Stream Processing Solutions
4. Demo
5. Summary
gschmutz
Guido Schmutz
Stream Processing – Concepts and Frameworks
Working at Trivadis for more than 22 years
Oracle Groundbreaker Ambassador & Oracle ACE Director
Consultant, Trainer, Software Architect for Java, AWS, Azure,
Oracle Cloud, SOA and Big Data / Fast Data
Platform Architect & Head of Trivadis Architecture Board
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
155th edition
gschmutzStream Processing – Concepts and Frameworks
Motivation for Stream Processing?
gschmutz
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Search
SQL
Export
Service
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Big Data solves Volume and Variety – not Velocity
Stream Processing – Concepts and Frameworks
gschmutz
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Search
SQL
Export
Service
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Source
Location
Telemetry
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Stream Processing – Concepts and Frameworks
Event Stream
gschmutz
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Search
SQL
Export
Service
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Stream
Event Source
Location
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Stream Processing – Concepts and Frameworks
Event
Hub
Event
Hub
Event
Hub
Telemetry
gschmutz
"Data at Rest" vs. "Data in Motion"
Stream Processing – Concepts and Frameworks
Data at Rest Data in Motion
Store
(Re)Act
Visualize/
Analyze
StoreAct
Analyze
11101
01010
10110
11101
01010
10110
vs.
Visualize
gschmutz
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Stream Processing Architecture solves Velocity
Stream Processing – Concepts and Frameworks
BI Tools
Enterprise Data
Warehouse
Event
Hub
SQ
L
Search / Explore
Enterprise Apps
Search
ServiceResults
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Low(est) latency, no history
Telemetry
gschmutz
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Big Data for all historical data analysis
Stream Processing – Concepts and Frameworks
BI Tools
Enterprise Data
Warehouse
SQ
L
Search / Explore
Enterprise Apps
Search
ServiceResults
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
File Import / SQL Import
Telemetry
gschmutz
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Integrate existing systems with lower latency through CDC
Stream Processing – Concepts and Frameworks
BI Tools
Enterprise Data
Warehouse
SQ
L
Search / Explore
Enterprise Apps
Search
ServiceResults
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
File Import / SQL Import
Event
Stream
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Change Data
Capture
Telemetry
gschmutz
New systems participate in event-oriented fashion
Stream Processing – Concepts and Frameworks
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Platform
Microservice State
{ }
API
Stream Analytics Platform
Stream
Processor
State
{ }
API
Event
Stream
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
SQL
Export
Search
Service
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Change Data
Capture
Event
Stream
Event
Stream
Telemetry
gschmutz
Edge computing allows processing close to data sources
Stream Processing – Concepts and Frameworks
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Platform
Microservice State
{ }
API
Stream Analytics Platform
Stream
Processor
State
{ }
API
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
SQL
Export
Search
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Edge Node
File Import / SQL Import
Change DataCapture D
ata
Flow
Event
Hub
Data Flow
Event
Stream
Event
Stream
Event Stream
Telemetry
Rules
Event Hub
Storage
gschmutz
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
Stream Processing – Concepts and Frameworks
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
gschmutz
Two Types of Stream Processing
(by Gartner)
Stream Processing – Concepts and Frameworks
Stream Data Integration
• focuses on the ingestion and processing of
data sources targeting real-time extract-
transform-load (ETL) and data integration
use cases
• filter and enrich the data
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business
Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
gschmutz
Stream Processing & Analytics Ecosystem
Stream Processing – Concepts and Frameworks
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
gschmutzStream Processing – Concepts and Frameworks
Important Capabilities for Stream
Processing
gschmutz
Capabilities: Stream Data Integration vs. Stream Analytics
Stream Processing – Concepts and Frameworks
Stream Data Integration Stream Analytics
Support for Various Data Sources yes -
Streaming ETL (Transformation/Format Translation, Routing, Validation) yes partial
Execution Mode: Native Streaming yes yes
Execution Mode: Non-Native Streaming - Micro-Batching yes partial
Delivery Guarantees yes yes
API : GUI-Based API / Declarative API / Programmatic yes yes
API: Streaming SQL - yes
Event Time vs. Ingestion / Processing Time - yes
Windowing - yes
Stream-to-Static Joins (Lookup/Enrichment) partial yes
Stream-to-Stream Joins - yes
State Management - yes
Queryable State (aka Interactive Queries) - yes
Event Pattern Detection - Yes
gschmutz
Integrating Data Sources
Stream Processing – Concepts and Frameworks
Sensor Stream
SQL Polling
Change Data Capture
(CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
gschmutz
Streaming ETL
Stream Processing – Concepts and Frameworks
• Streaming Extract – Transform – Load
• Flow-based ”programming”
• High-Throughput, straight-through
data flows
• Visual coding with flow editor
• Stream Data Integration but not
Stream Analytics
gschmutz
Execution Mode: Native Streaming
Ingestion
Event
Source
Event
Source
Stream Processing – Concepts and Frameworks
Individual Event
PPPPPPPPPPPP
• Events processed as they arrive
• low(est)-latency
• fault tolerance expensive
gschmutz
Execution Mode: Non-Native Streaming - Micro-Batching
Ingestion
Event
Source
Event
Source
Stream Processing – Concepts and Frameworks
PPPPPP
• Splits incoming stream in small batches
• Fault tolerance easier to achieve
• Higher latency
gschmutz
Delivery Guarantees
Stream Processing – Concepts and Frameworks
At most once (fire-and-forget)
§ message is sent, but the sender doesn’t care if it’s received or lost.
At least once
§ Retransmission of messages can cause messages to be sent one or
more times
Exactly once
§ ensures that a message is received once and only once (never lost
and never repeated)
[ 0 | 1 ]
[ 1+ ]
[ 1 ]
gschmutz
API
Stream Processing – Concepts and Frameworks
GUI-based / Drag-and-Drop
• A graphical way of designing a
pipeline
• Often web-based
Declarative
• An streaming engine configured
declaratively
• JSON, YML
"config": {
"connector.class": "..MqttSourceConnector",
"tasks.max": "1",
"mqtt.server.uri": "tcp://mosquitto-1:1883",
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position",
...
gschmutz
Programmatic
• Low-level (class) or high-level fluent
API
• Higher order function as operators
(filter, mapWithState …)
API (II)
Stream Processing – Concepts and Frameworks
Streaming SQL
• use stream in FROM clause
• Extensions support windowing, pattern
matching, spatial, ….
val filteredDf = truckPosDf.
where("eventType !='Normal'")
SELECT *
FROM truck_position_s
WHERE eventType != 'Normal'
gschmutz
Event Time vs. Ingestion / Processing Time
Stream Processing – Concepts and Frameworks
Event time
• time at which events actually occurred
Ingestion time / Processing Time
• time at which events are ingested into /
processed by the system
Not all use cases care about event times, but lot’s do!
gschmutz
Windowing
Stream Processing – Concepts and Frameworks
Computations over events done using windows of data
not feasible to keep entire stream of data in memory
window represents a certain amount of data to perform computations on
Time
Stream of Data Window of Data
gschmutz
Sliding / Hopping Window
eviction & trigger based on
window length and sliding interval
length
Fixed / Tumbling Window
eviction based on window being full
and trigger based on either count of
items or time
Session Window
sequences of temporarily related
events terminated by gap of
inactivity > than some timeout
Windowing
Stream Processing – Concepts and Frameworks
Time TimeTime
gschmutz
Joining – Stream-to-Static
Stream Processing – Concepts and Frameworks
Challenges of joining streams
• Data streams need to be aligned
because of their different
timestamps
• joins must be limited; otherwise
they will never end
• join needs to produce results
continuously
• there is no end to the data
Stream-to-Static (Table) Join
Stream-to-
Static Join
Time
gschmutz
Joining –Stream-to-Stream
Stream Processing – Concepts and Frameworks
Stream-to-Stream Join (one window
join)
Stream-to-Stream Join (two window
join)
Stream-to-
Stream
Join
Stream-to-
Stream
Join
Time
Time
gschmutz
State Management
Stream Processing – Concepts and Frameworks
Needed if use case is dependent
on previously seen data
Windowing, Joining and Pattern
Detection use State Management
behind the scenes
State needs to be as close to the
stream processor as possible
How does it handle failures?
Options for State Management
In-Memory
Replicated,
Distributed
Store
Local,
Embedded
Store
Operational Complexity and Features
Low high
gschmutz
Queryable State (aka. Interactive Queries)
Stream Processing – Concepts and Frameworks
Exposes state managed by
Stream Analytics solution
Allows application to query
managed state, i.e. to
visualize it
can eliminate need for an
external database to keep
results
Stream Processing Infrastructure
Reference
Data
Stream Analytics
{ }
Query API
State
Stream
Processor
Search /
Explore
Online &
Mobile Apps
Model
Dashboard
gschmutz
Event Pattern Detection
Stream Processing – Concepts and Frameworks
• Streaming Data often
contain interesting
patterns
• Special operators allow
finding complex
relationships between
events
• Absence Pattern - event A not followed by
event B within time window
• Sequence Pattern - event A followed by event
B followed by event C
• Increasing Pattern - up trend of a value of a
certain attribute
• Decreasing Pattern - down trend of a value of
a certain attribute
• …
gschmutz
Capabilities: Stream Data Integration vs. Stream Analytics
Stream Processing – Concepts and Frameworks
Stream Data Integration Stream Analytics
Support for Various Data Sources yes -
Streaming ETL (Transformation/Format Translation, Routing, Validation) yes partial
Execution Mode: Native Streaming yes yes
Execution Mode: Non-Native Streaming - Micro-Batching yes partial
Delivery Guarantees yes yes
API : GUI-Based API / Declarative API / Programmatic yes yes
API: Streaming SQL - yes
Event Time vs. Ingestion / Processing Time - yes
Windowing - yes
Stream-to-Static Joins (Lookup/Enrichment) partial yes
Stream-to-Stream Joins - yes
State Management - yes
Queryable State (aka Interactive Queries) - yes
Event Pattern Detection - Yes
gschmutzStream Processing – Concepts and Frameworks
Implementing Stream Processing
Solutions
gschmutz
Stream Processing & Analytics Ecosystem
Stream Processing – Concepts and Frameworks
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
gschmutz
Event Hub: Apache Kafka
Stream Processing – Concepts and Frameworks
Kafka Cluster
Consumer Consumer Consumer
Broker 1 Broker 2 Broker 3
Zookeeper
Ensemble
ZK 1 ZK 2ZK 3
Schema
Registry
Service 1
Management
Control Center
Kafka Manager
KAdmin
Producer Producer Producer
kafkacat
Data Retention
• Never
• Time (TTL) or Size-based
• Log-Compacted based
gschmutz
Stream Data Integration: Kafka Connect
Stream Processing – Concepts and Frameworks
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" -d $'{
"name": "mqtt-source",
"config": {
"connector.class": "io.confluent.connect.mqtt.MqttSourceConnector",
"tasks.max": "1",
"mqtt.server.uri": "tcp://mosquitto:1883",
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position" }
}'
• declarative style data flows
• framework is part of Kafka
• Many connectors available
• Single Message Transforms
(SMT)
gschmutz
Stream Data Integration: StreamSets
• GUI-based, drag-and drop Data
Flow Pipelines
• Both stream and batch processing
• special option for Edge computing
• custom sources, sinks, processors
• Monitoring and Error Detection
Stream Processing – Concepts and Frameworks
gschmutz
Stream Analytics: Kafka Streams
• Programmatic API, “just” a Java library
• Native streaming
• fault-tolerant local state
• Fixed, Sliding and Session Windowing
• Stream-Stream / Stream-Table Joins
• At-least-once and exactly-once
KTable<Integer, Customer> customers = builder.stream(”customer");
KStream<Integer, Order> orders = builder.stream(”order");
KStream<Integer, String> joined = orders.leftJoin(customers, …);
joined.to(”orderEnriched");
trucking_
driver
Kafka Broker
Java Application
Kafka Streams
Stream Processing – Concepts and Frameworks
gschmutz
Stream Analytics: KSQL
• Stream Processing with zero coding using
SQL-like language
• part of Confluent Platform (community
edition)
• built on top of Kafka Streams
• interactive (CLI) and headless (command file)
CREATE STREAM customer_s WITH (kafka_topic='customer', value_format='AVRO');
SELECT * FROM customer_s WHERE address->country = 'Switzerland';
...
trucking_
driver
Kafka Broker
KSQL Engine
Kafka Streams
KSQL CLI Commands
Stream Processing – Concepts and Frameworks
gschmutz
Stream Analytics: Spark Structured Streaming
Stream Processing – Concepts and Frameworks
• 2nd gen Spark Streaming, using
DataFrame instead of RDD
• Programmatic API
• Code reuse between batch and
streaming
• Supports Java, Scala, Python, R
and SQL
val oderDf = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "broker-1:9092")
.option("subscribe", ”order")
.load()
val orderFilteredDf = orderDf.where(”address.county = ‘Switzerland'")
gschmutzStream Processing – Concepts and Frameworks
Demo
gschmutz
Sample Use Case
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck_
position
Stream
Stream
dangerous_
driving
count_by_
eventType
Table
dangergous_
driving_coun
t
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Stream Processing – Concepts and Frameworks
Source: https://github.com/gschmutz/iot-truck-demo
gschmutzStream Processing – Concepts and Frameworks
Summary
gschmutz
Summary
Stream Processing – Concepts and Frameworks
• Stream Processing is the solution for low-latency
• Event Hub, Stream Data Integration and Stream Analytics are the main
building blocks in your architecture
• Kafka is currently the de-facto standard for Event Hub
• Various options exists for Stream Data Integration and Stream Analytics
• SQL becomes a valid option for implementing Stream Analytics
gschmutzStream Processing – Concepts and Frameworks
Technology on its own won't help you.
You need to know how to use it properly.

More Related Content

What's hot

Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Spark
SparkSpark
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
George Teo
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
Araf Karsh Hamid
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
DataStax
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Databricks
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
Araf Karsh Hamid
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
Kai Wähner
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
Knoldus Inc.
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
kafka
kafkakafka

What's hot (20)

Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Spark
SparkSpark
Spark
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
kafka
kafkakafka
kafka
 

Similar to Stream Processing – Concepts and Frameworks

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
Guido Schmutz
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Sriskandarajah Suhothayan
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
Amazon Web Services
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
Guido Schmutz
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Similar to Stream Processing – Concepts and Frameworks (20)

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
Guido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Stream Processing – Concepts and Frameworks

  • 1. gschmutz Stream Processing – Concepts and Frameworks JEE Conf 2019 Guido Schmutz (guido.schmutz@trivadis.com) gschmutz http://guidoschmutz.wordpress.com
  • 2. gschmutz Agenda Stream Processing – Concepts and Frameworks 1. Motivation for Stream Processing? 2. Capabilities for Stream Processing 3. Implementing Stream Processing Solutions 4. Demo 5. Summary
  • 3. gschmutz Guido Schmutz Stream Processing – Concepts and Frameworks Working at Trivadis for more than 22 years Oracle Groundbreaker Ambassador & Oracle ACE Director Consultant, Trainer, Software Architect for Java, AWS, Azure, Oracle Cloud, SOA and Big Data / Fast Data Platform Architect & Head of Trivadis Architecture Board More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 155th edition
  • 4. gschmutzStream Processing – Concepts and Frameworks Motivation for Stream Processing?
  • 5. gschmutz Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Search SQL Export Service Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Big Data solves Volume and Variety – not Velocity Stream Processing – Concepts and Frameworks
  • 6. gschmutz Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Search SQL Export Service Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Source Location Telemetry IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Stream Processing – Concepts and Frameworks Event Stream
  • 7. gschmutz Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Search SQL Export Service • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Stream Event Source Location IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Stream Processing – Concepts and Frameworks Event Hub Event Hub Event Hub Telemetry
  • 8. gschmutz "Data at Rest" vs. "Data in Motion" Stream Processing – Concepts and Frameworks Data at Rest Data in Motion Store (Re)Act Visualize/ Analyze StoreAct Analyze 11101 01010 10110 11101 01010 10110 vs. Visualize
  • 9. gschmutz Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Stream Processing Architecture solves Velocity Stream Processing – Concepts and Frameworks BI Tools Enterprise Data Warehouse Event Hub SQ L Search / Explore Enterprise Apps Search ServiceResults Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Low(est) latency, no history Telemetry
  • 10. gschmutz Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Big Data for all historical data analysis Stream Processing – Concepts and Frameworks BI Tools Enterprise Data Warehouse SQ L Search / Explore Enterprise Apps Search ServiceResults Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social File Import / SQL Import Telemetry
  • 11. gschmutz Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Integrate existing systems with lower latency through CDC Stream Processing – Concepts and Frameworks BI Tools Enterprise Data Warehouse SQ L Search / Explore Enterprise Apps Search ServiceResults Stream Analytics Reference / Models Dashboard Logic { } API Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results File Import / SQL Import Event Stream Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Change Data Capture Telemetry
  • 12. gschmutz New systems participate in event-oriented fashion Stream Processing – Concepts and Frameworks Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Microservice Platform Microservice State { } API Stream Analytics Platform Stream Processor State { } API Event Stream SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore SQL Export Search Service Enterprise Apps Logic { } API File Import / SQL Import Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Change Data Capture Event Stream Event Stream Telemetry
  • 13. gschmutz Edge computing allows processing close to data sources Stream Processing – Concepts and Frameworks Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Microservice Platform Microservice State { } API Stream Analytics Platform Stream Processor State { } API SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore SQL Export Search Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Edge Node File Import / SQL Import Change DataCapture D ata Flow Event Hub Data Flow Event Stream Event Stream Event Stream Telemetry Rules Event Hub Storage
  • 14. gschmutz Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions Stream Processing – Concepts and Frameworks SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined Results SQL Export Microservice State { } API Stream Processor State { } API Event Stream Event Stream Search Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 15. gschmutz Two Types of Stream Processing (by Gartner) Stream Processing – Concepts and Frameworks Stream Data Integration • focuses on the ingestion and processing of data sources targeting real-time extract- transform-load (ETL) and data integration use cases • filter and enrich the data Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
  • 16. gschmutz Stream Processing & Analytics Ecosystem Stream Processing – Concepts and Frameworks Stream Analytics Event Hub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge
  • 17. gschmutzStream Processing – Concepts and Frameworks Important Capabilities for Stream Processing
  • 18. gschmutz Capabilities: Stream Data Integration vs. Stream Analytics Stream Processing – Concepts and Frameworks Stream Data Integration Stream Analytics Support for Various Data Sources yes - Streaming ETL (Transformation/Format Translation, Routing, Validation) yes partial Execution Mode: Native Streaming yes yes Execution Mode: Non-Native Streaming - Micro-Batching yes partial Delivery Guarantees yes yes API : GUI-Based API / Declarative API / Programmatic yes yes API: Streaming SQL - yes Event Time vs. Ingestion / Processing Time - yes Windowing - yes Stream-to-Static Joins (Lookup/Enrichment) partial yes Stream-to-Stream Joins - yes State Management - yes Queryable State (aka Interactive Queries) - yes Event Pattern Detection - Yes
  • 19. gschmutz Integrating Data Sources Stream Processing – Concepts and Frameworks Sensor Stream SQL Polling Change Data Capture (CDC) File Polling File Stream (File Tailing) File Stream (Appender)
  • 20. gschmutz Streaming ETL Stream Processing – Concepts and Frameworks • Streaming Extract – Transform – Load • Flow-based ”programming” • High-Throughput, straight-through data flows • Visual coding with flow editor • Stream Data Integration but not Stream Analytics
  • 21. gschmutz Execution Mode: Native Streaming Ingestion Event Source Event Source Stream Processing – Concepts and Frameworks Individual Event PPPPPPPPPPPP • Events processed as they arrive • low(est)-latency • fault tolerance expensive
  • 22. gschmutz Execution Mode: Non-Native Streaming - Micro-Batching Ingestion Event Source Event Source Stream Processing – Concepts and Frameworks PPPPPP • Splits incoming stream in small batches • Fault tolerance easier to achieve • Higher latency
  • 23. gschmutz Delivery Guarantees Stream Processing – Concepts and Frameworks At most once (fire-and-forget) § message is sent, but the sender doesn’t care if it’s received or lost. At least once § Retransmission of messages can cause messages to be sent one or more times Exactly once § ensures that a message is received once and only once (never lost and never repeated) [ 0 | 1 ] [ 1+ ] [ 1 ]
  • 24. gschmutz API Stream Processing – Concepts and Frameworks GUI-based / Drag-and-Drop • A graphical way of designing a pipeline • Often web-based Declarative • An streaming engine configured declaratively • JSON, YML "config": { "connector.class": "..MqttSourceConnector", "tasks.max": "1", "mqtt.server.uri": "tcp://mosquitto-1:1883", "mqtt.topics": "truck/+/position", "kafka.topic":"truck_position", ...
  • 25. gschmutz Programmatic • Low-level (class) or high-level fluent API • Higher order function as operators (filter, mapWithState …) API (II) Stream Processing – Concepts and Frameworks Streaming SQL • use stream in FROM clause • Extensions support windowing, pattern matching, spatial, …. val filteredDf = truckPosDf. where("eventType !='Normal'") SELECT * FROM truck_position_s WHERE eventType != 'Normal'
  • 26. gschmutz Event Time vs. Ingestion / Processing Time Stream Processing – Concepts and Frameworks Event time • time at which events actually occurred Ingestion time / Processing Time • time at which events are ingested into / processed by the system Not all use cases care about event times, but lot’s do!
  • 27. gschmutz Windowing Stream Processing – Concepts and Frameworks Computations over events done using windows of data not feasible to keep entire stream of data in memory window represents a certain amount of data to perform computations on Time Stream of Data Window of Data
  • 28. gschmutz Sliding / Hopping Window eviction & trigger based on window length and sliding interval length Fixed / Tumbling Window eviction based on window being full and trigger based on either count of items or time Session Window sequences of temporarily related events terminated by gap of inactivity > than some timeout Windowing Stream Processing – Concepts and Frameworks Time TimeTime
  • 29. gschmutz Joining – Stream-to-Static Stream Processing – Concepts and Frameworks Challenges of joining streams • Data streams need to be aligned because of their different timestamps • joins must be limited; otherwise they will never end • join needs to produce results continuously • there is no end to the data Stream-to-Static (Table) Join Stream-to- Static Join Time
  • 30. gschmutz Joining –Stream-to-Stream Stream Processing – Concepts and Frameworks Stream-to-Stream Join (one window join) Stream-to-Stream Join (two window join) Stream-to- Stream Join Stream-to- Stream Join Time Time
  • 31. gschmutz State Management Stream Processing – Concepts and Frameworks Needed if use case is dependent on previously seen data Windowing, Joining and Pattern Detection use State Management behind the scenes State needs to be as close to the stream processor as possible How does it handle failures? Options for State Management In-Memory Replicated, Distributed Store Local, Embedded Store Operational Complexity and Features Low high
  • 32. gschmutz Queryable State (aka. Interactive Queries) Stream Processing – Concepts and Frameworks Exposes state managed by Stream Analytics solution Allows application to query managed state, i.e. to visualize it can eliminate need for an external database to keep results Stream Processing Infrastructure Reference Data Stream Analytics { } Query API State Stream Processor Search / Explore Online & Mobile Apps Model Dashboard
  • 33. gschmutz Event Pattern Detection Stream Processing – Concepts and Frameworks • Streaming Data often contain interesting patterns • Special operators allow finding complex relationships between events • Absence Pattern - event A not followed by event B within time window • Sequence Pattern - event A followed by event B followed by event C • Increasing Pattern - up trend of a value of a certain attribute • Decreasing Pattern - down trend of a value of a certain attribute • …
  • 34. gschmutz Capabilities: Stream Data Integration vs. Stream Analytics Stream Processing – Concepts and Frameworks Stream Data Integration Stream Analytics Support for Various Data Sources yes - Streaming ETL (Transformation/Format Translation, Routing, Validation) yes partial Execution Mode: Native Streaming yes yes Execution Mode: Non-Native Streaming - Micro-Batching yes partial Delivery Guarantees yes yes API : GUI-Based API / Declarative API / Programmatic yes yes API: Streaming SQL - yes Event Time vs. Ingestion / Processing Time - yes Windowing - yes Stream-to-Static Joins (Lookup/Enrichment) partial yes Stream-to-Stream Joins - yes State Management - yes Queryable State (aka Interactive Queries) - yes Event Pattern Detection - Yes
  • 35. gschmutzStream Processing – Concepts and Frameworks Implementing Stream Processing Solutions
  • 36. gschmutz Stream Processing & Analytics Ecosystem Stream Processing – Concepts and Frameworks Stream Analytics Event Hub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge
  • 37. gschmutz Event Hub: Apache Kafka Stream Processing – Concepts and Frameworks Kafka Cluster Consumer Consumer Consumer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK 1 ZK 2ZK 3 Schema Registry Service 1 Management Control Center Kafka Manager KAdmin Producer Producer Producer kafkacat Data Retention • Never • Time (TTL) or Size-based • Log-Compacted based
  • 38. gschmutz Stream Data Integration: Kafka Connect Stream Processing – Concepts and Frameworks curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max": "1", "mqtt.server.uri": "tcp://mosquitto:1883", "mqtt.topics": "truck/+/position", "kafka.topic":"truck_position" } }' • declarative style data flows • framework is part of Kafka • Many connectors available • Single Message Transforms (SMT)
  • 39. gschmutz Stream Data Integration: StreamSets • GUI-based, drag-and drop Data Flow Pipelines • Both stream and batch processing • special option for Edge computing • custom sources, sinks, processors • Monitoring and Error Detection Stream Processing – Concepts and Frameworks
  • 40. gschmutz Stream Analytics: Kafka Streams • Programmatic API, “just” a Java library • Native streaming • fault-tolerant local state • Fixed, Sliding and Session Windowing • Stream-Stream / Stream-Table Joins • At-least-once and exactly-once KTable<Integer, Customer> customers = builder.stream(”customer"); KStream<Integer, Order> orders = builder.stream(”order"); KStream<Integer, String> joined = orders.leftJoin(customers, …); joined.to(”orderEnriched"); trucking_ driver Kafka Broker Java Application Kafka Streams Stream Processing – Concepts and Frameworks
  • 41. gschmutz Stream Analytics: KSQL • Stream Processing with zero coding using SQL-like language • part of Confluent Platform (community edition) • built on top of Kafka Streams • interactive (CLI) and headless (command file) CREATE STREAM customer_s WITH (kafka_topic='customer', value_format='AVRO'); SELECT * FROM customer_s WHERE address->country = 'Switzerland'; ... trucking_ driver Kafka Broker KSQL Engine Kafka Streams KSQL CLI Commands Stream Processing – Concepts and Frameworks
  • 42. gschmutz Stream Analytics: Spark Structured Streaming Stream Processing – Concepts and Frameworks • 2nd gen Spark Streaming, using DataFrame instead of RDD • Programmatic API • Code reuse between batch and streaming • Supports Java, Scala, Python, R and SQL val oderDf = spark.readStream.format("kafka") .option("kafka.bootstrap.servers", "broker-1:9092") .option("subscribe", ”order") .load() val orderFilteredDf = orderDf.where(”address.county = ‘Switzerland'")
  • 43. gschmutzStream Processing – Concepts and Frameworks Demo
  • 45. gschmutzStream Processing – Concepts and Frameworks Summary
  • 46. gschmutz Summary Stream Processing – Concepts and Frameworks • Stream Processing is the solution for low-latency • Event Hub, Stream Data Integration and Stream Analytics are the main building blocks in your architecture • Kafka is currently the de-facto standard for Event Hub • Various options exists for Stream Data Integration and Stream Analytics • SQL becomes a valid option for implementing Stream Analytics
  • 47. gschmutzStream Processing – Concepts and Frameworks Technology on its own won't help you. You need to know how to use it properly.