SlideShare a Scribd company logo
1 of 85
Download to read offline
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Introduction to
Stream Processing
Guido Schmutz
Java Zone 2018 – 13.9.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Introduction to Stream Processing
Agenda
Introduction to Stream Processing
1. Motivation for Stream Processing?
2. Capabilities for Stream Processing
3. Implementing Stream Processing Solutions
4. Summary
Motivation for Stream Processing?
Introduction to Stream Processing
Why not using Traditional BI Infrastructures?
Enterprise Data
Warehouse
ETL / Stored
Procedures
Bulk Source
DB
Extract
File
DB
Architekturen von Big Data Anwendungen
BI Tools
Search / Explore
Enterprise Apps
Logic
{ }
API
high latency
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Source
Location
Telemetry
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event Stream
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Stream
Event Source
Location
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event
Hub
Event
Hub
Event
Hub
Telemetry
"Data at Rest" vs. "Data in Motion"
Data at Rest Data in Motion
Store
Act
Analyze
StoreAct
Analyze
1110
1010
1010
110
1110
1010
1010
110
Introduction to Stream Processing
Typical Stream Processing Use Cases
• Notifications and Alerting - a notification or alert should be triggered if
some sort of event or series of events occurs.
• Real-Time Reporting – run real-time dashboards that
employees/customers can look at
• Incremental ETL – still ETL, but not in Batch but in streaming, continuous
mode
• Update data to serve in real-time – compute data that get served
interactively by other applications
• Real-Time decision making – analyzing new inputs and responding to
them automatically using business logic, i.e. Fraud Detection
• Online Machine Learning – train a model on a combination of historical
and streaming data and use it for real-time decision making
Introduction to Stream Processing
When to Stream / When not?
Introduction to Stream Processing
Constant low
Milliseconds & under
Low milliseconds to seconds,
delay in case of failures
10s of seconds of more,
Re-run in case of failures
Real-Time Near-Real-Time Batch
Source: adapted from Cloudera
"No free lunch"
Introduction to Stream Processing
Constant low
Milliseconds & under
Low milliseconds to seconds,
delay in case of failures
10s of seconds of more,
Re-run in case of failures
Real-Time Near-Real-Time Batch
"Difficult" architectures, lower latency "Easier architectures", higher latency
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Stream Processing Architecture solves Velocity
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Low(est) latency, no history
Telemetry
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Big Data for all historical data analysis
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
File Import / SQL Import
Introduction to Stream Processing
Telemetry
Data Store
Integrate existing systems through CDC
Data
Event Hub
Integration
Consuming Systems
StateLogic
CDC
CDC Connector
Traditional Silo-based
System
LogicUser Interface
Capture changes directly on database
Change Data Capture (CDC) => think like
a global database trigger
Transform existing systems to event
producer
Event
Stream
Event
Stream
Introduction to Stream Processing
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Integrate existing systems with lower latency through CDC
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
File Import / SQL Import
Event
Stream
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Telemetry
New systems participate in event-oriented fashion
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Platform
Microservice State
{ }
API
Stream Analytics Platform
Stream
Processor
State
{ }
API
Event
Stream
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event
Stream
Event
Stream
Introduction to Stream Processing
Telemetry
Edge computing allows processing close to data sources
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Platfrom
Microservice State
{ }
API
Stream Analytics Platform
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Introduction to Stream Processing
Telemetry
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Two Types of Stream Processing
(from Gartner)
Introduction to Stream Processing
Stream Data Integration
• primarily focuses on the ingestion and
processing of data sources targeting real-
time extract-transform-load (ETL) and data
integration use cases
• filter and enrich the data
• optionally calculate time-windowed
aggregations before storing the results in a
database or file system
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business through real-time dashboards,
alerts or decision automation
Introduction to Stream Processing
Important Capabilities for Stream
Processing
Streaming Model: Event-at-a-time vs. Micro Batch
Introduction to Stream Processing
Event-at-a-time Processing
• Events processed as they
arrive
• low-latency
• fault tolerance expensive
Micro-Batch Processing
• Splits incoming stream in
small batches
• Fault tolerance easier
• Better throughput
Fault Tolerance: Delivery Guarantees
Introduction to Stream Processing
• At most once (fire-and-forget) - means the message is sent, but the sender
doesn’t care if it’s received or lost. Imposes no additional overhead to ensure
message delivery, such as requiring acknowledgments from consumers. Hence, it
is the easiest and most performant behavior to support.
• At least once - means that retransmission of a message will occur until an
acknowledgment is received. Since a delayed acknowledgment from the receiver
could be in flight when the sender retransmits the message, the message may be
received one or more times.
• Exactly once - ensures that a message is received once and only once, and is
never lost and never repeated. The system must implement whatever
mechanisms are required to ensure that a message is received and processed
just once
[ 0 | 1 ]
[ 1+ ]
[ 1 ]
API
Introduction to Stream Processing
Compositional / Programmatic
• Highly customizable operator based
on basic building blocks
• Manual topology definition and
optimization
Declarative
• High-Level, fluent API
• Higher order function as operators
(filter, mapWithState …)
• Logical plan optimization
SQL
• Query language allowing to use
stream in FROM clause
• Extensions supporting windowing,
pattern matching, spatial, ….
Operators
• SELECT * FROM <stream> WHERE
f1 = 10
Windowing
Introduction to Stream Processing
Computations over events done using
windows of data
Due to size and never-ending nature of it,
it’s not feasible to keep entire stream of
data in memory
A window of data represents a certain
amount of data where we can perform
computations on
Windows give the power to keep a
working memory and look back at recent
data efficiently
Time
Stream of Data Window of Data
Sliding Window (aka
Hopping Window) - uses
eviction and trigger policies
that are based on time: window
length and sliding interval
length
Fixed Window (aka Tumbling
Window) - eviction policy always
based on the window being full
and trigger policy based on
either the count of items in the
window or time
Session Window – composed
of sequences of temporarily
related events terminated by a
gap of inactivity greater than
some timeout
Windowing
Introduction to Stream Processing
Time TimeTime
Joining
Introduction to Stream Processing
Challenges of joining streams
1. Data streams need to be aligned as they
come because they have different timestamps
2. since streams are never-ending, the joins
must be limited; otherwise join will never end
3. join needs to produce results continuously as
there is no end to the data
Stream to Static (Table) Join
Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-
to-Static
Join
Stream-
to-Stream
Join
Stream-
to-Stream
Join
Time
Time
Time
Pattern Detection
Introduction to Stream Processing
• Streaming Data often contains interesting patterns that only emerge as new
streaming data arrives, e.g.
• Absence Pattern: event A not followed by event B within time window
• Sequence Pattern: event A followed by event B followed by event C
• Increasing Pattern: up trend of a value of a certain attribute
• Decreasing Pattern: down trend of a value of a certain attribute
• …
• Pattern operators allow developers to define complex relationships between
streaming events
State Management
Introduction to Stream Processing
Necessary if stream processing use case
is dependent on previously seen data or
external data
Windowing, Joining and Pattern
Detection use State Management behind
the scenes
State Management services can be made
available for custom state handling logic
State needs to be managed as close to
the stream processor as possible
Options for State Management
How does it handle failures? If a machine
crashes and the/some state is lost?
In-Memory
Replicated,
Distributed
Store
Local,
Embedded
Store
Operational Complexity and Features
Low high
Queryable State (aka. Interactive Queries)
Introduction to Stream Processing
Exposes the state managed by the
Stream Analytics solution to the
outside world
Allows an application to query the
managed state, i.e. to visualize it
For some scenarios, Queryable State
can eliminate the need for an external
database to keep results
Stream Processing Cluster
Reference
Data
Stream Analytics
{ }
Query API
State
Stream
Processor
Search /
Explore
Online &
Mobile
Apps
Model
Dashboard
Event Time vs. Ingestion Time vs. Processing Time
Introduction to Stream Processing
Event time
• the time at which events actually occurred
Ingestion time / Processing Time
• the time at which events are ingested /
observed in the system
Not all use cases care about event times but many do
Examples
• characterizing user behavior over time
• most billing applications
• anomaly detection
Capabilities: Stream Data Integration vs. Stream Analytics
Introduction to Stream Processing
Stream Data Integration Stream Analytics
Support for Various Data Sources Yes (yes)
Streaming ETL (Transformation, Routing, Validation) yes (yes)
Format Translation yes (yes)
Micro-Batching yes (yes)
Auto Scaling yes
Backpressure Support yes -
Processing Time vs. Event Time - yes
Stream-to-Static Joins (Lookup/Enrichment) yes yes
Stream-to-Stream Joins - yes
Windowing - yes
State Management - yes
Queryable State (aka Interactive Queries) - yes
Streaming SQL (yes) yes
Event Pattern Matching / Detection - Yes
Implementing Stream Processing
Solutions
Introduction to Stream Processing
Stream Processing & Analytics Ecosystem
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
Introduction to Stream Processing
Implementing Stream Processing
Solutions:
"Event Hub"
Introduction to Stream Processing
Implementing "Event Hub"
Hadoop Clusterd
Hadoop Cluster
Cluster Infrastructure
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Replay
Big Data Batch Analytics
Stream Analytics
Modern Applications
Introduction to Stream Processing
Apache Kafka – A Streaming Platform
Introduction to Stream Processing
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Hold Data for Long-Term – Data Retention
Producer 1
Broker 1
Broker 2
Broker 3
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based
(entries with same key are removed):
kafka-topics.sh --zookeeper zk:2181 
--create --topic customers 
--replication-factor 1 
--partitions 1 
--config cleanup.policy=compact
Introduction to Stream Processing
Keep Topics in Compacted Form
0 1 2 3 4 5 6 7 8 9 10 11
K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
Offset
Key
Value
3 4 6 8 9 10
K1 K3 K4 K5 K2 K6
V4 V5 V7 V9 V10 V11
Offset
Key
Value
Compaction
Introduction to Stream Processing
V1
V2
V3 V4
V5
V6
V7
V8 V9
V1
0
V1
1
K1
K3
K4
K5
K2
K6
Demo (I) - Kafka
Truck-2
truck
position
Truck-1
Truck-3
console
consumer
Testdata-Generator by Hortonworks
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Demo (I) – Create Kafka Topic
$ kafka-topics --zookeeper zookeeper:2181 --create 
--topic truck_position --partitions 8 --replication-factor 1
$ kafka-topics --zookeeper zookeeper:2181 –list
__consumer_offsets
_confluent-metrics
_schemas
docker-connect-configs
docker-connect-offsets
docker-connect-status
truck_position
Introduction to Stream Processing
Demo (I) – Run Producer and Kafka-Console-Consumer
Introduction to Stream Processing
Implementing Stream Processing
Solutions:
Streaming Data Sources
Introduction to Stream Processing
Integrating Streaming Data Sources
Hadoop Clusterd
Hadoop Cluster
Cluster Infrastructure
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Replay
Big Data Batch Analytics
Stream Analytics
Modern Applications
Introduction to Stream Processing
Integrating (Streaming) Data Sources
Introduction to Stream Processing
SQL Polling
Change Data Capture
(CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
Sensor Stream
Demo (II) – devices send to MQTT instead of Kafka
Truck-2
truck/nn/
position
Truck-1
Truck-3
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Message Queue Telemetry Transport
(MQTT)
Reduces amount of bytes flowing over the
wire compared to HTTP -> reduces power
and bandwidth usage
Built in retry / QoS mechansim
Pub/Sub architecture and Message Broker
allows for flexibility in communication
patterns
Simple API, requires minimal "plumbing"
code
Demo (II) – devices send to MQTT instead of Kafka
Introduction to Stream Processing
Demo (II) - devices send to MQTT instead of Kafka –
how to get the data into Kafka?
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
?
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Implementing Stream Processing
Solutions:
"Stream Data Integration"
Introduction to Stream Processing
Implementing "Stream Data Integration"
Hadoop Clusterd
Hadoop Cluster
Cluster Infrastructure
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Replay
Big Data Batch Analytics
Stream Analytics
Modern Applications
Introduction to Stream Processing
Integration with or without
Transformation?
Zero Transformation
• No transformation, plain ingest, no
schema validation
• Keep the original format – Text,
CSV, …
• Allows to store data that may have
errors in the schema
Format Transformation
• Prefer name of Format Translation
• Simply change the format
• Change format from Text to Avro
• Does schema validation
Enrichment Transformation
• Add new data to the message
• Do not change existing values
• Convert a value from one system to
another and add it to the message
Value Transformation
• Replaces values in the message
• Convert a value from one system to
another and change the value in-place
• Destroys the raw data!
Introduction to Stream Processing
d
Apache NiFi
• Originated at NSA as Niagarafiles – developed
behind closed doors for 8 years
• Open sourced December 2014, Apache Top
Level Project July 2015
• Look-and-Feel modernized in 2016
• Opaque, "file-oriented" payload
• Distributed system of processors with
centralized control
• Based on flow-based programming concepts
• Data Provenance and Data Lineage
• Web-based user interface
Introduction to Stream Processing
StreamSets Data Collector
Founded by ex-Cloudera, Informatica
employees
Continuous open source, intent-driven, big
data ingest
Visible, record-oriented approach fixes
combinatorial explosion
Batch or stream processing
• Standalone, Spark cluster, MapReduce
cluster
IDE for pipeline development by ‘civilians’
Relatively new - first public release
September 2015
So far, vast majority of commits are from
StreamSets staff
Introduction to Stream Processing
Demo (III) – StreamSets Data Collector
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
console
consumer
Introduction to Stream Processing
2016-06-02
14:39:56.605,98,27,803014426,
Wichita to Little Rock Route2,
Normal,38.65,90.21,51872977366525026
31
Truck-2
truck/nn/
position
Truck-1
Edge
mqtt to
kafka
Demo (III): Dataflow for MQTT to Kafka
Introduction to Stream Processing
Demo (III): MQTT Source
Introduction to Stream Processing
Demo (III): Kafka Sink
Introduction to Stream Processing
Demo (III): Dataflow for MQTT to Kafka
Introduction to Stream Processing
Kafka Connect - Overview
Source
Connecto
r
Sink
Connecto
r
Introduction to Stream Processing
Demo (IV)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
console
consumer
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Demo (IV) – Create MQTT Connect through REST API
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "mqtt-source",
"config": {
"connector.class": "io.confluent.connect.mqtt.MqttSourceConnector",
"connect.mqtt.connection.timeout": "1000",
"tasks.max": "1",
"name": "mqtt-source",
"mqtt.server.uri": "tcp://mosquitto:1883»,
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position",
"mqtt.clean.session.enabled":"true",
"mqtt.keepalive.interval.seconds ": "60",
"mqtt.qos":"0"
}
}'
Introduction to Stream Processing
Demo (IV)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
console
consumer
what about some
analytics ?
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Implementing Stream Processing
Solutions:
"Stream Analytics"
Introduction to Stream Processing
Implementing "Stream Analytics"
Hadoop Clusterd
Hadoop Cluster
Cluster Infrastructure
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Replay
Big Data Batch Analytics
Stream Analytics
Modern Applications
Introduction to Stream Processing
Kafka Streams - Overview
• Designed as a simple and lightweight library in
Apache Kafka
• no external dependencies other than Kafka
• Leverages Kafka as its internal messaging layer
• Supports fault-tolerant local state
• Event-at-a-time processing (not micro-batch) with
millisecond latency
• Supports Windowing (Fixed, Sliding and Session)
and Stream-Stream / Stream-Table Joins
• Interactive Query
• Support for Event Time with late arrivals
• Millisecond processing latency, no micro-batching
• At-least-once and exactly-once processing
guarantees
Introduction to Stream Processing
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
Demo (V) – Kafka Streams
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position_s
detect_dang
erous_drivin
g
dangerous_
driving
console
consumer
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Introduction to Stream Processing
Demo (V) - Create Stream
final KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source =
builder.stream(stringSerde, stringSerde, "truck_position");
KStream<String, TruckPosition> positions =
source.map((key,value) ->
new KeyValue<>(key, TruckPosition.create(key,value)));
KStream<String, TruckPosition> filtered =
positions.filter(TruckPosition::filterNonNORMAL);
filtered.map((key,value) -> new KeyValue<>(key,value.toCSV()))
.to("dangerous_driving");
Introduction to Stream Processing
KSQL: Streaming SQL for Apache Kafka
• Enables stream processing with zero coding required
• The simples way to process streams of data in real-time
• Powered by Kafka and Kafka Streams: scalable, distributed,
mature
• All you need is Kafka – no complex deployments
• STREAM and TABLE as first-class citizens
• STREAM = data in motion
• TABLE = collected state of a stream
• join STREAM and TABLE
Introduction to Stream Processing
Source: Confluent
Demo (VI) - KSQL
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position_s
detect_dang
erous_drivin
g
dangerous_
driving
console
consumer
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Demo (VI) - Start Kafka KSQL
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | .  ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
Introduction to Stream Processing
Demo (VI) - Create Stream
ksql> CREATE STREAM truck_position_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_position', 
value_format='DELIMITED');
Message
----------------
Stream created
Introduction to Stream Processing
Demo (VI) - Create Stream
ksql> SELECT * FROM truck_position_s;
1522847870317 | "truck/13/position0 | �1522847870310 | 44 | 13 | 1390372503 |
Normal | 41.71 | -91.32 | -2458274393837068406
1522847870376 | "truck/14/position0 | �1522847870370 | 35 | 14 | 1961634315 |
Normal | 37.66 | -94.3 | -2458274393837068406
1522847870418 | "truck/21/position0 | �1522847870410 | 58 | 21 | 137128276 |
Normal | 36.17 | -95.99 | -2458274393837068406
1522847870397 | "truck/29/position0 | �1522847870390 | 18 | 29 | 1090292248 |
Normal | 41.67 | -91.24 | -2458274393837068406
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1522847914246 | "truck/11/position0 | �1522847914240 | 54 | 11 | 1198242881 |
Lane Departure | 40.86 | -89.91 | -2458274393837068406
1522847915125 | "truck/10/position0 | �1522847915120 | 93 | 10 | 1384345811 |
Overspeed | 40.38 | -89.17 | -2458274393837068406
1522847919216 | "truck/12/position0 | �1522847919210 | 75 | 12 | 24929475 |
Overspeed | 42.23 | -91.78 | -2458274393837068406
Introduction to Stream Processing
Demo (VI) - Create Stream
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
Introduction to Stream Processing
Demo (VI) - Create Stream
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_s;
1522848286143 | "truck/15/position0 | �1522848286125 | 98 | 15 | 987179512 |
Overspeed | 34.78 | -92.31 | -2458274393837068406
1522848295729 | "truck/11/position0 | �1522848295720 | 54 | 11 | 1198242881 |
Unsafe following distance | 38.43 | -90.35 | -2458274393837068406
1522848313018 | "truck/11/position0 | �1522848313000 | 54 | 11 | 1198242881 |
Overspeed | 41.87 | -87.67 | -2458274393837068406
Introduction to Stream Processing
Demo (VII) – KSQL Join
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt-
source
truck_
position
detect_dang
erous_drivin
g
dangerous_
driving
Truck
Driver
jdbc-
source
trucking_
driver
join_dangero
us_driving_dr
iver
dangerous_d
riving_driver
27, Walter, Ward, Y, 24-JUL-85, 2017-10-02
15:19:00
console
consumer
{"id":27,"firstName":"Walter"
,"lastName":"Ward","availab
le":"Y","birthdate":"24-JUL-
85","last_update":15069230
52012}
Introduction to Stream Processing
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Demo (VII) – Create JDBC Connect through REST API
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"trucking_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
Introduction to Stream Processing
Demo (VII) - Create Table with Driver State
Introduction to Stream Processing
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='trucking_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
Demo (VII) - Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON') 
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance
1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure
1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail
distance
Introduction to Stream Processing
Spark Structured Streaming
• Stream processing on Structured API (2nd generation
with DataFrames / Datasets rather than RDDs)
• Code reuse between batch and streaming
• Supports Windowing (Fixed, Sliding) and Stream-to-
Stream / Stream-to-Static Joins
• Support for Event Time with late arrivals
• Currently only supports Micro-Batching
• At-least-once and partial Exactly-once support
• Support for Java, Scala, Python
Introduction to Stream Processing
val schema = new StructType()
.add(...)
val inputDf = spark
.readStream
.format(...)
.option(...)
.load()
val filteredDf = inputDf.where(...)
val query = filteredDf
.writeStream
.format(...)
.option(...)
.start()
query.stop
Demo (VIII) – Spark Structured Streaming
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position_s
detect_dang
erous_drivin
g
dangerous_
driving
console
consumer
1522846456703,101,31,1927624662,Normal,37.
31,-94.31,-4802309397906690837
Introduction to Stream Processing
Demo (VIII) – Spark Structured Streaming
val rawDf = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "broker-1:9092")
.option("subscribe", "truck_position")
.load()
val jsonDf = rawDf.select($"value" cast "string" as "json")
.select(from_json($"json", truckPositionSchema) as "data")
.selectExpr("data.*", "cast(cast (data.timestamp as double) /
1000 as timestamp) as eventTime")
val jsonFilteredDf = jsonDf.where("data.eventType !='Normal'”)
Introduction to Stream Processing
Demo (VIII) – Spark Structured Streaming
val query = jsonFilteredDf
.selectExpr("to_json(struct(*)) AS value")
.writeStream
.format("kafka")
.option("kafka.bootstrap.servers", "broker-1:9092")
.option("topic","dangerous_driving_spark")
.option("checkpointLocation", "/tmp")
.start()
Introduction to Stream Processing
Summary
Introduction to Stream Processing
Summary
Introduction to Stream Processing
Stream Processing is the solution for low-latency
Event Hub, Stream Data Integration and Stream Analytics are the main building
blocks in your architecture
Kafka is currently the de-facto standard for Event Hub
Various options exists for Stream Data Integration and Stream Analytics
Kafka offers a full-blown streaming platform with support for all 3 options (although not
as flexible in Stream Data Integration)
Not yet complete (SQL, Event Pattern Detection, Streaming Machine Learning)
Introduction to Stream Processing
Technology on its own won't help you.
You need to know how to use it properly.

More Related Content

What's hot

When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaKai Wähner
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
SharePoint Migration-What you need to know
SharePoint Migration-What you need to knowSharePoint Migration-What you need to know
SharePoint Migration-What you need to knowOliver Wirkus
 
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Cathrine Wilhelmsen
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka EcosystemGuido Schmutz
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 

What's hot (20)

When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
SharePoint Migration-What you need to know
SharePoint Migration-What you need to knowSharePoint Migration-What you need to know
SharePoint Migration-What you need to know
 
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Data Migration Made Easy
Data Migration Made EasyData Migration Made Easy
Data Migration Made Easy
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 

Similar to Introduction to Stream Processing

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsGuido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
ScaleFast Grid And Flow
ScaleFast Grid And FlowScaleFast Grid And Flow
ScaleFast Grid And FlowDevelops Ltd
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
AWS Summit Singapore - Advanced AWS Patterns for the Enterprise
AWS Summit Singapore - Advanced AWS Patterns for the EnterpriseAWS Summit Singapore - Advanced AWS Patterns for the Enterprise
AWS Summit Singapore - Advanced AWS Patterns for the EnterpriseAmazon Web Services
 
Justifying Capacity Management Efforts
Justifying Capacity Management EffortsJustifying Capacity Management Efforts
Justifying Capacity Management EffortsPrecisely
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Guido Schmutz
 
Delivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware EnvironmentsDelivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware EnvironmentsPaula Koziol
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Labs
 

Similar to Introduction to Stream Processing (20)

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
ScaleFast Grid And Flow
ScaleFast Grid And FlowScaleFast Grid And Flow
ScaleFast Grid And Flow
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
AWS Summit Singapore - Advanced AWS Patterns for the Enterprise
AWS Summit Singapore - Advanced AWS Patterns for the EnterpriseAWS Summit Singapore - Advanced AWS Patterns for the Enterprise
AWS Summit Singapore - Advanced AWS Patterns for the Enterprise
 
Justifying Capacity Management Efforts
Justifying Capacity Management EffortsJustifying Capacity Management Efforts
Justifying Capacity Management Efforts
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
 
Delivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware EnvironmentsDelivering Modern Data Protection for VMware Environments
Delivering Modern Data Protection for VMware Environments
 
VAS - VMware CMP
VAS - VMware CMPVAS - VMware CMP
VAS - VMware CMP
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Redis Streams for Event-Driven Microservices
Redis Streams for Event-Driven MicroservicesRedis Streams for Event-Driven Microservices
Redis Streams for Event-Driven Microservices
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaGuido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Introduction to Stream Processing

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Introduction to Stream Processing Guido Schmutz Java Zone 2018 – 13.9.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Introduction to Stream Processing
  • 3. Agenda Introduction to Stream Processing 1. Motivation for Stream Processing? 2. Capabilities for Stream Processing 3. Implementing Stream Processing Solutions 4. Summary
  • 4. Motivation for Stream Processing? Introduction to Stream Processing
  • 5. Why not using Traditional BI Infrastructures? Enterprise Data Warehouse ETL / Stored Procedures Bulk Source DB Extract File DB Architekturen von Big Data Anwendungen BI Tools Search / Explore Enterprise Apps Logic { } API high latency
  • 6. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing
  • 7. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Source Location Telemetry IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Stream
  • 8. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Stream Event Source Location IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Hub Event Hub Event Hub Telemetry
  • 9. "Data at Rest" vs. "Data in Motion" Data at Rest Data in Motion Store Act Analyze StoreAct Analyze 1110 1010 1010 110 1110 1010 1010 110 Introduction to Stream Processing
  • 10. Typical Stream Processing Use Cases • Notifications and Alerting - a notification or alert should be triggered if some sort of event or series of events occurs. • Real-Time Reporting – run real-time dashboards that employees/customers can look at • Incremental ETL – still ETL, but not in Batch but in streaming, continuous mode • Update data to serve in real-time – compute data that get served interactively by other applications • Real-Time decision making – analyzing new inputs and responding to them automatically using business logic, i.e. Fraud Detection • Online Machine Learning – train a model on a combination of historical and streaming data and use it for real-time decision making Introduction to Stream Processing
  • 11. When to Stream / When not? Introduction to Stream Processing Constant low Milliseconds & under Low milliseconds to seconds, delay in case of failures 10s of seconds of more, Re-run in case of failures Real-Time Near-Real-Time Batch Source: adapted from Cloudera
  • 12. "No free lunch" Introduction to Stream Processing Constant low Milliseconds & under Low milliseconds to seconds, delay in case of failures 10s of seconds of more, Re-run in case of failures Real-Time Near-Real-Time Batch "Difficult" architectures, lower latency "Easier architectures", higher latency
  • 13. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Stream Processing Architecture solves Velocity BI Tools Enterprise Data Warehouse Event Hub Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Low(est) latency, no history Telemetry
  • 14. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Big Data for all historical data analysis BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social File Import / SQL Import Introduction to Stream Processing Telemetry
  • 15. Data Store Integrate existing systems through CDC Data Event Hub Integration Consuming Systems StateLogic CDC CDC Connector Traditional Silo-based System LogicUser Interface Capture changes directly on database Change Data Capture (CDC) => think like a global database trigger Transform existing systems to event producer Event Stream Event Stream Introduction to Stream Processing
  • 16. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Integrate existing systems with lower latency through CDC BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results File Import / SQL Import Event Stream Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Telemetry
  • 17. New systems participate in event-oriented fashion Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Microservice Platform Microservice State { } API Stream Analytics Platform Stream Processor State { } API Event Stream SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API File Import / SQL Import Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Event Stream Introduction to Stream Processing Telemetry
  • 18. Edge computing allows processing close to data sources Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Microservice Platfrom Microservice State { } API Stream Analytics Platform Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Introduction to Stream Processing Telemetry
  • 19. Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions SQL Search BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API Event Stream Event Stream Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 20. Two Types of Stream Processing (from Gartner) Introduction to Stream Processing Stream Data Integration • primarily focuses on the ingestion and processing of data sources targeting real- time extract-transform-load (ETL) and data integration use cases • filter and enrich the data • optionally calculate time-windowed aggregations before storing the results in a database or file system Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business through real-time dashboards, alerts or decision automation
  • 21. Introduction to Stream Processing Important Capabilities for Stream Processing
  • 22. Streaming Model: Event-at-a-time vs. Micro Batch Introduction to Stream Processing Event-at-a-time Processing • Events processed as they arrive • low-latency • fault tolerance expensive Micro-Batch Processing • Splits incoming stream in small batches • Fault tolerance easier • Better throughput
  • 23. Fault Tolerance: Delivery Guarantees Introduction to Stream Processing • At most once (fire-and-forget) - means the message is sent, but the sender doesn’t care if it’s received or lost. Imposes no additional overhead to ensure message delivery, such as requiring acknowledgments from consumers. Hence, it is the easiest and most performant behavior to support. • At least once - means that retransmission of a message will occur until an acknowledgment is received. Since a delayed acknowledgment from the receiver could be in flight when the sender retransmits the message, the message may be received one or more times. • Exactly once - ensures that a message is received once and only once, and is never lost and never repeated. The system must implement whatever mechanisms are required to ensure that a message is received and processed just once [ 0 | 1 ] [ 1+ ] [ 1 ]
  • 24. API Introduction to Stream Processing Compositional / Programmatic • Highly customizable operator based on basic building blocks • Manual topology definition and optimization Declarative • High-Level, fluent API • Higher order function as operators (filter, mapWithState …) • Logical plan optimization SQL • Query language allowing to use stream in FROM clause • Extensions supporting windowing, pattern matching, spatial, …. Operators • SELECT * FROM <stream> WHERE f1 = 10
  • 25. Windowing Introduction to Stream Processing Computations over events done using windows of data Due to size and never-ending nature of it, it’s not feasible to keep entire stream of data in memory A window of data represents a certain amount of data where we can perform computations on Windows give the power to keep a working memory and look back at recent data efficiently Time Stream of Data Window of Data
  • 26. Sliding Window (aka Hopping Window) - uses eviction and trigger policies that are based on time: window length and sliding interval length Fixed Window (aka Tumbling Window) - eviction policy always based on the window being full and trigger policy based on either the count of items in the window or time Session Window – composed of sequences of temporarily related events terminated by a gap of inactivity greater than some timeout Windowing Introduction to Stream Processing Time TimeTime
  • 27. Joining Introduction to Stream Processing Challenges of joining streams 1. Data streams need to be aligned as they come because they have different timestamps 2. since streams are never-ending, the joins must be limited; otherwise join will never end 3. join needs to produce results continuously as there is no end to the data Stream to Static (Table) Join Stream to Stream Join (one window join) Stream to Stream Join (two window join) Stream- to-Static Join Stream- to-Stream Join Stream- to-Stream Join Time Time Time
  • 28. Pattern Detection Introduction to Stream Processing • Streaming Data often contains interesting patterns that only emerge as new streaming data arrives, e.g. • Absence Pattern: event A not followed by event B within time window • Sequence Pattern: event A followed by event B followed by event C • Increasing Pattern: up trend of a value of a certain attribute • Decreasing Pattern: down trend of a value of a certain attribute • … • Pattern operators allow developers to define complex relationships between streaming events
  • 29. State Management Introduction to Stream Processing Necessary if stream processing use case is dependent on previously seen data or external data Windowing, Joining and Pattern Detection use State Management behind the scenes State Management services can be made available for custom state handling logic State needs to be managed as close to the stream processor as possible Options for State Management How does it handle failures? If a machine crashes and the/some state is lost? In-Memory Replicated, Distributed Store Local, Embedded Store Operational Complexity and Features Low high
  • 30. Queryable State (aka. Interactive Queries) Introduction to Stream Processing Exposes the state managed by the Stream Analytics solution to the outside world Allows an application to query the managed state, i.e. to visualize it For some scenarios, Queryable State can eliminate the need for an external database to keep results Stream Processing Cluster Reference Data Stream Analytics { } Query API State Stream Processor Search / Explore Online & Mobile Apps Model Dashboard
  • 31. Event Time vs. Ingestion Time vs. Processing Time Introduction to Stream Processing Event time • the time at which events actually occurred Ingestion time / Processing Time • the time at which events are ingested / observed in the system Not all use cases care about event times but many do Examples • characterizing user behavior over time • most billing applications • anomaly detection
  • 32. Capabilities: Stream Data Integration vs. Stream Analytics Introduction to Stream Processing Stream Data Integration Stream Analytics Support for Various Data Sources Yes (yes) Streaming ETL (Transformation, Routing, Validation) yes (yes) Format Translation yes (yes) Micro-Batching yes (yes) Auto Scaling yes Backpressure Support yes - Processing Time vs. Event Time - yes Stream-to-Static Joins (Lookup/Enrichment) yes yes Stream-to-Stream Joins - yes Windowing - yes State Management - yes Queryable State (aka Interactive Queries) - yes Streaming SQL (yes) yes Event Pattern Matching / Detection - Yes
  • 34. Stream Processing & Analytics Ecosystem Stream Analytics Event Hub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge Introduction to Stream Processing
  • 35. Implementing Stream Processing Solutions: "Event Hub" Introduction to Stream Processing
  • 36. Implementing "Event Hub" Hadoop Clusterd Hadoop Cluster Cluster Infrastructure Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Replay Big Data Batch Analytics Stream Analytics Modern Applications Introduction to Stream Processing
  • 37. Apache Kafka – A Streaming Platform Introduction to Stream Processing High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  • 38. Hold Data for Long-Term – Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Introduction to Stream Processing
  • 39. Keep Topics in Compacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction Introduction to Stream Processing V1 V2 V3 V4 V5 V6 V7 V8 V9 V1 0 V1 1 K1 K3 K4 K5 K2 K6
  • 40. Demo (I) - Kafka Truck-2 truck position Truck-1 Truck-3 console consumer Testdata-Generator by Hortonworks Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837
  • 41. Demo (I) – Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 –list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Introduction to Stream Processing
  • 42. Demo (I) – Run Producer and Kafka-Console-Consumer Introduction to Stream Processing
  • 43. Implementing Stream Processing Solutions: Streaming Data Sources Introduction to Stream Processing
  • 44. Integrating Streaming Data Sources Hadoop Clusterd Hadoop Cluster Cluster Infrastructure Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Replay Big Data Batch Analytics Stream Analytics Modern Applications Introduction to Stream Processing
  • 45. Integrating (Streaming) Data Sources Introduction to Stream Processing SQL Polling Change Data Capture (CDC) File Polling File Stream (File Tailing) File Stream (Appender) Sensor Stream
  • 46. Demo (II) – devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837 Message Queue Telemetry Transport (MQTT) Reduces amount of bytes flowing over the wire compared to HTTP -> reduces power and bandwidth usage Built in retry / QoS mechansim Pub/Sub architecture and Message Broker allows for flexibility in communication patterns Simple API, requires minimal "plumbing" code
  • 47. Demo (II) – devices send to MQTT instead of Kafka Introduction to Stream Processing
  • 48. Demo (II) - devices send to MQTT instead of Kafka – how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837
  • 49. Implementing Stream Processing Solutions: "Stream Data Integration" Introduction to Stream Processing
  • 50. Implementing "Stream Data Integration" Hadoop Clusterd Hadoop Cluster Cluster Infrastructure Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Replay Big Data Batch Analytics Stream Analytics Modern Applications Introduction to Stream Processing
  • 51. Integration with or without Transformation? Zero Transformation • No transformation, plain ingest, no schema validation • Keep the original format – Text, CSV, … • Allows to store data that may have errors in the schema Format Transformation • Prefer name of Format Translation • Simply change the format • Change format from Text to Avro • Does schema validation Enrichment Transformation • Add new data to the message • Do not change existing values • Convert a value from one system to another and add it to the message Value Transformation • Replaces values in the message • Convert a value from one system to another and change the value in-place • Destroys the raw data! Introduction to Stream Processing d
  • 52. Apache NiFi • Originated at NSA as Niagarafiles – developed behind closed doors for 8 years • Open sourced December 2014, Apache Top Level Project July 2015 • Look-and-Feel modernized in 2016 • Opaque, "file-oriented" payload • Distributed system of processors with centralized control • Based on flow-based programming concepts • Data Provenance and Data Lineage • Web-based user interface Introduction to Stream Processing
  • 53. StreamSets Data Collector Founded by ex-Cloudera, Informatica employees Continuous open source, intent-driven, big data ingest Visible, record-oriented approach fixes combinatorial explosion Batch or stream processing • Standalone, Spark cluster, MapReduce cluster IDE for pipeline development by ‘civilians’ Relatively new - first public release September 2015 So far, vast majority of commits are from StreamSets staff Introduction to Stream Processing
  • 54. Demo (III) – StreamSets Data Collector Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position console consumer Introduction to Stream Processing 2016-06-02 14:39:56.605,98,27,803014426, Wichita to Little Rock Route2, Normal,38.65,90.21,51872977366525026 31 Truck-2 truck/nn/ position Truck-1 Edge mqtt to kafka
  • 55. Demo (III): Dataflow for MQTT to Kafka Introduction to Stream Processing
  • 56. Demo (III): MQTT Source Introduction to Stream Processing
  • 57. Demo (III): Kafka Sink Introduction to Stream Processing
  • 58. Demo (III): Dataflow for MQTT to Kafka Introduction to Stream Processing
  • 59. Kafka Connect - Overview Source Connecto r Sink Connecto r Introduction to Stream Processing
  • 60. Demo (IV) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position console consumer Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837
  • 61. Demo (IV) – Create MQTT Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "io.confluent.connect.mqtt.MqttSourceConnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "name": "mqtt-source", "mqtt.server.uri": "tcp://mosquitto:1883», "mqtt.topics": "truck/+/position", "kafka.topic":"truck_position", "mqtt.clean.session.enabled":"true", "mqtt.keepalive.interval.seconds ": "60", "mqtt.qos":"0" } }' Introduction to Stream Processing
  • 62. Demo (IV) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position console consumer what about some analytics ? Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837
  • 63. Implementing Stream Processing Solutions: "Stream Analytics" Introduction to Stream Processing
  • 64. Implementing "Stream Analytics" Hadoop Clusterd Hadoop Cluster Cluster Infrastructure Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Replay Big Data Batch Analytics Stream Analytics Modern Applications Introduction to Stream Processing
  • 65. Kafka Streams - Overview • Designed as a simple and lightweight library in Apache Kafka • no external dependencies other than Kafka • Leverages Kafka as its internal messaging layer • Supports fault-tolerant local state • Event-at-a-time processing (not micro-batch) with millisecond latency • Supports Windowing (Fixed, Sliding and Session) and Stream-Stream / Stream-Table Joins • Interactive Query • Support for Event Time with late arrivals • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees Introduction to Stream Processing KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1");
  • 66. Demo (V) – Kafka Streams Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_dang erous_drivin g dangerous_ driving console consumer 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837 Introduction to Stream Processing
  • 67. Demo (V) - Create Stream final KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> source = builder.stream(stringSerde, stringSerde, "truck_position"); KStream<String, TruckPosition> positions = source.map((key,value) -> new KeyValue<>(key, TruckPosition.create(key,value))); KStream<String, TruckPosition> filtered = positions.filter(TruckPosition::filterNonNORMAL); filtered.map((key,value) -> new KeyValue<>(key,value.toCSV())) .to("dangerous_driving"); Introduction to Stream Processing
  • 68. KSQL: Streaming SQL for Apache Kafka • Enables stream processing with zero coding required • The simples way to process streams of data in real-time • Powered by Kafka and Kafka Streams: scalable, distributed, mature • All you need is Kafka – no complex deployments • STREAM and TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream • join STREAM and TABLE Introduction to Stream Processing Source: Confluent
  • 69. Demo (VI) - KSQL Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_dang erous_drivin g dangerous_ driving console consumer Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837
  • 70. Demo (VI) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql> Introduction to Stream Processing
  • 71. Demo (VI) - Create Stream ksql> CREATE STREAM truck_position_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='truck_position', value_format='DELIMITED'); Message ---------------- Stream created Introduction to Stream Processing
  • 72. Demo (VI) - Create Stream ksql> SELECT * FROM truck_position_s; 1522847870317 | "truck/13/position0 | �1522847870310 | 44 | 13 | 1390372503 | Normal | 41.71 | -91.32 | -2458274393837068406 1522847870376 | "truck/14/position0 | �1522847870370 | 35 | 14 | 1961634315 | Normal | 37.66 | -94.3 | -2458274393837068406 1522847870418 | "truck/21/position0 | �1522847870410 | 58 | 21 | 137128276 | Normal | 36.17 | -95.99 | -2458274393837068406 1522847870397 | "truck/29/position0 | �1522847870390 | 18 | 29 | 1090292248 | Normal | 41.67 | -91.24 | -2458274393837068406 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1522847914246 | "truck/11/position0 | �1522847914240 | 54 | 11 | 1198242881 | Lane Departure | 40.86 | -89.91 | -2458274393837068406 1522847915125 | "truck/10/position0 | �1522847915120 | 93 | 10 | 1384345811 | Overspeed | 40.38 | -89.17 | -2458274393837068406 1522847919216 | "truck/12/position0 | �1522847919210 | 75 | 12 | 24929475 | Overspeed | 42.23 | -91.78 | -2458274393837068406 Introduction to Stream Processing
  • 73. Demo (VI) - Create Stream ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING) Introduction to Stream Processing
  • 74. Demo (VI) - Create Stream ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_s; 1522848286143 | "truck/15/position0 | �1522848286125 | 98 | 15 | 987179512 | Overspeed | 34.78 | -92.31 | -2458274393837068406 1522848295729 | "truck/11/position0 | �1522848295720 | 54 | 11 | 1198242881 | Unsafe following distance | 38.43 | -90.35 | -2458274393837068406 1522848313018 | "truck/11/position0 | �1522848313000 | 54 | 11 | 1198242881 | Overspeed | 41.87 | -87.67 | -2458274393837068406 Introduction to Stream Processing
  • 75. Demo (VII) – KSQL Join Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_dang erous_drivin g dangerous_ driving Truck Driver jdbc- source trucking_ driver join_dangero us_driving_dr iver dangerous_d riving_driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10-02 15:19:00 console consumer {"id":27,"firstName":"Walter" ,"lastName":"Ward","availab le":"Y","birthdate":"24-JUL- 85","last_update":15069230 52012} Introduction to Stream Processing 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837
  • 76. Demo (VII) – Create JDBC Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"trucking_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }' Introduction to Stream Processing
  • 77. Demo (VII) - Create Table with Driver State Introduction to Stream Processing ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='trucking_driver', value_format='JSON', key='id'); Message ---------------- Table created
  • 78. Demo (VII) - Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON') AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance 1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure 1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail distance Introduction to Stream Processing
  • 79. Spark Structured Streaming • Stream processing on Structured API (2nd generation with DataFrames / Datasets rather than RDDs) • Code reuse between batch and streaming • Supports Windowing (Fixed, Sliding) and Stream-to- Stream / Stream-to-Static Joins • Support for Event Time with late arrivals • Currently only supports Micro-Batching • At-least-once and partial Exactly-once support • Support for Java, Scala, Python Introduction to Stream Processing val schema = new StructType() .add(...) val inputDf = spark .readStream .format(...) .option(...) .load() val filteredDf = inputDf.where(...) val query = filteredDf .writeStream .format(...) .option(...) .start() query.stop
  • 80. Demo (VIII) – Spark Structured Streaming Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_dang erous_drivin g dangerous_ driving console consumer 1522846456703,101,31,1927624662,Normal,37. 31,-94.31,-4802309397906690837 Introduction to Stream Processing
  • 81. Demo (VIII) – Spark Structured Streaming val rawDf = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "broker-1:9092") .option("subscribe", "truck_position") .load() val jsonDf = rawDf.select($"value" cast "string" as "json") .select(from_json($"json", truckPositionSchema) as "data") .selectExpr("data.*", "cast(cast (data.timestamp as double) / 1000 as timestamp) as eventTime") val jsonFilteredDf = jsonDf.where("data.eventType !='Normal'”) Introduction to Stream Processing
  • 82. Demo (VIII) – Spark Structured Streaming val query = jsonFilteredDf .selectExpr("to_json(struct(*)) AS value") .writeStream .format("kafka") .option("kafka.bootstrap.servers", "broker-1:9092") .option("topic","dangerous_driving_spark") .option("checkpointLocation", "/tmp") .start() Introduction to Stream Processing
  • 84. Summary Introduction to Stream Processing Stream Processing is the solution for low-latency Event Hub, Stream Data Integration and Stream Analytics are the main building blocks in your architecture Kafka is currently the de-facto standard for Event Hub Various options exists for Stream Data Integration and Stream Analytics Kafka offers a full-blown streaming platform with support for all 3 options (although not as flexible in Stream Data Integration) Not yet complete (SQL, Event Pattern Detection, Streaming Machine Learning)
  • 85. Introduction to Stream Processing Technology on its own won't help you. You need to know how to use it properly.