SlideShare a Scribd company logo
1 of 65
Design Patterns for Streaming
Applications
Advanced Training
© 2018 data Artisans2
Our Usecase for the Session
© 2018 data Artisans3
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• We have an organization using 3rd party services:
• Dropbox for file sharing
• Google Suite for collaborative editing and emails
• Slack for communication
• We want a platform to detect suspicious behaviour like:
• Does anyone share confidential docs with competitors
• Did we have any attempts to login from outside
• ...
• We assume each action (e.g. share a doc) creates an event with
userID, itemID, actionType.
© 2018 data Artisans4
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Detect the behaviour and raise alerts
• Report the user and the affected items
• Be able to update the ”suspicious behaviour” rules
© 2018 data Artisans5
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Detect the behaviour and raise alerts
• Report the user and the affected items
• Be able to update the ”suspicious behaviour” rules
© 2018 data Artisans6
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Act on the incoming events themselves to extract knowledge
• Report the user and the affected items
• Be able to update the ”suspicious behaviour” rules
© 2018 data Artisans7
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Act on the incoming events themselves to extract knowledge
• Report the user and the affected items
• Be able to update the ”suspicious behaviour” rules
© 2018 data Artisans8
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Act on the incoming events themselves to extract knowledge
• Fetch meta-info for incoming events (e.g. userID -> user_data)
• Be able to update the ”suspicious behaviour” rules
© 2018 data Artisans9
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Act on the incoming events themselves to extract knowledge
• Fetch meta-info for incoming events (e.g. userID -> user_data)
• Be able to update the ”suspicious behaviour” rules
© 2018 data Artisans10
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Act on the incoming events themselves to extract knowledge
• Fetch meta-info for incoming events (e.g. userID -> user_data)
• Ability to evolve the logic of the system
© 2018 data Artisans11
“Suspicious Behaviour“ Detection/Reporting
Your
organization
• Requirements for the platform:
• Act on the incoming events themselves to extract knowledge
• Fetch meta-info for incoming events (e.g. userID -> user_data)
• Guarantee the ability to evolve the logic of the system
These requirements represent the 3 patterns we
describe in the following
© 2018 data Artisans12
Time-based Aggregations
© 2018 data Artisans13
Detect “Suspicious Behaviour”
Your
organization
• Raise an alert when we have:
• more than 100 failed login attempts within 1 sec
• someone sharing more than 100 files within 1 hour
• someone logging in from multiple remote locations within 1
sec / Magic carpet travel
© 2018 data Artisans14
Detect “Suspicious Behaviour”
Your
organization
• Raise an alert when we have:
• more than 100 failed login attempts within 1 sec
• someone sharing more than 100 files within 1 hour
• someone logging in from multiple remote locations within 1
sec / Magic carpet travel
© 2018 data Artisans15
Detect “Suspicious Behaviour”
Your
organization
• Raise an alert when we have:
• more than 100 failed login attempts within 1 sec
• someone sharing more than 100 files within 1 hour
• someone logging in from multiple remote locations within 1
sec / Magic carpet travel
Time-Based Aggregations
© 2018 data Artisans16
Blueprint: Time-Based Aggregations
windowed
aggregation
source sink
state: contents of all the in-flight windows
© 2018 data Artisans17
Blueprint: Time-Based Aggregations
• Do I want to window by event-time or processing time?
• How do I set my watermark emission strategy?
• How do I handle out-of-order event arrivals?
• If using event-time, when is data considered late?
• How to handle late data?
Some things to look out for.
© 2018 data Artisans18
• Windowing API
• Timestamp assigners/watermark extractors
• Allowed lateness for defining when data is late
• Side output of late data as a special flow path
• ProcessFunction
• CEP
Flink features to look at.
Covered in
basic training
Blueprint: Time-Based Aggregations
© 2018 data Artisans19
Blueprint: Time-Based Aggregations
windowed
aggregation
kinesis
write to
Elastic
alert real
humanslate data
allowed lateness: 10 min
extract timestamps/watermarks
side output
© 2018 data Artisans20
Data Enrichment
© 2018 data Artisans21
Report who did it / what was affected
Your
organization
• You have the alert which contains:
• the userID of the perpetrator
• the itemID of the item (e.g. doc) that was affected (e.g. shared)
• Report:
• who is behind the userID
• what is behind the itemID
© 2018 data Artisans22
Report who did it / what was affected
Your
organization
• You have the alert which contains:
• the userID of the perpetrator
• the itemID of the item (e.g. doc) that was affected (e.g. shared)
• Report:
• who is behind the userID
• what is behind the itemID
© 2018 data Artisans23
Report who did it / what was affected
Your
organization
• You have the alert which contains:
• the userID of the perpetrator
• the itemID of the item (e.g. doc) that was affected (e.g. shared)
• Report:
• who is behind the userID
• what is behind the itemID
Data Enrichment
© 2018 data Artisans24
Blueprint: Enriching data with “side input”
filter enrich
© 2018 data Artisans25
Blueprint: Enriching data with “side input”
filter enrich
For each incoming element:
• extract a key
• query a DB or KV-store for info on that key
• emit an enriched version of the input element
© 2018 data Artisans26
Blueprint: Enriching data with “side input”
filter enrich
Naïve approach
synchronous access to
external data store for
every element
© 2018 data Artisans27
Blueprint: Enriching data with “side input”
filter enrich
Slightly better approach
asynchronous access to
external data store for
every element
© 2018 data Artisans28
Blueprint: Enriching data with “side input”
Communication delay can
dominate application
throughput and latency
© 2018 data Artisans29
Blueprint: Enriching data with “side input”
© 2018 data Artisans30
Blueprint: Enriching data with “side input”
• Requires:
‒ a client to the data store that supports asynchronous calls
• Offers:
‒ integration with Flink’s APIs
‒ fault-tolerance
‒ order guarantees for the emitted elements
‒ correct time semantics (event/processing time)
Flink’sAsyncI/O
© 2018 data Artisans31
Blueprint: Enriching data with “side input”Flink’sAsyncI/O
/** Example async function call. */
DataStream<...> result = AsyncDataStream.(un)orderedWait(
stream, // pre-enriched stream
new MyAsyncFunction(), // the function that will query the DB
1000, TimeUnit.MILLISECONDS, // timeout for the query to complete
100); // the max number of in-flight requests
© 2018 data Artisans32
Blueprint: Enriching data with “side input”Flink’sAsyncI/O
/** Example async function call. */
DataStream<...> result = AsyncDataStream.(un)orderedWait(
stream, // pre-enriched stream
new MyAsyncFunction(), // the function that will query the DB
1000, TimeUnit.MILLISECONDS, // timeout for the query to complete
100); // the max number of in-flight requests
unorderedWait: emit results in order of completion
orderedWait: emit results in order of arrival
INVARIANT: watermarks never overpass elements and vice versa
© 2018 data Artisans34
Blueprint: Enriching data with “side input”
filter enrich
“Next-level” approach
keep the enrichment
data in Flink state itselfchangelog
input
© 2018 data Artisans35
Blueprint: Enriching data with “side input”
• We use ConnectedStreams (see BasicTraining):
‒ 1st input stream: the pre-enriched data
‒ 2nd input stream: the changelog of the enrichment data
‒ key the two streams on the same key, e.g. userID
‒ connect() the two keyed streams
‒ specify a KeyedCoProcessFunction/CoProcessFunction that:
• on the changelog side stores the data in Flink’s keyed state
• on the other side looks up the incoming key in the state and enriches the data accordingly
// the two KeySelectors below must return keys in the same key-space
KeyedStream<TypeA, K> keyedPreEnrichedStream = preEnrichedStream.keyBy(...)
KeyedStream<TypeB, K> keyedEnrichmentData = enrichmentData.keyBy(...)
keyedPreEnrichedStream
.connect(keyedEnrichmentData) // inputs are keyed so...
.process(myCoProcessFun) // ... the function can access state and timers
© 2018 data Artisans36
Blueprint: Enriching data with “side input”
Time to see what we learned:
https://training.data-artisans.com/exercises/eventTimeJoin.html
© 2018 data Artisans37
Dynamic Processing
© 2018 data Artisans38
Evolve the set of rules
Your
organization
• Your organisation evolves:
• More services are added
• More people are added
• More departments use your software
• New types of “suspicious behaviour” emerge and other
become obsolete
© 2018 data Artisans39
Evolve the set of rules
Your
organization
• Your organisation evolves:
• More services are added
• More people are added
• More departments use your software
• New types of “suspicious behaviour” emerge and other
become obsolete
© 2018 data Artisans40
Evolve the set of rules
Your
organization
• Your organisation evolves:
• More services are added
• More people are added
• More departments use your software
• New types of “suspicious behaviour” emerge and other
become obsolete
• Your rule set evolves as well...
© 2018 data Artisans41
Evolve the set of rules
Your
organization
• Your organisation evolves:
• More services are added
• More people are added
• More departments use your software
• New types of “suspicious behaviour” emerge and other
become obsolete
• Your rule set evolves as well...
Dynamic Processing
© 2018 data Artisans42
Blueprint: Dynamic processing
pre-
processing
dynamic
processing
rules
input
broadcast
stream
broadcast
state
© 2018 data Artisans43
Dynamic Processing: Broadcast State
Example
Stream A: user actions
Stream B: rules
© 2018 data Artisans44
Example
keyBy
Stream B: rules
Dynamic Processing: Broadcast State
Stream A: user actions
© 2018 data Artisans45
Example Keyed State
keyBy
Stream B: rules
Dynamic Processing: Broadcast State
Stream A: user actions
© 2018 data Artisans46
Example
keyBy
broadcast
Stream B: rules
Dynamic Processing: Broadcast State
Stream A: user actions
© 2018 data Artisans47
Example Broadcast State
keyBy
broadcast
Stream B: rules
Dynamic Processing: Broadcast State
Stream A: user actions
© 2018 data Artisans48
Example
keyBy
broadcast
Stream B: rules
connect
Dynamic Processing: Broadcast State
Stream A: user actions
© 2018 data Artisans49
Example
keyBy
broadcast
Stream B: rules
connect
Dynamic Processing: Broadcast State
Stream A: user actions
© 2018 data Artisans50
REQUIREMENTS
• Partition elements by key
• Access to keyed state
• Broadcast elements
• State to store the broadcasted elements
‒ Non-keyed
‒ Identical on all tasks even after restoring/rescaling
• Ability to connect the two streams and react to incoming elements
‒ Connect keyed with non-keyed stream
‒ Have access to respective states
Dynamic Processing: Broadcast State
© 2018 data Artisans51
REQUIREMENTS
• Partition elements by key
• Access to keyed state
• Broadcast elements
• State to store the broadcasted elements
‒ Non-keyed
‒ Identical on all tasks even after restoring/rescaling
• Ability to connect the two streams and react to incoming elements
‒ Connect keyed with non-keyed stream
‒ Have access to respective states
Dynamic Processing: Broadcast State
© 2018 data Artisans52
// key the actions by user
KeyedStream<Action, UserID> perUserActionStream = actionStream
.keyBy(new KeySelector<Item, Color>(...))
// broadcast the rules and create the broadcast state
BroadcastStream<Rules> broadcastRuleStream = ruleStream
.broadcast(myMapStateDescriptor);
// connect the two streams and apply myFunction
DataStream<> resultStream = perUserActionStream
.connect(broadcastRuleStream)
.process(myFunction)
Dynamic Processing: Broadcast State API
© 2018 data Artisans53
• The Broadcast State has a map format (<K,V> pairs)
• The user-defined function is applied on a type of Connected Streams:
‒ Two “sides”: the broadcast side and the non-broadcast one
‒ Special type of CoProcessFunction in two “flavors”:
• Non-keyed non-broadcast side: BroadcastProcessFunction
• Keyed non-broadcast side:KeyedBroadcastProcessFunction
Dynamic Processing: Broadcast State API
© 2018 data Artisans54
Focusing on the function
• Depending on if the non-broadcast stream is keyed:
‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT>
void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out)
void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out)
‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT>
void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out)
void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out)
void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out)
Dynamic Processing: Broadcast State API
© 2018 data Artisans55
Focusing on the function
• Depending on if the non-broadcast stream is keyed:
‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT>
• void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out)
void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out)
‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT>
• void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out)
void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out)
void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out)
Dynamic Processing: Broadcast State API
© 2018 data Artisans56
Focusing on the function
• Depending on if the non-broadcast stream is keyed:
‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT>
• void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out)
• void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out)
‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT>
• void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out)
• void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out)
void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out)
Dynamic Processing: Broadcast State API
© 2018 data Artisans57
Focusing on the function
• Depending on if the non-broadcast stream is keyed:
‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT>
• void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out)
• void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out)
‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT>
• void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out)
• void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out)
• void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out)
Dynamic Processing: Broadcast State API
© 2018 data Artisans58
Non-Keyed Non-Broadcast Side: BroadcastProcessFunction
• Non-Keyed Non-Broadcast side:
‒ has read-only access to the broadcast state
• Broadcast side:
‒ has read-write access to the broadcast state
‒ each parallel task acts independently of the rest
‒ there is no communication between parallel tasks
Dynamic Processing: Broadcast State API
© 2018 data Artisans59
Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction
• Keyed Non-Broadcast side:
‒ has read-only access to the broadcast state
‒ has access to keyed state
‒ can register timers
• Broadcast side:
‒ has read-write access to the broadcast state
‒ can register function to be applied to the state of all keys
‒ each parallel task acts independently of the rest
‒ there is no communication between parallel tasks
Dynamic Processing: Broadcast State API
© 2018 data Artisans60
Focusing on the keyed case
• Depending on if the non-broadcast stream is keyed:
‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT>
• void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out)
• void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out)
‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT>
• void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out)
• void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out)
• void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out)
Dynamic Processing: Broadcast State API
© 2018 data Artisans61
Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction
• KeyedContext: non-broadcast side
‒ BroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> broadcastStateDesc);
‒ void applyToKeyedState(StateDescriptor<S,VS> stateDesc, KeyedStateFunction<KS, S> function)
Dynamic Processing: Broadcast State API
© 2018 data Artisans62
Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction
• KeyedContext: non-broadcast side
‒ BroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> broadcastStateDesc);
‒ void applyToKeyedState(StateDescriptor<S,VS> stateDesc, KeyedStateFunction<KS, S> function)
• KeyedReadOnlyContext: broadcast side
‒ ReadOnlyBroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> stateDesc)
‒ TimerService timerService()
Dynamic Processing: Broadcast State API
© 2018 data Artisans63
Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction
• KeyedContext: non-broadcast side
‒ BroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> broadcastStateDesc);
‒ void applyToKeyedState(StateDescriptor<S,VS> stateDesc, KeyedStateFunction<KS, S> function)
• KeyedReadOnlyContext: broadcast side
‒ ReadOnlyBroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> stateDesc)
‒ TimerService timerService()
• OnTimerContext: upon timer firing
‒ TimerService timerService()
‒ TimeDomain timeDomain()
‒ KS getCurrentKey()
Dynamic Processing: Broadcast State API
© 2018 data Artisans64
Blueprint: Dynamic processing
Time to see what we learned:
http://training.data-artisans.com/exercises/taxiQuery.html
© 2018 data Artisans65
Closing
Thank you!
aljoscha@apache.org
kkloudas@apache.org
@dataArtisans
@ApacheFlink
We are hiring!
data-artisans.com/careers

More Related Content

What's hot

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)PRISMA CSI
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizationsSzehon Ho
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
2.apache spark 실습
2.apache spark 실습2.apache spark 실습
2.apache spark 실습동현 강
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 

What's hot (20)

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Presto
PrestoPresto
Presto
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
2.apache spark 실습
2.apache spark 실습2.apache spark 실습
2.apache spark 실습
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 

Similar to Advanced Flink Training - Design patterns for streaming applications

Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Aljoscha Krettek
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and ConsumptionBrian Greig
 
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...confluent
 
From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...
From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...
From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...Codemotion
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Artificial Intelligence Powered Event Monitoring_4-11-2022.pptx
Artificial Intelligence Powered Event Monitoring_4-11-2022.pptxArtificial Intelligence Powered Event Monitoring_4-11-2022.pptx
Artificial Intelligence Powered Event Monitoring_4-11-2022.pptxPerfomatix Solutions
 
Enabling Event Driven Architecture with PubSub+
Enabling Event Driven Architecture with PubSub+Enabling Event Driven Architecture with PubSub+
Enabling Event Driven Architecture with PubSub+Himanshu Gupta
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data PlatformLivePerson
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...HostedbyConfluent
 
Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServicesDavid Walker
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Event-based API Patterns and Practices - AsyncAPI Online Conference
Event-based API Patterns and Practices - AsyncAPI Online ConferenceEvent-based API Patterns and Practices - AsyncAPI Online Conference
Event-based API Patterns and Practices - AsyncAPI Online ConferenceLaunchAny
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
 
Event-Based API Patterns and Practices
Event-Based API Patterns and PracticesEvent-Based API Patterns and Practices
Event-Based API Patterns and PracticesLaunchAny
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 

Similar to Advanced Flink Training - Design patterns for streaming applications (20)

Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and Consumption
 
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...
From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...
From ICT Event Management to Big Data Management - Roberto Raguseo - Codemoti...
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Artificial Intelligence Powered Event Monitoring_4-11-2022.pptx
Artificial Intelligence Powered Event Monitoring_4-11-2022.pptxArtificial Intelligence Powered Event Monitoring_4-11-2022.pptx
Artificial Intelligence Powered Event Monitoring_4-11-2022.pptx
 
Enabling Event Driven Architecture with PubSub+
Enabling Event Driven Architecture with PubSub+Enabling Event Driven Architecture with PubSub+
Enabling Event Driven Architecture with PubSub+
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
 
Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServices
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Event-based API Patterns and Practices - AsyncAPI Online Conference
Event-based API Patterns and Practices - AsyncAPI Online ConferenceEvent-based API Patterns and Practices - AsyncAPI Online Conference
Event-based API Patterns and Practices - AsyncAPI Online Conference
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
 
Event-Based API Patterns and Practices
Event-Based API Patterns and PracticesEvent-Based API Patterns and Practices
Event-Based API Patterns and Practices
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
Application Security Logging with Splunk using Java
Application Security Logging with Splunk using JavaApplication Security Logging with Splunk using Java
Application Security Logging with Splunk using Java
 

More from Aljoscha Krettek

Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingAljoscha Krettek
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®Aljoscha Krettek
 
(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache FlinkAljoscha Krettek
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkAljoscha Krettek
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache FlinkAljoscha Krettek
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Aljoscha Krettek
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineAljoscha Krettek
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and WindowsAdventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and WindowsAljoscha Krettek
 
Flink 0.10 - Upcoming Features
Flink 0.10 - Upcoming FeaturesFlink 0.10 - Upcoming Features
Flink 0.10 - Upcoming FeaturesAljoscha Krettek
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Aljoscha Krettek
 

More from Aljoscha Krettek (14)

Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data Processing
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache Flink
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and WindowsAdventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and Windows
 
Flink 0.10 - Upcoming Features
Flink 0.10 - Upcoming FeaturesFlink 0.10 - Upcoming Features
Flink 0.10 - Upcoming Features
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Apache Flink Hands-On
Apache Flink Hands-OnApache Flink Hands-On
Apache Flink Hands-On
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

Advanced Flink Training - Design patterns for streaming applications

  • 1. Design Patterns for Streaming Applications Advanced Training
  • 2. © 2018 data Artisans2 Our Usecase for the Session
  • 3. © 2018 data Artisans3 “Suspicious Behaviour“ Detection/Reporting Your organization • We have an organization using 3rd party services: • Dropbox for file sharing • Google Suite for collaborative editing and emails • Slack for communication • We want a platform to detect suspicious behaviour like: • Does anyone share confidential docs with competitors • Did we have any attempts to login from outside • ... • We assume each action (e.g. share a doc) creates an event with userID, itemID, actionType.
  • 4. © 2018 data Artisans4 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Detect the behaviour and raise alerts • Report the user and the affected items • Be able to update the ”suspicious behaviour” rules
  • 5. © 2018 data Artisans5 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Detect the behaviour and raise alerts • Report the user and the affected items • Be able to update the ”suspicious behaviour” rules
  • 6. © 2018 data Artisans6 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Act on the incoming events themselves to extract knowledge • Report the user and the affected items • Be able to update the ”suspicious behaviour” rules
  • 7. © 2018 data Artisans7 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Act on the incoming events themselves to extract knowledge • Report the user and the affected items • Be able to update the ”suspicious behaviour” rules
  • 8. © 2018 data Artisans8 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Act on the incoming events themselves to extract knowledge • Fetch meta-info for incoming events (e.g. userID -> user_data) • Be able to update the ”suspicious behaviour” rules
  • 9. © 2018 data Artisans9 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Act on the incoming events themselves to extract knowledge • Fetch meta-info for incoming events (e.g. userID -> user_data) • Be able to update the ”suspicious behaviour” rules
  • 10. © 2018 data Artisans10 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Act on the incoming events themselves to extract knowledge • Fetch meta-info for incoming events (e.g. userID -> user_data) • Ability to evolve the logic of the system
  • 11. © 2018 data Artisans11 “Suspicious Behaviour“ Detection/Reporting Your organization • Requirements for the platform: • Act on the incoming events themselves to extract knowledge • Fetch meta-info for incoming events (e.g. userID -> user_data) • Guarantee the ability to evolve the logic of the system These requirements represent the 3 patterns we describe in the following
  • 12. © 2018 data Artisans12 Time-based Aggregations
  • 13. © 2018 data Artisans13 Detect “Suspicious Behaviour” Your organization • Raise an alert when we have: • more than 100 failed login attempts within 1 sec • someone sharing more than 100 files within 1 hour • someone logging in from multiple remote locations within 1 sec / Magic carpet travel
  • 14. © 2018 data Artisans14 Detect “Suspicious Behaviour” Your organization • Raise an alert when we have: • more than 100 failed login attempts within 1 sec • someone sharing more than 100 files within 1 hour • someone logging in from multiple remote locations within 1 sec / Magic carpet travel
  • 15. © 2018 data Artisans15 Detect “Suspicious Behaviour” Your organization • Raise an alert when we have: • more than 100 failed login attempts within 1 sec • someone sharing more than 100 files within 1 hour • someone logging in from multiple remote locations within 1 sec / Magic carpet travel Time-Based Aggregations
  • 16. © 2018 data Artisans16 Blueprint: Time-Based Aggregations windowed aggregation source sink state: contents of all the in-flight windows
  • 17. © 2018 data Artisans17 Blueprint: Time-Based Aggregations • Do I want to window by event-time or processing time? • How do I set my watermark emission strategy? • How do I handle out-of-order event arrivals? • If using event-time, when is data considered late? • How to handle late data? Some things to look out for.
  • 18. © 2018 data Artisans18 • Windowing API • Timestamp assigners/watermark extractors • Allowed lateness for defining when data is late • Side output of late data as a special flow path • ProcessFunction • CEP Flink features to look at. Covered in basic training Blueprint: Time-Based Aggregations
  • 19. © 2018 data Artisans19 Blueprint: Time-Based Aggregations windowed aggregation kinesis write to Elastic alert real humanslate data allowed lateness: 10 min extract timestamps/watermarks side output
  • 20. © 2018 data Artisans20 Data Enrichment
  • 21. © 2018 data Artisans21 Report who did it / what was affected Your organization • You have the alert which contains: • the userID of the perpetrator • the itemID of the item (e.g. doc) that was affected (e.g. shared) • Report: • who is behind the userID • what is behind the itemID
  • 22. © 2018 data Artisans22 Report who did it / what was affected Your organization • You have the alert which contains: • the userID of the perpetrator • the itemID of the item (e.g. doc) that was affected (e.g. shared) • Report: • who is behind the userID • what is behind the itemID
  • 23. © 2018 data Artisans23 Report who did it / what was affected Your organization • You have the alert which contains: • the userID of the perpetrator • the itemID of the item (e.g. doc) that was affected (e.g. shared) • Report: • who is behind the userID • what is behind the itemID Data Enrichment
  • 24. © 2018 data Artisans24 Blueprint: Enriching data with “side input” filter enrich
  • 25. © 2018 data Artisans25 Blueprint: Enriching data with “side input” filter enrich For each incoming element: • extract a key • query a DB or KV-store for info on that key • emit an enriched version of the input element
  • 26. © 2018 data Artisans26 Blueprint: Enriching data with “side input” filter enrich Naïve approach synchronous access to external data store for every element
  • 27. © 2018 data Artisans27 Blueprint: Enriching data with “side input” filter enrich Slightly better approach asynchronous access to external data store for every element
  • 28. © 2018 data Artisans28 Blueprint: Enriching data with “side input” Communication delay can dominate application throughput and latency
  • 29. © 2018 data Artisans29 Blueprint: Enriching data with “side input”
  • 30. © 2018 data Artisans30 Blueprint: Enriching data with “side input” • Requires: ‒ a client to the data store that supports asynchronous calls • Offers: ‒ integration with Flink’s APIs ‒ fault-tolerance ‒ order guarantees for the emitted elements ‒ correct time semantics (event/processing time) Flink’sAsyncI/O
  • 31. © 2018 data Artisans31 Blueprint: Enriching data with “side input”Flink’sAsyncI/O /** Example async function call. */ DataStream<...> result = AsyncDataStream.(un)orderedWait( stream, // pre-enriched stream new MyAsyncFunction(), // the function that will query the DB 1000, TimeUnit.MILLISECONDS, // timeout for the query to complete 100); // the max number of in-flight requests
  • 32. © 2018 data Artisans32 Blueprint: Enriching data with “side input”Flink’sAsyncI/O /** Example async function call. */ DataStream<...> result = AsyncDataStream.(un)orderedWait( stream, // pre-enriched stream new MyAsyncFunction(), // the function that will query the DB 1000, TimeUnit.MILLISECONDS, // timeout for the query to complete 100); // the max number of in-flight requests unorderedWait: emit results in order of completion orderedWait: emit results in order of arrival INVARIANT: watermarks never overpass elements and vice versa
  • 33. © 2018 data Artisans34 Blueprint: Enriching data with “side input” filter enrich “Next-level” approach keep the enrichment data in Flink state itselfchangelog input
  • 34. © 2018 data Artisans35 Blueprint: Enriching data with “side input” • We use ConnectedStreams (see BasicTraining): ‒ 1st input stream: the pre-enriched data ‒ 2nd input stream: the changelog of the enrichment data ‒ key the two streams on the same key, e.g. userID ‒ connect() the two keyed streams ‒ specify a KeyedCoProcessFunction/CoProcessFunction that: • on the changelog side stores the data in Flink’s keyed state • on the other side looks up the incoming key in the state and enriches the data accordingly // the two KeySelectors below must return keys in the same key-space KeyedStream<TypeA, K> keyedPreEnrichedStream = preEnrichedStream.keyBy(...) KeyedStream<TypeB, K> keyedEnrichmentData = enrichmentData.keyBy(...) keyedPreEnrichedStream .connect(keyedEnrichmentData) // inputs are keyed so... .process(myCoProcessFun) // ... the function can access state and timers
  • 35. © 2018 data Artisans36 Blueprint: Enriching data with “side input” Time to see what we learned: https://training.data-artisans.com/exercises/eventTimeJoin.html
  • 36. © 2018 data Artisans37 Dynamic Processing
  • 37. © 2018 data Artisans38 Evolve the set of rules Your organization • Your organisation evolves: • More services are added • More people are added • More departments use your software • New types of “suspicious behaviour” emerge and other become obsolete
  • 38. © 2018 data Artisans39 Evolve the set of rules Your organization • Your organisation evolves: • More services are added • More people are added • More departments use your software • New types of “suspicious behaviour” emerge and other become obsolete
  • 39. © 2018 data Artisans40 Evolve the set of rules Your organization • Your organisation evolves: • More services are added • More people are added • More departments use your software • New types of “suspicious behaviour” emerge and other become obsolete • Your rule set evolves as well...
  • 40. © 2018 data Artisans41 Evolve the set of rules Your organization • Your organisation evolves: • More services are added • More people are added • More departments use your software • New types of “suspicious behaviour” emerge and other become obsolete • Your rule set evolves as well... Dynamic Processing
  • 41. © 2018 data Artisans42 Blueprint: Dynamic processing pre- processing dynamic processing rules input broadcast stream broadcast state
  • 42. © 2018 data Artisans43 Dynamic Processing: Broadcast State Example Stream A: user actions Stream B: rules
  • 43. © 2018 data Artisans44 Example keyBy Stream B: rules Dynamic Processing: Broadcast State Stream A: user actions
  • 44. © 2018 data Artisans45 Example Keyed State keyBy Stream B: rules Dynamic Processing: Broadcast State Stream A: user actions
  • 45. © 2018 data Artisans46 Example keyBy broadcast Stream B: rules Dynamic Processing: Broadcast State Stream A: user actions
  • 46. © 2018 data Artisans47 Example Broadcast State keyBy broadcast Stream B: rules Dynamic Processing: Broadcast State Stream A: user actions
  • 47. © 2018 data Artisans48 Example keyBy broadcast Stream B: rules connect Dynamic Processing: Broadcast State Stream A: user actions
  • 48. © 2018 data Artisans49 Example keyBy broadcast Stream B: rules connect Dynamic Processing: Broadcast State Stream A: user actions
  • 49. © 2018 data Artisans50 REQUIREMENTS • Partition elements by key • Access to keyed state • Broadcast elements • State to store the broadcasted elements ‒ Non-keyed ‒ Identical on all tasks even after restoring/rescaling • Ability to connect the two streams and react to incoming elements ‒ Connect keyed with non-keyed stream ‒ Have access to respective states Dynamic Processing: Broadcast State
  • 50. © 2018 data Artisans51 REQUIREMENTS • Partition elements by key • Access to keyed state • Broadcast elements • State to store the broadcasted elements ‒ Non-keyed ‒ Identical on all tasks even after restoring/rescaling • Ability to connect the two streams and react to incoming elements ‒ Connect keyed with non-keyed stream ‒ Have access to respective states Dynamic Processing: Broadcast State
  • 51. © 2018 data Artisans52 // key the actions by user KeyedStream<Action, UserID> perUserActionStream = actionStream .keyBy(new KeySelector<Item, Color>(...)) // broadcast the rules and create the broadcast state BroadcastStream<Rules> broadcastRuleStream = ruleStream .broadcast(myMapStateDescriptor); // connect the two streams and apply myFunction DataStream<> resultStream = perUserActionStream .connect(broadcastRuleStream) .process(myFunction) Dynamic Processing: Broadcast State API
  • 52. © 2018 data Artisans53 • The Broadcast State has a map format (<K,V> pairs) • The user-defined function is applied on a type of Connected Streams: ‒ Two “sides”: the broadcast side and the non-broadcast one ‒ Special type of CoProcessFunction in two “flavors”: • Non-keyed non-broadcast side: BroadcastProcessFunction • Keyed non-broadcast side:KeyedBroadcastProcessFunction Dynamic Processing: Broadcast State API
  • 53. © 2018 data Artisans54 Focusing on the function • Depending on if the non-broadcast stream is keyed: ‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT> void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out) void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out) ‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT> void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out) void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out) void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) Dynamic Processing: Broadcast State API
  • 54. © 2018 data Artisans55 Focusing on the function • Depending on if the non-broadcast stream is keyed: ‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT> • void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out) void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out) ‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT> • void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out) void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out) void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) Dynamic Processing: Broadcast State API
  • 55. © 2018 data Artisans56 Focusing on the function • Depending on if the non-broadcast stream is keyed: ‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT> • void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out) • void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out) ‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT> • void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out) • void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out) void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) Dynamic Processing: Broadcast State API
  • 56. © 2018 data Artisans57 Focusing on the function • Depending on if the non-broadcast stream is keyed: ‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT> • void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out) • void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out) ‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT> • void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out) • void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out) • void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) Dynamic Processing: Broadcast State API
  • 57. © 2018 data Artisans58 Non-Keyed Non-Broadcast Side: BroadcastProcessFunction • Non-Keyed Non-Broadcast side: ‒ has read-only access to the broadcast state • Broadcast side: ‒ has read-write access to the broadcast state ‒ each parallel task acts independently of the rest ‒ there is no communication between parallel tasks Dynamic Processing: Broadcast State API
  • 58. © 2018 data Artisans59 Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction • Keyed Non-Broadcast side: ‒ has read-only access to the broadcast state ‒ has access to keyed state ‒ can register timers • Broadcast side: ‒ has read-write access to the broadcast state ‒ can register function to be applied to the state of all keys ‒ each parallel task acts independently of the rest ‒ there is no communication between parallel tasks Dynamic Processing: Broadcast State API
  • 59. © 2018 data Artisans60 Focusing on the keyed case • Depending on if the non-broadcast stream is keyed: ‒ Non-keyed: BroadcastProcessFunction<IN1, IN2, OUT> • void processElement(IN1 value, ReadOnlyContext ctx, Collector<OUT> out) • void processBroadcastElement(IN2 value, Context ctx, Collector<OUT> out) ‒ Keyed: KeyedBroadcastProcessFunction<K, IN1, IN2, OUT> • void processElement(IN1 value, KeyedReadOnlyContext ctx, Collector<OUT> out) • void processBroadcastElement(IN2 value, KeyedContext ctx, Collector<OUT> out) • void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) Dynamic Processing: Broadcast State API
  • 60. © 2018 data Artisans61 Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction • KeyedContext: non-broadcast side ‒ BroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> broadcastStateDesc); ‒ void applyToKeyedState(StateDescriptor<S,VS> stateDesc, KeyedStateFunction<KS, S> function) Dynamic Processing: Broadcast State API
  • 61. © 2018 data Artisans62 Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction • KeyedContext: non-broadcast side ‒ BroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> broadcastStateDesc); ‒ void applyToKeyedState(StateDescriptor<S,VS> stateDesc, KeyedStateFunction<KS, S> function) • KeyedReadOnlyContext: broadcast side ‒ ReadOnlyBroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> stateDesc) ‒ TimerService timerService() Dynamic Processing: Broadcast State API
  • 62. © 2018 data Artisans63 Keyed Non-Broadcast Side: KeyedBroadcastProcessFunction • KeyedContext: non-broadcast side ‒ BroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> broadcastStateDesc); ‒ void applyToKeyedState(StateDescriptor<S,VS> stateDesc, KeyedStateFunction<KS, S> function) • KeyedReadOnlyContext: broadcast side ‒ ReadOnlyBroadcastState<K,V> getBroadcastState(MapStateDescriptor<K,V> stateDesc) ‒ TimerService timerService() • OnTimerContext: upon timer firing ‒ TimerService timerService() ‒ TimeDomain timeDomain() ‒ KS getCurrentKey() Dynamic Processing: Broadcast State API
  • 63. © 2018 data Artisans64 Blueprint: Dynamic processing Time to see what we learned: http://training.data-artisans.com/exercises/taxiQuery.html
  • 64. © 2018 data Artisans65 Closing

Editor's Notes

  1. As a running example for this session, we will use the case of an organization which uses 3rd party software for its day to day operations, e.g. ... And given that the organization grows, we want to build a platform for suspicious behavior detection. Each action on these services ...
  2. Focusing on the first, we can rephrase/generalize it by the ... (slide)
  3. Same as the fist...
  4. Same as before
  5. Other Use cases Give me the number of tweet impressions per tweet for every hour/day/… Calculate the average temperature over 10 minute intervals for each sensor in my warehouse Aggregate user interaction data for my website to display on my internal dashboards
  6. We observe that they all have a time constraint, and this is reasonable as we have a continuous stream of incoming data
  7. All the above fall into the category of time-based aggregations. Now let’s see what Flink offers for these usecases...
  8. Now you have the alert, but you only have the userID and for example the documentID of the doc that was shared. Other Use cases Enrich user events with known user data Add geolocation information to geotagged events
  9. Now you have the alert, but you only have the userID and for example the documentID of the doc that was shared. Other Use cases Enrich user events with known user data Add geolocation information to geotagged events
  10. Now you have the alert, but you only have the userID and for example the documentID of the doc that was shared. Other Use cases Enrich user events with known user data Add geolocation information to geotagged events
  11. NOTE: The fact that this will happen for each incoming element, put the enrichment process in the critical path of your application ...so be careful.
  12. Finally, a more pure stream-y approach is to “connect” your main stream with the stream containing the changelog of your enrichment data, keep the enrichment data in state Managed by Flink, and use this state for the actual erichment, as shown in the figure. Once again, Flink guarantees that state will be fault tolerant.
  13. Finally, a more pure stream-y approach is to “connect” your main stream with the stream containing the changelog of your enrichment data, keep the enrichment data in state Managed by Flink, and use this state for the actual erichment, as shown in the figure. Once again, Flink guarantees that state will be fault tolerant.
  14. Finally, a more pure stream-y approach is to “connect” your main stream with the stream containing the changelog of your enrichment data, keep the enrichment data in state Managed by Flink, and use this state for the actual erichment, as shown in the figure. Once again, Flink guarantees that state will be fault tolerant.
  15. Now you have the alert, but you only have the userID and for example the documentID of the doc that was shared. Other Use cases Enrich user events with known user data Add geolocation information to geotagged events
  16. Now you have the alert, but you only have the userID and for example the documentID of the doc that was shared. Other Use cases Enrich user events with known user data Add geolocation information to geotagged events
  17. Now you have the alert, but you only have the userID and for example the documentID of the doc that was shared. Other Use cases Enrich user events with known user data Add geolocation information to geotagged events
  18. Other use cases Update of processing rules via DSL, think dynamic fraud-detection rules/policies Live-update of machine learning models
  19. Imagine that we have our stream of user actions. In the figure, this is the top stream of objects of different colors and shapes, with the color representing the userID and the shape, the type of action. Now we want to find pairs of actions of the same user (color) that follow a certain pattern, e.g. a rectangle followed by a triangle (i.e. a login from location A followed by a login from location B). In addition, the set of interesting patterns evolve over time. In this case, we would have our stream of user actions (streamA) and our our rules (streamB), and we want to feed these streams into our green operator of parallelism 3, which will detect the matching sequences.
  20. We want the matches to have objects of the same color, so we first partition our data stream by the color of each object, using the keyBy color. This will give us a keyed stream, where elements are partitioned by color, as shown in the figure.
  21. Then, given that we want to detect pairs of objects, we need to store somewhere each matching first element. Given that our stream is now keyed, we can use Flink’s keyed state for that, as shown in the figure.
  22. Now let’s move on to the second stream, the one containing our rules. We want those rules to be applied to all the objects of streamA, ie all the colors. For this, we need to Broadcast the rules to all the parallel tasks of our operator.
  23. And, as before, we need to be able to store these rules for future use. This is where the new type of state comes into play, as it allows to store the elements of a broadcasted stream, as shown in the figure.
  24. Now our operator has the necessary data from both streams, data and rules, the operator needs to be able to connect these two streams, i.e the one needs to be able to ”see” the state of the other.
  25. This is done so that when the yellow triangle arrives, the rules will be read from the broadcast state, the already received yellow rectangle will be read from the keyed state, and each rule is going to be evaluated.
  26. The grey requirements are the ones that are offered by the default Flink, without Broadcast state.
  27. KEYBY: ... BROADCAST: Then for the rule stream, we will do a broadcast with a MapStateDescriptor. This will broadcast the elements in the stream to all downstream tasks, and create the state to store them. As we will also see later, broadcast state has a map format, so it stores pairs of a Key associated with a Value. In this case we can have a String representing the Name of the rule (or an identifier) and a list of all the currently accepted but not matched elements. CONNECT: finally we will connect the keyed with the non-keyed, broadcast stream and we will call process() on the result with the function containing our matching logic.
  28. As said earlier, we use a mapState descriptor in the broadcast() command, as the BroadcastState has a Map format. In addition, as shown in the previous slide, we connect() the keyed with the broadcast stream. For those of you familiar with Flink’s apis, this means that your function will have “two sides”, with each describing how to react on an incoming element from each of the two streams. In the case of the Broadcast State Pattern, the broadcase side has rw access to the broadcast state, while the non-broadcast has only read access. The reason for that is that ... Broadcast state does NOT mean that whatever your function does on one parallel instance (task), gets sent to all other parallel instances. So make sure that your computation on an element is the same across all instances.
  29. ... Finally the KeyedBroadcastProcessFunction has the onTimer() method which contains the logic to execute when a timer fires. As we will see later, when operating on the keyed side of our KeyedBroadcastProcessFunction, we have access to an internal timerService, which allows us to register timers in both event or processing time. This is also aligned with the ProcessFunction and KeyedProcessFunction offered by Flink.
  30. To guarantee that the contents of the Broadcast State are the same across all parallel instances of our operator, we give read-write access only to the broadcast side, And we require the computation on each incoming element to be identical across all tasks.
  31. On the non-broadcast side, apart from the access to the broadcast state, the processElement() can do the same stuff as in the normal ProcessFunction.
  32. And all of them can emit to sideoutputs, ask the timestamp of the element, the current processing time and the current watermark.
  33. And all of them can emit to sideoutputs, ask the timestamp of the element, the current processing time and the current watermark.
  34. And all of them can emit to sideoutputs, ask the timestamp of the element, the current processing time and the current watermark.
  35. Finally, a more pure stream-y approach is to “connect” your main stream with the stream containing the changelog of your enrichment data, keep the enrichment data in state Managed by Flink, and use this state for the actual erichment, as shown in the figure. Once again, Flink guarantees that state will be fault tolerant.
  36. (Keep this slide up during the Q&A part of your talk. Having this up in the final 5-10 minutes of the session gives the audience something useful to look at.)