SlideShare a Scribd company logo
1 of 36
Dawid Wysakowicz, SOFTWARE ENGINEER
Detecting Patterns in
Event Streams with Flink
SQL
© 2018 data Artisans2
ABOUT DATA ARTISANS
Original Creators of
Apache Flink®
Enterprise Ready
Real Time Stream Processing
© 2018 data Artisans3
FREE TRIAL DOWNLOAD
data-artisans.com/download
© 2018 data Artisans4
Why people like SQL?
• Well known interface: "lingua franca of data processing"
• No programming required - easier to learn
• Declarative way to express your query, WHAT not HOW
• Out-of-the box optimization
• Reusability
‒ built-in functions
‒ user defined functions
© 2018 data Artisans5
SQL on Streams
● Unified semantics for stream and batch input
○ Same query, same input, same result
Stream is converted
into dynamic table
Dynamic table is
converted into stream
Continuous query on
dynamic table produces
result d table
© 2018 data Artisans6
The New York Taxi Rides Data Set
• The New York City & Limousine Commission provides a public data set about
past taxi rides in New York City
• We can derive a streaming table from the data
• Table: TaxiRides
rideId: BIGINT // ID of the taxi ride
taxiId: BIGINT // ID of the taxi cab
driverId: BIGINT // ID of the drive
isStart: BOOLEAN // flag for pick-up (true) or drop-off
(false) event
lon: DOUBLE // longitude of pick-up or
drop-off location
lat: DOUBLE // latitude of pick-up or drop-
off location
rowTime: TIMESTAMP // time of pick-up or drop-off event
© 2018 data Artisans7
Number of rides per 30 minutes per area
Time
Number of rides
© 2018 data Artisans8
SQL Example
SELECT
toCellId(lat, lon) as cellId,
COUNT(distinct rideId) as rideCount,
TUMBLE_ROWTIME(rowTime, INTERVAL '30' minute) AS rowTime,
cast (TUMBLE_START(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS startTime,
cast(TUMBLE_END(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS endTime
FROM
TaxiRides
GROUP BY
toCellId(lat, lon),
TUMBLE(rowTime, INTERVAL '30' minute)
© 2018 data Artisans9
SQL Support in Flink 1.6
• SELECT FROM WHERE
• GROUP BY / HAVING
‒ Non-windowed, TUMBLE, HOP, SESSION windows
• JOIN
‒ Windowed INNER, LEFT / RIGHT / FULL OUTER JOIN
‒ Non-windowed INNER / LEFT / RIGHT / FULL OUTER JOIN
• Scalar, aggregation, table-valued UDFs
• Many built-in functions
• [streaming only] OVER / WINDOW
‒ UNBOUNDED / BOUNDED PRECEDING
© 2018 data Artisans10
SQL Support in Flink 1.6
• SELECT FROM WHERE
• GROUP BY / HAVING
‒ Non-windowed, TUMBLE, HOP, SESSION windows
• JOIN
‒ Windowed INNER, LEFT / RIGHT / FULL OUTER JOIN
‒ Non-windowed INNER / LEFT / RIGHT / FULL OUTER JOIN
• Scalar, aggregation, table-valued UDFs
• Many built-in functions
• [streaming only] OVER / WINDOW
‒ UNBOUNDED / BOUNDED PRECEDING
12:20-13:00 Kesselhaus
Timo Walther, data Artisans
Flink SQL in Action
© 2018 data Artisans11
• Find rides with mid-stops
What is hard with SQL?
© 2018 data Artisans12
Mid stops
Pattern.<Row>begin("S”).where((row) -> {
return row.isStart == true;
}).next("E").where( (row) -> {
return row.isStart == true;
});
CEP.pattern(input.keyBy(“driverId”), pattern)
.flatSelect(
new PatternFlatSelectFunction<Row, Row>() {
@Override
public void flatSelect(
Map<String, List<Row>> pattern,
Collector<Row> out) throws Exception {
out.collect((
pattern.get(“S”).get(0).getRideId
));
}
);
© 2018 data Artisans13
MATCH_RECOGNIZE
SQL:2016 extension
© 2018 data Artisans14
Common use-cases
• stock market analysis
• customer behaviour
• tracking money laundering
• service quality
• network intrusion detection
© 2018 data Artisans15
Syntax
tableReference:
tablePrimary
[ matchRecognize ]
[ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]
select:
SELECT [ STREAM ] [ ALL | DISTINCT ]
{ * | projectItem [, projectItem ]* }
FROM tableExpression
[matchRecognize]
[ WHERE booleanExpression ]
[ GROUP BY { groupItem [, groupItem ]* } ]
….
© 2018 data Artisans16
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rowTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S E)
DEFINE
S AS S.isStart = true,
E AS E.isStart = true
)
Rides with mid-stops
partition the data by
given field = keyBy
© 2018 data Artisans17
Rides with mid-stops
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rowTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S E)
DEFINE
S AS S.isStart = true,
E AS E.isStart = true
)
● specify order
● primary order = Event or
Processing time
1 3 2 5 4
© 2018 data Artisans18
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rowTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S E)
DEFINE
S AS S.isStart = true,
E AS E.isStart = true
)
Rides with mid-stops
construct pattern
S E
© 2018 data Artisans19
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rowTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S E)
DEFINE
S AS S.isStart = true,
E AS E.isStart = true
)
Rides with mid-stops
extract measures from
matched sequence
S E
© 2018 data Artisans20
Multi-Stop
© 2018 data Artisans21
Rides with more than one mid-stop
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rowTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S M{2,} E)
DEFINE
S AS S.isStart = true,
M AS M.rideId <> S.rideId,
E AS E.isStart = false
AND E.rideId = S.rideId
)
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rowTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S E)
DEFINE
S AS S.isStart = true,
E AS E.isStart = true
)
© 2018 data Artisans22
Rush (peak) hours - V shape
Time
Number of rides
© 2018 data Artisans23
Statistics per Area
CREATE VIEW RidesInArea AS
SELECT
toCellId(lat, lon) as cellId,
COUNT(distinct rideId) as rideCount,
TUMBLE_ROWTIME(rowTime, INTERVAL '30' minute) AS rowTime,
cast (TUMBLE_START(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS startTime,
cast(TUMBLE_END(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS endTime
FROM
TaxiRides
GROUP BY
toCellId(lat, lon),
TUMBLE(rowTime, INTERVAL '30' minute)
© 2018 data Artisans24
Rush (peak) hours
SELECT * FROM RidesInArea
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
FIRST(UP.rideCount) AS preCnt,
LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd,
LAST(DOWN.rideCount) AS postCnt
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS
NULL,
DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND
DOWN.rideCount < LAST(UP.rideCount),
E AS E.rideCount > LAST(DOWN.rideCount)
Time
Number of rides
Use previous table/view
© 2018 data Artisans25
Rush (peak) hours
SELECT * FROM RidesInArea
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
FIRST(UP.rideCount) AS preCnt,
LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd,
LAST(DOWN.rideCount) AS postCnt
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS
NULL,
DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND
DOWN.rideCount < LAST(UP.rideCount),
E AS E.rideCount > LAST(DOWN.rideCount)
Time
Number of rides
apply match to result of the
inner query
Use previous table/view
© 2018 data Artisans26
Rush (peak) hours
SELECT * FROM RidesInArea
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
FIRST(UP.rideCount) AS preCnt,
LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd,
LAST(DOWN.rideCount) AS postCnt
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS
NULL,
DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND
DOWN.rideCount < LAST(UP.rideCount),
E AS E.rideCount > LAST(DOWN.rideCount)
Time
Number of rides
access elements of
looping pattern
© 2018 data Artisans27
Rush (peak) hours
SELECT * FROM RidesInArea
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
FIRST(UP.rideCount) AS preCnt,
LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd,
LAST(DOWN.rideCount) AS postCnt
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS
NULL,
DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND
DOWN.rideCount < LAST(UP.rideCount),
E AS E.rideCount > LAST(DOWN.rideCount)
Time
Number of rides
access elements of
looping pattern
© 2018 data Artisans28
Feature set of MATCH_RECOGNIZE
(Experimental)
• Quantifiers support:
‒ + (one or more), * (zero or more), {x,y} (times)
‒ greedy(default), ?(reluctant)
• with some restrictions (not working for last pattern)
• After Match Skip
‒ skip_to_first/last, skip_past_last, skip_to_next
• Not supported:
‒ alter(|), permute, exclude ‘{- -}’
‒ aggregates within MATCH_RECOGNIZE
© 2018 data Artisans29
Summary
● opens for new use cases, that assume order of events
● enables reuse of SQL goods
● still in experimental phase
○ https://github.com/dawidwys/flink/SQL_MATCH_RECOGNIZE
THANK YOU!
@dwysakowicz
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers
© 2018 data Artisans31
Under the hood
Dissecting MATCH_RECOGNIZE
© 2018 data Artisans32
Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) ->
{
Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”);
return prevRideCount == null || row.get(“rideCount”) >
prevRideCount;
}).next("DOWN").timesOrMore(2).where( (row, ctx) -> {
...
}).next("E").where( (row, ctx) -> {
...
});
CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect(
new PatternFlatSelectFunction<Row, Row>() {
@Override
public void flatSelect(Map<String, List<Row>> pattern,
Collector<Row> out) throws Exception {
Row last = pattern.get(“UP”).tail();
Row first = pattern.get(“UP”).headl();
out.collect(Row.of(...));
}
);
Under the hood
SELECT * FROM TaxiStatistics
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
LAST(UP.rideCount) AS rushCnt,
FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE
UP AS UP.rideCount > PREV(UP.rideCount)
or PREV(UP.rideCount) IS NULL,
DOWN AS DOWN.rideCount <
LAST(UP.rideCount),
E AS E.rideCount >
LAST(DOWN.rideCount)
)
© 2018 data Artisans33
Under the hood
Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) ->
{
Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”);
return prevRideCount == null || row.get(“rideCount”) >
prevRideCount;
}).next("DOWN").timesOrMore(2).where( (row, ctx) -> {
...
}).next(“E").where( (row, ctx) -> {
...
});
CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect(
new PatternFlatSelectFunction<Row, Row>() {
@Override
public void flatSelect(Map<String, List<Row>> pattern,
Collector<Row> out) throws Exception {
Row last = pattern.get(“UP”).tail();
Row first = pattern.get(“UP”).headl();
out.collect(Row.of(...));
}
);
SELECT * FROM TaxiStatistics
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
LAST(UP.rideCount) AS rushCnt,
FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE
UP AS UP.rideCount > PREV(UP.rideCount)
or PREV(UP.rideCount) IS NULL,
DOWN AS DOWN.rideCount <
LAST(UP.rideCount),
E AS E.rideCount >
LAST(DOWN.rideCount)
)
© 2018 data Artisans34
Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) ->
{
Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”);
return prevRideCount == null || row.get(“rideCount”) >
prevRideCount;
}).next("DOWN").timesOrMore(2).where( (row, ctx) -> {
...
}).next("E").where( (row, ctx) -> {
...
});
CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect(
new PatternFlatSelectFunction<Row, Row>() {
@Override
public void flatSelect(Map<String, List<Row>> pattern,
Collector<Row> out) throws Exception {
Row last = pattern.get(“UP”).tail();
Row first = pattern.get(“UP”).headl();
out.collect(Row.of(...));
}
);
SELECT * FROM TaxiStatistics
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
LAST(UP.rideCount) AS rushCnt,
FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE
UP AS UP.rideCount > PREV(UP.rideCount)
or PREV(UP.rideCount) IS NULL,
DOWN AS DOWN.rideCount <
LAST(UP.rideCount),
E AS E.rideCount >
LAST(DOWN.rideCount)
)
Under the hood
© 2018 data Artisans35
Under the hood
Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) ->
{
Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”);
return prevRideCount == null || row.get(“rideCount”) >
prevRideCount;
}).next("DOWN").timesOrMore(2).where( (row, ctx) -> {
...
}).next("E").where( (row, ctx) -> {
...
});
CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect(
new PatternFlatSelectFunction<Row, Row>() {
@Override
public void flatSelect(Map<String, List<Row>> pattern,
Collector<Row> out) throws Exception {
Row last = pattern.get(“UP”).tail();
Row first = pattern.get(“UP”).headl();
out.collect(Row.of(...));
}
);
SELECT * FROM TaxiStatistics
MATCH_RECOGNIZE(
PARTITION BY cellId
MEASURES
LAST(UP.rideCount) AS rushCnt,
FIRST(UP.startTime) as rushStart,
LAST(UP.endTime) as rushEnd
AFTER MATCH SKIP PAST LAST ROW
PATTERN (UP{4,} DOWN{2,} E)
DEFINE
UP AS UP.rideCount > PREV(UP.rideCount)
or PREV(UP.rideCount) IS NULL,
DOWN AS DOWN.rideCount <
LAST(UP.rideCount),
E AS E.rideCount >
LAST(DOWN.rideCount)
)
THANK YOU!
@dwysakowicz
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers

More Related Content

Similar to Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event Streams with Flink SQL"

Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
For the following questions, you will implement the data structure to.pdf
For the following questions, you will implement the data structure to.pdfFor the following questions, you will implement the data structure to.pdf
For the following questions, you will implement the data structure to.pdf
arjunhassan8
 
Prog1 chap1 and chap 2
Prog1 chap1 and chap 2Prog1 chap1 and chap 2
Prog1 chap1 and chap 2
rowensCap
 
PHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & Pinba
Patrick Allaert
 
e computer notes - Single row functions
e computer notes - Single row functionse computer notes - Single row functions
e computer notes - Single row functions
ecomputernotes
 

Similar to Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event Streams with Flink SQL" (20)

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
Real-Time Analytics at Uber Scale
Real-Time Analytics at Uber ScaleReal-Time Analytics at Uber Scale
Real-Time Analytics at Uber Scale
 
R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?
 
Azure Stream Analytics Report - Toll Booth Stream
Azure Stream Analytics Report - Toll Booth StreamAzure Stream Analytics Report - Toll Booth Stream
Azure Stream Analytics Report - Toll Booth Stream
 
Improve data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDFImprove data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDF
 
How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012
 
For the following questions, you will implement the data structure to.pdf
For the following questions, you will implement the data structure to.pdfFor the following questions, you will implement the data structure to.pdf
For the following questions, you will implement the data structure to.pdf
 
Les03
Les03Les03
Les03
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
 
Les03 (Using Single Row Functions To Customize Output)
Les03 (Using Single Row Functions To Customize Output)Les03 (Using Single Row Functions To Customize Output)
Les03 (Using Single Row Functions To Customize Output)
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
 
Intake 38 data access 3
Intake 38 data access 3Intake 38 data access 3
Intake 38 data access 3
 
Prog1 chap1 and chap 2
Prog1 chap1 and chap 2Prog1 chap1 and chap 2
Prog1 chap1 and chap 2
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4j
 
PHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & PinbaPHP applications/environments monitoring: APM & Pinba
PHP applications/environments monitoring: APM & Pinba
 
e computer notes - Single row functions
e computer notes - Single row functionse computer notes - Single row functions
e computer notes - Single row functions
 
SAS Functions
SAS FunctionsSAS Functions
SAS Functions
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 

More from Flink Forward

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Flink Forward Berlin 2018: Dawid Wysakowicz - "Detecting Patterns in Event Streams with Flink SQL"

  • 1. Dawid Wysakowicz, SOFTWARE ENGINEER Detecting Patterns in Event Streams with Flink SQL
  • 2. © 2018 data Artisans2 ABOUT DATA ARTISANS Original Creators of Apache Flink® Enterprise Ready Real Time Stream Processing
  • 3. © 2018 data Artisans3 FREE TRIAL DOWNLOAD data-artisans.com/download
  • 4. © 2018 data Artisans4 Why people like SQL? • Well known interface: "lingua franca of data processing" • No programming required - easier to learn • Declarative way to express your query, WHAT not HOW • Out-of-the box optimization • Reusability ‒ built-in functions ‒ user defined functions
  • 5. © 2018 data Artisans5 SQL on Streams ● Unified semantics for stream and batch input ○ Same query, same input, same result Stream is converted into dynamic table Dynamic table is converted into stream Continuous query on dynamic table produces result d table
  • 6. © 2018 data Artisans6 The New York Taxi Rides Data Set • The New York City & Limousine Commission provides a public data set about past taxi rides in New York City • We can derive a streaming table from the data • Table: TaxiRides rideId: BIGINT // ID of the taxi ride taxiId: BIGINT // ID of the taxi cab driverId: BIGINT // ID of the drive isStart: BOOLEAN // flag for pick-up (true) or drop-off (false) event lon: DOUBLE // longitude of pick-up or drop-off location lat: DOUBLE // latitude of pick-up or drop- off location rowTime: TIMESTAMP // time of pick-up or drop-off event
  • 7. © 2018 data Artisans7 Number of rides per 30 minutes per area Time Number of rides
  • 8. © 2018 data Artisans8 SQL Example SELECT toCellId(lat, lon) as cellId, COUNT(distinct rideId) as rideCount, TUMBLE_ROWTIME(rowTime, INTERVAL '30' minute) AS rowTime, cast (TUMBLE_START(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS startTime, cast(TUMBLE_END(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS endTime FROM TaxiRides GROUP BY toCellId(lat, lon), TUMBLE(rowTime, INTERVAL '30' minute)
  • 9. © 2018 data Artisans9 SQL Support in Flink 1.6 • SELECT FROM WHERE • GROUP BY / HAVING ‒ Non-windowed, TUMBLE, HOP, SESSION windows • JOIN ‒ Windowed INNER, LEFT / RIGHT / FULL OUTER JOIN ‒ Non-windowed INNER / LEFT / RIGHT / FULL OUTER JOIN • Scalar, aggregation, table-valued UDFs • Many built-in functions • [streaming only] OVER / WINDOW ‒ UNBOUNDED / BOUNDED PRECEDING
  • 10. © 2018 data Artisans10 SQL Support in Flink 1.6 • SELECT FROM WHERE • GROUP BY / HAVING ‒ Non-windowed, TUMBLE, HOP, SESSION windows • JOIN ‒ Windowed INNER, LEFT / RIGHT / FULL OUTER JOIN ‒ Non-windowed INNER / LEFT / RIGHT / FULL OUTER JOIN • Scalar, aggregation, table-valued UDFs • Many built-in functions • [streaming only] OVER / WINDOW ‒ UNBOUNDED / BOUNDED PRECEDING 12:20-13:00 Kesselhaus Timo Walther, data Artisans Flink SQL in Action
  • 11. © 2018 data Artisans11 • Find rides with mid-stops What is hard with SQL?
  • 12. © 2018 data Artisans12 Mid stops Pattern.<Row>begin("S”).where((row) -> { return row.isStart == true; }).next("E").where( (row) -> { return row.isStart == true; }); CEP.pattern(input.keyBy(“driverId”), pattern) .flatSelect( new PatternFlatSelectFunction<Row, Row>() { @Override public void flatSelect( Map<String, List<Row>> pattern, Collector<Row> out) throws Exception { out.collect(( pattern.get(“S”).get(0).getRideId )); } );
  • 13. © 2018 data Artisans13 MATCH_RECOGNIZE SQL:2016 extension
  • 14. © 2018 data Artisans14 Common use-cases • stock market analysis • customer behaviour • tracking money laundering • service quality • network intrusion detection
  • 15. © 2018 data Artisans15 Syntax tableReference: tablePrimary [ matchRecognize ] [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ] select: SELECT [ STREAM ] [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROM tableExpression [matchRecognize] [ WHERE booleanExpression ] [ GROUP BY { groupItem [, groupItem ]* } ] ….
  • 16. © 2018 data Artisans16 SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rowTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S E) DEFINE S AS S.isStart = true, E AS E.isStart = true ) Rides with mid-stops partition the data by given field = keyBy
  • 17. © 2018 data Artisans17 Rides with mid-stops SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rowTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S E) DEFINE S AS S.isStart = true, E AS E.isStart = true ) ● specify order ● primary order = Event or Processing time 1 3 2 5 4
  • 18. © 2018 data Artisans18 SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rowTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S E) DEFINE S AS S.isStart = true, E AS E.isStart = true ) Rides with mid-stops construct pattern S E
  • 19. © 2018 data Artisans19 SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rowTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S E) DEFINE S AS S.isStart = true, E AS E.isStart = true ) Rides with mid-stops extract measures from matched sequence S E
  • 20. © 2018 data Artisans20 Multi-Stop
  • 21. © 2018 data Artisans21 Rides with more than one mid-stop SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rowTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S M{2,} E) DEFINE S AS S.isStart = true, M AS M.rideId <> S.rideId, E AS E.isStart = false AND E.rideId = S.rideId ) SELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rowTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S E) DEFINE S AS S.isStart = true, E AS E.isStart = true )
  • 22. © 2018 data Artisans22 Rush (peak) hours - V shape Time Number of rides
  • 23. © 2018 data Artisans23 Statistics per Area CREATE VIEW RidesInArea AS SELECT toCellId(lat, lon) as cellId, COUNT(distinct rideId) as rideCount, TUMBLE_ROWTIME(rowTime, INTERVAL '30' minute) AS rowTime, cast (TUMBLE_START(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS startTime, cast(TUMBLE_END(rowTime, INTERVAL '30' minute) as TIMESTAMP) AS endTime FROM TaxiRides GROUP BY toCellId(lat, lon), TUMBLE(rowTime, INTERVAL '30' minute)
  • 24. © 2018 data Artisans24 Rush (peak) hours SELECT * FROM RidesInArea MATCH_RECOGNIZE( PARTITION BY cellId MEASURES FIRST(UP.rideCount) AS preCnt, LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd, LAST(DOWN.rideCount) AS postCnt AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) Time Number of rides Use previous table/view
  • 25. © 2018 data Artisans25 Rush (peak) hours SELECT * FROM RidesInArea MATCH_RECOGNIZE( PARTITION BY cellId MEASURES FIRST(UP.rideCount) AS preCnt, LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd, LAST(DOWN.rideCount) AS postCnt AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) Time Number of rides apply match to result of the inner query Use previous table/view
  • 26. © 2018 data Artisans26 Rush (peak) hours SELECT * FROM RidesInArea MATCH_RECOGNIZE( PARTITION BY cellId MEASURES FIRST(UP.rideCount) AS preCnt, LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd, LAST(DOWN.rideCount) AS postCnt AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) Time Number of rides access elements of looping pattern
  • 27. © 2018 data Artisans27 Rush (peak) hours SELECT * FROM RidesInArea MATCH_RECOGNIZE( PARTITION BY cellId MEASURES FIRST(UP.rideCount) AS preCnt, LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd, LAST(DOWN.rideCount) AS postCnt AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < PREV(DOWN.rideCount) AND DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) Time Number of rides access elements of looping pattern
  • 28. © 2018 data Artisans28 Feature set of MATCH_RECOGNIZE (Experimental) • Quantifiers support: ‒ + (one or more), * (zero or more), {x,y} (times) ‒ greedy(default), ?(reluctant) • with some restrictions (not working for last pattern) • After Match Skip ‒ skip_to_first/last, skip_past_last, skip_to_next • Not supported: ‒ alter(|), permute, exclude ‘{- -}’ ‒ aggregates within MATCH_RECOGNIZE
  • 29. © 2018 data Artisans29 Summary ● opens for new use cases, that assume order of events ● enables reuse of SQL goods ● still in experimental phase ○ https://github.com/dawidwys/flink/SQL_MATCH_RECOGNIZE
  • 31. © 2018 data Artisans31 Under the hood Dissecting MATCH_RECOGNIZE
  • 32. © 2018 data Artisans32 Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) -> { Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”); return prevRideCount == null || row.get(“rideCount”) > prevRideCount; }).next("DOWN").timesOrMore(2).where( (row, ctx) -> { ... }).next("E").where( (row, ctx) -> { ... }); CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect( new PatternFlatSelectFunction<Row, Row>() { @Override public void flatSelect(Map<String, List<Row>> pattern, Collector<Row> out) throws Exception { Row last = pattern.get(“UP”).tail(); Row first = pattern.get(“UP”).headl(); out.collect(Row.of(...)); } ); Under the hood SELECT * FROM TaxiStatistics MATCH_RECOGNIZE( PARTITION BY cellId MEASURES LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) )
  • 33. © 2018 data Artisans33 Under the hood Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) -> { Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”); return prevRideCount == null || row.get(“rideCount”) > prevRideCount; }).next("DOWN").timesOrMore(2).where( (row, ctx) -> { ... }).next(“E").where( (row, ctx) -> { ... }); CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect( new PatternFlatSelectFunction<Row, Row>() { @Override public void flatSelect(Map<String, List<Row>> pattern, Collector<Row> out) throws Exception { Row last = pattern.get(“UP”).tail(); Row first = pattern.get(“UP”).headl(); out.collect(Row.of(...)); } ); SELECT * FROM TaxiStatistics MATCH_RECOGNIZE( PARTITION BY cellId MEASURES LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) )
  • 34. © 2018 data Artisans34 Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) -> { Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”); return prevRideCount == null || row.get(“rideCount”) > prevRideCount; }).next("DOWN").timesOrMore(2).where( (row, ctx) -> { ... }).next("E").where( (row, ctx) -> { ... }); CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect( new PatternFlatSelectFunction<Row, Row>() { @Override public void flatSelect(Map<String, List<Row>> pattern, Collector<Row> out) throws Exception { Row last = pattern.get(“UP”).tail(); Row first = pattern.get(“UP”).headl(); out.collect(Row.of(...)); } ); SELECT * FROM TaxiStatistics MATCH_RECOGNIZE( PARTITION BY cellId MEASURES LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) ) Under the hood
  • 35. © 2018 data Artisans35 Under the hood Pattern.<Row>begin("UP”, SKIP_PAST_LAST).timesOrMore(4).where((row, ctx) -> { Long prevRideCount = ctx.get(“UP”).tail().get(“rideCount”); return prevRideCount == null || row.get(“rideCount”) > prevRideCount; }).next("DOWN").timesOrMore(2).where( (row, ctx) -> { ... }).next("E").where( (row, ctx) -> { ... }); CEP.pattern(input.keyBy(“cellId”), pattern).flatSelect( new PatternFlatSelectFunction<Row, Row>() { @Override public void flatSelect(Map<String, List<Row>> pattern, Collector<Row> out) throws Exception { Row last = pattern.get(“UP”).tail(); Row first = pattern.get(“UP”).headl(); out.collect(Row.of(...)); } ); SELECT * FROM TaxiStatistics MATCH_RECOGNIZE( PARTITION BY cellId MEASURES LAST(UP.rideCount) AS rushCnt, FIRST(UP.startTime) as rushStart, LAST(UP.endTime) as rushEnd AFTER MATCH SKIP PAST LAST ROW PATTERN (UP{4,} DOWN{2,} E) DEFINE UP AS UP.rideCount > PREV(UP.rideCount) or PREV(UP.rideCount) IS NULL, DOWN AS DOWN.rideCount < LAST(UP.rideCount), E AS E.rideCount > LAST(DOWN.rideCount) )