Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud"

Huawei Cloud
Flink real-time analysis
in Cloud Stream Service
Jinkui Shi
Radu Tudoran
2018/04

Speakers
Jinkui Shi
Principal Engineer @ Huawei
Cloud
shijinkui@huawei.com
Radu Tudoran
Staff Engineer @ Huawei
Cloud
Radu.Tudoran@huawei.com

Background about Huawei Cloud
❖ Cloud BU
❖ Foundation at 2017/06
❖ Huawei Cloud
❖ HUAWEI CLOUD services-let enterprises use ICT
services in the same way as using water and electric
utilities.

Why choose Flink
❖ Graceful Runtime framework
❖ Rich Stream SQL function
❖ lightweight async checkpoint
❖ Real low latency and hight throughput
❖ expansibility: ML, Graph, Edge

Cloud Stream Service
❖ Cloud Stream Service (CS) ：
Real-time big data stream analysis service on Huawei Cloud.
Compatible with Apache Flink and Spark APIs, CS also fully
managed computing clusters. Users just focus on StreamSQL or
UDF and run jobs in real time.
❖ CS is the first public cloud native service that choose
Flink as its Runtime computing engine in the world.
https://www.huaweicloud.com/en-us/product/cs.html

CS Overview
- Industrial IoT
- Car Internet
- exchange(BitCoin/Stock)
- Bank/insurance industry
- Electronic Commerce …
Make the computing easier
- Union batch and stream
- SQL and Job visualization
- Streaming monitoring
Connect everything
- Open Source source/sink
- Cloud Service source/sink

Features
easy-to-use, serverless, fully-managed, safe, High cost performance

Cost Comparison (Reference)
Item Offline Environment Buildup CS Saved Cost
Hardware cost
80,000 x 3 = 105,000 CNY
The hardware cost of a single
physical machine is 80
thousand CNY. The cost is for
reference only.
0.5 x 20 x 24 x 30 x 12 x 3 =
259,000 CNY
Users are charged 0.5 CNY
per hour for a single SPU. 20
SPUs are purchased.
O&M manpower cost 200,000 CNY/man-year 0
Water/Electricity/DC
maintenance 76300 CNY/year 0
Total 516,300 CNY 259,000 CNY 42.9%
To achieve the same computing capability
CS saves:42.9% costs

Job types
❖ Flink SQL: First-class citizen for easy-use
❖ Flink Jar job: FlinkML, Gelly, CEP, SQL
❖ Spark Streaming and structured streaming Jar job
❖ PySpark Jar Job
❖ Edge Computing Job: beta now

Connect to Ecosystem
❖ Open Source Connectors(Flink connector and
Bahir Flink)
❖ Connect to cloud native service in Huawei Cloud
Problem of Connection API adapter:
1. define unified connector API between Flink and Spark such as Kafka, JDBC connector..
2. define cloud service general connector API such as object bucket storage..
Apache Bahir need more contributions.

Online Stream SQL editor
SPU: Stream Processing Units, 1 core and 4G memory
https://console.huaweicloud.com/cs

Visualization[vɪʒʊəlɪ'zeʃən]
❖ runtime monitoring
❖ for dev: editor, notebook
❖ for prod: pipeline, DSL

Flink Benchmark - chicken ribs
❖ Standard benchmark problem:
❖ just focus on performance and supposed use case
❖ can’t cover all the API and feature
❖ performance only show your best, no worst case
❖ Enterprise care more reliability and best practice

Flink Reliability benchmark
❖ Test metric dimensionality for every API:
❖ overall source generating rate:
❖ fixed rate, rapid rate, index rate
❖ data skew and backpresure
❖ Job.ratio= max{Vertex.ratio | Vertex∈Job};
❖ Vertex.ratio = max{SubTask.ratio | SubTask∈Vertex}
❖ latency
❖ job latency: source generate rate and job processing rate
❖ event latency: the time cost between source and sink
❖ throughput and GC …
AutoRun a large-scale test to find Flink that may encounter runtime memory overflow,
calculation result error, run-time reliability problems, and collect metrics of anti-pressure,
latency, throughput, memory, CPU, rate to analyze the reasons for the reliability problem.

Flink ReliabilityBench project
❖ The generated report include all API
❖ In next half year, we’ll publish Flink reliability bench and
standard benchmark to Cloud Stream Service
❖ User just set the needed resource, then auto run the
bench, generate a final report for tuning and best
practice guide
Welcome everyone and Flink community to try it then

Some problem
❖ In SQL, how expression JSON and OpenTSDB, and other data format?
❖ SQL with phrase:
❖ how make a general and extensible rule to support all connector?
❖ how support general and extensible cloud standard, like object
bucket storage..
❖ API server?
❖ manage job lifetime and metric
❖ For job, input the source data, …, output sink data with Streaming
API
❖ sink reliability support for external Write ahead log framework：
❖ source1 - processing – sink1 – source2 - processing - sink –
source2 - …
maybe lost data

Intelligent Streaming Computing
❖ Open Source framework
❖ Streaming+ML: Spark MLlib, pySpark, Flink ML
❖ Streaming+Graph: Spark GraphX, Flink Gelly
❖ SQL: bonding the above by UDF
Stream Analysis is not enough, Intelligent framework is need.
If we make less efforts, maybe surpassed by others quickly.
Keep hunger

Scenario 1: streaming trading analysis
Just a example diagram for showing. From sohu site.
1. Disorder stream data for K line
charts of 5min, 15min, 30min, 60min
2. Aggregate streaming data at window
3. Low latency
BitCoin trading pain spots
Cloud StreamDIS
Kafka Flink
Cloud Table
OpenTSDB
HBas
e
Spark
DCS(Redis)
Huawei
Cloud
solution

Scenario 3: Stream Analysis and ETL
CS uses jobs of the Flink SQL,
Flink, and Spark Streaming types to
conduct exception detection, real-
time alarm reporting, and CEP-
based processing on stream data.
Feedback/decision-
making/monitoring: Based on the
positive feedback during service
running and monitoring information,
CS provides guidance for positive
product optimization, loss stop,
quantization, and visualization.

Enhanced Statistics and ML Features
Extraction
Design Principles
• Incremental
computation
• Fixed size
memory
• Constant to sub-
linear time
complexity

Enhanced Statistics and ML Features
Extraction
𝑆2
= 𝑦 − 𝑓 𝑥𝑖, 𝛽1, 𝛽2, 𝛽3, … , 𝛽 𝑛
2
For the linear fit:
𝑆2
= 𝑦𝑖 − 𝑓 𝛽1 + 𝛽2 𝑥𝑖
2
𝛽2 =
𝑠 𝑥𝑦,𝑡
𝑚2,𝑡
2
𝛽1 = 𝑦 − 𝛽2 𝑥
Regression parameters
𝑚2,𝑡 = 𝑚2,𝑡−1 + (𝑥 𝑡 − 𝑥 𝑡−1)(𝑥 𝑡 − 𝑥 𝑡)
Incremental variance (2nd central moment)
𝑥 = 𝑥 𝑡−1 +
1
𝑡
(𝑥 𝑡 − 𝑥 𝑡−1)
Incremental mean
In general:
𝑠 𝑥𝑦,𝑡 =
𝑡 − 2
𝑡 − 1
𝑠 𝑥𝑦,𝑡−1 +
1
𝑡
𝑥 𝑡 − 𝑥 𝑡−1 𝑦𝑡 − 𝑦𝑡−1
Incremental covariance
Online Linear Regression Learner
Execution time (s)
Trhoughput(ev)
Time range (ms)
Events
Latency analysis
Throughput analysis

GeoSepatial
• DDL for Time Geospatial
• ST_Point
• ST_Line
• ST_Polygon
• SQL Geospatial Scalar Functions
• ST_CONTAINS
• ST_COVERS
• ST_DISJOINT
• ST_BUFFER
• ST_INTERSECTION
• ST_ENVELOPE
• SQL Time Geospatial
• AGG_DISTANCE
• AVG_SPEED
• … on HOP/TUMBLE/OVER/SESSION windows
• …on count/time windows
• ….on rowtime/proctime windows
•Huawei offers complete coverage of geospatial standard plus extra time-
based functions
• ST_DISTANCE
• ST_PERIMETER
• ST_AREA (polygon)
• ST_OVERLAPS
• ST_INTERSECTS
• ST_WITHIN
Realtime IoT Analytics
Flink IoT Stream Engine
Deploy Execute
Geometry
Engine
GeoSpatial
function
User
Define
Function Geometry
Engine
GeoSpatial
function
Stream Topology
Stream SQL IoT
Translation
Optimizatio
n
IoT Op. Library
SQL IoT
Fun.
SQL IoT Functions
•ST_DISTANCE
• ST_PERIMETER
• ST_AREA
(polygon)
• ST_OVERLAPS
• ST_INTERSECTS
• ST_WITHIN
•…
• ST_CONTAINS
• ST_COVERS
• ST_DISJOINT
• ST_BUFFER
•ST_INTERSECTION
•ST_ENVELOPE
•…
Stream IoT Operators
•Window Tumble Count/
Time
•Window Hop Count/
Time
•Window Session Count/
Time
•Process Function
•Map
•FlatMap
Stream SQL Time
GeoSpatial Analytics
Submit
Continuous data

GeoSepatial Examples
Select if cars deviate from road
SELECT carId FROM CarStream
WHERE ST_WITHIN( +
ST_POINT( car.lat, car.lon),
ST_BUFFER( ST_ROAD_FROM_FILE(file), 2.0))
Compute Time Aggregates over Spatial Data
SELECT timestampa, lat, lon,
AGG_DISTANCE( ST_POINT(lat, lon)) OVER (
PARTITION BY carid ORDER BY proctime RANGE BETWEEN
INTERVAL '1' HOUR PRECEDING AND CURRENT ROW),
AVG_SPEED( ST_POINT(lat, lon)) OVER (
PARTITION BY carid ORDER BY proctime RANGE BETWEEN
INTERVAL '1' HOUR PRECEDING AND CURRENT ROW)
FROM CarStream
Filter by region
SELECT timestampr, lat, lon, speed
FROM CarStream
WHERE ST_WITHIN( ST_POINT(lat, lon), ST_POLYGON( ARRAY[
ST_POINT(53.454326,7.334517),
ST_POINT(53.682480, 13.906822),
ST_POINT(47.761194, 12.607594),
ST_POINT(47.722358, 7.601213),
ST_POINT(53.454326,7.334517)]))

Flink CEP on SQL enhance
SQL CEP Syntax
SELECT * FROM stream...
MATCH_RECOGNIZE (
[row_pattern_partition_by ]
[row_pattern_order_by ]
[row_pattern_measures ]
[row_pattern_rows_per_match ]
[row_pattern_skip_to ]
PATTERN (row_pattern) [with_in clause]
[duration clause]
[row_pattern_subset_clause]
DEFINE row_pattern_definition_list )
Define pattern matching computation
Offer complete syntax
coverage for real time
CEP analytics
SELECT * FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
MEASURES
FINAL FIRST(A.price) AS firstAPrice,
FIINAL FIRST(B.price) AS firstBPrice,
FINAL FIRST(C.price) AS firstCPrice,
FINAL LAST(A.price) AS lastAPrice,
FINAL LAST(B.price) AS lastBPrice,
FINAL LAST(C.price) AS lastCPrice
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN ((A B C){2})
DEFINE A AS A.price < 50, B AS B.price < 30,
C AS C.price < 70 ) # Events: ~2.5M # Matched events: ~ 100K
# Stocks: 7 Average latency: ~ 27.13 ms

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud"

Similar to Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud" (20)

More from Flink Forward

More from Flink Forward (20)

Recently uploaded

Recently uploaded (20)

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-time analysis in CloudStream Service of Huawei Cloud"

Editor's Notes