2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2

Seattle Apache Flink Meetup
Jan 17, 2018

Attendees of our very first meetup (01/17/2018)!

Our Goal
Help each other learn more about Apache Flink and
share best practices of running Flink applications.

Meetup Page Sponsor
founded by the original creators of Apache Flink®
If you haven’t join our meetup group, please register at
https://www.meetup.com/seattle-apache-flink/

Organizers
● Bowen Li
● Haitao Wang
● Fabian Hueske (PMC and committer of Apache Flink)

Future planning
● Schedule:
○ Meetup every two months.
○ Two talks and ~1.5 h
○ Food and drinks will be provided.
● Location:
○ Either in Seattle or East Side, depending on our event sponsor

Sponsor our events
● Contact the meetup team for event sponsorship!
○ If you are giving a talk, we’d recommend sponsoring our event at the same time
● Include providing meetup space, food and drinks for at least 60 people
○ An event may not have 60 attendees, but sponsors need at least this budget
upfront

Give talks in our events
● Submit your abstract to organizers
● Organizers will work with you to shape the content
○ Content must have a wide audience, and attractive and useful to attendees
○ Must guarantee presentation quality
● (Optional but highly appreciated) Speakers sign a simple clause to grant usage
of your content to Apache Foundation, data Artisans, and Seattle Apache Flink
Meetup

Agenda Today
● Opening - Bowen Li
● Presentation 1 - Haitao Wang
● Presentation 2 - Bowen Li

OfferUp Confidential
Large-scale Near-real-time Stream Processing
with Apache Flink @ OfferUp
Bowen Li

User Survey
● Who has used OfferUp?
● Who has used Apache Flink?
● Who has developed code in Apache Flink?

OfferUp, create the simplest, most trustworthy way to buy and sell locally
At a Glance
● the largest mobile marketplace for local
buyers and sellers in the U.S.
● Top shopping app on iOS and Android
● $14+ Billion In Transactions in 2016

Speaker Background
Bowen Li
○ Offerup
■ Develop stream processing infra with Apache Flink
■ Apache Airflow, Apache Avro, etc
○ Tableau
■ Lots of Apache ZooKeeper and Apache Curator

Stream Processing @ OfferUp
We are expanding our stream processing footprint.
We developed OfferUp’s stream processing platform with a few primitives:
● Flink installation on EMR
● HDFS, YARN, metrics, checkpoint/savepoint, etc. configuration help for Flink cluster
● User apps deployment
● Connecting to streams
● Data Ser/Deser

Use Case
Business Requirement:
● Calculate time-decaying personalization scores based on user activity within
the last month

Use Case - The old pipeline
The old pipeline:
● batch processing
● ~3 hours of end-to-end latency

Use Case - The new near-real-time pipeline!

Pipeline Stats
● Data Volume: processing billions of records per day
● Average end-to-end latency: ~1 min
○ The 2min aggregation in 1st Flink cluster (NRT scores) dominates the latency
○ Depending on when the event enters the 2min window, the minimum latency of
this pipeline can be a few seconds, the expected maximum is ~2min
We dramatically lowered the end-to-end latency from ~3h to ~1min!

Design Considerations Explained
● Latency - why 2min aggregation
○ User-facing services will only batch-read new scores every 5min
○ Any latency smaller than 5min is good

Design Considerations Explained (con’t)
● Data Correctness
○ at-least-once guarantee
○ Be careful with merging NRT scores with historical scores, ensure no overlap and
no gap

● Failure Recovery (assuming AWS Kinesis are reliable)
○ Two Flink clusters have checkpointing enabled and they can auto recover
○ Redis data has 3 day TTL, enough time to fix things up
○ Other parts are stateless

● Replay Capability
○ Each component can handle data load of replay, and its latency is not greatly
impacted

Contribution to Apache Flink
I’ve been contributing to Apache Flink since Mar 2017

Related Contributions to Apache Flink
● FLINK-7508 Improved flink-connector-kinesis write performance by 10+X
○ Released version 1.3.2
○ Many other comprehensive improvements of flink-connector-kinesis
○ 13 out of my 43 commits
● FLINK-7475 Improved Flink’s ListState APIs() and boost its performance by 15~35X
○ Will be released in version 1.5.0
● FLINK-6013 Created flink-metrics-datadog module
● Other contributions include Flink’s DataStream APIs, side output, build system, etc

● Problem: flink-connector-kinesis used to create one http connection for each request
● Improvement: Switched it to a connection pool mode
● Result: Improved flink-connector-kinesis’s write performance by 10+X
Contribution: Improved flink-connector-kinesis
# Records Sent
# Records Pending in Client
In this basic test, you can tell
1) write throughput goes up
2) # pending records has dropped significantly

Benchmarking:
● Running a Flink hourly-sliding windowing job
● Enough Kinesis shards
● ~70 bytes/record
Limitations: My Flink job is not developed for benchmarking. It only generates 21million records at maximum, which gives us
a 10X or more improvement estimate. In reality, the perf improvement should be more than 10X. You’re welcome to do your
own benchmarking
Contribution: Improved flink-connector-kinesis

Contribution: Improved Flink’s ListState performance 20~35X
Background on RocksDBStateBackend
Problems:
● ListState has only two APIs - add() and get()
● RocksDBListState translate add() as RocksDB.merge()
○ adding 100 elements takes 100 memtable write, very slow….
Improvements:
● Developed two new APIs in Flink 1.5 - update() and addAll()
● update() and addAll() will both simulate RocksDB’s byte merge operation, pre-merge all elements
upfront and write to memtable only once

Benchmarking added to source code: org.apache.flink.contrib.streaming.state.benchmark.RocksDBListStatePerformanceTest
Result: 15 ~35X faster!

Q&A
Thank you!
We’re hiring!

2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to 2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2

Similar to 2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2 (20)

More from Ververica

More from Ververica (14)

Recently uploaded

Recently uploaded (20)

2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2