Stream processing with Apache Flink @ OfferUp

OfferUp Confidential
Large-scale Near-real-time Stream Processing
with Apache Flink @ OfferUp
Bowen Li

User Survey
● Who has used OfferUp?
● Who has used Apache Flink?
● Who has developed code in Apache Flink?

OfferUp, create the simplest, most trustworthy way to buy and sell locally
At a Glance
● the largest mobile marketplace for local
buyers and sellers in the U.S.
● Top shopping app on iOS and Android
● $14+ Billion In Transactions in 2016

Speaker Background
Bowen Li
○ Offerup
■ Develop stream processing infra with Apache Flink
■ Apache Airflow, Apache Avro, etc
○ Tableau
■ Lots of Apache ZooKeeper and Apache Curator

Stream Processing @ OfferUp
We are expanding our stream processing footprint.
We developed OfferUp’s stream processing platform with a few primitives:
● Flink installation on EMR
● HDFS, YARN, metrics, checkpoint/savepoint, etc. configuration help for Flink cluster
● User apps deployment
● Connecting to streams
● Data Ser/Deser

Use Case
Business Requirement:
● Calculate time-decaying personalization scores based on user activity within
the last month

Use Case - The old pipeline
The old pipeline:
● batch processing
● ~3 hours of end-to-end latency

Use Case - The new near-real-time pipeline!

Pipeline Stats
● Data Volume: processing billions of records per day
● Average end-to-end latency: ~1 min
○ The 2min aggregation in 1st Flink cluster (NRT scores) dominates the latency
○ Depending on when the event enters the 2min window, the minimum latency of
this pipeline can be a few seconds, the expected maximum is ~2min
We dramatically lowered the end-to-end latency from ~3h to ~1min!

Design Considerations Explained
● Latency - why 2min aggregation
○ User-facing services will only batch-read new scores every 5min
○ Any latency smaller than 5min is good

Design Considerations Explained (con’t)
● Data Correctness
○ at-least-once guarantee
○ Be careful with merging NRT scores with historical scores, ensure no overlap and
no gap

● Failure Recovery (assuming AWS Kinesis are reliable)
○ Two Flink clusters have checkpointing enabled and they can auto recover
○ Redis data has 3 day TTL, enough time to fix things up
○ Other parts are stateless

● Replay Capability
○ Each component can handle data load of replay, and its latency is not greatly
impacted

Contribution to Apache Flink
I’ve been contributing to Apache Flink since Mar 2017

Related Contributions to Apache Flink
● FLINK-7508 Improved flink-connector-kinesis write performance by 10+X
○ Released version 1.3.2
○ Many other comprehensive improvements of flink-connector-kinesis
○ 13 out of my 43 commits
● FLINK-7475 Improved Flink’s ListState APIs() and boost its performance by 15~35X
○ Will be released in version 1.5.0
● FLINK-6013 Created flink-metrics-datadog module
● Other contributions include Flink’s DataStream APIs, side output, build system, etc

● Problem: flink-connector-kinesis used to create one http connection for each request
● Improvement: Switched it to a connection pool mode
● Result: Improved flink-connector-kinesis’s write performance by 10+X
Contribution: Improved flink-connector-kinesis
# Records Sent
# Records Pending in Client
In this basic test, you can tell
1) write throughput goes up
2) # pending records has dropped significantly

Benchmarking:
● Running a Flink hourly-sliding windowing job
● Enough Kinesis shards
● ~70 bytes/record
Limitations: My Flink job is not developed for benchmarking. It only generates 21million records at maximum, which gives us
a 10X or more improvement estimate. In reality, the perf improvement should be more than 10X. You’re welcome to do your
own benchmarking
Contribution: Improved flink-connector-kinesis

Contribution: Improved Flink’s ListState performance 20~35X
Background on RocksDBStateBackend
Problems:
● ListState has only two APIs - add() and get()
● RocksDBListState translate add() as RocksDB.merge()
○ adding 100 elements takes 100 memtable write, very slow….
Improvements:
● Developed two new APIs in Flink 1.5 - update() and addAll()
● update() and addAll() will both simulate RocksDB’s byte merge operation, pre-merge all elements
upfront and write to memtable only once

Benchmarking added to source code: org.apache.flink.contrib.streaming.state.benchmark.RocksDBListStatePerformanceTest
Result: 15 ~35X faster!

Q&A
Thank you!
We’re hiring!

Stream processing with Apache Flink @ OfferUp

More Related Content

What's hot

Similar to Stream processing with Apache Flink @ OfferUp

More from Bowen Li

Recently uploaded

Stream processing with Apache Flink @ OfferUp