OfferUp Confidential
Large-scale Near-real-time Stream Processing
with Apache Flink @ OfferUp
Bowen Li
User Survey
● Who has used OfferUp?
● Who has used Apache Flink?
● Who has developed code in Apache Flink?
OfferUp, create the simplest, most trustworthy way to buy and sell locally
At a Glance
● the largest mobile marketplace for local
buyers and sellers in the U.S.
● Top shopping app on iOS and Android
● $14+ Billion In Transactions in 2016
Speaker Background
Bowen Li
○ Offerup
■ Develop stream processing infra with Apache Flink
■ Apache Airflow, Apache Avro, etc
○ Tableau
■ Lots of Apache ZooKeeper and Apache Curator
Stream Processing @ OfferUp
We are expanding our stream processing footprint.
We developed OfferUp’s stream processing platform with a few primitives:
● Flink installation on EMR
● HDFS, YARN, metrics, checkpoint/savepoint, etc. configuration help for Flink cluster
● User apps deployment
● Connecting to streams
● Data Ser/Deser
Use Case
Business Requirement:
● Calculate time-decaying personalization scores based on user activity within
the last month
Use Case - The old pipeline
The old pipeline:
● batch processing
● ~3 hours of end-to-end latency
Use Case - The new near-real-time pipeline!
Pipeline Stats
● Data Volume: processing billions of records per day
● Average end-to-end latency: ~1 min
○ The 2min aggregation in 1st Flink cluster (NRT scores) dominates the latency
○ Depending on when the event enters the 2min window, the minimum latency of
this pipeline can be a few seconds, the expected maximum is ~2min
We dramatically lowered the end-to-end latency from ~3h to ~1min!
Design Considerations Explained
● Latency - why 2min aggregation
○ User-facing services will only batch-read new scores every 5min
○ Any latency smaller than 5min is good
Design Considerations Explained (con’t)
● Data Correctness
○ at-least-once guarantee
○ Be careful with merging NRT scores with historical scores, ensure no overlap and
no gap
Design Considerations Explained (con’t)
● Failure Recovery (assuming AWS Kinesis are reliable)
○ Two Flink clusters have checkpointing enabled and they can auto recover
○ Redis data has 3 day TTL, enough time to fix things up
○ Other parts are stateless
Design Considerations Explained (con’t)
● Replay Capability
○ Each component can handle data load of replay, and its latency is not greatly
impacted
Contribution to Apache Flink
I’ve been contributing to Apache Flink since Mar 2017
Related Contributions to Apache Flink
● FLINK-7508 Improved flink-connector-kinesis write performance by 10+X
○ Released version 1.3.2
○ Many other comprehensive improvements of flink-connector-kinesis
○ 13 out of my 43 commits
● FLINK-7475 Improved Flink’s ListState APIs() and boost its performance by 15~35X
○ Will be released in version 1.5.0
● FLINK-6013 Created flink-metrics-datadog module
● Other contributions include Flink’s DataStream APIs, side output, build system, etc
● Problem: flink-connector-kinesis used to create one http connection for each request
● Improvement: Switched it to a connection pool mode
● Result: Improved flink-connector-kinesis’s write performance by 10+X
Contribution: Improved flink-connector-kinesis
# Records Sent
# Records Pending in Client
In this basic test, you can tell
1) write throughput goes up
2) # pending records has dropped significantly
Benchmarking:
● Running a Flink hourly-sliding windowing job
● Enough Kinesis shards
● ~70 bytes/record
Limitations: My Flink job is not developed for benchmarking. It only generates 21million records at maximum, which gives us
a 10X or more improvement estimate. In reality, the perf improvement should be more than 10X. You’re welcome to do your
own benchmarking
Contribution: Improved flink-connector-kinesis
Contribution: Improved Flink’s ListState performance 20~35X
Background on RocksDBStateBackend
Problems:
● ListState has only two APIs - add() and get()
● RocksDBListState translate add() as RocksDB.merge()
○ adding 100 elements takes 100 memtable write, very slow….
Improvements:
● Developed two new APIs in Flink 1.5 - update() and addAll()
● update() and addAll() will both simulate RocksDB’s byte merge operation, pre-merge all elements
upfront and write to memtable only once
Benchmarking added to source code: org.apache.flink.contrib.streaming.state.benchmark.RocksDBListStatePerformanceTest
Result: 15 ~35X faster!
Q&A
Thank you!
We’re hiring!

Stream processing with Apache Flink @ OfferUp

  • 1.
    OfferUp Confidential Large-scale Near-real-timeStream Processing with Apache Flink @ OfferUp Bowen Li
  • 2.
    User Survey ● Whohas used OfferUp? ● Who has used Apache Flink? ● Who has developed code in Apache Flink?
  • 3.
    OfferUp, create thesimplest, most trustworthy way to buy and sell locally At a Glance ● the largest mobile marketplace for local buyers and sellers in the U.S. ● Top shopping app on iOS and Android ● $14+ Billion In Transactions in 2016
  • 4.
    Speaker Background Bowen Li ○Offerup ■ Develop stream processing infra with Apache Flink ■ Apache Airflow, Apache Avro, etc ○ Tableau ■ Lots of Apache ZooKeeper and Apache Curator
  • 5.
    Stream Processing @OfferUp We are expanding our stream processing footprint. We developed OfferUp’s stream processing platform with a few primitives: ● Flink installation on EMR ● HDFS, YARN, metrics, checkpoint/savepoint, etc. configuration help for Flink cluster ● User apps deployment ● Connecting to streams ● Data Ser/Deser
  • 6.
    Use Case Business Requirement: ●Calculate time-decaying personalization scores based on user activity within the last month
  • 7.
    Use Case -The old pipeline The old pipeline: ● batch processing ● ~3 hours of end-to-end latency
  • 8.
    Use Case -The new near-real-time pipeline!
  • 9.
    Pipeline Stats ● DataVolume: processing billions of records per day ● Average end-to-end latency: ~1 min ○ The 2min aggregation in 1st Flink cluster (NRT scores) dominates the latency ○ Depending on when the event enters the 2min window, the minimum latency of this pipeline can be a few seconds, the expected maximum is ~2min We dramatically lowered the end-to-end latency from ~3h to ~1min!
  • 10.
    Design Considerations Explained ●Latency - why 2min aggregation ○ User-facing services will only batch-read new scores every 5min ○ Any latency smaller than 5min is good
  • 11.
    Design Considerations Explained(con’t) ● Data Correctness ○ at-least-once guarantee ○ Be careful with merging NRT scores with historical scores, ensure no overlap and no gap
  • 12.
    Design Considerations Explained(con’t) ● Failure Recovery (assuming AWS Kinesis are reliable) ○ Two Flink clusters have checkpointing enabled and they can auto recover ○ Redis data has 3 day TTL, enough time to fix things up ○ Other parts are stateless
  • 13.
    Design Considerations Explained(con’t) ● Replay Capability ○ Each component can handle data load of replay, and its latency is not greatly impacted
  • 14.
    Contribution to ApacheFlink I’ve been contributing to Apache Flink since Mar 2017
  • 15.
    Related Contributions toApache Flink ● FLINK-7508 Improved flink-connector-kinesis write performance by 10+X ○ Released version 1.3.2 ○ Many other comprehensive improvements of flink-connector-kinesis ○ 13 out of my 43 commits ● FLINK-7475 Improved Flink’s ListState APIs() and boost its performance by 15~35X ○ Will be released in version 1.5.0 ● FLINK-6013 Created flink-metrics-datadog module ● Other contributions include Flink’s DataStream APIs, side output, build system, etc
  • 16.
    ● Problem: flink-connector-kinesisused to create one http connection for each request ● Improvement: Switched it to a connection pool mode ● Result: Improved flink-connector-kinesis’s write performance by 10+X Contribution: Improved flink-connector-kinesis # Records Sent # Records Pending in Client In this basic test, you can tell 1) write throughput goes up 2) # pending records has dropped significantly
  • 17.
    Benchmarking: ● Running aFlink hourly-sliding windowing job ● Enough Kinesis shards ● ~70 bytes/record Limitations: My Flink job is not developed for benchmarking. It only generates 21million records at maximum, which gives us a 10X or more improvement estimate. In reality, the perf improvement should be more than 10X. You’re welcome to do your own benchmarking Contribution: Improved flink-connector-kinesis
  • 18.
    Contribution: Improved Flink’sListState performance 20~35X Background on RocksDBStateBackend Problems: ● ListState has only two APIs - add() and get() ● RocksDBListState translate add() as RocksDB.merge() ○ adding 100 elements takes 100 memtable write, very slow…. Improvements: ● Developed two new APIs in Flink 1.5 - update() and addAll() ● update() and addAll() will both simulate RocksDB’s byte merge operation, pre-merge all elements upfront and write to memtable only once
  • 19.
    Benchmarking added tosource code: org.apache.flink.contrib.streaming.state.benchmark.RocksDBListStatePerformanceTest Result: 15 ~35X faster!
  • 20.