Pinterest's Evolution of Real-time User Engagement Event Processing

Evolution of Real-time User Engagement Event
Consumption at Pinterest
Heng Zhang
Lu Liu
09/26/2023

Agenda
1. Introduction
2. Background
3. Real-time event processing architecture evolution
4. Unified solution deep dive
5. Wins and Learns
6. Ongoing efforts
7. Q&A

Pinterest is the visual inspiration platform people
around the world use to shop products personalized to
their taste, ﬁnd ideas to do oﬄine and discover the
most inspiring creators.
Pinterest’s mission is to bring everyone the inspiration
to create a life that they love.
What is Pinterest?

We are engineers from Pinterest Data Eng.
Data Eng’s mission is to create and run reliable, eﬃcient and planet-scale
data platforms and services to accelerate innovation and sustain Pinterest
business.
Who are we?

Conﬁdential
|
©
Pinterest User Engagement events Processing - overview
Standardized metric processing
frontend events
backend events
impression
events
Streaming
jobs
Batch Data
Warehouse ETL
Batch
workflows
Kafka Message
Transportation Service
Merced
raw
events
tables
Data
Warehouse
SOT tables

Conﬁdential
|
©
Pinterest User Engagement events Processing - data source
● Scale and data volume:
○ hundred of billions of events daily ( hundreds of TB )
○ double every several years
● event schema:
○ All the events share the same Thrift struct deﬁnition
○ Event types:
■ number of events per type shows a long tail distribution (thousands of types)
■ new types are constantly being added for certain new use cases
■ most use cases only need to process events of a very small set of types

Conﬁdential
|
©
Pinterest
● Signals used in recommendation
● Derived results directly showing on diﬀerent product surfaces
● Analytical dashboard for pinners and advertisers
● Batch and real time Experiments
● Internal analytics & debugging
User Engagement events Processing - use cases

3. Real-time Event Processing
Architecture Evolution

Conﬁdential
|
©
Pinterest
Multiple frameworks maintained by different teams before 2019
● Mini Frameworks built on top of Kafka client - Pinpin content & discovery
● Apache Storm - Roadrunner
● Apache Spark streaming - Voracity
● Kafka streams - Ads budgeting
Compute Frameworks

Conﬁdential
|
©
Pinterest
Xenon - Pinterest stream processing platform on top of Flink since 2019
+
Cluster
Management
(YARN)
Native Flink
DataStream API
NRTG(Lite) on
top of
DataStream API
Flink Table
API & SQL
The Resource Management & Job Execution Layer
The Developer APIs
Job State
Management
(Checkpoints,
Backups,
Restores, Edits)
Security /
Auth
(PII/FGAC)
Job Health &
Diagnosis
(Dr. Squirrel)
CI/CD Hermez
The Deployment Stack
Job
Management
Service
Common Libraries and Connectors
Compute Frameworks

Scaling challenges and solutions 2019 ~ 2020 Stability
Challenge
● outbound traffic = number of jobs * inbound traffic
● Kafka clusters hosting event topics had very high resource saturation
Observation
● each job only needs to process a few common event types (eg: click, view)
● events of those common types are a small portion of all the events
event
(type 1, type 2, …, type M)
streaming job 1
streaming job 2
streaming job N
…

Solution
event
Stream Splitter v1
● Flink DataStream API
● Job graph consists of source, filter
and sink
● Filter operator only keep events of a
small set of types required by
downstream
event_core
(type i, type j, type k)

Win
● The derived topics were about 10% of the original event topics, and high Kafka
cluster resource saturation issue was mitigated.
● Due to the smaller input QPS, jobs processing derived topics required less CPU /
memory and AWS cross-az traffic cost was reduced. Infra savings!!!
event_core
streaming job 1
streaming job 2
streaming job N
…

Challenge
● With new jobs requiring more event types, the derived topics grew larger and
larger (10% -> 30% of original event topics)
● Infra cost grew significantly with new jobs onboarded
Observation
● QPS for each job became larger due to the growth of derived topics and job
required more resources
● Each job had to filter input events by types to get what it needs
Scaling challenges and solutions 2021 ~ 2022 Efficiency

Solution
event
Stream Splitter v2
● Flink SQL
● Job consists of a statement set of DML
statements – insert into event_type_i (select *
from event where type = type_i)
● one DML statement for one per-type event topic
event_type_i
event_type_j
event_type_k
…

Win
● Downstream jobs only process several
per-type event topics that they needs
● Downstream jobs no longer needs filter logic
● Downstream jobs require much less infra
resources (infra savings!!!)
● Setting up a new pipeline requires a new
topic and a SQL statement
streaming job 1
streaming job 2
streaming job N
…
event_type_i
event_type_j
event_type_k
…

Issues with Stream Splitter v2
● All the records coming out of source operator are forwarded to every pipeline
● Stream Splitter v2 jobs cost twice as much as v1 jobs
Note:
Job graph is generated by the internal
SQL planner from the DML statements
other operators that do not affect the
data transportation pattern are not
shown for better visualization
Kafka
Source
filter i
(type = type_i)
Kafka Sink i
Kafka Sink j
Kafka Sink k
…
filter j
(type = type_j)
filter k
(type = type_k)
…
M
M
M
M
Mi Mi
Mj Mj
Mk Mk

Scaling challenges and solutions 2021 ~ 2022 Data quality
Challenge
● Streaming and batch workflows
generated inconsistent results
Observation
● Streaming job re-implemented many
batch ETL logics without standardization
Streaming
jobs
70 impressions
100 impressions
event
DWH
SOT
tables
Batch
workflows

Solution
event
Real time DWH streams
● Build with NRTG - mini framework on top
of a subset of Flink Datastream API (Flink
state API is not supported)
● Job graph consists of source, filter,
enrich, dedup and sink
● filter, enrich and dedup logics are reusing
those in batch ETL
● dedup key is stored in off-heap memory
(with pre-configured memory size) via a 3d
party library ohc
dwh_event (enriched and deduped)
Dedup accuracy is compromised
during task restart or job deployment
as in-memory dedup keys are lost
It takes up to 1 day’s raw events to
rebuild the state.

Improved Solution
event
Real time DWH streams with native
Flink State
● Native Flink state API is added to NRTG
● Dedup operator is re-written using the Flink
MapState to store dudup key with 1d TTL
● Rocksdb state backend and S3 to store
active (read / write) keys and backup
● Savepoint size is tens of TB. Full state is
preserved during task restart and job
redeployment
dwh_event (enriched and deduped)
Dedup accuracy is guaranteed
during task restart or job
redeployment with specified
checkpoint (from s3)

Win
● Downstream jobs reading dwh_events can generate consistent results with the batch workflows;
the computed real-time signals used in recommendation helped to boost Pinterest engagement
metrics by double digits.
● Downstream jobs no longer need to implement enrich and dedup logics and job graphs are
simplified to only focus on the business logic.
Streaming
jobs
70 impressions
70 impressions
dwh_event
DWH
SOT
tables
Batch
workflows

Issues with Real-time DWH streams job
● The generated dwh_event topic consists of multiple types and downstream jobs are
reading unnecessary data and thus implementing filter logics
● The mini framework introduces extra overhead
● Supporting a new type is slow - The logics for processing different types are coupled
together due to the mini framework’s API requirement

Two solutions for pre-processing engagement events
Stream Splitter
Efficient downstream consumption
Fast onboarding
No data quality
Repetitive processing logics in downstreams
Inefficient job runtime (data duplication )
Data Quality
simplified downstream job logic
Slow onboarding
Inefficient downstream consumption
Inefficient job runtime (framework overhead)
Realtime DWH
Downstream job developers are confused about what to use
Infra cost doubles and KTLO cost doubles

Unified Solution - Requirements
● Efficiency
○ Pre-processing jobs have efficient runtime
○ Downstream jobs only read events what they need to process
● Data quality
○ Downstream jobs read enriched and deduped events that can generate
consistent results with the batch workflows
● Dev velocity
○ Supporting a new type in the pre-processing jobs should be simple and can be
enabled easily without affecting the existing pipelines
○ Downstream jobs no longer port the filter-enrich logics from batch ETL and no
longer implement deduplication logic on data source
● KTLO
○ maintain one unified solution rather than 2 solutions

Unified solution - API choice
● Flink Datastream API
● Flink SQL
● Mini framework like NRTG
● Flink Table API - our final choice
○ It is more expressive than Flink SQL - complex logics can’t be easily implemented as SQL
○ It is very flexible
■ source and sinks are registered and accessed as Flink tables
■ easily convert Table to Datastream when we want to leverage low-level features
○ It does not have any extra framework overhead like NRTG

Unified solution - job framework
Framework design
● Each output stream is generated through a pipeline made up of filter, enrich dedup and sink
operators
● Pipeline is pluggable and independent from each other
● Classes from batch ETL are re-used to maintain consistent semantics
● Java reflection are leveraged to easily configure each pipeline
Job graph optimization - side outputs
● An job operator assign every source event based on type to each pipelines through side output
● Essentially we are implementing “filter pushdown” to reduce unnecessary data transportation

Unified solution - new offering
Platinum Event Stream

4. Platinum Event Stream Deep Dive

Platinum Event Streams - What it offers?
raw event platinum event streams
Standardized
Event
Selection
Event
Deduplication
Downstream
Efficiency
streaming
applications

Platinum Event Streams - User Flow
Logging / Metric
Owners
Streaming
App
Developers
I want to use event A as one of my signals, what’s
the correct logic to process it from raw events?
Before After
Logging
Owners
Metric
Definition
Owners
Streaming
App
Developers
Data
Warehouse
Team
platinum event
streams
Faster onboarding w/ guaranteed
quality and efficiency!

Platinum Event Stream - Technical Architecture
Input:
raw event
Flink processing:
Event splitting,
filtering, enrichment,
deduplication
Output:
Kafka topics w/ cleansed event data

Platinum Event Stream - Flink Processing
Kafka
Source
Table
Dedup 1 Kafka Sink 1
Enrich 1
Dedup 2 Kafka Sink 2
Enrich 2
Dedup N Kafka Sink N
Enrich N
… … …
Splitter
w/ Filters
Side output 1
Side output 2
…
Side output N
M
M1
M2
MN

Platinum Event Stream - Splitter w/ Filters
Splitter Functionalities:
1. Filter out the events we don’t need.
2. Split the stream into different sub-pipelines according to event types.
… …
…
Splitter
(w/ filters)
Enrich i Dedup i Kafka Sink i
Kafka
Source
Table
Enrich 1 Dedup 1 Kafka Sink 1
Enrich N Dedup N Kafka Sink N

Metric Repository
(shared by batch and streaming
processing)
Event / Metric X
def filter(event: Event): Boolean =
……
……
def createDedupKey(event: Event) =
……
……
… …
…
Splitter
(w/ filters)
Kafka
Source
Table
Standardized Event Selection
Consistent w/ Batch Applications

Splitter Functionalities: (1) Filtering (2) Split Streams
Solution 1 - FlatMapFunc
Severe back pressure and scalability
issue when input traffic is high.
Solution 2 - Side Output
Kafka
Source
Table
Splitter:
Initialize:
Map<event type, pipeline tag>
Process:
- Emit events with
corresponding pipeline tag.
- Throw away if not needed.
…
FlatMapFunc - 1
Kafka
Source
Table
FlatMapFunc - 2
FlatMapFunc - N
…
M
M - QPS of input
raw event stream
M
M
M
M1
M2
Mn
Mi - QPS of side output i which
is needed by pipeline i
ΣMi << M
Scalability issue solved!

Latency Information
Decoded Info Derived Info
● Derived spam flags from a
couple different fields logged
in raw event data.
ms
● Additional latency information.
● Help latency sensitive
downstream takes different
actions according to latency
per event.
… …
…
Splitter
(w/ filters)
Kafka
Source
Table
● Decoded some
commonly used fields for
downstream to use.
BASE64
Platinum Event Stream - Enrich

… …
…
Splitter
(w/ filters)
Kafka
Source
Table
Why we need deduplication?
● Duplicate events exist in Pinterest’s raw event data.
● In some cases, duplicate rates vary from ~10-40% depending on the types of events.
Causes of duplicates:
1. Repeated users’ actions when interacting with Pinterest app.
2. Incorrect client logging implementation.
3. Client resend logging messages.
Solution:
● Deduplication in both batch and streaming pipelines before exporting to dashboards or
flowing into ML systems.
Platinum Event Stream - Dedup

Key by
UserID
Not
exists
Update
state
& Output
DedupKey
(e)
24hr TTL
2-10 TB
… …
…
Splitter
(w/ filters)
Kafka
Source
Table
Flink Stateful Functions

24hr TTL
2-10 TB
Incremental checkpoint for large state
● Full state size: 2-10TB
● Every time checkpoint size: tens of GB
Re-deployment
● From savepoint: ~10 - 20 mins
● From checkpoint: < 2 mins
… …
…
Splitter
(w/ filters)
Kafka
Source
Table

Easy-to-Extend Framework with Java Reflection
Metric Repository
(Referenced by both online and offline
processing)
event_definitions
EventA.scala
EventB.scala
EventC.scala
One Line
Configuration
pipeline1.eventClass=A
pipeline2.eventClass=B
pipeline3.eventClass=C
*.properties:
1. Only several line configuration changes needed for adding new streaming pipeline.
2. Guaranteed batch and streaming logic consistent by referencing the same code repo.
Java Reflection
Look up Event class by its
name with Java Reflection
when building job graph.
Invoke functions defined for
each metric at runtime for
each pipeline:
MetricA.filter()
MetricA.createDedupeKey()

Platinum Event Stream - Data Quality Monitoring
Before
30-40%
discrepancies on
streaming vs. batch
applications
After
>99% match rate
between streaming vs.
batch applications
Daily comparison with batch SOT dataset
platinum event streams offline tables
offline
SOT
tables
Kafka topic → S3 dump
(internal framework)
Internal offline data
checker system
Alerts for match rate violation
Dashboards for continuous monitoring

Platinum Event Streams - Cost Efficiency
Efficiency Solution
600 vcores
600 vcores
Data Quality Solution
Unified Solution
(Efficiency + Data Quality)
600 vcores
Achieve both functionalities with single
copy of cost similar to previous offerings!

5. Wins and Learns
1. User engagement boost brought by cleaner source data!
2. Highly simpliﬁed downstream streaming applications’ onboarding ﬂow!
3. Hundreds of thousands infra saving as well as maintenance cost saving!

Ongoing efforts - streaming governance
We build streaming lineage & catalog which are integrated with batch
lineage and catalog for uniﬁed data governance
● catalog of Flink tables that are registered all the external systems that are
interacting with Flink jobs
● lineage between Flink jobs and external systems

Ongoing efforts - streaming and incremental ETL
We build solutions on top of CDC, Kafka, Flink, Iceberg and Spark to
● ingest data in near real-time from online systems to oﬄine data lake
● incrementally process oﬄine data in data lake

Pinterest's Evolution of Real-time User Engagement Event Processing

Recommended

Recommended

More Related Content

Similar to Pinterest's Evolution of Real-time User Engagement Event Processing

Similar to Pinterest's Evolution of Real-time User Engagement Event Processing (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Pinterest's Evolution of Real-time User Engagement Event Processing