Apache Kafka® and Stream Processing at Pinterest

Apache Kafka®
and Stream
Processing at Pinterest
Liquan Pei, Boyang Chen, Shawn Nguyen

Agenda
What are we talking about today?
● Pinterest Overview
● Ads on Pinterest
● Ads Budgeting
● Ads Budgeting Kafka Client Migration
● Predictive Budgeting with Kafka Streams

Pinterest helps you discover
and do what you love

200M+ Monthly Active Users
100B+ Pins created by people
saving images from around
the web
2B+ ideas search monthly
A great platform for ads
Pinterest

Ads Product
What can advertisers get?
● CPC charge by clicks.
● CPM charge by impressions.

Ads Budgeting
● Prevent overdelivery.
● Pace advertiser spend throughout the billing cycle.
● To achieve the above goals, we need
realtime spend data.
Goals

Ads Budgeting
Kafkaaction log
server log
Ads
Serving
Architecture
Budgeting Service
Kafka

Ads Budgeting
● Every X minutes, the realtime spend information for all ads is published to S3.
● Every Y seconds, realtime spend information for ads that changes is published to
Kafka.
● Ads Servers tails Kafka to get the latest spend information.
● Each Kafka message is relatively big and the amount of data transferred via
network can be huge.
Workload

Ads Budgeting
Kafkaaction log
server log
Ads
Serving
Architecture 1.0
Budgeting Service
Kafka
Sarama Client
Sarama Client
Sarama Client

Sarama Go Client
● Ads Budgeting used a legacy Sarama Go Client to interact with Kafka.
● The legacy client had unexpected behavior when Kafka had failovers, high network
usage, or changes to ISRs.
○ Sarama Client can have inconsistent metadata information during broker leader
transitions, requiring service restarts to establish connections to the correct
brokers.
○ Metadata fetch requests occasionally hang.
Overview

Ads Budgeting
Kafkaaction log
server log
Ads
Serving
Architecture 1.0
Budgeting Service
Kafka
llbrdkafka
librdkafka
librdkafka

Zero Downtime Migration
● Introduce backward-compatible interfaces with Sarama.
● Introduce conversions between Sarama and Confluent’s go client event types.
● Introduce conversions between configurations for consumers and producers.
● Add switches in existing code to allow fast rollback within seconds.
● Lots of integration and unit tests.
Plan

Backward Compatible Interface
func (c *Client) Produce(topic string, partition int32, key, value []byte) error {
// create a librdkafka message struct
msg := ConvertedProducerMsg(topic, partition, key, value)
err = p.Produce(msg, p.Events())
if err != nil {
return err
}
// Check delivery report
e := <-p.Events()
m, ok := e.(*kafka.Message)
if !ok {
return fmt.Errorf("delivery report chan returned a non-message type: %s", m)
}
return nil
}
Producer

Backward Compatible Interface
func (c *Client) Consume(group string, topic string, partition int32, offset int64, conf
sarama.ConsumerConfig) (<-chan *sarama.ConsumerEvent, func(), error){
go func() {
run := true
for run == true {
select {
case ev := <-consumer.Events():
switch e := ev.(type) {
case *kafka.Message:
// convert Message to Sarama ConsumerEvent
consumeEvent := ConvertedConsumerEvent(e)
evCh <- consumeEvent
case kafka.Error:
…
}
}()
return evCh, func() { stopConsumer(consumer, evCh) }, err
Consumer

Metadata Management
// topicMetadata fetches all metadata info (leader/part isrs/etc.) for a given topic
func (c *Client) topicMetadata(topic string) ([]kafka.PartitionMetadata, error) {
// There is no exposed client interface in confluent-kafka other than consu
mer/producer
//Use a producer for now to send metadata requests
p, err := c.createDefaultProducer()
defer p.Close()
if err != nil {
return nil, err
}
metadata, err := p.GetMetadata(&topic, false, metadataTimeoutMs)
if err != nil {
return nil, err
}
return metadata.Topics[topic].Partitions, nil
}
Fetching Topic Metadata

Metadata Management
// FetchOffset fetches the requested earliest/latest offset for a given snapshot.
// Useful for bootstrapping from kafka as well as batch updates.
func (c *Client) FetchOffset(clientID, topic string, partition int32, method int) (int64,
error) {
if method != kafkasource.OffsetEarliest && method != kafkasource.OffsetLatest {
return -1, fmt.Errorf("bad offset type requested: %d", method)
}
p, err := c.createDefaultProducer()
defer p.Close()
if err != nil {
return -1, err
}
low, high, err := p.QueryWatermarkOffsets(topic, partition, metadataTimeoutMs)
…
}
Fetching Offsets for Batches

Putting new code in action doesn’t always go as expected.

Learnings
● Producer latency is higher than expected.
● Too many open sockets and threads.
● Which APIs to use?
● Broker network saturation.
Overview

Producer latency higher than expected
● Produce latency was around 75ms per message on sync producer (the
produce client acks delivery report synchronously per event before sending
subsequent records).
● Reduce queue.buffering.max.ms to 0ms (for sync producer, otherwise
msgs won’t send until buffer is filled or queue ms is reached)
Message Delays

Too many open sockets and threads
● File descriptor usage on brokers increased by over 10x.
● Redundant connections, N + len(bootstrap.servers), are created per client.
*10.1.239.229, and 10.1.225.203 were supplied in the client bootstrap list.
Resource Leak

Where did 10x sockets come from?
In an ideal world:
Partition Leader Replica Set
1 1 1,2,3
2 2 3,4,5
3 3 5,6,7
= 3 clients total w/ 1 client per partition = 3 threads and 3 sockets for consuming
In librdkafka (3 clients total and bootstrap.list containing brokers 1,2,3), we have:
Client A = 9 (1 socket per broker in cluster) + 3 (1 socket per broker in bootstrap list) = 12
Client B = 12 connections
Client C = 12 connections
= 36 connections total = ~10x the original connection count

Too many open sockets and threads
● Redundant connections, N + len(bootstrap.servers), are created per client if
broker’s ip/hostname in advertised.listeners do not match the exact
bootstrap broker name. The client compares this in the metadata response.
● See #LIBRDKAFKA-825 for fixes coming soon
○ Sparse connections/threads - Only connect and create threads to brokers we need to talk
to
○ Purge old brokers - Remove brokers no longer reported in metadata
Reducing socket usage

Which Consume API to use?
● We read directly from the events channel because there was slightly better
consume latency and reading from a chan is more natural in Go.
● Also, the poll method returns only 1 record per call, so reading from the Events
channel was simpler to express in code.

Broker network saturation
● With the default librdkafka consumer configuration, the network bandwidth
is reached during consumer initialization.
● We observed that the message consumption rate was constant.
● We changed the following configs to mitigate this issue
○ go.events.channel.size (max # msgs per consumer channel)
○ queued.min.messages (max # of msgs the consumer pre-fetches in the
background).
○ Enable compression on producers (snappy was used in our case).
Congestion Issue

Takeaways
● If sync producer is necessary and latency is critical, make sure buffering is as
minimal as possible.
● Tuning the buffers/channels and queue sizes are critical for achieving high
performance. In our app use case, pausing happens, making pre-fetching
unnecessary.
● Understand internals of consumer/producer APIs. This can uncover details
about performance and resource usages.
● Integration and performance tests are crucial! Performance is often
overlooked until code is deployed.

Librdkafka migration Q&A
● See https://github.com/edenhill/librdkafka/wiki/FAQ for more info and best
practices.

Ads Budgeting
Kafkaaction log
server log
Ads
Serving
Recap
Budgeting Service
Kafka

Advertiser X wants to do show ads on your website...
Advertiser Daily Budget Spend Impression
Spend per
impression
X $100 N/A 1000 $0.1
What is Overdelivery

Spend per
impression
X $100 $100 1100 $0.09
Website
operator
Revenue
Internet
Company
N/A $100 1100 N/A
over deliver
100
The result was unexpected.
● Advertiser X earns more impressions.
● Solvable by realtime spend data.

Spend per
impression
X $100 $100 1010 $0.1
Website
operator
Revenue
Internet
Company
N/A $100 1010 N/A
By making the system faster, overdelivery is reduced.
However, this is not the end of story...

Advertiser Y wants to do CPC (cost per click) ads on your website
Advertiser Daily Budget Spend Click Spend per click
Y $100 N/A 50 $2
Overdelivery continuing...
Advertiser Daily Budget Spend Click
Spend per
Click
Y $100 N/A 50 $2

Overdelivery continuing...
The result is unexpected, again.
● Advertiser Y earns more clicks.
● Because user action could be naturally delayed.
Advertiser Daily Budget Spend Click
Spend per
Click
Y $100 $100 60 $1.70
Website
operator
Revenue
Internet
Company
N/A $100 60 N/A
over deliver
10 clicks

The system needs to be able
to predict spend that might
occur in the future and slow
down campaigns close to
reaching their budget.

Ads Budgeting
Inflight Spend
Kafkaaction log
server log
Ads
Serving
Budgeting Service
Kafka
Inflight Spend
Service
Kafka

Inflight Spend
Methodology
● Predict potential spend based on previous 3 minute insertions. Output to
downstream every 10 seconds.
● Inflight_spend = price * impression_rate * action_rate
○ Price: the value of this ad
○ Impression_rate: historical conversion rate of one insertion to impression.
Note that an insertion is not guaranteed to convert to an impression.
○ Action_rate: for an advertiser paying by click, this is the probability that user
will click on this ad insertion; for advertiser paying by impression, this is 1.

Inflight Spend
Single Stage Aggregation
● We could use hopping windows to calculate the previous three minutes of expected
spend.
● Window could be three minutes long, spaced 10 seconds apart. There would be 180
seconds / 10 seconds = 18 open windows.
● However, each event may update all 18 windows.
● Write over read. (state store being written through Kafka)
Hopping window

Inflight Spend
Two stage aggregation
● Switched from large hopping windows to small tumbling windows.
● One 180 second overlapping window -> eighteen 10 second non-overlapping
window.
● Each event only updates one window at a time.
● By reducing the number of updates from 18 to 1, the overall throughput has
increased by 18x.
Tumbling window

Inflight Spend
Data publishing
Partition 1
Partition 2
Partition 4
Partition 3
Ads Serving
id spend
id spend
id spend
id spend
Message Map
● For each input partition, the lastest 3 minute inflight spend data is encoded as a
Kafka message.
● A map from ad ID to inflight spend by input partition.
● The client only needs to swap the snapshot in local.

Inflight Spend
Data encoding
● Since inflight spend accuracy is not strict we can use lookup table encoding(lossy)
for the data.
● Use delta encoding to encode ad IDs.
○ List of ad ids: [10000, 10002, 9999, 9980, 20000, 20010]
○ Sorted [9980, 9999, 10000, 10002, 20000, 20010]
○ Encoded list: [9980, 19, 1, 2, 20000, 10]

Inflight Spend
3-minute wide time interval
Inflight vs Actual spend

Takeaways
● To make precise spend measurement, predictive approach needs to be taken.
● Think about whether read/write is more heavy when doing window design.

We are hiring!
● Ads Infrastructure:
○ Software Engineer, Ads Infrastructure
● Data:
○ Data Engineer, Logging Platform
○ Data Engineer, Stream Platform
○ Software Engineer, Big Data Platform
● Contact:
○ liquanpei@pinterest.com
○ bychen@pinterest.com
○ shnguyen@pinterest.com

Apache Kafka® and Stream Processing at Pinterest

Recommended

Recommended

More Related Content

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Apache Kafka® and Stream Processing at Pinterest