Speakers: Liquan Pei, Software Engineer, Ads Infrastructure Team, Pinterest; Boyang Chen, Software Engineer, Ads Infrastructure Team, Pinterest & Shawn Nguyen, Platform Engineer, Pinterest
Apache Kafka® is widely used at Pinterest to power the recommendation systems for both organic and promoted content to its 200+ million monthly active users.
With the recent adoption of the Confluent Go client and Kafka Streams, Pinterest has experienced significantly improved system stability and performance of its Kafka clients and real-time processing framework, as well as improved scalability and lower maintenance costs.
In this talk, members of the Pinterest team:
-Share details of their results
-Offer lessons learned from their Confluent Go client migration
-Discuss their use cases for adopting Kafka Streams
2. Agenda
What are we talking about today?
● Pinterest Overview
● Ads on Pinterest
● Ads Budgeting
● Ads Budgeting Kafka Client Migration
● Predictive Budgeting with Kafka Streams
4. 200M+ Monthly Active Users
100B+ Pins created by people
saving images from around
the web
2B+ ideas search monthly
A great platform for ads
Pinterest
6. Ads Product
What can advertisers get?
● CPC charge by clicks.
● CPM charge by impressions.
7. Ads Budgeting
● Prevent overdelivery.
● Pace advertiser spend throughout the billing cycle.
● To achieve the above goals, we need
realtime spend data.
Goals
11. Ads Budgeting
● Every X minutes, the realtime spend information for all ads is published to S3.
● Every Y seconds, realtime spend information for ads that changes is published to
Kafka.
● Ads Servers tails Kafka to get the latest spend information.
● Each Kafka message is relatively big and the amount of data transferred via
network can be huge.
Workload
13. Sarama Go Client
● Ads Budgeting used a legacy Sarama Go Client to interact with Kafka.
● The legacy client had unexpected behavior when Kafka had failovers, high network
usage, or changes to ISRs.
○ Sarama Client can have inconsistent metadata information during broker leader
transitions, requiring service restarts to establish connections to the correct
brokers.
○ Metadata fetch requests occasionally hang.
Overview
16. Zero Downtime Migration
● Introduce backward-compatible interfaces with Sarama.
● Introduce conversions between Sarama and Confluent’s go client event types.
● Introduce conversions between configurations for consumers and producers.
● Add switches in existing code to allow fast rollback within seconds.
● Lots of integration and unit tests.
Plan
17. Backward Compatible Interface
func (c *Client) Produce(topic string, partition int32, key, value []byte) error {
// create a librdkafka message struct
msg := ConvertedProducerMsg(topic, partition, key, value)
err = p.Produce(msg, p.Events())
if err != nil {
return err
}
// Check delivery report
e := <-p.Events()
m, ok := e.(*kafka.Message)
if !ok {
return fmt.Errorf("delivery report chan returned a non-message type: %s", m)
}
return nil
}
Producer
18. Backward Compatible Interface
func (c *Client) Consume(group string, topic string, partition int32, offset int64, conf
sarama.ConsumerConfig) (<-chan *sarama.ConsumerEvent, func(), error){
go func() {
run := true
for run == true {
select {
case ev := <-consumer.Events():
switch e := ev.(type) {
case *kafka.Message:
// convert Message to Sarama ConsumerEvent
consumeEvent := ConvertedConsumerEvent(e)
evCh <- consumeEvent
case kafka.Error:
…
}
}()
return evCh, func() { stopConsumer(consumer, evCh) }, err
Consumer
19. Metadata Management
// topicMetadata fetches all metadata info (leader/part isrs/etc.) for a given topic
func (c *Client) topicMetadata(topic string) ([]kafka.PartitionMetadata, error) {
// There is no exposed client interface in confluent-kafka other than consu
mer/producer
//Use a producer for now to send metadata requests
p, err := c.createDefaultProducer()
defer p.Close()
if err != nil {
return nil, err
}
metadata, err := p.GetMetadata(&topic, false, metadataTimeoutMs)
if err != nil {
return nil, err
}
return metadata.Topics[topic].Partitions, nil
}
Fetching Topic Metadata
20. Metadata Management
// FetchOffset fetches the requested earliest/latest offset for a given snapshot.
// Useful for bootstrapping from kafka as well as batch updates.
func (c *Client) FetchOffset(clientID, topic string, partition int32, method int) (int64,
error) {
if method != kafkasource.OffsetEarliest && method != kafkasource.OffsetLatest {
return -1, fmt.Errorf("bad offset type requested: %d", method)
}
p, err := c.createDefaultProducer()
defer p.Close()
if err != nil {
return -1, err
}
low, high, err := p.QueryWatermarkOffsets(topic, partition, metadataTimeoutMs)
…
}
Fetching Offsets for Batches
22. Learnings
● Producer latency is higher than expected.
● Too many open sockets and threads.
● Which APIs to use?
● Broker network saturation.
Overview
23. Producer latency higher than expected
● Produce latency was around 75ms per message on sync producer (the
produce client acks delivery report synchronously per event before sending
subsequent records).
● Reduce queue.buffering.max.ms to 0ms (for sync producer, otherwise
msgs won’t send until buffer is filled or queue ms is reached)
Message Delays
24. Too many open sockets and threads
● File descriptor usage on brokers increased by over 10x.
● Redundant connections, N + len(bootstrap.servers), are created per client.
*10.1.239.229, and 10.1.225.203 were supplied in the client bootstrap list.
Resource Leak
25. Where did 10x sockets come from?
In an ideal world:
Partition Leader Replica Set
1 1 1,2,3
2 2 3,4,5
3 3 5,6,7
= 3 clients total w/ 1 client per partition = 3 threads and 3 sockets for consuming
In librdkafka (3 clients total and bootstrap.list containing brokers 1,2,3), we have:
Client A = 9 (1 socket per broker in cluster) + 3 (1 socket per broker in bootstrap list) = 12
Client B = 12 connections
Client C = 12 connections
= 36 connections total = ~10x the original connection count
26. Too many open sockets and threads
● Redundant connections, N + len(bootstrap.servers), are created per client if
broker’s ip/hostname in advertised.listeners do not match the exact
bootstrap broker name. The client compares this in the metadata response.
● See #LIBRDKAFKA-825 for fixes coming soon
○ Sparse connections/threads - Only connect and create threads to brokers we need to talk
to
○ Purge old brokers - Remove brokers no longer reported in metadata
Reducing socket usage
27. Which Consume API to use?
● We read directly from the events channel because there was slightly better
consume latency and reading from a chan is more natural in Go.
● Also, the poll method returns only 1 record per call, so reading from the Events
channel was simpler to express in code.
28. Broker network saturation
● With the default librdkafka consumer configuration, the network bandwidth
is reached during consumer initialization.
● We observed that the message consumption rate was constant.
● We changed the following configs to mitigate this issue
○ go.events.channel.size (max # msgs per consumer channel)
○ queued.min.messages (max # of msgs the consumer pre-fetches in the
background).
○ Enable compression on producers (snappy was used in our case).
Congestion Issue
29. Takeaways
● If sync producer is necessary and latency is critical, make sure buffering is as
minimal as possible.
● Tuning the buffers/channels and queue sizes are critical for achieving high
performance. In our app use case, pausing happens, making pre-fetching
unnecessary.
● Understand internals of consumer/producer APIs. This can uncover details
about performance and resource usages.
● Integration and performance tests are crucial! Performance is often
overlooked until code is deployed.
30. Librdkafka migration Q&A
● See https://github.com/edenhill/librdkafka/wiki/FAQ for more info and best
practices.
32. Advertiser X wants to do show ads on your website...
Advertiser Daily Budget Spend Impression
Spend per
impression
X $100 N/A 1000 $0.1
What is Overdelivery
33. What is Overdelivery
Advertiser Daily Budget Spend Impression
Spend per
impression
X $100 $100 1100 $0.09
Website
operator
Revenue
Internet
Company
N/A $100 1100 N/A
over deliver
100
The result was unexpected.
● Advertiser X earns more impressions.
● Solvable by realtime spend data.
34. What is Overdelivery
Advertiser Daily Budget Spend Impression
Spend per
impression
X $100 $100 1010 $0.1
Website
operator
Revenue
Internet
Company
N/A $100 1010 N/A
By making the system faster, overdelivery is reduced.
However, this is not the end of story...
35. Advertiser Y wants to do CPC (cost per click) ads on your website
Advertiser Daily Budget Spend Click Spend per click
Y $100 N/A 50 $2
Overdelivery continuing...
Advertiser Daily Budget Spend Click
Spend per
Click
Y $100 N/A 50 $2
36. Overdelivery continuing...
The result is unexpected, again.
● Advertiser Y earns more clicks.
● Because user action could be naturally delayed.
Advertiser Daily Budget Spend Click
Spend per
Click
Y $100 $100 60 $1.70
Website
operator
Revenue
Internet
Company
N/A $100 60 N/A
over deliver
10 clicks
37. The system needs to be able
to predict spend that might
occur in the future and slow
down campaigns close to
reaching their budget.
39. Inflight Spend
Methodology
● Predict potential spend based on previous 3 minute insertions. Output to
downstream every 10 seconds.
● Inflight_spend = price * impression_rate * action_rate
○ Price: the value of this ad
○ Impression_rate: historical conversion rate of one insertion to impression.
Note that an insertion is not guaranteed to convert to an impression.
○ Action_rate: for an advertiser paying by click, this is the probability that user
will click on this ad insertion; for advertiser paying by impression, this is 1.
40. Inflight Spend
Single Stage Aggregation
● We could use hopping windows to calculate the previous three minutes of expected
spend.
● Window could be three minutes long, spaced 10 seconds apart. There would be 180
seconds / 10 seconds = 18 open windows.
● However, each event may update all 18 windows.
● Write over read. (state store being written through Kafka)
Hopping window
41. Inflight Spend
Two stage aggregation
● Switched from large hopping windows to small tumbling windows.
● One 180 second overlapping window -> eighteen 10 second non-overlapping
window.
● Each event only updates one window at a time.
● By reducing the number of updates from 18 to 1, the overall throughput has
increased by 18x.
Tumbling window
42. Inflight Spend
Data publishing
Partition 1
Partition 2
Partition 4
Partition 3
Ads Serving
id spend
id spend
id spend
id spend
Message Map
● For each input partition, the lastest 3 minute inflight spend data is encoded as a
Kafka message.
● A map from ad ID to inflight spend by input partition.
● The client only needs to swap the snapshot in local.
43. Inflight Spend
Data encoding
● Since inflight spend accuracy is not strict we can use lookup table encoding(lossy)
for the data.
● Use delta encoding to encode ad IDs.
○ List of ad ids: [10000, 10002, 9999, 9980, 20000, 20010]
○ Sorted [9980, 9999, 10000, 10002, 20000, 20010]
○ Encoded list: [9980, 19, 1, 2, 20000, 10]
46. Takeaways
● To make precise spend measurement, predictive approach needs to be taken.
● Think about whether read/write is more heavy when doing window design.
47. We are hiring!
● Ads Infrastructure:
○ Software Engineer, Ads Infrastructure
● Data:
○ Data Engineer, Logging Platform
○ Data Engineer, Stream Platform
○ Software Engineer, Big Data Platform
● Contact:
○ liquanpei@pinterest.com
○ bychen@pinterest.com
○ shnguyen@pinterest.com