Serving ads at Twitter scale requires fast data: making use of real-time data sets about what users are engaged with and interested in right now. Twitter is live, and the data that we use for ads personalization is unmatched. We’ll give an overview of Twitter’s revenue science stack, with an emphasis on the real-time production pipeline that powers how we show the right user the right ad for what they’re interested in right now.
3. Built ML stack & team
“Data quarterback”
Michigan PhD in AI/ML
AVID M sports fan
2015: TellApart -> Twitter
Ads Marketplace
Three things to know about me
4. Team of 11 ML eng and
tech DS
We select, place, and
price ads for Twitter
Seattle & SF
Three things about Ads Marketplace
1 2 3
5. 1. Intro to Twitter ads
2. Revenue Science production ML stack
3. Data science across the rest of Twitter
Agenda
7. Here’s where our Kona coffee
beans are grown on the Big Island
in Hawaii. Feast your eyes!
The Barista Bar
@baristabar
Promoted
RETWEETS27 LIKES27
Twitter invented
native ads
14. Maximize Twitter revenue
Use “rankscore”, or the expected revenue for showing an ad to a user
Calculated per ad-slot in a request
Minimize impact to user engagement
Long-running holdout experiments to evaluate impact of ad load on users
Requires valuing user activity in dollars
Minimize cost of computation
More relevant ads cost more money
Straightforward trade-off
Hydrating the timeline
15. Twitter ad requests served
every second
Here’s where our Kona coffee
beans are grown on the Big Island
in Hawaii. Feast your eyes!
The Barista Bar
@baristabar
Promoted
RETWEETS27 LIKES27
XXXk
YYYk
Some numbers
Active ad campaigns
80 ms to pick ads for timeline
17. Lifecycle of an ad request
● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements
18. ● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements
Lifecycle of an ad request
19. Data Targeting Prediction Marketplace
● Twitter product: Follow-graph, Tweet engagements, user behavior
● Advertisers: Website browsing behavior, in-app behavior
● Partners: Data aggregators, 3rd party data sets
And Twitter has much more data than that.
20. ● To predict user behavior, need:
○ Audiences
■ Custom - advertiser site visitors
■ Tailored - advertiser lists of emails or Twitter handles
■ Modeled - based on interests
○ Features of interactions with
■ Twitter product
■ Ads
■ Advertiser sites
■ Advertiser applications
Example: serving user data
23. ● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements
Lifecycle of an ad request
24. ● Advertisers specify the types of
people they want to target
○ User Attributes
(age, gender, geo, device, etc.)
○ User Interest
(topics, keywords, handles,
events, tv shows, etc.)
● Many user attributes are modeled
● Campaign eligibility is binary
(for now)
Data Targeting Prediction Marketplace
25. ● Age, gender, geo, etc. - logistic regression models applied offline
● Interest modeling
○ Seems straightforward, but:
■ What does it mean to target “interests”?
■ How to structure Twitter data?
■ How can we handle scale?
○ Three applications
■ Follower interest
■ Keyword interest
■ Topic interest
Reaching a relevant audience
26. ● Most organic interest targeting
○ People follow handles
corresponding to their interests
○ Handles perform well for
advertisers
● Features: cosine similarity between
handles, local topology from social
network, etc.
● LR model predicts probability of
future links being added to social
network
Follower targeting
27. ● Versatile & flexible
● But ambiguous & limited reach
● CBOW model embeds n-grams into 300d latent space
● Cosine similarity -> semantic similarity
Keyword targeting
28. ● Organized into taxonomy
○ XX parent topics (“music”, “sports”)
○ YYY children (“jazz”, “basketball”)
● No ambiguity, large reach
● Random forest classifier
● Survey users for labeled data
● Features:
○ Of topic
○ Of user
○ Cross of topic and user
Topic interest
30. ● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements
Lifecycle of an ad request
31. ● Prediction: What is the probability of an engagement if we show this ad?
● Learns P(click | user, ad, t)
● Uses state of the art home-rolled ML platform (Lolly)
● Full prediction is expensive
○ XXXX dense user features
○ Sparse features hashed into 2^28 space
○ YYY ad features
○ Global features
○ Crosses of user x ad features
Data Targeting Prediction Marketplace
32. ● Lolly was built to replace Vowpal Wabbit
○ Distributed online learning
○ SGD LR
● Powerful: adaptive learning rates, variety of loss functions, regularization,
importance weighting, adaptive normalization, contextual bandits, warm
start
● Feature representation: GBDT, transforms
● Memory management: controls model size, will always hash into RAM
● 5x faster than VW online service
Lolly - Twitter’s ML infrastructure
33. ● P(engagement) prediction can be very good, but also is very expensive
● Balance performance and cost with tunable heuristics
Ad request
Filtering
Light pCTR
Eligibility
Full pCTR(ad, placement) tuples
~100s
~50
~1000
A LOT
Balancing revenue with compute
34. AdMixer
Ad request
(ad, placement) tuples
Prediction in production
AdMixer AdMixer AdMixer
AdMixer
AdShard
Filtering
Light pCTR
Eligibility
Full pCTRPrediction
Service
batch
online
35. ● Ads ecosystem is non-stationary - ADAM
● Online regularization - L1 with lazy updates
● Importance weighting - gradient scaling
● Delayed feedback
● Explore exploit
● Hyperparameter tuning
● Model calibration
● Distributed systems break
Challenges of online learning
36. ● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements
Lifecycle of an ad request
37. Data Targeting Prediction Marketplace
● Core marketplace problem: translate P(engage) to bid, run an internal
auction to determine (ad, position) tuples
● Complexity arises from:
○ Different bid types
○ Different bid optimizations
○ Campaign pacing
○ User experience
38. eCPA1
= 5% link click chance * $4 link click bid = $0.20
eCPA2
= 30% video view * $0.50 video view bid = $0.15
Second price: bid needed to win = $0.15 / 5% = $3 charge on link click
Many advertisers bidding for different events, competing for the same slots.
Auction dynamics
39. Actually not constant bids, twitter offers bidding strategies:
○ Target bidding: guarantee daily spend / clicks <= target
○ Conversion optimization: e.g. pay for click “optimize” for app installs
○ Budget pacing: get the most clicks on a fixed budget
Billions of auctions a day, each with 10k+ ads, huge distributed system, ...
Auction complexities
43. ● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements
Lifecycle of an ad request
p99
80ms
45. “As I look into 2017 and we look at what we can do, I just think the
superpower we really provide the world is we
can break news and get information to people
faster than any other service in the world. And in order to do that,
people have to do just a ton of work right now to dig through
everything that may not matter to them to find something that really
does. And that’s why I am excited about really making sure that we
apply artificial intelligence and machine learning
in the right ways and that we really meet that superpower of
being that little bird that told you something that you couldn’t find
anywhere else.”
-Jack Dorsey, Twitter CEO
46. Consumer Twitter product
Ranked timeline, who to follow, notifications, recommended tweets, moments, …
Cortex (deep learning science team)
Semantic tagging, image tagging, video tagging, NSFX, …
Abuse
Inappropriate content, user penalty box, safe search, ...
Identity
Deterministic & probabilistic identifier bridging
And others
Machine Learning at Twitter
Outside of ads