Nicholas Gorski: Real-time revenue science at Twitter

Real-time Revenue
Science @Twitter
@nicholasgorski
ngorski@twitter.com
28 March 2017

Nick Gorski
Engineering Manager, Ads Marketplace
@nicholasgorski

Built ML stack & team
“Data quarterback”
Michigan PhD in AI/ML
AVID M sports fan
2015: TellApart -> Twitter
Ads Marketplace
Three things to know about me

Team of 11 ML eng and
tech DS
We select, place, and
price ads for Twitter
Seattle & SF
Three things about Ads Marketplace
1 2 3

1. Intro to Twitter ads
2. Revenue Science production ML stack
3. Data science across the rest of Twitter
Agenda

Here’s where our Kona coffee
beans are grown on the Big Island
in Hawaii. Feast your eyes!
The Barista Bar
@baristabar
Promoted
RETWEETS27 LIKES27
Twitter invented
native ads

Why do people buy ads?
Create Awareness
Show Value
Engage Customers
Inspire Action

Users should find Twitter ads as
engaging as organic content.

So. Ads.
(We prefer “ad tech”).

Maximize Twitter revenue
Use “rankscore”, or the expected revenue for showing an ad to a user
Calculated per ad-slot in a request
Minimize impact to user engagement
Long-running holdout experiments to evaluate impact of ad load on users
Requires valuing user activity in dollars
Minimize cost of computation
More relevant ads cost more money
Straightforward trade-off
Hydrating the timeline

Twitter ad requests served
every second
Here’s where our Kona coffee
beans are grown on the Big Island
in Hawaii. Feast your eyes!
The Barista Bar
@baristabar
Promoted
RETWEETS27 LIKES27
XXXk
YYYk
Some numbers
Active ad campaigns
80 ms to pick ads for timeline

Twitter’s Revenue
Science Production
Stack

Lifecycle of an ad request
● Serving: Receive ad request from app
● Data: Look-up data about the user
● Targeting: Identify eligible campaigns
● Prediction: Calculate probability of relevance
● Marketplace: Choose the best ads
● Serving: Send the ad to the app
● Formats: Render the ad as an impression
● Feedback: Did the user engage click?
● Billing: Charge advertiser for engagements

Data Targeting Prediction Marketplace
● Twitter product: Follow-graph, Tweet engagements, user behavior
● Advertisers: Website browsing behavior, in-app behavior
● Partners: Data aggregators, 3rd party data sets
And Twitter has much more data than that.

● To predict user behavior, need:
○ Audiences
■ Custom - advertiser site visitors
■ Tailored - advertiser lists of emails or Twitter handles
■ Modeled - based on interests
○ Features of interactions with
■ Twitter product
■ Ads
■ Advertiser sites
■ Advertiser applications
Example: serving user data

Eventbus
HDFS
Manhattan
RO
Nighthawk
RW
Serving
layer
Adserver
Scalding
Summingbird
kafka redis
cassandra
voldemort
hadoop
Serving fast data: lambda architecture
Twitter
client
Adv.
tags
Adv.
apps Storm

● Advertisers specify the types of
people they want to target
○ User Attributes
(age, gender, geo, device, etc.)
○ User Interest
(topics, keywords, handles,
events, tv shows, etc.)
● Many user attributes are modeled
● Campaign eligibility is binary
(for now)

● Age, gender, geo, etc. - logistic regression models applied offline
● Interest modeling
○ Seems straightforward, but:
■ What does it mean to target “interests”?
■ How to structure Twitter data?
■ How can we handle scale?
○ Three applications
■ Follower interest
■ Keyword interest
■ Topic interest
Reaching a relevant audience

● Most organic interest targeting
○ People follow handles
corresponding to their interests
○ Handles perform well for
advertisers
● Features: cosine similarity between
handles, local topology from social
network, etc.
● LR model predicts probability of
future links being added to social
network
Follower targeting

● Versatile & flexible
● But ambiguous & limited reach
● CBOW model embeds n-grams into 300d latent space
● Cosine similarity -> semantic similarity
Keyword targeting

● Organized into taxonomy
○ XX parent topics (“music”, “sports”)
○ YYY children (“jazz”, “basketball”)
● No ambiguity, large reach
● Random forest classifier
● Survey users for labeled data
● Features:
○ Of topic
○ Of user
○ Cross of topic and user
Topic interest

Poll: which campaign performs best?

● Prediction: What is the probability of an engagement if we show this ad?
● Learns P(click | user, ad, t)
● Uses state of the art home-rolled ML platform (Lolly)
● Full prediction is expensive
○ XXXX dense user features
○ Sparse features hashed into 2^28 space
○ YYY ad features
○ Global features
○ Crosses of user x ad features

● Lolly was built to replace Vowpal Wabbit
○ Distributed online learning
○ SGD LR
● Powerful: adaptive learning rates, variety of loss functions, regularization,
importance weighting, adaptive normalization, contextual bandits, warm
start
● Feature representation: GBDT, transforms
● Memory management: controls model size, will always hash into RAM
● 5x faster than VW online service
Lolly - Twitter’s ML infrastructure

● P(engagement) prediction can be very good, but also is very expensive
● Balance performance and cost with tunable heuristics
Ad request
Filtering
Light pCTR
Eligibility
Full pCTR(ad, placement) tuples
~100s
~50
~1000
A LOT
Balancing revenue with compute

AdMixer
Ad request
(ad, placement) tuples
Prediction in production
AdMixer AdMixer AdMixer
AdMixer
AdShard
Filtering
Light pCTR
Eligibility
Full pCTRPrediction
Service
batch
online

● Ads ecosystem is non-stationary - ADAM
● Online regularization - L1 with lazy updates
● Importance weighting - gradient scaling
● Delayed feedback
● Explore exploit
● Hyperparameter tuning
● Model calibration
● Distributed systems break
Challenges of online learning

● Core marketplace problem: translate P(engage) to bid, run an internal
auction to determine (ad, position) tuples
● Complexity arises from:
○ Different bid types
○ Different bid optimizations
○ Campaign pacing
○ User experience

eCPA1
= 5% link click chance * $4 link click bid = $0.20
eCPA2
= 30% video view * $0.50 video view bid = $0.15
Second price: bid needed to win = $0.15 / 5% = $3 charge on link click
Many advertisers bidding for different events, competing for the same slots.
Auction dynamics

Actually not constant bids, twitter offers bidding strategies:
○ Target bidding: guarantee daily spend / clicks <= target
○ Conversion optimization: e.g. pay for click “optimize” for app installs
○ Budget pacing: get the most clicks on a fixed budget
Billions of auctions a day, each with 10k+ ads, huge distributed system, ...
Auction complexities

p99
80ms

The Rest of Twitter
Data science everywhere

“As I look into 2017 and we look at what we can do, I just think the
superpower we really provide the world is we
can break news and get information to people
faster than any other service in the world. And in order to do that,
people have to do just a ton of work right now to dig through
everything that may not matter to them to find something that really
does. And that’s why I am excited about really making sure that we
apply artificial intelligence and machine learning
in the right ways and that we really meet that superpower of
being that little bird that told you something that you couldn’t find
anywhere else.”
-Jack Dorsey, Twitter CEO

Consumer Twitter product
Ranked timeline, who to follow, notifications, recommended tweets, moments, …
Cortex (deep learning science team)
Semantic tagging, image tagging, video tagging, NSFX, …
Abuse
Inappropriate content, user penalty box, safe search, ...
Identity
Deterministic & probabilistic identifier bridging
And others
Machine Learning at Twitter
Outside of ads

careers.twitter.com
Insanely great team
Doing hard things
Uncommon opportunity for impact
ksimmons@twitter.com
ngorski@twitter.com

Nicholas Gorski: Real-time revenue science at Twitter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nicholas Gorski: Real-time revenue science at Twitter

Similar to Nicholas Gorski: Real-time revenue science at Twitter (20)

Recently uploaded

Recently uploaded (20)

Nicholas Gorski: Real-time revenue science at Twitter