More likely toclick
Logistic Regression
(Maximum entropy)
Sum of traf
fi
c properties
Less likely to click
Pr(y = 1|x) =
1
1 + exp(−wTx)
Softmax of binary (1/0) output
Find w thatminimizes the negative log likelihood (w/ L2 regularization)
Control model complexity
NLL for logistic regression
arg min
w
n
∑
i=1
log(1 + exp(−yiwT
xi)) +
λ
2
∥w∥2
2
Registration #
Activity/Service Log
Gender,Age Far far ago
Naive Bayes (GA)
Ad Feedback (Click)
Mapping & Counting (Interest)
Clustering (k-means)
Topic Modeling (LDA)
FM & DNN
Subscription (Channel)
32.
Feature Embeddingwith DimensionalityReduction
• Reliability / Speed / Scalability
• Robustness (+) vs Information loss (-)
• Abstraction (anonymity) vs Less interpretability (-)
Lessons learned
• 30 ~ 50 topics enough
• Multiple sources in one embedding? Not work properly
• How to retain previous dimension structure (topic semantics)
- Syntactic hashing (short term) and re-training (long term)
Prediction Layer
Embedding Layer2
Soft max = Logistic Regression
Deep Aggregate Embedding
(Dimensionality reduction / projection)
Embedding for each features
(Raw data to numerical vectors)
Embedding Layer 1
36.
𝛔
Prediction
Pr(Y = 1|X)
Deep & Cross Embedding
Primitive Embedding
Demography
AD response
Subscription
AD
Pooling & Concat.
Research / AcademiaProduction / Industry
Maximize Accuracy Maximize f(I, S, E, …)
subject to
Accuracy > X
Reliability & Robustness
41.
- Scale up& out
- Slim model
- Simple architecture
- Few #hidden layers & nodes
- Limited features —> incremental model
- Starport (C++) (vs deployment time)
- Candidate generation
- Hybrid (O
ff
-Heavy + On-Light)
Training Time
-> Model update delay
-> Lack of recency
Inference Time
-> Time-out (No Ad)
Daily tra
ffi
c: 1,000,000
Avg(eCPM):2,000
Conversion/Tra
ffi
c: 0.01%
Daily budget: 1,000,000
Avg(pCTR): 1%
BAcpc: 100? 200? 500?
Ryan LLC
RUN with RYAN
Gift for YOU
Buyone get one free
Shop Now
It’s Travel Time
Refresh yourself. Booking
Congratulations!
Happy birthday~~ Purchase
Male or young
Outdoor activity
Rider
Potential customers
53.
Inventory buying Audiencebuying
Static Info.
• Gender, age, region
• Interest
Context
• Placement (inventory)
• Current time & location
• Device / OS
• Wi
fi
/ Cellular
Custom
• Upload customers
• Inclusive / Exclusive
Dynamic (behavior) Info.
• Site visit
• Product (Page) view
• Keyword query
• Category
• Cohort
LookALike
E
ff
ective & Coverage
It’s Travel Time
Refreshyourself. Booking
90% tra
ffi
c pCTR
pCTR’ = pCTR +
𝜶
Random bucket MAB
(Multi-armed bandit)
Thompson sampling
Posterior
Observed
10% tra
ffi
c
make unstable to make stable
78.
Cold-start and Exploration
—Random bucket
— Thompson sampling
— Stochastic feature augmentation (drop-out)
— Transfer learning (with hierarchy)
— Model initialization
— Semantic embedding (learning to hash)
— Jitter (tie-breaking)
Explore to get more training data
Proximity
79.
Negative Feedback
• Hide(Do Not Show Ads)
• AdBlock
• DNT (Do Not Track) / LMT (Limit Ad Tracking)
• ITP / ATT
• NDNC (No Response)
• Abusing / Fraud
Auction with ReservePrice
No Bid
Win
Win
2nd price
2nd price
Win
2nd price
Win
Win & 1st price
Auction with Hard Bid Floor Auction with Soft Bid Floor
No Ad
Data Overload &Imbalance
Millions of clicks over billions of impressions
Negative downsampling (
𝞈
) q =
p
p +
1 − p
ω
Clicked
Not clicked
96.
Research O
ffl
ine TestOnline Test Production
• Model validity
• Log-loss, RIG
• Simulation
• Validity & revenue
• CTR, calibration
• 0 Bucket
Problem & ideation Complexity & Stability
97.
Random
A’
B
C
D
A
• 5 ~10%
• Exploration (i.e., cold-start), serving-unbiased, reference (worst case)
• Main bucket (control group)
• Current serving version
• Identical model to main bucket
• To check the e
ff
ect of serving bias
• Do not reject null hypothesis (A = A’)
• Test bucket (treatment group)
• 10% (up-to 50%, except random bucket)
• Hours to weeks
• Buckets are randomly assigned to users or tra
ffi
c.
• User-based buckets are periodically re-assigned.
• B’?
Rank by Group/Adv
Rankby Creative
BA * pCTR | Targeting(1/0)
Group Creative
BA * pCTR(G)
MAB or Generate
CTR, RPM (5~10%p lift)
Calibration -> bucket size
Ad Automation
• UserResponse Prediction
• Auto-Targeting (Performance)
• AutoBid
• Creative Generation (DCO/Gen)
• Set Objectives
• Budget Setting
• (Agent?)
• Go or Stop
• Nothing to do