Recent advances in computational advertising

Recent advances in computational advertising:
design and analysis of ad retrieval systems

Evgeniy Gabrilovich
g @y
gabr@yahoo-inc.com

1

What is “Computational Advertising”?

• A new scientific sub-discipline that provides the
foundation f building online ad retrieval platforms
f d i for b ildi li d i l l f
– To wit: given a certain user in a certain context,
find the most suitable ad

• At the intersection of
– Large scale text analysis
– Information retrieval
– Statistical modeling and machine learning
– Optimization
– Microeconomics

2
© Yahoo! Research 2010 Technologies described might or might not be in actual use at Yahoo!

Computational Advertising at
Yahoo! Research

3
© Yahoo! Research 2010

Online advertising spending

4

Textual advertising

1.
1 Ads driven by search keywords –
Sponsored Search (a.k.a. “keyword driven
ads”, “paid search”, etc.)
, p , )
2. Ads directly driven by the content of a web
page – Content Match (a k a “context
(a.k.a. context
driven ads”, “contextual ads”, etc.)

Textual advertising on the Web is strongly related
to NLP and information retrieval
5

Sponsored search
Text-based
Text based ads driven by a keyword search

6

Content match ads
Text-based ads driven by the page content

Content
C t t
match
ads

7

Anatomy of an ad

Bid phrases: {SIGIR 2010,
computational advertising,
advertising
Evgeniy Gabrilovich, ...}
Bid: $0.10
Title
Creative
Display URL

Landing URL:
http://research.yahoo.com/t
utorials/sigir10_compadv

Landing page 8

So when do advertising dollars
actually change hands?

– CPM = cost per thousand i
t th d impressions
i
• Typically used for graphical/banner ads
(brand advertising)
– CPC = cost per click
p
• Typically used for textual ads
– CPT/CPA = cost per transaction/action
a.k.a. referral fees or affiliate fees

9

Beyond keyword matching

• Matching ads is relatively simple for explicitly bid keywords
What about queries on which there are no bids ?
– Advertisers should be able to bid on “broad queries” and/or
“concept queries”
– Advertisers need volume – the total amount of searches on bid
phrases is not enough !

• Suppose your ad is “Good prices on Seattle hotels”
Good hotels
• Naïve approach: bid on any query that contains the word Seattle
• Problems
• “Seattle's Best Coffee Chicago”

• “Alaska cruises start point”
• Ideally: bid on any query related to Seattle as a travel destination
10

The old school:
heuristic ad matching

• Sponsored search
p
– Exact match between the query and the bid phrase
of the ad (modulo simple normalization, e.g.,
stemming)
– Advertisers cannot possibly bid on all relevant
queries (especially rare ones)
• Use advanced match (e.g., through query-to-query rewrites)
• Content match
– Extract bid phrases from pages, thus reducing the
problem to exact match
 Both essentially perform record lookup

11

The old school (cont’d)
Query
Abbey Road
lyrics
Front end

Simplistic Query rewriting module
query Query
expansion Query rewrites

Ignoring (or
underusing) Exact match
the multitude
of information
available
il bl Candidate ads

Revenue
reordering
d i Ad slate

12

The new approach:
knowledge based
knowledge-based ad retrieval

• Ad indexing and scoring based on all the information
available (bid terms, title, creative, URL, landing page, ...)
– Similar to document indexing in IR
• Use standard IR tools (text preprocessing – tokenization, stemming,
entity extraction; inverted indexes etc.)
– Use multiple features of the query and the ad

• Elaborate query expansion

• 2nd pass relevance reordering (
l d i (re-ranking)
ki )
– Using features not available to the 1st pass model (e.g., set-level
features, click history)

13

The new approach (cont’d)
Query Miele
Front end

<Miele, appliances, kitchen,
Ad query “appliances repair”, “appliance parts”,
appliances repair appliance parts
Rich query generation Business/Shopping/Home/Appliances>

Ad query

The hidden Ad search engine
parts of ads
(bid phrases +
landing pages) First
Fi pass
allow us to retrieval
augment the
ads (cf. query Relevance
expansion) reordering Revenue
reordering Ad slate
Candidate
© Yahoo! Research 2010 ads 14

Research How to Should we
How to
questions index the
select
relevant
show ads
Can we generate bid ad corpus? at all?
ads?
phrases (or even
entire ad campaigns)
automatically?

What is the
Wh t i th
interplay between
the organic and
sponsored
p
results?

Should
Sh ld we
use the
landing for
indexing?
g Can we optimally
p y
choose the
landing page?
15

How to
select
relevant
ads?

Feature generation for
improved ad retrieval
(SIGIR 2007 w. B d et al.;
2007, Broder t l
ACM TWEB 2009, Gabrilovich et al.)
)

16

Query classification using
Web search results

• Humans often find it hard to readily see what the
y
query is about …
– But they can easily make sense of it once they look at
the
th search results…
h lt
• Let computers do the same thing
– Infer the query intent from the top algorithmic search
q er
results (“pseudo relevance feedback”)
• Classify search results (either summaries or full pages)
• Let these results “vote” to determine the query class(es) in a
large taxonomy of commercial topics
• Our goal: Construct additional features to retrieve better ads

17

Example: ex560lku

CATEGORIES
1. Computing/Computer/
Hardware/Computer/Peri-
pherals/Computer
Modems

18

If we know it is about actiontec usb modem
then we have plenty of ads …
p y

19

Our approach

Traditional approach:

Insufficient
Query Classifier
data 

Our approach:
Very large
scale
Query
y Search engine

Search results Pre-classify
all pages
Using Web just once !
Classifier as external
knowledge
20

Research questions

Number
of search
Snippets or
results to
full pages?
obtain

Number f
N b of
classes per
search result

Aggregation:
bundling or voting?

21

The effect of using Web search results

22

Beyond the bag of
B d th b f
words: matching
textual ads in the
enriched feature space
(
(SIGIR 2007, Broder et al.;
, ;
CIKM 2008, w. Broder et al.)

23

What can we do about non-English queries ?
(iNEWS @ CIKM 2008, w. Wang et al.;
WSDM 2009, w. W
2009 Wang et al.)
t l)

• Developing a taxonomy and building a query
classifier for every language is prohibitively
expensive
• Solution: apply off-the-shelf MT to the
search results in the source language
g g
Machine
Translation
Very short
text  Sufficiently
long text 

24

The effect of query expansion
prior to applying MT.
MT

The gap for
infrequent
queries is wider

Baseline = translate
the
th query ( i MT)
(using MT),
then classify the result
as an English query
(Head) (Tail)
more frequent less frequent
25

How to
index the
ad corpus?

The Anatomy of an ad:
Structured indexing and retrieval
for sponsored search
(WWW 2010, w. Bendersky et al )
2010 w al.)

26

Structure of online ad campaigns: the
ad schema
Advertiser

New Year deals on
Buy appliances on lawn & garden tools Account 1 Account 2 …
Black Friday

Kitchen appliances Campaign Campaign
…
1 2

Ad group Ad group
…
1 2

Creatives Ad Bid phrases Can be just a single
bid phrase, or
thousands of bid
Brand name appliances { Miele, phrases (which are
Compare prices and save money KitchenAid, not necessarily
www.appliances-r-us.com Cuisinart, …} topically coherent)
27

Implications of the campaign
structure

• What is the appropriate indexing unit?
g
– Cartesian product of creatives and bid phrases? Ad group?

• Leveraging information from higher levels to address data sparsity
at children nodes

• What is the right approach to document length normalization?
– Large variability of document lengths
– Probability of shorter documents (smaller ad groups) to be retrieved is
higher than their probability of being relevant

• How to index and score templated ads?
p

• Prior work mostly considered ads as independent atomic units and
ignored hierarchical campaign structure
g p g

28

Possible approaches

1. Term index (Cartesian product of all creatives and bid terms)
• Huge index, small focused documents

2. Creative index (a creative is coupled with all the bid terms in
the ad group)
• Two-stage retrieval (first choose the creative, then pick the term)
• Bid terms are duplicated across creatives

3. Ad group index
• Indexing units are entire ad groups
• Three stage retrieval (first choose
Three-stage
the ad group, then the creative,
and finally pick the term)
• M t compact index
Most ti d

29

Retrieval speed vs. relevance

Term index yields most relevant
ads, yet is least efficient (20x slower
than the ad group index)
Are we trading
effectiveness
for efficiency ?

Ad group index is most efficient
(2x faster than creative index), yet
least effective

30

Using learning to rank techniques:
structured re-ranking
re ranking
• Step 1: Retrieve an initial set of candidates using the ad group index

• Step 2: Re-rank the candidate set using structural features (instead of
ignoring the structure and scoring creatives and terms independently)
– Ad group score, creative-term pair score
g p , p
– # bid terms in the ad group
– Unigram entropy (cohesiveness)
of the ad group
– Ratio of query words covered
by the ad group text
– Fraction of the titles / terms /
URLs that contain at least
one query term
– Other features are possible !

feature functions
31

Re-ranking retrieval performance

nDCG@5 Len 1 Len 2-3 Len 4+
(143 queries)
i ) (443 queries)
i ) (187 queries)
i )
Term index 0.841 0.716 0.656
Structured
St t d 0.849
0 849 0.731
0 731 0.686
0 686
re-ranking (+ 0.95%) (+ 2.1%) (+ 4.6%)

• Structured re-ranking is superior
for all query lengths
• Most notable improvements are
obtained for longer queries
• Still very efficient!
32

To swing or not to swing: learning when (not)
to advertise (CIKM 2008, w. Broder et al.)

Should we
• Repeatedly showing non-
non show ads
relevant ads can have at all?
detrimental long-term effects

• Want to be able to predict
when (not) to show individual
ads or a set of ads (“swing”)
( swing )

• Modeling actual short- and
long-term costs of showing
f
non-relevant ads is very
difficult

33

Thresholding approach

• Decision made on individual ads based on
ad scores
– Set a global score threshold
– Only retrieve ads with scores above it
– If none of the ad scores are above the
threshold, then no ads are shown (“no swing”)

• Scores are not necessarily comparable
across queries!
q

34

Machine learning approach

• Decision made on sets of ads based on a
variety of features
– Learn a binary prediction model (“swing” /
( swing
“no swing”) for sets of ads
– If we swing, then all ads are retrieved
swing
– If we do not swing, then no ads are retrieved
• F t
Features d fi d over sets of ads, rather
defined t f d th
than individual ads

35

Features

• Relevance features
– Word overlap, cosine similarity between ad and query/page
• Vocabulary mismatch features
– Translation models
– PMI between query/page terms and bid terms
• Ad-based features
– Bid price ( g
p (higher bids may indicate better ads)
y )
• Result set cohesiveness features
– Coefficient of variation of ad scores (std/mean)
– Result set clarity
• If the set of ads is very cohesive and focused on 1-2 topics, the
relevance language model is very different from the collection
model
– Entropy

36

What h
Wh t happens after an ad click?
ft d li k?
Quantifying the impact of landing
y g p g
pages in Web advertising
(CIKM 2009 w. B k et al.)
2009, Becker t l )

Can we
optimally
choose the
landing p g
g page?

37

Conceptually: context transfer

Search engine result p g
g page
Click!

Landing page

User’s activity
on th
the
advertiser’s
Conversion Web site
(e.g., purchase of the
product or service
being advertised) 38

All landing pages are not created equal
(and neither are the corresponding conversion rates)

• We propose a concise taxonomy of landing page types:
I. Homepage (25%) – top-level page of the advertiser’s site
(e.g., Verizon.com)
II. Category browse (37.5%) – main page of a sub-section of
sub section
the advertiser’s site, which describes a category of related
products
III. Search transfer (26%) – search within the advertiser’s site
( )
OR on other Web sites
IV. Other (11.5%) – terminal pages (e.g., promotion pages or
forms)

39

Examples: Homepage

40

Examples: Category browse

41

Examples: search transfer

42

Landing page classifier
• Features: bag of words, HTML patterns
– [ST] “
“search results”, “f
h lt ” “found”
d”
– [CB] “Home > Verizon > LG phones”
– [HP] HTML overlap between given URL and base URL
– [O] ratio of form elements to text, few outgoing links
• Accuracy on the pilot dataset (10-fold xval): 83%
• Accuracy on additional 100 labeled pages: 80%

• Distribution of landing p g types in a set of 20,000
g page yp
landing pages from Yahoo! Toolbar logs:
Homepage Search Category Other
Transfer Browse
34.4% 22.3% 36.0% 7.3%
43

Using the landing page taxonomy

Picking the right landing page
type for each ad

Improving the conversion rate

Improving advertisers’ ROI !

44

Landing page type usage vs. conversion:
breakdown by query frequency

Navigational Category and search
transfer become more
queries
p p
popular for rare q
queries

Observed conversion rates are in
sharp contrast with usage frequency
of the different page types 45

Landing page type usage vs. conversion:
b ea do
breakdown by query price
que y p ce
Category and search
transfer are dominant
for cheaper queries
p q

As the price goes up, so
does the conversion rate
(higher quality pages?) 46

What is the
interplay between
p y
the organic and
sponsored results?

Competing for users’ attention:
On the interplay between organic and
sponsored search results
(WWW 2010, w. Danescu-Niculescu-Mizil et al )
2010 w Danescu Niculescu Mizil al.)

47

The interplay between ads and
organic results
“... in an information-rich world, the wealth of information means a
dearth of something else: a scarcity of whatever it is that
information consumes. What information consumes is rather
obvious: it consumes the attention of its recipients. Hence a
wealth of information creates a poverty of attention and a
need to allocate that attention efficiently among the
overabundance of information sources that might consume it.”
-- Herbert Simon, “Designing Organizations for an Information-Rich
World”, 1971.
,

• Is there competition for clicks between ads and organic results ?
• Do users prefer ads that are similar to the organic results, or do
they prefer diversity ?

 We found that the nature of this interplay depends
on the type of the query

48

Relation between the CTR of ads
and the CTR of organic results

• Negative correlation (
g (competition)
p )
– Users are only willing to spend limited time and effort on
each query
• P iti correlation (d
Positive l ti (depends on th quality of
d the lit f
results)
– Easy query ( online radio”) – decent ads and organic
(“online radio )
results – clicks on both
– Hard query (“who is giving this talk?”) – poor results on
both sides – no clicks on either
• Independence (null hypothesis)
– Users consider ads and organic results as two
g
independent sources of information
49

Findings:
competition + positive correlation

50

Decoupling the forces

• Users are willing to invest limited effort in
g
each query  competition
• In order to single out the competition effect, we
g
tried to explicitly model the amount of effort
the user is willing to invest
• L
Low effort = navigational queries [B d 2002]
ff i i l i [Broder,
(27% of queries)
– “Pandora radio”, “Bank of America
Pandora radio Bank America”
• High effort = non-navigational queries
– “Meaning of life , “academia vs industry”
Meaning life” academia vs. industry

51

Competition clearly exists for
navigational queries

We also examined different
degrees of navigationality:
the less navigational the query
is, the less competition we
observed

52

Another viewpoint:
Do users prefer ads that are more similar to
the organic results or more diverse ads?

• Both have been argued for in prior work
• Preference for similarity
– Ads are more likely to be relevant
– This assumption is often made in query
expansion f advertising [B d et al., 2008]
i for d ti i [Broder t l
• Preference of diversity
– Diversity among organic search results has
often been shown to be desirable (e.g., entire
session on di
i diversity @ WWW 2010)
it
53

We found evidence for users’ preferring
bot d e s ty a d s
both diversity and similarity
a ty

So we need to
dig deeper
again ...

Overlap measured
using the Jaccard
coefficient
between titles of
ads and organic
results 54

Let’s break down
by navigationality again

55

Break down by navigationality
(cont d)
(cont’d)

56

Counterintuitive ?

57

Responsive and incidental ads

• Responsive ads directly address the user s
user’s
information need
– More likely to be similar to the organic results
• Incidental ads are only somewhat related to the
user’s information need
– Unreasonable as organic results but ok for ads
results,
– More likely to be different from the organic results

• Example: query = “free internet radio
free radio”
– Responsive: “Pandora Internet Radio”
– Incidental: “Discount Bose Computer Speakers”
Discount Speakers

58

Now it all make sense ...

Using the features
that quantify this
interplay,
we improved the
accuracy of CTR
prediction by 5%

59

Summary

1.
1 The financial scale is huge
2. Advertising is a form of information
3. Finding the “best ad” is an information
retrieval problem
 Multiple, possibly contradictory utility functions
 Classical IR needs significant adaptation
4. The optimal solution requires extensive
g
use of external knowledge

60

Thank
Th k you!
!
gabr@yahoo-inc.com

http://research.yahoo.com/~gabr

61

This talk is Copyright Yahoo! 2010.
Yahoo! d th A th
Y h ! and the Author retain all rights, including
t i ll i ht i l di
copyright and distribution rights. No publication or
further distribution in full or in part is permitted
without explicit written permission.

The opinions expressed herein are the responsibility
of the author and do not necessarily reflect the
opinion of Yahoo! Inc.

This talk benefitted from the contributions of many
colleagues and co-authors at Yahoo! and elsewhere.
Their help is gratefully acknowledged.
62

Recent advances in computational advertising

Recommended

Recommended

More Related Content

Similar to Recent advances in computational advertising

Similar to Recent advances in computational advertising (20)

More from yaevents

More from yaevents (20)

Recent advances in computational advertising