SlideShare a Scribd company logo
1 of 62
Download to read offline
Recent advances in computational advertising:
 design and analysis of ad retrieval systems



            Evgeniy Gabrilovich
             g @y
             gabr@yahoo-inc.com




                                                1
What is “Computational Advertising”?


     • A new scientific sub-discipline that provides the
       foundation f building online ad retrieval platforms
       f     d i for b ildi       li   d     i   l l f
         – To wit: given a certain user in a certain context,
           find the most suitable ad

     • At the intersection of
        – Large scale text analysis
        – Information retrieval
        – Statistical modeling and machine learning
        – Optimization
        – Microeconomics

                                                                                                2
© Yahoo! Research 2010   Technologies described might or might not be in actual use at Yahoo!
Computational Advertising at
                     Yahoo! Research




                                                    3
© Yahoo! Research 2010
Online advertising spending




                                                4
© Yahoo! Research 2010
Textual advertising

     1.
     1 Ads driven by search keywords –
        Sponsored Search (a.k.a. “keyword driven
        ads”, “paid search”, etc.)
            , p            ,     )
     2. Ads directly driven by the content of a web
        page – Content Match (a k a “context
                                (a.k.a. context
        driven ads”, “contextual ads”, etc.)

      Textual advertising on the Web is strongly related
      to NLP and information retrieval
                                                           5
© Yahoo! Research 2010
Sponsored search
                  Text-based
                  Text based ads driven by a keyword search




                                                              6
© Yahoo! Research 2010
Content match ads
                  Text-based ads driven by the page content




  Content
  C t t
   match
    ads

                                                              7
© Yahoo! Research 2010
Anatomy of an ad

                                      Bid phrases: {SIGIR 2010,
                                      computational advertising,
                                                    advertising
                                      Evgeniy Gabrilovich, ...}
                                      Bid: $0.10
           Title
  Creative
Display URL


       Landing URL:
       http://research.yahoo.com/t
       utorials/sigir10_compadv

                                          Landing page             8
© Yahoo! Research 2010
So when do advertising dollars
                         actually change hands?

          – CPM = cost per thousand i
                     t     th     d impressions
                                           i
               • Typically used for graphical/banner ads
                 (brand advertising)
          – CPC = cost per click
                       p
               • Typically used for textual ads
          – CPT/CPA = cost per transaction/action
            a.k.a. referral fees or affiliate fees

                                                           9
© Yahoo! Research 2010
Beyond keyword matching

    • Matching ads is relatively simple for explicitly bid keywords
      What about queries on which there are no bids ?
         – Advertisers should be able to bid on “broad queries” and/or
           “concept queries”
         – Advertisers need volume – the total amount of searches on bid
           phrases is not enough !

    • Suppose your ad is “Good prices on Seattle hotels”
                          Good                   hotels
    • Naïve approach: bid on any query that contains the word Seattle
    • Problems
          • “Seattle's Best Coffee Chicago”

              • “Alaska cruises start point”
    • Ideally: bid on any query related to Seattle as a travel destination
                                                                             10
© Yahoo! Research 2010
The old school:
                  heuristic ad matching

     • Sponsored search
        p
          – Exact match between the query and the bid phrase
            of the ad (modulo simple normalization, e.g.,
            stemming)
          – Advertisers cannot possibly bid on all relevant
            queries (especially rare ones)
               • Use advanced match (e.g., through query-to-query rewrites)
     • Content match
          – Extract bid phrases from pages, thus reducing the
            problem to exact match
      Both essentially perform record lookup

                                                                              11
© Yahoo! Research 2010
The old school (cont’d)
                             Query
                                                                  Abbey Road
                                                                    lyrics
                          Front end

        Simplistic                       Query rewriting module
        query                Query
        expansion                              Query rewrites


     Ignoring (or
     underusing)          Exact match
     the multitude
     of information
     available
         il bl           Candidate ads


                           Revenue
                          reordering
                              d i        Ad slate


                                                                               12
© Yahoo! Research 2010
The new approach:
                  knowledge based
                  knowledge-based ad retrieval

    • Ad indexing and scoring based on all the information
      available (bid terms, title, creative, URL, landing page, ...)
         – Similar to document indexing in IR
              • Use standard IR tools (text preprocessing – tokenization, stemming,
                entity extraction; inverted indexes etc.)
         – Use multiple features of the query and the ad

    • Elaborate query expansion

    • 2nd pass relevance reordering (
                 l           d i (re-ranking)
                                        ki )
         – Using features not available to the 1st pass model (e.g., set-level
           features, click history)


                                                                                      13
© Yahoo! Research 2010
The new approach (cont’d)
                           Query                                 Miele
                         Front end

                                               <Miele, appliances, kitchen,
                          Ad query              “appliances repair”, “appliance parts”,
                                                 appliances repair appliance parts
   Rich query             generation            Business/Shopping/Home/Appliances>

                           Ad query

 The hidden          Ad search engine
 parts of ads
 (bid phrases +
 landing pages)           First
                          Fi pass
 allow us to              retrieval
 augment the
 ads (cf. query           Relevance
 expansion)               reordering                 Revenue
                                                    reordering     Ad slate
                                        Candidate
© Yahoo! Research 2010                    ads                                             14
Research                                         How to     Should we
                                       How to
 questions                            index the
                                                   select
                                                  relevant
                                                             show ads
       Can we generate bid           ad corpus?                at all?
                                                     ads?
         phrases (or even
       entire ad campaigns)
          automatically?


                 What is the
                 Wh t i th
             interplay between
              the organic and
                 sponsored
                  p
                  results?


         Should
         Sh ld we
            use the
         landing for
          indexing?
                  g       Can we optimally
                                  p      y
                             choose the
                           landing page?
                                                                         15
© Yahoo! Research 2010
How to
                                                  select
                                                 relevant
                                                    ads?




                     Feature generation for
                         improved ad retrieval
                    (SIGIR 2007 w. B d et al.;
                           2007,   Broder t l
             ACM TWEB 2009, Gabrilovich et al.)
                                              )



                                                            16
© Yahoo! Research 2010
Query classification using
                         Web search results

     • Humans often find it hard to readily see what the
                                          y
       query is about …
          – But they can easily make sense of it once they look at
            the
            th search results…
                     h     lt
     • Let computers do the same thing
          – Infer the query intent from the top algorithmic search
                      q er
            results (“pseudo relevance feedback”)
               • Classify search results (either summaries or full pages)
               • Let these results “vote” to determine the query class(es) in a
                 large taxonomy of commercial topics
     • Our goal: Construct additional features to retrieve better ads

                                                                                  17
© Yahoo! Research 2010
Example: ex560lku




                                         CATEGORIES
                                         1. Computing/Computer/
                                         Hardware/Computer/Peri-
                                         pherals/Computer
                                         Modems




                                                              18
© Yahoo! Research 2010
If we know it is about actiontec usb modem
                  then we have plenty of ads …
                                 p    y




                                                               19
© Yahoo! Research 2010
Our approach

     Traditional approach:

                                          Insufficient
          Query          Classifier
                                          data 

     Our approach:
                                                          Very large
                                                            scale
          Query
              y          Search engine




                         Search results                   Pre-classify
                                                           all pages
                                            Using Web     just once !
                             Classifier     as external
                                            knowledge
                                                                         20
© Yahoo! Research 2010
Research questions



                                                Number
                                                of search
  Snippets or
                                                results to
  full pages?
                                                obtain


                                                Number f
                                                N b of
                                                classes per
                                                search result



                             Aggregation:
                          bundling or voting?

                                                             21
© Yahoo! Research 2010
The effect of using Web search results




                                                            22
© Yahoo! Research 2010
Beyond the bag of
 B      d th b      f
 words: matching
 textual ads in the
 enriched feature space
 (
 (SIGIR 2007, Broder et al.;
            ,              ;
  CIKM 2008, w. Broder et al.)




                                 23
© Yahoo! Research 2010
What can we do about non-English queries ?
                  (iNEWS @ CIKM 2008, w. Wang et al.;
                  WSDM 2009, w. W
                        2009    Wang et al.)
                                      t l)

     • Developing a taxonomy and building a query
       classifier for every language is prohibitively
       expensive
     • Solution: apply off-the-shelf MT to the
       search results in the source language
                                       g g
                                                    Machine
                                                    Translation
     Very short
      text                                                       Sufficiently
                                                                  long text 



                                                                                 24
© Yahoo! Research 2010
The effect of query expansion
                         prior to applying MT.
                                           MT




                                                         The gap for
                                                         infrequent
                                                         queries is wider


                                                        Baseline = translate
                                                        the
                                                        th query ( i MT)
                                                                  (using MT),
                                                        then classify the result
                                                        as an English query
            (Head)                             (Tail)
            more frequent              less frequent
                                                                              25
© Yahoo! Research 2010
How to
                                                 index the
                                                ad corpus?




                    The Anatomy of an ad:
        Structured indexing and retrieval
                         for sponsored search
               (WWW 2010, w. Bendersky et al )
                    2010 w                al.)



                                                             26
© Yahoo! Research 2010
Structure of online ad campaigns: the
                      ad schema
                                                                          Advertiser


                                    New Year deals on
         Buy appliances on         lawn & garden tools   Account 1        Account 2    …
           Black Friday


        Kitchen appliances                   Campaign    Campaign
                                                                               …
                                                1           2



                             Ad group        Ad group
                                                            …
                                1               2



              Creatives       Ad        Bid phrases             Can be just a single
                                                                   bid phrase, or
                                                                 thousands of bid
  Brand name appliances             { Miele,                    phrases (which are
  Compare prices and save money     KitchenAid,                   not necessarily
  www.appliances-r-us.com           Cuisinart, …}               topically coherent)
                                                                                           27
© Yahoo! Research 2010
Implications of the campaign
                  structure

     •   What is the appropriate indexing unit?
                                        g
          – Cartesian product of creatives and bid phrases? Ad group?

     •   Leveraging information from higher levels to address data sparsity
         at children nodes

     •   What is the right approach to document length normalization?
          – Large variability of document lengths
          – Probability of shorter documents (smaller ad groups) to be retrieved is
            higher than their probability of being relevant

     •   How to index and score templated ads?
                                   p

     •   Prior work mostly considered ads as independent atomic units and
         ignored hierarchical campaign structure
          g                      p g


                                                                                      28
© Yahoo! Research 2010
Possible approaches

    1. Term index (Cartesian product of all creatives and bid terms)
           • Huge index, small focused documents

    2. Creative index (a creative is coupled with all the bid terms in
       the ad group)
           • Two-stage retrieval (first choose the creative, then pick the term)
           • Bid terms are duplicated across creatives

    3. Ad group index
           • Indexing units are entire ad groups
           • Three stage retrieval (first choose
             Three-stage
             the ad group, then the creative,
             and finally pick the term)
           • M t compact index
             Most           ti d

                                                                                   29
© Yahoo! Research 2010
Retrieval speed vs. relevance

                                            Term index yields most relevant
                                            ads, yet is least efficient (20x slower
                                            than the ad group index)
                                Are we trading
                                 effectiveness
                                for efficiency ?


     Ad group index is most efficient
     (2x faster than creative index), yet
     least effective

                                                                                      30
© Yahoo! Research 2010
Using learning to rank techniques:
                  structured re-ranking
                             re ranking
 •   Step 1: Retrieve an initial set of candidates using the ad group index

 •   Step 2: Re-rank the candidate set using structural features (instead of
     ignoring the structure and scoring creatives and terms independently)
      – Ad group score, creative-term pair score
            g p         ,                p
      – # bid terms in the ad group
      – Unigram entropy (cohesiveness)
        of the ad group
      – Ratio of query words covered
        by the ad group text
      – Fraction of the titles / terms /
        URLs that contain at least
        one query term
      – Other features are possible !


                                                                  feature functions
                                                                                31
© Yahoo! Research 2010
Re-ranking retrieval performance

     nDCG@5                 Len 1          Len 2-3          Len 4+
                         (143 queries)
                                  i )    (443 queries)
                                                  i )    (187 queries)
                                                                  i )
     Term index             0.841           0.716           0.656
     Structured
     St    t  d             0.849
                            0 849           0.731
                                            0 731           0.686
                                                            0 686
     re-ranking           (+ 0.95%)        (+ 2.1%)        (+ 4.6%)

     • Structured re-ranking is superior
         for all query lengths
     • Most notable improvements are
         obtained for longer queries
     • Still very efficient!
                                                                         32
© Yahoo! Research 2010
To swing or not to swing: learning when (not)
                  to advertise (CIKM 2008, w. Broder et al.)

                                                       Should we
    • Repeatedly showing non-
                           non                         show ads
      relevant ads can have                              at all?
      detrimental long-term effects

    • Want to be able to predict
      when (not) to show individual
      ads or a set of ads (“swing”)
                          ( swing )

    • Modeling actual short- and
      long-term costs of showing
                       f
      non-relevant ads is very
      difficult


                                                                   33
© Yahoo! Research 2010
Thresholding approach

    • Decision made on individual ads based on
      ad scores
         – Set a global score threshold
         – Only retrieve ads with scores above it
         – If none of the ad scores are above the
           threshold, then no ads are shown (“no swing”)

    • Scores are not necessarily comparable
      across queries!
             q

                                                           34
© Yahoo! Research 2010
Machine learning approach

     • Decision made on sets of ads based on a
       variety of features
          – Learn a binary prediction model (“swing” /
                                               ( swing
            “no swing”) for sets of ads
          – If we swing, then all ads are retrieved
                  swing
          – If we do not swing, then no ads are retrieved
     • F t
       Features d fi d over sets of ads, rather
                 defined      t f d        th
       than individual ads

                                                            35
© Yahoo! Research 2010
Features

     • Relevance features
          – Word overlap, cosine similarity between ad and query/page
     • Vocabulary mismatch features
          – Translation models
          – PMI between query/page terms and bid terms
     • Ad-based features
          – Bid price ( g
                p     (higher bids may indicate better ads)
                                     y                    )
     • Result set cohesiveness features
          – Coefficient of variation of ad scores (std/mean)
          – Result set clarity
               • If the set of ads is very cohesive and focused on 1-2 topics, the
                 relevance language model is very different from the collection
                 model
          – Entropy

                                                                                     36
© Yahoo! Research 2010
What h
         Wh t happens after an ad click?
                         ft      d li k?
         Quantifying the impact of landing
                y g         p            g
            pages in Web advertising
                       (CIKM 2009 w. B k et al.)
                             2009,   Becker t l )

          Can we
         optimally
         choose the
       landing p g
             g page?


                                                    37
© Yahoo! Research 2010
Conceptually: context transfer

                   Search engine result p g
                            g           page
                                                  Click!



                           Landing page

                                                      User’s activity
                                                         on th
                                                             the
                                                       advertiser’s
                            Conversion                  Web site
                         (e.g., purchase of the
                           product or service
© Yahoo! Research 2010
                           being advertised)                            38
All landing pages are not created equal
                    (and neither are the corresponding conversion rates)


     •     We propose a concise taxonomy of landing page types:
             I.      Homepage (25%) – top-level page of the advertiser’s site
                     (e.g., Verizon.com)
             II.     Category browse (37.5%) – main page of a sub-section of
                                                                  sub section
                     the advertiser’s site, which describes a category of related
                     products
             III.    Search transfer (26%) – search within the advertiser’s site
                                     (   )
                     OR on other Web sites
             IV.     Other (11.5%) – terminal pages (e.g., promotion pages or
                     forms)




                                                                                    39
© Yahoo! Research 2010
Examples: Homepage




                                       40
© Yahoo! Research 2010
Examples: Category browse




                                              41
© Yahoo! Research 2010
Examples: search transfer




                                              42
© Yahoo! Research 2010
Landing page classifier
     • Features: bag of words, HTML patterns
          – [ST] “
                 “search results”, “f
                       h     lt ” “found”
                                       d”
          – [CB] “Home > Verizon > LG phones”
          – [HP] HTML overlap between given URL and base URL
          – [O] ratio of form elements to text, few outgoing links
     • Accuracy on the pilot dataset (10-fold xval): 83%
     • Accuracy on additional 100 labeled pages: 80%

     • Distribution of landing p g types in a set of 20,000
                             g page yp
       landing pages from Yahoo! Toolbar logs:
                    Homepage      Search     Category      Other
                                 Transfer    Browse
                         34.4%    22.3%        36.0%        7.3%
                                                                     43
© Yahoo! Research 2010
Using the landing page taxonomy

                    Picking the right landing page
                           type for each ad



                    Improving the conversion rate




                         Improving advertisers’ ROI !

                                                        44
© Yahoo! Research 2010
Landing page type usage vs. conversion:
                  breakdown by query frequency

                  Navigational     Category and search
                                  transfer become more
                    queries
                                 p p
                                 popular for rare q
                                                  queries




                                                        Observed conversion rates are in
                                                      sharp contrast with usage frequency
                                                           of the different page types 45
© Yahoo! Research 2010
Landing page type usage vs. conversion:
                     b ea do
                     breakdown by query price
                                   que y p ce
           Category and search
          transfer are dominant
            for cheaper queries
                    p q




                                               As the price goes up, so
                                               does the conversion rate
                                               (higher quality pages?)    46
© Yahoo! Research 2010
What is the
                                          interplay between
                                               p y
                                           the organic and
                                          sponsored results?




         Competing for users’ attention:
      On the interplay between organic and
            sponsored search results
        (WWW 2010, w. Danescu-Niculescu-Mizil et al )
             2010 w Danescu Niculescu Mizil al.)



                                                               47
© Yahoo! Research 2010
The interplay between ads and
                  organic results
     “... in an information-rich world, the wealth of information means a
          dearth of something else: a scarcity of whatever it is that
          information consumes. What information consumes is rather
          obvious: it consumes the attention of its recipients. Hence a
          wealth of information creates a poverty of attention and a
          need to allocate that attention efficiently among the
          overabundance of information sources that might consume it.”
     -- Herbert Simon, “Designing Organizations for an Information-Rich
          World”, 1971.
                 ,

     •   Is there competition for clicks between ads and organic results ?
     •   Do users prefer ads that are similar to the organic results, or do
         they prefer diversity ?

                 We found that the nature of this interplay depends
                             on the type of the query


                                                                              48
© Yahoo! Research 2010
Relation between the CTR of ads
                  and the CTR of organic results

     • Negative correlation (
         g                  (competition)
                                p       )
          – Users are only willing to spend limited time and effort on
            each query
     • P iti correlation (d
       Positive   l ti (depends on th quality of
                             d     the   lit f
       results)
          – Easy query ( online radio”) – decent ads and organic
                          (“online radio )
            results – clicks on both
          – Hard query (“who is giving this talk?”) – poor results on
            both sides – no clicks on either
     • Independence (null hypothesis)
          – Users consider ads and organic results as two
                                      g
            independent sources of information
                                                                         49
© Yahoo! Research 2010
Findings:
                  competition + positive correlation




                                                       50
© Yahoo! Research 2010
Decoupling the forces

     • Users are willing to invest limited effort in
                         g
       each query  competition
     • In order to single out the competition effect, we
                      g
       tried to explicitly model the amount of effort
       the user is willing to invest
     • L
       Low effort = navigational queries [B d 2002]
              ff          i i     l   i [Broder,
       (27% of queries)
          – “Pandora radio”, “Bank of America
             Pandora radio Bank America”
     • High effort = non-navigational queries
          – “Meaning of life , “academia vs industry”
             Meaning life” academia vs. industry

                                                           51
© Yahoo! Research 2010
Competition clearly exists for
                  navigational queries

                                    We also examined different
                                    degrees of navigationality:
                                  the less navigational the query
                                    is, the less competition we
                                              observed




                                                                    52
© Yahoo! Research 2010
Another viewpoint:
                    Do users prefer ads that are more similar to
                    the organic results or more diverse ads?

     • Both have been argued for in prior work
     • Preference for similarity
          – Ads are more likely to be relevant
          – This assumption is often made in query
            expansion f advertising [B d et al., 2008]
                   i for d ti i [Broder t l
     • Preference of diversity
          – Diversity among organic search results has
            often been shown to be desirable (e.g., entire
            session on di
                 i     diversity @ WWW 2010)
                             it
                                                                   53
© Yahoo! Research 2010
We found evidence for users’ preferring
                  bot d e s ty a d s
                  both diversity and similarity
                                          a ty




                                                 So we need to
                                                  dig deeper
                                                    again ...




                                                 Overlap measured
                                                 using the Jaccard
                                                    coefficient
                                                 between titles of
                                                  ads and organic
                                                       results       54
© Yahoo! Research 2010
Let’s break down
                    by navigationality again




                                               55
© Yahoo! Research 2010
Break down by navigationality
                  (cont d)
                  (cont’d)




                                                  56
© Yahoo! Research 2010
Counterintuitive ?




                                              57
© Yahoo! Research 2010
Responsive and incidental ads

     • Responsive ads directly address the user s
                                           user’s
       information need
          – More likely to be similar to the organic results
     • Incidental ads are only somewhat related to the
       user’s information need
          – Unreasonable as organic results but ok for ads
                                    results,
          – More likely to be different from the organic results

     • Example: query = “free internet radio
                         free          radio”
          – Responsive: “Pandora Internet Radio”
          – Incidental: “Discount Bose Computer Speakers”
                         Discount               Speakers

                                                                   58
© Yahoo! Research 2010
Now it all make sense ...

                                         Using the features
                                          that quantify this
                                              interplay,
                                          we improved the
                                         accuracy of CTR
                                         prediction by 5%




                                                               59
© Yahoo! Research 2010
Summary

     1.
     1 The financial scale is huge
     2. Advertising is a form of information
     3. Finding the “best ad” is an information
        retrieval problem
                Multiple, possibly contradictory utility functions
                Classical IR needs significant adaptation
     4. The optimal solution requires extensive
                                g
        use of external knowledge

                                                                      60
© Yahoo! Research 2010
Thank
    Th k you!
            !
     gabr@yahoo-inc.com

http://research.yahoo.com/~gabr




                                  61
This talk is Copyright Yahoo! 2010.
       Yahoo! d th A th
       Y h ! and the Author retain all rights, including
                                   t i ll i ht i l di
      copyright and distribution rights. No publication or
        further distribution in full or in part is permitted
              without explicit written permission.

     The opinions expressed herein are the responsibility
       of the author and do not necessarily reflect the
                   opinion of Yahoo! Inc.

      This talk benefitted from the contributions of many
     colleagues and co-authors at Yahoo! and elsewhere.
             Their help is gratefully acknowledged.
                                                               62
© Yahoo! Research 2010

More Related Content

Similar to Recent advances in computational advertising

Making Pay-Per-Click Search Marketing Work For Your Business
Making Pay-Per-Click Search Marketing Work For Your BusinessMaking Pay-Per-Click Search Marketing Work For Your Business
Making Pay-Per-Click Search Marketing Work For Your BusinessKat Jenkins
 
Making Pay-Per-Click Search Marketing Work for Your Business
Making Pay-Per-Click Search Marketing Work for Your BusinessMaking Pay-Per-Click Search Marketing Work for Your Business
Making Pay-Per-Click Search Marketing Work for Your BusinessSanger & Eby
 
7 Pillars of Digital Strategy Webinar
7 Pillars of Digital Strategy Webinar7 Pillars of Digital Strategy Webinar
7 Pillars of Digital Strategy WebinarArman Rousta
 
Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008Josh (Tzvika) Avnery
 
Search marketing tactics for startups @ Barcelona startup week 2017
Search marketing tactics for startups @ Barcelona startup week 2017Search marketing tactics for startups @ Barcelona startup week 2017
Search marketing tactics for startups @ Barcelona startup week 2017Teodora Curelciuc
 
7 Step Battle Plan Final
7 Step Battle Plan Final7 Step Battle Plan Final
7 Step Battle Plan Finaljasonmking
 
Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008guestb6c0b1
 
Digital Marketing For Startups
Digital Marketing For StartupsDigital Marketing For Startups
Digital Marketing For Startupsspalangala
 
Tracking PPC ROI - June 2010
Tracking PPC ROI - June 2010Tracking PPC ROI - June 2010
Tracking PPC ROI - June 2010WriterAccess
 
Optimizing for Ecommerce: The Dynamic Landscape of SEO
Optimizing for Ecommerce: The Dynamic Landscape of SEOOptimizing for Ecommerce: The Dynamic Landscape of SEO
Optimizing for Ecommerce: The Dynamic Landscape of SEOBe Found Online
 
Inbound Marketing Overview Nov 2012
Inbound Marketing Overview Nov 2012Inbound Marketing Overview Nov 2012
Inbound Marketing Overview Nov 2012Brand Digital, Inc
 
BEST SEO TIPS - Tricks SEO
BEST SEO TIPS - Tricks SEO BEST SEO TIPS - Tricks SEO
BEST SEO TIPS - Tricks SEO vickybish
 
Vikram seo ppt
Vikram seo pptVikram seo ppt
Vikram seo pptvickybish
 
Search engine strategy introduction
Search engine strategy introductionSearch engine strategy introduction
Search engine strategy introductionlaytonhind
 

Similar to Recent advances in computational advertising (20)

Comp module
Comp moduleComp module
Comp module
 
Search 101
Search 101Search 101
Search 101
 
Search + Display Ama
Search + Display    AmaSearch + Display    Ama
Search + Display Ama
 
Making Pay-Per-Click Search Marketing Work For Your Business
Making Pay-Per-Click Search Marketing Work For Your BusinessMaking Pay-Per-Click Search Marketing Work For Your Business
Making Pay-Per-Click Search Marketing Work For Your Business
 
Making Pay-Per-Click Search Marketing Work for Your Business
Making Pay-Per-Click Search Marketing Work for Your BusinessMaking Pay-Per-Click Search Marketing Work for Your Business
Making Pay-Per-Click Search Marketing Work for Your Business
 
7 Pillars of Digital Strategy Webinar
7 Pillars of Digital Strategy Webinar7 Pillars of Digital Strategy Webinar
7 Pillars of Digital Strategy Webinar
 
Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008
 
Search marketing tactics for startups @ Barcelona startup week 2017
Search marketing tactics for startups @ Barcelona startup week 2017Search marketing tactics for startups @ Barcelona startup week 2017
Search marketing tactics for startups @ Barcelona startup week 2017
 
7 Step Battle Plan Final
7 Step Battle Plan Final7 Step Battle Plan Final
7 Step Battle Plan Final
 
7 Step Battle Plan Final
7 Step Battle Plan Final7 Step Battle Plan Final
7 Step Battle Plan Final
 
Introduction to Global SEO
Introduction to Global SEOIntroduction to Global SEO
Introduction to Global SEO
 
Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008Total Seo Branko Rihtman July 22, 2008
Total Seo Branko Rihtman July 22, 2008
 
Digital Marketing For Startups
Digital Marketing For StartupsDigital Marketing For Startups
Digital Marketing For Startups
 
Tracking PPC ROI - June 2010
Tracking PPC ROI - June 2010Tracking PPC ROI - June 2010
Tracking PPC ROI - June 2010
 
Optimizing for Ecommerce: The Dynamic Landscape of SEO
Optimizing for Ecommerce: The Dynamic Landscape of SEOOptimizing for Ecommerce: The Dynamic Landscape of SEO
Optimizing for Ecommerce: The Dynamic Landscape of SEO
 
Inbound Marketing Overview Nov 2012
Inbound Marketing Overview Nov 2012Inbound Marketing Overview Nov 2012
Inbound Marketing Overview Nov 2012
 
BEST SEO TIPS - Tricks SEO
BEST SEO TIPS - Tricks SEO BEST SEO TIPS - Tricks SEO
BEST SEO TIPS - Tricks SEO
 
Vikram seo ppt
Vikram seo pptVikram seo ppt
Vikram seo ppt
 
Uktinw online marketing-july2012
Uktinw online marketing-july2012Uktinw online marketing-july2012
Uktinw online marketing-july2012
 
Search engine strategy introduction
Search engine strategy introductionSearch engine strategy introduction
Search engine strategy introduction
 

More from yaevents

Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...yaevents
 
Тема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, ЯндексТема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, Яндексyaevents
 
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...yaevents
 
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндексi-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндексyaevents
 
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...yaevents
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 

More from yaevents (20)

Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
 
Тема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, ЯндексТема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, Яндекс
 
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
 
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндексi-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
 
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 

Recent advances in computational advertising

  • 1. Recent advances in computational advertising: design and analysis of ad retrieval systems Evgeniy Gabrilovich g @y gabr@yahoo-inc.com 1
  • 2. What is “Computational Advertising”? • A new scientific sub-discipline that provides the foundation f building online ad retrieval platforms f d i for b ildi li d i l l f – To wit: given a certain user in a certain context, find the most suitable ad • At the intersection of – Large scale text analysis – Information retrieval – Statistical modeling and machine learning – Optimization – Microeconomics 2 © Yahoo! Research 2010 Technologies described might or might not be in actual use at Yahoo!
  • 3. Computational Advertising at Yahoo! Research 3 © Yahoo! Research 2010
  • 4. Online advertising spending 4 © Yahoo! Research 2010
  • 5. Textual advertising 1. 1 Ads driven by search keywords – Sponsored Search (a.k.a. “keyword driven ads”, “paid search”, etc.) , p , ) 2. Ads directly driven by the content of a web page – Content Match (a k a “context (a.k.a. context driven ads”, “contextual ads”, etc.) Textual advertising on the Web is strongly related to NLP and information retrieval 5 © Yahoo! Research 2010
  • 6. Sponsored search Text-based Text based ads driven by a keyword search 6 © Yahoo! Research 2010
  • 7. Content match ads Text-based ads driven by the page content Content C t t match ads 7 © Yahoo! Research 2010
  • 8. Anatomy of an ad Bid phrases: {SIGIR 2010, computational advertising, advertising Evgeniy Gabrilovich, ...} Bid: $0.10 Title Creative Display URL Landing URL: http://research.yahoo.com/t utorials/sigir10_compadv Landing page 8 © Yahoo! Research 2010
  • 9. So when do advertising dollars actually change hands? – CPM = cost per thousand i t th d impressions i • Typically used for graphical/banner ads (brand advertising) – CPC = cost per click p • Typically used for textual ads – CPT/CPA = cost per transaction/action a.k.a. referral fees or affiliate fees 9 © Yahoo! Research 2010
  • 10. Beyond keyword matching • Matching ads is relatively simple for explicitly bid keywords What about queries on which there are no bids ? – Advertisers should be able to bid on “broad queries” and/or “concept queries” – Advertisers need volume – the total amount of searches on bid phrases is not enough ! • Suppose your ad is “Good prices on Seattle hotels” Good hotels • Naïve approach: bid on any query that contains the word Seattle • Problems • “Seattle's Best Coffee Chicago” • “Alaska cruises start point” • Ideally: bid on any query related to Seattle as a travel destination 10 © Yahoo! Research 2010
  • 11. The old school: heuristic ad matching • Sponsored search p – Exact match between the query and the bid phrase of the ad (modulo simple normalization, e.g., stemming) – Advertisers cannot possibly bid on all relevant queries (especially rare ones) • Use advanced match (e.g., through query-to-query rewrites) • Content match – Extract bid phrases from pages, thus reducing the problem to exact match  Both essentially perform record lookup 11 © Yahoo! Research 2010
  • 12. The old school (cont’d) Query Abbey Road lyrics Front end Simplistic Query rewriting module query Query expansion Query rewrites Ignoring (or underusing) Exact match the multitude of information available il bl Candidate ads Revenue reordering d i Ad slate 12 © Yahoo! Research 2010
  • 13. The new approach: knowledge based knowledge-based ad retrieval • Ad indexing and scoring based on all the information available (bid terms, title, creative, URL, landing page, ...) – Similar to document indexing in IR • Use standard IR tools (text preprocessing – tokenization, stemming, entity extraction; inverted indexes etc.) – Use multiple features of the query and the ad • Elaborate query expansion • 2nd pass relevance reordering ( l d i (re-ranking) ki ) – Using features not available to the 1st pass model (e.g., set-level features, click history) 13 © Yahoo! Research 2010
  • 14. The new approach (cont’d) Query Miele Front end <Miele, appliances, kitchen, Ad query “appliances repair”, “appliance parts”, appliances repair appliance parts Rich query generation Business/Shopping/Home/Appliances> Ad query The hidden Ad search engine parts of ads (bid phrases + landing pages) First Fi pass allow us to retrieval augment the ads (cf. query Relevance expansion) reordering Revenue reordering Ad slate Candidate © Yahoo! Research 2010 ads 14
  • 15. Research How to Should we How to questions index the select relevant show ads Can we generate bid ad corpus? at all? ads? phrases (or even entire ad campaigns) automatically? What is the Wh t i th interplay between the organic and sponsored p results? Should Sh ld we use the landing for indexing? g Can we optimally p y choose the landing page? 15 © Yahoo! Research 2010
  • 16. How to select relevant ads? Feature generation for improved ad retrieval (SIGIR 2007 w. B d et al.; 2007, Broder t l ACM TWEB 2009, Gabrilovich et al.) ) 16 © Yahoo! Research 2010
  • 17. Query classification using Web search results • Humans often find it hard to readily see what the y query is about … – But they can easily make sense of it once they look at the th search results… h lt • Let computers do the same thing – Infer the query intent from the top algorithmic search q er results (“pseudo relevance feedback”) • Classify search results (either summaries or full pages) • Let these results “vote” to determine the query class(es) in a large taxonomy of commercial topics • Our goal: Construct additional features to retrieve better ads 17 © Yahoo! Research 2010
  • 18. Example: ex560lku CATEGORIES 1. Computing/Computer/ Hardware/Computer/Peri- pherals/Computer Modems 18 © Yahoo! Research 2010
  • 19. If we know it is about actiontec usb modem then we have plenty of ads … p y 19 © Yahoo! Research 2010
  • 20. Our approach Traditional approach: Insufficient Query Classifier data  Our approach: Very large scale Query y Search engine Search results Pre-classify all pages Using Web just once ! Classifier as external knowledge 20 © Yahoo! Research 2010
  • 21. Research questions Number of search Snippets or results to full pages? obtain Number f N b of classes per search result Aggregation: bundling or voting? 21 © Yahoo! Research 2010
  • 22. The effect of using Web search results 22 © Yahoo! Research 2010
  • 23. Beyond the bag of B d th b f words: matching textual ads in the enriched feature space ( (SIGIR 2007, Broder et al.; , ; CIKM 2008, w. Broder et al.) 23 © Yahoo! Research 2010
  • 24. What can we do about non-English queries ? (iNEWS @ CIKM 2008, w. Wang et al.; WSDM 2009, w. W 2009 Wang et al.) t l) • Developing a taxonomy and building a query classifier for every language is prohibitively expensive • Solution: apply off-the-shelf MT to the search results in the source language g g Machine Translation Very short text  Sufficiently long text  24 © Yahoo! Research 2010
  • 25. The effect of query expansion prior to applying MT. MT The gap for infrequent queries is wider Baseline = translate the th query ( i MT) (using MT), then classify the result as an English query (Head) (Tail) more frequent less frequent 25 © Yahoo! Research 2010
  • 26. How to index the ad corpus? The Anatomy of an ad: Structured indexing and retrieval for sponsored search (WWW 2010, w. Bendersky et al ) 2010 w al.) 26 © Yahoo! Research 2010
  • 27. Structure of online ad campaigns: the ad schema Advertiser New Year deals on Buy appliances on lawn & garden tools Account 1 Account 2 … Black Friday Kitchen appliances Campaign Campaign … 1 2 Ad group Ad group … 1 2 Creatives Ad Bid phrases Can be just a single bid phrase, or thousands of bid Brand name appliances { Miele, phrases (which are Compare prices and save money KitchenAid, not necessarily www.appliances-r-us.com Cuisinart, …} topically coherent) 27 © Yahoo! Research 2010
  • 28. Implications of the campaign structure • What is the appropriate indexing unit? g – Cartesian product of creatives and bid phrases? Ad group? • Leveraging information from higher levels to address data sparsity at children nodes • What is the right approach to document length normalization? – Large variability of document lengths – Probability of shorter documents (smaller ad groups) to be retrieved is higher than their probability of being relevant • How to index and score templated ads? p • Prior work mostly considered ads as independent atomic units and ignored hierarchical campaign structure g p g 28 © Yahoo! Research 2010
  • 29. Possible approaches 1. Term index (Cartesian product of all creatives and bid terms) • Huge index, small focused documents 2. Creative index (a creative is coupled with all the bid terms in the ad group) • Two-stage retrieval (first choose the creative, then pick the term) • Bid terms are duplicated across creatives 3. Ad group index • Indexing units are entire ad groups • Three stage retrieval (first choose Three-stage the ad group, then the creative, and finally pick the term) • M t compact index Most ti d 29 © Yahoo! Research 2010
  • 30. Retrieval speed vs. relevance Term index yields most relevant ads, yet is least efficient (20x slower than the ad group index) Are we trading effectiveness for efficiency ? Ad group index is most efficient (2x faster than creative index), yet least effective 30 © Yahoo! Research 2010
  • 31. Using learning to rank techniques: structured re-ranking re ranking • Step 1: Retrieve an initial set of candidates using the ad group index • Step 2: Re-rank the candidate set using structural features (instead of ignoring the structure and scoring creatives and terms independently) – Ad group score, creative-term pair score g p , p – # bid terms in the ad group – Unigram entropy (cohesiveness) of the ad group – Ratio of query words covered by the ad group text – Fraction of the titles / terms / URLs that contain at least one query term – Other features are possible ! feature functions 31 © Yahoo! Research 2010
  • 32. Re-ranking retrieval performance nDCG@5 Len 1 Len 2-3 Len 4+ (143 queries) i ) (443 queries) i ) (187 queries) i ) Term index 0.841 0.716 0.656 Structured St t d 0.849 0 849 0.731 0 731 0.686 0 686 re-ranking (+ 0.95%) (+ 2.1%) (+ 4.6%) • Structured re-ranking is superior for all query lengths • Most notable improvements are obtained for longer queries • Still very efficient! 32 © Yahoo! Research 2010
  • 33. To swing or not to swing: learning when (not) to advertise (CIKM 2008, w. Broder et al.) Should we • Repeatedly showing non- non show ads relevant ads can have at all? detrimental long-term effects • Want to be able to predict when (not) to show individual ads or a set of ads (“swing”) ( swing ) • Modeling actual short- and long-term costs of showing f non-relevant ads is very difficult 33 © Yahoo! Research 2010
  • 34. Thresholding approach • Decision made on individual ads based on ad scores – Set a global score threshold – Only retrieve ads with scores above it – If none of the ad scores are above the threshold, then no ads are shown (“no swing”) • Scores are not necessarily comparable across queries! q 34 © Yahoo! Research 2010
  • 35. Machine learning approach • Decision made on sets of ads based on a variety of features – Learn a binary prediction model (“swing” / ( swing “no swing”) for sets of ads – If we swing, then all ads are retrieved swing – If we do not swing, then no ads are retrieved • F t Features d fi d over sets of ads, rather defined t f d th than individual ads 35 © Yahoo! Research 2010
  • 36. Features • Relevance features – Word overlap, cosine similarity between ad and query/page • Vocabulary mismatch features – Translation models – PMI between query/page terms and bid terms • Ad-based features – Bid price ( g p (higher bids may indicate better ads) y ) • Result set cohesiveness features – Coefficient of variation of ad scores (std/mean) – Result set clarity • If the set of ads is very cohesive and focused on 1-2 topics, the relevance language model is very different from the collection model – Entropy 36 © Yahoo! Research 2010
  • 37. What h Wh t happens after an ad click? ft d li k? Quantifying the impact of landing y g p g pages in Web advertising (CIKM 2009 w. B k et al.) 2009, Becker t l ) Can we optimally choose the landing p g g page? 37 © Yahoo! Research 2010
  • 38. Conceptually: context transfer Search engine result p g g page Click! Landing page User’s activity on th the advertiser’s Conversion Web site (e.g., purchase of the product or service © Yahoo! Research 2010 being advertised) 38
  • 39. All landing pages are not created equal (and neither are the corresponding conversion rates) • We propose a concise taxonomy of landing page types: I. Homepage (25%) – top-level page of the advertiser’s site (e.g., Verizon.com) II. Category browse (37.5%) – main page of a sub-section of sub section the advertiser’s site, which describes a category of related products III. Search transfer (26%) – search within the advertiser’s site ( ) OR on other Web sites IV. Other (11.5%) – terminal pages (e.g., promotion pages or forms) 39 © Yahoo! Research 2010
  • 40. Examples: Homepage 40 © Yahoo! Research 2010
  • 41. Examples: Category browse 41 © Yahoo! Research 2010
  • 42. Examples: search transfer 42 © Yahoo! Research 2010
  • 43. Landing page classifier • Features: bag of words, HTML patterns – [ST] “ “search results”, “f h lt ” “found” d” – [CB] “Home > Verizon > LG phones” – [HP] HTML overlap between given URL and base URL – [O] ratio of form elements to text, few outgoing links • Accuracy on the pilot dataset (10-fold xval): 83% • Accuracy on additional 100 labeled pages: 80% • Distribution of landing p g types in a set of 20,000 g page yp landing pages from Yahoo! Toolbar logs: Homepage Search Category Other Transfer Browse 34.4% 22.3% 36.0% 7.3% 43 © Yahoo! Research 2010
  • 44. Using the landing page taxonomy Picking the right landing page type for each ad Improving the conversion rate Improving advertisers’ ROI ! 44 © Yahoo! Research 2010
  • 45. Landing page type usage vs. conversion: breakdown by query frequency Navigational Category and search transfer become more queries p p popular for rare q queries Observed conversion rates are in sharp contrast with usage frequency of the different page types 45 © Yahoo! Research 2010
  • 46. Landing page type usage vs. conversion: b ea do breakdown by query price que y p ce Category and search transfer are dominant for cheaper queries p q As the price goes up, so does the conversion rate (higher quality pages?) 46 © Yahoo! Research 2010
  • 47. What is the interplay between p y the organic and sponsored results? Competing for users’ attention: On the interplay between organic and sponsored search results (WWW 2010, w. Danescu-Niculescu-Mizil et al ) 2010 w Danescu Niculescu Mizil al.) 47 © Yahoo! Research 2010
  • 48. The interplay between ads and organic results “... in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” -- Herbert Simon, “Designing Organizations for an Information-Rich World”, 1971. , • Is there competition for clicks between ads and organic results ? • Do users prefer ads that are similar to the organic results, or do they prefer diversity ?  We found that the nature of this interplay depends on the type of the query 48 © Yahoo! Research 2010
  • 49. Relation between the CTR of ads and the CTR of organic results • Negative correlation ( g (competition) p ) – Users are only willing to spend limited time and effort on each query • P iti correlation (d Positive l ti (depends on th quality of d the lit f results) – Easy query ( online radio”) – decent ads and organic (“online radio ) results – clicks on both – Hard query (“who is giving this talk?”) – poor results on both sides – no clicks on either • Independence (null hypothesis) – Users consider ads and organic results as two g independent sources of information 49 © Yahoo! Research 2010
  • 50. Findings: competition + positive correlation 50 © Yahoo! Research 2010
  • 51. Decoupling the forces • Users are willing to invest limited effort in g each query  competition • In order to single out the competition effect, we g tried to explicitly model the amount of effort the user is willing to invest • L Low effort = navigational queries [B d 2002] ff i i l i [Broder, (27% of queries) – “Pandora radio”, “Bank of America Pandora radio Bank America” • High effort = non-navigational queries – “Meaning of life , “academia vs industry” Meaning life” academia vs. industry 51 © Yahoo! Research 2010
  • 52. Competition clearly exists for navigational queries We also examined different degrees of navigationality: the less navigational the query is, the less competition we observed 52 © Yahoo! Research 2010
  • 53. Another viewpoint: Do users prefer ads that are more similar to the organic results or more diverse ads? • Both have been argued for in prior work • Preference for similarity – Ads are more likely to be relevant – This assumption is often made in query expansion f advertising [B d et al., 2008] i for d ti i [Broder t l • Preference of diversity – Diversity among organic search results has often been shown to be desirable (e.g., entire session on di i diversity @ WWW 2010) it 53 © Yahoo! Research 2010
  • 54. We found evidence for users’ preferring bot d e s ty a d s both diversity and similarity a ty So we need to dig deeper again ... Overlap measured using the Jaccard coefficient between titles of ads and organic results 54 © Yahoo! Research 2010
  • 55. Let’s break down by navigationality again 55 © Yahoo! Research 2010
  • 56. Break down by navigationality (cont d) (cont’d) 56 © Yahoo! Research 2010
  • 57. Counterintuitive ? 57 © Yahoo! Research 2010
  • 58. Responsive and incidental ads • Responsive ads directly address the user s user’s information need – More likely to be similar to the organic results • Incidental ads are only somewhat related to the user’s information need – Unreasonable as organic results but ok for ads results, – More likely to be different from the organic results • Example: query = “free internet radio free radio” – Responsive: “Pandora Internet Radio” – Incidental: “Discount Bose Computer Speakers” Discount Speakers 58 © Yahoo! Research 2010
  • 59. Now it all make sense ... Using the features that quantify this interplay, we improved the accuracy of CTR prediction by 5% 59 © Yahoo! Research 2010
  • 60. Summary 1. 1 The financial scale is huge 2. Advertising is a form of information 3. Finding the “best ad” is an information retrieval problem  Multiple, possibly contradictory utility functions  Classical IR needs significant adaptation 4. The optimal solution requires extensive g use of external knowledge 60 © Yahoo! Research 2010
  • 61. Thank Th k you! ! gabr@yahoo-inc.com http://research.yahoo.com/~gabr 61
  • 62. This talk is Copyright Yahoo! 2010. Yahoo! d th A th Y h ! and the Author retain all rights, including t i ll i ht i l di copyright and distribution rights. No publication or further distribution in full or in part is permitted without explicit written permission. The opinions expressed herein are the responsibility of the author and do not necessarily reflect the opinion of Yahoo! Inc. This talk benefitted from the contributions of many colleagues and co-authors at Yahoo! and elsewhere. Their help is gratefully acknowledged. 62 © Yahoo! Research 2010