SCALABLE,PRIVACY-FRIENDLY,TARGETED INTERNETADSClaudia PerlichChief ScientistThanks to INFORMS for the2009 Design Science Award
OUTLINE Industry The high level M6D approach M6D data science for finding the right audience
BACKGROUNDIn a nutshell: our customers are brands who ask us to find a good audience and run an online ad campaign for themMain player in social targeting for (non-premium) display advertisementFounded in 2008Growth > 20Million Revenue in 2010
DISPLAY ADVERTISING TARGETING LANDSCAPE
CURRENT AD SPENDING SEEMS DISPROPORTIONATE… SHARE OF TIME IN ATYPICAL WEEK THAT US ADULTS SPEND WITH SELECT MEDIA* VS. SHARE OF US ADVERTISING SPENDING BY MEDIA, 2007 Note: *consumer media time excludes time spent using a mobile phone, watching DVDs or playing video games; Source: Forrester Research, “Teleconference: The US Interactive Marketing Forecast 2007-2012,” January 4, 2008
ONLINE ADVERTISING SPENDING BREAKDOWN (2009)Source: IAB Internet Advertising Revenue ReportConducted by PricewaterhouseCoopers and Sponsored by theInteractive Advertising Bureau (IAB)April 2010
Every BRAND has a unique SocialSignature, determined by the collection ofsites where current customers cluster.Every CONSUMER can be scoredagainst the brand‟s Social Signature,based on his/her visits to those sites.
STEP 1IDENTIFY BRAND‟S SOCIAL SIGNATUREWe place pixels on your site to cookie current brand customers.Our algorithm identifies and weights sites where the customers visit.• Sites with the highest visit density determine the brand‟s Social Signature.
STEP 2MATCH CONSUMERS WITH BRAND SIGNATUREWe score prospects based on how well they match your brand‟s Social Signature.Close match = better prospect…more likely to convert.
STEP 3PLACE THE AD WHEREVER THE BROWSER MAY BEWe find the best prospects in the bestenvironments at whatever scale yourcampaign requires, and deliver the ads.1. Prospects who match a brand‟s 3. Users respond to the adsSocial Signature visit sites and visit brand site.in the ad exchanges. 2. We buy impressions and serve ads to users with the highest scores, via the major ad exchanges.
STEP 4REFINE YOUR SIGNATURE BUYWe track results via our full feedback loop:- Refine your Social Signature.- Refresh each user‟s score to optimize performance in real-time. Prospect Ad Buying Scoring & ServingSignature ConversionOptimization Tracking
MAIN POINTS FOR THIS MORNING1. Machine learning can be used as the basis for EFFECTIVE, PRIVACY-FRIENDLY targeting for online advertising2. Important to consider carefully the TARGET variable used for training3. Many other exciting analytic challenges• Bid optimization• Across brand optimization• Server side cookie consistency• Proof ad effectiveness• Customer site optimization
BROWSER COOKIESINTERACTIONS NY Times PIXEL PIXEL 1. Browsing with 2. Shopping at one 3. General one of our data of our campaign browsing with partners sites display ads Browser interaction with ad (click) If we win an auction we serve an ad AD EXCHANGE
1. DOUBLY-ANONYMIZEDBIPARTITE GRAPHFor each browser, we collect from our data From this data, we induce a bipartitepartners hashed tokens of URL‟s previously browser-content graph.visited. BrowserID: 1234 ContentIDs: abkcc kkllo 88iok 7uiol
2. ADDING LABELSTO THE BIPARTITEGRAPHAdditionally, our brand partners provide From this data, we include labels in theinformation on what cookies have bipartite browser-content graph.previously taken some brand-relatedaction.* BRAND ACTORS *note: this is done separately for every brand we work with
3. THE TARGETINGGOALWhich of the browsers that have notpreviously taken a brand action are likely todo so? P(conv|imp, φbi)* P(conv|imp, φbi) In effect, this is a standard binary classification task. *φbi is a set of features that capture unconditional brand affinity based on graph structures…next section
KEY MODELING CHALLENGES How do you find a signal when the DATA LIMITATIONS nature of the ecosystem and self No contextual informationregulation limits your data availability? No demographic or PII Conversion rates ~ .01% DIMENSIONALITY & DATA SIZE > 100 MM Browsers > 50 MM hashed URL‟s > 150 brands data is very sparse How do you operate efficiently on so OPERATIONAL CONSTRAINTS much data for so many clients? Training models in linear time Scoring browsers in ms
OUR ITERATIVE MODELING PROCESS CONDITIONAL PROPENSITY MODEL UNCONDITIONAL GRAPH/BRAND PROXIMITY P(conv|imp,φbi) AD ALGORITHMS SERVIN G P(conv|φbi) HADOOP DATA STORE RAW(GRAPH) DATA CONVERSIONS Brand Actor Seeds
TWO-STAGEMODELING 1. STAGE: UNCONDITIONAL BRAND PROXIMITY & DIMENSIONALITY REDUCTION • Goal: estimate the browser-specific brand affinity • Draw from some network inference and relational learning techniques to incorporate the signal in the bi-partite graph • Can be build prior to campaign start 2. STAGE: CONDITIONAL MODEL • Goal: find the best prospect given the opportunity of an ad impression • Can potentially incorporate instance-specific features such as time of day, etc. • Can be continuously optimizes during the life of campaign
BAYESIAN RELATIONAL LEARNING1 BRAND ACTORSMEASURINGBRANDPROXIMITY BRAND PROXIMITY= P(conv|φbi) SCORING IN LINEAR TIME21 ACORA: Distribution-Based Aggregation for Relational Learning from Identifier Attributes. C. 2 Audience Selection for On-line Brand Advertising: Privacy-friendly SocialPerlich, F. Provost. Special Issue on Statistical Relational Learning and Multi-Relational Data Network Targeting. Provost, F., B. Dalessandro, R. Hook, X. Zhang, and A.Mining, Journal of Machine Learning 62 (2006) 65-105 Murray.Proceedings of the Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009).
OPTIMIZING TYPICAL CAMPAIGN ~ 100- 10000 PURCHASE PROPENSITY CONVERSIONS. GIVEN AD GOAL IS TO PREDICT P(CONVERSION| IMP,ΦBI) • For each browser bi, a feature vector φbi can be composed of the various brand proximity measures and impression information Every Impression/Conversion is • The different evidence can be combined via a logged and can be used to train a ranking function f(φbi) linear model that combines all graph based features to MULTIVARIATE LOGISTIC maximize probability of a future FUNCTION, TRAINED VIA conversion STANDARD LOGISTIC REGRESSION • Take full sample of positives, down sample conversions, can build a robust model with <100k examples* • Using Greedy Stepwise Forward Selection with 10-Fold Cross Validation, can automate the model building process and support >100 custom client models / week.Tree Induction vs. Logistic Regression: A Learning-curve Analysis. C. Perlich, F. Provost, and J. Simonoff. Journal of Machine Learning Research 4 (2003) 211-255.
„LIFE‟ PERFORMANCE OFM6D TARGETING NN Uniqs Median NN:RON SV/Uniq 16.0M 25 NN UNIQUES5.0M 204.0M 153.0M 102.0M1.0M 5 0 0
PERFORMANCE IMPROVES SUBSTANTIALLYWITH MORE DATA AND FEEDBACK LOOP 10% 50% 100%Shows lift for a particular size targeted population (%)Left-to-right decreases targeting threshold
HIGHER LIFT OR LARGER BUDGET 10% 50% 100%The orange horizontal line intersects the curves at the % of the population we can reach to get a 2x lift.The maroon vertical line intersects the curves at the lift multiple for the best 10% of each population.
SO WHY DOES ITWORK SO WELL?INTEGRATION WITH THE BRANDAND OBSERVING MEANINGFULBRAND ACTION IS KEY!• Use supervised learning instead of buying segments• It might be tempting to consider modeling based on clicks• No hassle dropping a pixel …• There are some brands who have no real good online presenceWE FIND GREAT SIGNAL IN OURBREADTH OF DATA BRAND• Social principles work even with „pseudo‟ social data ACTORS• Having „shallow‟ long-tail content on wide universe is very useful• Unique browser pool since we don‟t buy or sell
BRAND ACTIONS:CLICKS, CONVERSIONS AND SITE VISITS • Purchase conversions are rare • Clicks and Site Visits are common Correlation between actions Clicks and Purchase• Clicks and Purchases are basically uncorrelated!• Site visits are much better indicators of purchase conversion Site Visits and Purchase
WHAT NOT TO DO: USE CLICKS AS ASURROGATE FOR CONVERSIONS AUC: TRAIN PURCHASE and EVAL PURCHASE AUC: TRAIN CLICK and EVAL PURCHASE
BUT SITE VISITS ARE A GREATSURROGATE SITE VISITS 0.8 AUC: TRAIN SV and EVAL PURCHASE 0.7 auc_sv[auc_sv > 0.4] 0.6 0.5 0.4 0.4 0.5 0.6 0.7 0.8 AUC: TRAIN PURCHASE and EVAL PURCHASE auc_p[auc_p > 0.4]
OFTEN, SITE VISITS IS BETTER THAN CONVERSIONS FOR PREDICTING CONVERSIONS WHEN THERE ARE RELATIVELY FEW CONVERSIONS BUBBLE SIZE REPRESENTS #SV/#PURCHASERS 15% 10% PURCHASENormalized difference in AUC 5% BETTER 0% 0 1 2 3 4 5 6 7 8 9 -5% SV BETTER -10% -15% -20% N umber of Purchasers - Log Scale
“PRIVACY” ONLINE?Where would we like firms to operate on the spectrum between the twounacceptable extremes: “We can do whatever “You can’t do anything we want with whatever with MY data!” data we can get our hands on.” Are there points between the extremes that give us acceptable tradeoffs between “privacy” and efficacy? ML PROVIDES MANY POSSIBILITIES. 32
SOME M6D While our targeting is based on social science, we areTHOUGHTS not collecting a true social networkON PRIVACY There is NO link to the „real you‟ • What we „know‟ about you has no meaning in reality • Double-anonymization We collect a very thin but wide layer of information We are part of the IAB: Interactive Advertising Bureau, the driving force of self-regulation We fall into the category of „third party cookies‟ You can opt out (see our homepage) We do not sell data or derivatives
SUMMARY Many exciting opportunities for analytics in the space of advertisement Machine learning can be the basis for EFFECTIVE, PRIVACY- FRIENDLY targeting in online advertising Important to consider carefully the target used for training models
REFERENCES Audience Selection for On-Line Brand Advertising: Privacy Friendly Social Network Targeting – White paper published in 2009 SigKDD Conference Proceedings. Winner of 2009 INFORMS ISS Design Science Award. Predicting Conversions for Targeting On-Line Advertising Using Social Variables Constructed from User-Generated Content - working white paper for Marketing Science Journal. (Privacy Friendly!) Social Network Targeting for Online Advertising – Invited talk at 2010 SigKDD, presented by Foster Provost.