Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
M. 
RECCE 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
Machine 
Learning 
on 
Big 
Data 
for 
Personaliz...
Adver<sing 
has 
long 
wanted 
be?er 
algorithms 
Half 
the 
money 
I 
spend 
on 
adverBsing 
is 
wasted; 
the 
trouble 
i...
• Internet 
adverBsing 
(the 
business) 
• Internet 
adverBsing 
(the 
data) 
• Understanding 
consumers 
(the 
models) 
•...
The 
Personalized 
Media 
Economy 
Media 
is 
transiBoning 
from 
a 
“one 
size 
fits 
all” 
broadcast 
model 
to 
dynamic...
Money 
Follows 
Media 
ConsumpBon 
Globally, 
hundreds 
of 
billions 
of 
dollars 
of 
ad 
spend 
will 
shiY 
11/18/2011 
...
Why 
the 
Spending 
Disparity? 
• Media 
spend 
processes 
are 
well 
established 
• New 
media 
channels 
lag 
unBl 
audi...
Enter 
Quantcast 
• Launched 
September 
2006 
to 
enable 
addressable 
adverBsing 
at 
scale 
• First 
we 
had 
to 
fix 
...
Broad 
Par<cipa<on 
World’s 
Favorite 
Audience 
Measurement 
Service 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reser...
An 
Adver<sing 
Data 
Explosion 
• Massive 
expansion 
in 
number 
of 
decisions 
– Individuals, 
not 
whole 
audiences 
–...
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
Data 
Rich 
Environment 
4 
Billion 
Cookies 
/mo. 
observed...
Rise 
of 
Real-­‐Time 
Audience 
Targe<ng 
“….let 
adver<sers 
buy 
ads 
in 
the 
milliseconds 
between 
the 
Bme 
someone...
RTB 
– 
A 
Rapid 
& 
Transforma<onal 
Industry 
Shib 
Quantcast 
AucBon 
Volume 
(UK 
& 
US) 
11/18/2011 
© 
2011 
Quantca...
Media 
Buying 
& 
Execu<on 
is 
Changing 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
13 
$200B 
2005 
N...
Data 
Mining 
Challenges 
Audience 
EsBmaBon 
Using 
reference 
data 
from 
a 
small 
number 
of 
people 
and 
a 
small 
n...
Quantcast 
Lookalikes 
for 
Marketers 
RevoluBonary 
Ad 
TargeBng 
for 
Performance 
and 
Brand 
11/18/2011 
© 
2011 
Quan...
Lookalike 
Selec<on 
• Given 
an 
archetype 
group 
of 
users, 
find 
the 
feature 
set 
that 
best 
separates 
them 
from...
• Math 
compeBBon 
• Largest 
number 
of 
“conversions” 
(purchasers) 
during 
contest 
“wins” 
• Leverage 
informaBon 
on...
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
Lookalike 
Mass-­‐Produc<on 
Pipeline 
Model 
500 TB 
1000s ...
Lookalikes 
Iden<fy 
Consumers 
that 
Will 
Take 
Ac<on 
-­‐80 
-­‐60 
-­‐40 
-­‐20 
-80 -60 -40 -20 
11/18/2011 
© 
2011 ...
Wide 
Range 
of 
Ac<vity 
Websites, 
keywords, 
geo-­‐locaBon, 
ads 
and 
more 
11/18/2011 
© 
2011 
Quantcast. 
All 
Righ...
RTLAL 
Bidding 
Architecture 
Model 
DefiniBon 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
21 
Pixel 
D...
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
AcBvity 
Level 
VariaBons 
22
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
Cookie 
DeleBon 
Rates 
23
Media 
consumpBon 
is 
non-­‐staBonary 
13:00 
13:30 
14:00 
14:30 
15:00 
15:30 
16:00 
16:30 
17:00 
17:30 
18:00 
18:30...
Choose 
the 
Right 
Objec<ve! 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
25 
Clicks 
don’t 
always 
le...
Machines 
High 
Performance 
Plalorm 
MulBple 
Global 
Datacenters 
Ultra-­‐high 
availability 
with 
advanced 
traffic 
m...
Collabora<on 
• Regular 
brainstorming 
• Group 
review 
meeBngs 
• Shared 
wiki 
environment 
• Team 
goals 
Independence...
Measuring 
Lib 
– 
ROC 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
28
Cumula<ve 
Lib 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
29
Learning 
∝ 
experimentaBon 
To 
process 
100TB 
with 
first 
MapReduce 
job 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights...
Technology 
Maners 
Leaders 
will 
be 
world-­‐class 
in 
every 
discipline, 
and 
will 
operate 
all 
as 
a 
fully 
integ...
If 
you 
have 
all 
that 
then.... 
Having 
more 
Data 
really 
11/18/2011 
maners. 
© 
2011 
Quantcast. 
All 
Rights 
Res...
Numerous 
Open 
Challenges 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
33 
• Dealing 
with 
sparsity 
•...
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
Summary 
• Digital 
adverBsing 
is 
a 
vast 
analyBcal 
envi...
Quantcast 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
35
Contact: 
mrecce@quantcast.com 
11/18/2011 
© 
2011 
Quantcast. 
All 
Rights 
Reserved 
QCon 
36
Upcoming SlideShare
Loading in …5
×

Machine learning on big data for personalized Internet advertising

1,241 views

Published on

Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.

http://www.infoq.com/presentations/Machine-Learning-on-Big-Data-for-Personalized-Internet-Advertising

Published in: Data & Analytics

Machine learning on big data for personalized Internet advertising

  1. 1. M. RECCE 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Machine Learning on Big Data for Personalized Adver<sing
  2. 2. Adver<sing has long wanted be?er algorithms Half the money I spend on adverBsing is wasted; the trouble is I don't know which half. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 2 John Wanamaker “The Father of Modern AdverBsing” “ ”
  3. 3. • Internet adverBsing (the business) • Internet adverBsing (the data) • Understanding consumers (the models) • Organizing for success 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 3 Outline
  4. 4. The Personalized Media Economy Media is transiBoning from a “one size fits all” broadcast model to dynamic real-­‐Bme choice 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 4 Online AdverBsing Ecosystem
  5. 5. Money Follows Media ConsumpBon Globally, hundreds of billions of dollars of ad spend will shiY 11/18/2011 $30B opportunity © 2011 Quantcast. All Rights Reserved QCon ? 5
  6. 6. Why the Spending Disparity? • Media spend processes are well established • New media channels lag unBl audiences and value can be properly quanBfied • Historically, digital audiences were poorly quanBfied – StraBfied sampling has been the norm in media measurement for decades – Bias and sampling error prevail 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 6
  7. 7. Enter Quantcast • Launched September 2006 to enable addressable adverBsing at scale • First we had to fix audience measurement • Launched a free service based on direct measurement of media consumpBon • Use machine learning to infer audience characterisBcs 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 7
  8. 8. Broad Par<cipa<on World’s Favorite Audience Measurement Service 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 8
  9. 9. An Adver<sing Data Explosion • Massive expansion in number of decisions – Individuals, not whole audiences – Impressions, not whole sites – Screens/Bmes/locaBons/…… • Decision Bmeframe reduced from weeks to milliseconds • This problem can only be solved algorithmically 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 9
  10. 10. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Data Rich Environment 4 Billion Cookies /mo. observed 400,000+ Events /sec real-­‐<me transac<ons 600+ Billion Events /mo. media consump<on WHOLE LOT OF DATA! 1.3 Billion Global Users 240 Million U.S. Users everyone 800x /Person per month avg. observa<ons 5 Petabytes per day data processed 100+ Million Des<na<ons with QC tags 10
  11. 11. Rise of Real-­‐Time Audience Targe<ng “….let adver<sers buy ads in the milliseconds between the Bme someone enters a site’s Web address and the moment the page appears. The technology, called real-­‐Bme bidding, allows adver<sers to examine site visitors one by one and bid to serve them ads almost instantly…A consumer would barely noBce the shiY, except that ads might seem more relevant to exactly what they are shopping for.” -­‐ New York Times, March 12 More relevant ads, more effec<ve campaigns, higher inventory u<liza<on & higher CPMs 11 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon
  12. 12. RTB – A Rapid & Transforma<onal Industry Shib Quantcast AucBon Volume (UK & US) 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 12 7 5 3 2 1 4 Billions of Auctions / Day Jul ‘11 5.4B Apr ‘11 3.2B Oct ‘10 1.2B Feb ‘10 300M Apr ‘10 400M Jul ‘10 800M Jan ‘11 2.0B 6 Sep ‘11 7.2B
  13. 13. Media Buying & Execu<on is Changing 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 13 $200B 2005 Now Æ $200B Buy Whole Sites Real-­‐Time Bidding TransacBon Supply Porlolio 100 Publishers 100’s of 1000’s Impressions/ Second Data/Tools Aggregate Report Human Analysis Petascale CompuBng + Machine Learning
  14. 14. Data Mining Challenges Audience EsBmaBon Using reference data from a small number of people and a small number of web sites infer the demographics/anributes of the audience of all sites. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 14 User EsBmaBon Using media consumpBon records and audience esBmates, determine the characterisBcs of an Internet user across arbitrary dimensions. Lookalike SelecBon From the behavior of a small number of buyers of a product, determine the set of people who will buy it next. Live Traffic Modeling Compute the value for showing an adverBsement to a user as a funcBon of the user, adverBsing environment, Bme of day etc.
  15. 15. Quantcast Lookalikes for Marketers RevoluBonary Ad TargeBng for Performance and Brand 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 15 1. Understand marketer’s BEST CUSTOMERS with Quantcast Measurement 2. Isolate DISTINCTIVE INTERESTS 3. Find MILLIONS OF LOOKALIKES 4. Reach them ANYWHERE PERFORMANCE LOOKALIKES • Quantcast technology conBnually opBmizes real-­‐ Bme media for adverBser BRAND LOOKALIKES • Buy custom audiences from trusted media partners Your Site Traffic
  16. 16. Lookalike Selec<on • Given an archetype group of users, find the feature set that best separates them from their complement • Features can be posiBve or negaBve indicators of content relevance • Find more that look like them 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 16
  17. 17. • Math compeBBon • Largest number of “conversions” (purchasers) during contest “wins” • Leverage informaBon on prior purchasers to find more • Decide how to compete • Bring mathemaBcians • More data on each converter • Management by metrics • Know what the compeBtors are doing Problem Statement 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 17
  18. 18. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Lookalike Mass-­‐Produc<on Pipeline Model 500 TB 1000s of Concurrent Models Trained Models Scoring 10M Potential Converters 1.3 Billion 20 TB / Day Multi PB Internet Users Training 10,000 Converters Model Configuration 18
  19. 19. Lookalikes Iden<fy Consumers that Will Take Ac<on -­‐80 -­‐60 -­‐40 -­‐20 -80 -60 -40 -20 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Iden<fy Posi<ve & Nega<ve indicators of purchase Posi<ve Nega<ve 4. Consumers who purchased product Start with consumers who purchased 1. 2. Select consumers who didn’t purchase Evaluate world’s largest database of human interests 3. If a new consumer looks more like a purchaser than a non-­‐purchaser, they’re a Lookalike 5. days 250 500 750 1000 0 0 Consumers who did not purchase product days 0 250 500 750 1000 0 19
  20. 20. Wide Range of Ac<vity Websites, keywords, geo-­‐locaBon, ads and more 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 20 Conversion Event
  21. 21. RTLAL Bidding Architecture Model DefiniBon 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 21 Pixel Data Real Time Ad Exchange Model Training and Scoring AucBon Mgmt Bidding
  22. 22. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon AcBvity Level VariaBons 22
  23. 23. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Cookie DeleBon Rates 23
  24. 24. Media consumpBon is non-­‐staBonary 13:00 13:30 14:00 14:30 15:00 15:30 16:00 16:30 17:00 17:30 18:00 18:30 19:00 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon ‘Michael Jackson’ Media ConsumpBon June 25, 2009 Pages consumed per minute 24
  25. 25. Choose the Right Objec<ve! 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 25 Clicks don’t always lead to conversions The right metric is criBcal! Indexed Click Vs. Conversion Rates
  26. 26. Machines High Performance Plalorm MulBple Global Datacenters Ultra-­‐high availability with advanced traffic management 450,000 / Second 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 26 Real-­‐Bme events 5PB / Day Processing throughput
  27. 27. Collabora<on • Regular brainstorming • Group review meeBngs • Shared wiki environment • Team goals Independence • Everyone free to implement their own ideas • Improved models • Bener metrics • VisualizaBon methods, etc. Math Team Environment 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 27
  28. 28. Measuring Lib – ROC 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 28
  29. 29. Cumula<ve Lib 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 29
  30. 30. Learning ∝ experimentaBon To process 100TB with first MapReduce job 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 6 Hours 2 Days Mins New model development New model in producBon Hours Live performance assessment 2 Weeks To influence billions of real-­‐Bme decisions every day and millions of dollars of adverBsing spend 30
  31. 31. Technology Maners Leaders will be world-­‐class in every discipline, and will operate all as a fully integrated whole. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Machine Learning & OpBmizaBon Comprehensive Coherent Data Petascale Big-­‐Data CompuBng Real-­‐Time Tech Mastery 31
  32. 32. If you have all that then.... Having more Data really 11/18/2011 maners. © 2011 Quantcast. All Rights Reserved QCon 32
  33. 33. Numerous Open Challenges 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 33 • Dealing with sparsity • Feature selecBon • Real-­‐Bme scoring and bidding • ‘True’ performance & anribuBon modeling • LiY, liY and more liY! • Handling 100,000’s of concurrent models
  34. 34. 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon Summary • Digital adverBsing is a vast analyBcal environment – Enormous data volumes – Rich behaviors – ObjecBve performance metrics • MarkeBng will be transformed by computaBonal approaches • Hundreds of billions of dollars of spend are at stake 34
  35. 35. Quantcast 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 35
  36. 36. Contact: mrecce@quantcast.com 11/18/2011 © 2011 Quantcast. All Rights Reserved QCon 36

×