SlideShare a Scribd company logo
1 of 64
Predictive Modeling 
Claudia Perlich, 
Chief Scientist 
@claudia_perlich
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/display-advertising-big-data 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon New York 
www.qconnewyork.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
Targeted Online Display Advertising
Predictive Modeling: 
Algorithms that Learn Functions
P(Buy|Age,Income) 
Estimating conditional probabilities 
Income 
Age 
Not interested 
Buy 
50K 
45 
Logistic Regression 
p(+|x)= 
β0 = 3.7 
β1 = 0.00013 
p(buy|37,78000) = 0.48
cookies 
Does the ad have 
Shopping at one of 
our campaign sites 
10 Million 
URLs 
200 Million 
browsers 
causal effect? 
conversion 
20 Billion of 
Where should 
we advertise and 
at what price? 
Ad 
Exchange 
bid requests per day 
Who should 
we target for 
a marketer? 
What data should 
we pay for? 
Attribution? 
What requests 
are fraudulent?
Our Browser Data: Agnostic 
A consumer’s online/mobile activity 
The Non-Branded Web 
The Branded Web 
gets recorded like this: 
Browsing History 
Hashed URL’s: 
date1 abkcc 
date2 kkllo 
date3 88iok 
date4 7uiol 
… 
Brand Event 
Encoded 
date1 3012L20 
date 2 4199L30 
… 
date n 3075L50 
I do not want to ‘understand’ who you are …
The Heart and Soul 
Targeting 
Model 
P(Buy|URL,inventory,ad) 
 Predictive modeling on hashed browsing history 
 10 Million dimensions for URL’s (binary indicators) 
 extremely sparse data 
 positives are extremely rare
How can we learn from 10M features with 
no/few positives? 
 We cheat. 
In ML, cheating is called “Transfer Learning”
The heart and soul 
Targeting 
Model P(Buy|URL,inventory,ad) 
 Has to deal with the 10 Million URL’s 
 Need to find more positives!
Experiment 
Data 
 Randomized targeting across 58 different large display ad campaigns. 
 Served ads to users with active, stable cookies 
 Targeted ~5000 random users per day for each marketer. Campaigns ran 
for 1 to 5 months, between 100K and 4MM impressions per campaign 
 Observed outcomes: clicks on ads, post-impression (PI) purchases 
(conversions) 
Targeting 
• Optimize targeting using Click and PI Purchase 
• Technographic info and web history as input variables 
• Evaluate each separately trained model on its ability to rank order users for PI 
Purchase, using AUC (Mann-Whitney Wilcoxin Statistic) 
• Each model is trained/evaluated using Logistic Regression
Predictive performance* (AUC) for purchase 
learning 
.2 .4 .6 .8 
AUC 
Train on Click Train on Purchase 
Train on Click Train on Purchase 
® 
[Dalessandro et al. 2012] 
.2 .4 .6 .8 
AUC 
*Restricted feature set used for these modeling results; qualitative conclusions generalize
Predictive performance* (AUC) for click 
learning 
.2 .4 .6 .8 
(AUC in the target domain) 
AUC 
Train on Click Train on Purchase 
® 
[Dalessandro et al. 2012] 
Evaluated on predicting purchases 
*Restricted feature set used for these modeling results; qualitative conclusions generalize
Clickers in the Dark 
Top 10 Apps by CTR
Predictive performance* (AUC) 
for Site Visit learning 
[Dalessandro et al. 2012] 
Significantly better targeting training on source 
task 
Evaluated on predicting purchases 
(AUC in the target domain) 
. 2 . 4 . 6 . 8 1 
Train on Clicks Train on Site Visits Train on Purchase 
A U C D i s t r i b u t i o n
The heart and soul 
Targeting 
Model 
P(Buy|URL,inventory,ad) 
Organic: P(SiteVisit|URL’s) 
 Has to deal with the 10 Million URL’s 
 Transfer learning: 
 Use all kinds of Site visits instead of new purchases 
 Biased sample in every possible way to reduce variance 
 Negatives are ‘everything else’ 
 Pre-campaign without impression 
 Stacking for transfer learning 
MLJ 2014
Logistic regression in 10 
Million dimensions 
 Stochastic Gradient Descent 
 L1 and L2 constraints 
 Automatic estimation of optimal learning rates 
 Bayesian empirical industry priors 
 Streaming updates of the models 
 Fully Automated ~10000 model per week 
KDD 2014 
Targeting 
Model 
p(sv|urls) =
Dimensionality Reduction 
• There are a few obvious options for dimensionality reduction. 
• Hashing: Run each URL through a hash function, and spit out a 
specified number of buckets. 
• Categorization: We had both free and commercial website 
category data. Binary URL space  binary category space. 
www.baseball-reference.com 
Sports/Baseball/Major_League/Statistics 
• SVD: Singular Value Decomposition in Mahout to transform 
large, sparse feature space into small dense feature space. 
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
17 
www.dmoz.org
Algorithm: Intuition & Multitasking 
• Hierarchical clustering in the space of model parameters. 
 Naïve Bayes(ish) model: It’s not a bug, it’s a feature! 
• Distance function: Pearson Correlation 
• Cutting the dendrogram: 
 Most algorithms cut the tree at a specific “height” in order to 
produce a desired number of clusters. 
 In our case, we need clusters with sufficient representation in the 
data. 
 Recursively traverse the tree and cut when we reach a certain 
minimum popularity. 
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
18
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
Results 
Kids 
Health 
Home 
News 
Games 
& 
Videos 
Home
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
Experiments 
• We built models off data from 28 campaigns. 
• Our production cluster definitions have 4,318 features. 
• We tried to get each of the “challengers” as close to this as 
we possibly could. 
• We evaluate on Lift (5%) and AUC. 
20
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
Results 
Average 
Lift (5%) 
Average 
Relative Perf. 
Win Loss Tie Features 
Cluster 4.024 100% - - - 4,318 
SVD 3.539 86.0% 4 20 4 1,000 
Hash 3.035 70.0% 1 26 1 4,318 
Commercial 3.195 71.3% 2 24 2 1,183 
Free Context 3.643 84.4% 1 17 10 5,984 
21
To reduce or not to reduce? 
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
22
© 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 
Conclusions 
• We use the cluster based models for some 
things 
• Targeting is still using high-dimensional 
models whenever possible 
23
Real-time Scoring of a User 
Ad Ad Ad 
Ad 
OBSERVATION 
Purchase 
ProspectRank 
Threshold 
site visit with positive correlation 
site visit with negative correlation 
ENGAGEMENT 
Some prospects fall 
out of favor once their 
in-market indicators 
decline.
What exactly is Inventory? 
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential 
25 
Where the ad will be shown: 
7K unique inventories + default 
buckets
Example of Model Scores for Hotel Campaign 
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential 
• Scores are calculated on 
de-duplicated training 
pairs (i,s) 
• We even integrate out s 
• Nicely centered around 1 
26
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential 
Bidding Strategies 
Strategy 0 – do nothing special: 
• always bid base price for segment 
• equivalent to constant score of 1 across all inventories 
• consistent with an uninformative inventory model 
Strategy 1 – minimize CPA: 
• auction-theoretic view: bid what it is worth in relative terms 
• Multiply the base price with ratio 
Strategy 2 – maximize Conversion rate: 
• optimal performance is not to bid what it is worth but to trade off 
value for quality and only bid on the best opportunities 
• apply a step function to the model ratio to translate it into a 
factor applied to the price: 
 ratio below 0.8 yields a bid price of 0 (so not bidding), 
 ratios between 0.8 and 1.2 are set to 1 and ratios above 
 1.2 bid twice the base price 
27 
1
Both lowered CPA. Optimal decision making depends on long vs short 
term thinking (note: we chose long term, thus Strategy 2). 
1.40 
1.30 
1.20 
1.10 
1.00 
0.90 
0.80 
0.70 
0.60 
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential 
Results 
28 
0.50 
CR Index CPM Index CPA Index 
Strat 1 Strat 2 
Increased CR, same 
CPM = Free Lunch! 
Increased CR, but higher 
CPM. Lowest CPA.
Real-time Scoring of a User 
Ad Ad Ad 
Ad 
OBSERVATION 
Purchase 
ProspectRank 
Threshold 
site visit with positive correlation 
site visit with negative correlation 
ENGAGEMENT 
Some prospects fall 
out of favor once their 
in-market indicators 
decline.
25 
20 
15 
10 
5 
0 
6.0M 
5.0M 
4.0M 
3.0M 
2.0M 
1.0M 
0 
NN Lift over RON 
Total Impressions 
Lift over random for 66 campaigns 
for online display ad prospecting 
Note: the top prospects are consistently rated as 
being excellent compared to alternatives by advertising 
clients’ internal measures, and when measured by their 
analysis partners (e.g., Nielsen): high ROI, 
low cost-per-acquisition, etc. 
median lift = 5x 
Lift over baseline 
<snip>
Relative Performance to Third Party
Measuring causal effect? 
A/B Testing 
Practical concerns 
Estimate Causal effects from observational data 
 Using targeted maximum likelihood (TMLE) 
to estimate causal impact 
 Can be done ex-post for different questions 
 Need to control for confounding 
 Data has to be ‘rich’ and cover all combinations of 
confounding and treatment 
ADKDD 2011 
E[YA=ad] – E[YA=no ad]
An important decision… 
I think she is hot! 
Hmm – so what should I write 
to her to get her number?
Source: OK Trends 
? ?
Hardships of causality. 
Beauty is Confounding 
determines both the probability 
of getting the number and of the 
probability that James will say it 
need to control for the actual 
beauty or it can appear that 
making compliments is a bad idea 
“You are beautiful.”
Hardships of causality. 
Targeting is Confounding 
We only show ads to people 
we know are more likely to 
convert (ad or not) 
conversion rates 
SAW AD DID NOT SEE AD 
Need to control for confounding 
Data has to be ‘rich’ and cover all 
combinations of confounding and 
treatment
Observational Causal Methods: TMLE 
Negative Test: wrong ad 
Positive Test: A/B comparison
Some creatives do not work … 
38
Data Quality in Exchanges 
Fraud 
KDD 2013
Ensure location quality before using it 
Almost 30% of users with more than one location 
travel faster than the speed of sound
Unreasonable Performance Increase Spring 12 
2 weeks 
Performance Index 
2x
Oddly predictive websites?
36% traffic is Non-Intentional 
6% 
36% 
2011 2012
Traffic patterns are ‘non - human’ 
website 1 website 2 
50% 
Data from Bid Requests in Ad-Exchanges
WWW 2010 
Node: 
hostname 
Edge: 
50% co-visitation
Boston Herald
Boston Herald
womenshealthbase?
WWW 2012
Unreasonable Performance Increase Spring 12 
2 weeks 
Performance Index 
2x
Now it is coming also to brands 
• ‘Cookie Stuffing’ increases the value of the ad for 
retargeting 
• Messing up Web analytics … 
• Messes up my models because a botnet is easier to 
predict than a human
Fraud pollutes my models 
• Don’t show ads on those sites 
• Don’t show ads to a high jacked browser 
• Need to remove the visits to the fraud sites 
• Need to remove the fraudulent brand visits 
When we see a browser on caught up in fraudulent 
activity: send him to the penalty box where we 
ignore all his actions
Using the penalty box: all back to normal 
56 
3 more weeks in spring 2012 
Performance Index
we5b0s%ite 1
Somebody is posing as nytimes.com
Bottom-line 
It is all a question of how good you are at cheating! 
And that you can catch the bad guys at cheating …
In eigener Sache 
claudia.perlich@gmail.com
1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand 
Advertising: Privacy Friendly Social Network Targeting, KDD 2009 
2. O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Estimating The Effect Of 
Online Display Advertising On Browser Conversion. ADKDD 2011 
3. C.Perlich, O. Stitelman, B. Dalessandro, T. Raeder and F. Provost. Bid Optimizing 
and Inventory Scoring in Targeted Online Advertising. KDD 2012 (Best Paper Award) 
4. T. Raeder, O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Design 
Principles of Massive, Robust Prediction Systems. KDD 2012 
5. B. Dalessandro, O. Stitelman, C. Perlich, F. Provost Causally Motivated Attribution for 
Online Advertising. In Proceedings of KDD, ADKDD 2012 
6. B. Dalessandro, R. Hook. C. Perlich, F. Provost. Transfer Learning for Display 
Advertising MLJ 2014 
7. T. Raeder, C. Perlich, B. Dalessandro, O. Stitelman, F. Provost. Scalable Supervised 
Dimensionality Reduction Using Clustering at KDD 2013 
8. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, F. Provost. Using Co-visitation 
Networks For Classifying Non-Intentional Traffic‘ at KDD 2013 
61 
Some References
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/display-advertising- 
big-data

More Related Content

Viewers also liked

Pruevas medio ambiente
Pruevas medio ambientePruevas medio ambiente
Pruevas medio ambientesneidertavera
 
Hero 131112074510-phpapp02 (1)
Hero 131112074510-phpapp02 (1)Hero 131112074510-phpapp02 (1)
Hero 131112074510-phpapp02 (1)Mahesh Relangi
 
Effects of Gem Stones
Effects of Gem StonesEffects of Gem Stones
Effects of Gem StonesLoay Ghazaleh
 
Unidad 3 estetica-1-espana_b
Unidad 3 estetica-1-espana_bUnidad 3 estetica-1-espana_b
Unidad 3 estetica-1-espana_bEbrocap Ltda
 
Preparacion de la Piel
Preparacion de la PielPreparacion de la Piel
Preparacion de la Pielgchapperon
 
Como desmaquillar las pestañas
Como desmaquillar las pestañasComo desmaquillar las pestañas
Como desmaquillar las pestañascomomaquillarse
 
Flora intestinal y Probióticos: mitos y realidades
Flora intestinal y Probióticos: mitos y realidadesFlora intestinal y Probióticos: mitos y realidades
Flora intestinal y Probióticos: mitos y realidadesDaniel Fuentes
 

Viewers also liked (11)

Pruevas medio ambiente
Pruevas medio ambientePruevas medio ambiente
Pruevas medio ambiente
 
Objek wisata air terjun terujak
Objek wisata air terjun terujakObjek wisata air terjun terujak
Objek wisata air terjun terujak
 
Hero 131112074510-phpapp02 (1)
Hero 131112074510-phpapp02 (1)Hero 131112074510-phpapp02 (1)
Hero 131112074510-phpapp02 (1)
 
Blue Sky Thinking
Blue Sky ThinkingBlue Sky Thinking
Blue Sky Thinking
 
Effects of Gem Stones
Effects of Gem StonesEffects of Gem Stones
Effects of Gem Stones
 
Flora comensal
Flora comensal Flora comensal
Flora comensal
 
Unidad 3 estetica-1-espana_b
Unidad 3 estetica-1-espana_bUnidad 3 estetica-1-espana_b
Unidad 3 estetica-1-espana_b
 
Preparacion de la Piel
Preparacion de la PielPreparacion de la Piel
Preparacion de la Piel
 
Ras piでrt linux
Ras piでrt linuxRas piでrt linux
Ras piでrt linux
 
Como desmaquillar las pestañas
Como desmaquillar las pestañasComo desmaquillar las pestañas
Como desmaquillar las pestañas
 
Flora intestinal y Probióticos: mitos y realidades
Flora intestinal y Probióticos: mitos y realidadesFlora intestinal y Probióticos: mitos y realidades
Flora intestinal y Probióticos: mitos y realidades
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Weathering the data storm

  • 1. Predictive Modeling Claudia Perlich, Chief Scientist @claudia_perlich
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /display-advertising-big-data InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 5. Predictive Modeling: Algorithms that Learn Functions
  • 6. P(Buy|Age,Income) Estimating conditional probabilities Income Age Not interested Buy 50K 45 Logistic Regression p(+|x)= β0 = 3.7 β1 = 0.00013 p(buy|37,78000) = 0.48
  • 7. cookies Does the ad have Shopping at one of our campaign sites 10 Million URLs 200 Million browsers causal effect? conversion 20 Billion of Where should we advertise and at what price? Ad Exchange bid requests per day Who should we target for a marketer? What data should we pay for? Attribution? What requests are fraudulent?
  • 8. Our Browser Data: Agnostic A consumer’s online/mobile activity The Non-Branded Web The Branded Web gets recorded like this: Browsing History Hashed URL’s: date1 abkcc date2 kkllo date3 88iok date4 7uiol … Brand Event Encoded date1 3012L20 date 2 4199L30 … date n 3075L50 I do not want to ‘understand’ who you are …
  • 9. The Heart and Soul Targeting Model P(Buy|URL,inventory,ad)  Predictive modeling on hashed browsing history  10 Million dimensions for URL’s (binary indicators)  extremely sparse data  positives are extremely rare
  • 10. How can we learn from 10M features with no/few positives?  We cheat. In ML, cheating is called “Transfer Learning”
  • 11. The heart and soul Targeting Model P(Buy|URL,inventory,ad)  Has to deal with the 10 Million URL’s  Need to find more positives!
  • 12. Experiment Data  Randomized targeting across 58 different large display ad campaigns.  Served ads to users with active, stable cookies  Targeted ~5000 random users per day for each marketer. Campaigns ran for 1 to 5 months, between 100K and 4MM impressions per campaign  Observed outcomes: clicks on ads, post-impression (PI) purchases (conversions) Targeting • Optimize targeting using Click and PI Purchase • Technographic info and web history as input variables • Evaluate each separately trained model on its ability to rank order users for PI Purchase, using AUC (Mann-Whitney Wilcoxin Statistic) • Each model is trained/evaluated using Logistic Regression
  • 13. Predictive performance* (AUC) for purchase learning .2 .4 .6 .8 AUC Train on Click Train on Purchase Train on Click Train on Purchase ® [Dalessandro et al. 2012] .2 .4 .6 .8 AUC *Restricted feature set used for these modeling results; qualitative conclusions generalize
  • 14. Predictive performance* (AUC) for click learning .2 .4 .6 .8 (AUC in the target domain) AUC Train on Click Train on Purchase ® [Dalessandro et al. 2012] Evaluated on predicting purchases *Restricted feature set used for these modeling results; qualitative conclusions generalize
  • 15. Clickers in the Dark Top 10 Apps by CTR
  • 16. Predictive performance* (AUC) for Site Visit learning [Dalessandro et al. 2012] Significantly better targeting training on source task Evaluated on predicting purchases (AUC in the target domain) . 2 . 4 . 6 . 8 1 Train on Clicks Train on Site Visits Train on Purchase A U C D i s t r i b u t i o n
  • 17. The heart and soul Targeting Model P(Buy|URL,inventory,ad) Organic: P(SiteVisit|URL’s)  Has to deal with the 10 Million URL’s  Transfer learning:  Use all kinds of Site visits instead of new purchases  Biased sample in every possible way to reduce variance  Negatives are ‘everything else’  Pre-campaign without impression  Stacking for transfer learning MLJ 2014
  • 18. Logistic regression in 10 Million dimensions  Stochastic Gradient Descent  L1 and L2 constraints  Automatic estimation of optimal learning rates  Bayesian empirical industry priors  Streaming updates of the models  Fully Automated ~10000 model per week KDD 2014 Targeting Model p(sv|urls) =
  • 19. Dimensionality Reduction • There are a few obvious options for dimensionality reduction. • Hashing: Run each URL through a hash function, and spit out a specified number of buckets. • Categorization: We had both free and commercial website category data. Binary URL space  binary category space. www.baseball-reference.com Sports/Baseball/Major_League/Statistics • SVD: Singular Value Decomposition in Mahout to transform large, sparse feature space into small dense feature space. © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 17 www.dmoz.org
  • 20. Algorithm: Intuition & Multitasking • Hierarchical clustering in the space of model parameters.  Naïve Bayes(ish) model: It’s not a bug, it’s a feature! • Distance function: Pearson Correlation • Cutting the dendrogram:  Most algorithms cut the tree at a specific “height” in order to produce a desired number of clusters.  In our case, we need clusters with sufficient representation in the data.  Recursively traverse the tree and cut when we reach a certain minimum popularity. © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 18
  • 21. © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential Results Kids Health Home News Games & Videos Home
  • 22. © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential Experiments • We built models off data from 28 campaigns. • Our production cluster definitions have 4,318 features. • We tried to get each of the “challengers” as close to this as we possibly could. • We evaluate on Lift (5%) and AUC. 20
  • 23. © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential Results Average Lift (5%) Average Relative Perf. Win Loss Tie Features Cluster 4.024 100% - - - 4,318 SVD 3.539 86.0% 4 20 4 1,000 Hash 3.035 70.0% 1 26 1 4,318 Commercial 3.195 71.3% 2 24 2 1,183 Free Context 3.643 84.4% 1 17 10 5,984 21
  • 24. To reduce or not to reduce? © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential 22
  • 25. © 2013 Media6Degrees. All Rights Reserved. Proprietaryand Confidential Conclusions • We use the cluster based models for some things • Targeting is still using high-dimensional models whenever possible 23
  • 26. Real-time Scoring of a User Ad Ad Ad Ad OBSERVATION Purchase ProspectRank Threshold site visit with positive correlation site visit with negative correlation ENGAGEMENT Some prospects fall out of favor once their in-market indicators decline.
  • 27. What exactly is Inventory? © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential 25 Where the ad will be shown: 7K unique inventories + default buckets
  • 28. Example of Model Scores for Hotel Campaign © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential • Scores are calculated on de-duplicated training pairs (i,s) • We even integrate out s • Nicely centered around 1 26
  • 29. © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential Bidding Strategies Strategy 0 – do nothing special: • always bid base price for segment • equivalent to constant score of 1 across all inventories • consistent with an uninformative inventory model Strategy 1 – minimize CPA: • auction-theoretic view: bid what it is worth in relative terms • Multiply the base price with ratio Strategy 2 – maximize Conversion rate: • optimal performance is not to bid what it is worth but to trade off value for quality and only bid on the best opportunities • apply a step function to the model ratio to translate it into a factor applied to the price:  ratio below 0.8 yields a bid price of 0 (so not bidding),  ratios between 0.8 and 1.2 are set to 1 and ratios above  1.2 bid twice the base price 27 1
  • 30. Both lowered CPA. Optimal decision making depends on long vs short term thinking (note: we chose long term, thus Strategy 2). 1.40 1.30 1.20 1.10 1.00 0.90 0.80 0.70 0.60 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential Results 28 0.50 CR Index CPM Index CPA Index Strat 1 Strat 2 Increased CR, same CPM = Free Lunch! Increased CR, but higher CPM. Lowest CPA.
  • 31. Real-time Scoring of a User Ad Ad Ad Ad OBSERVATION Purchase ProspectRank Threshold site visit with positive correlation site visit with negative correlation ENGAGEMENT Some prospects fall out of favor once their in-market indicators decline.
  • 32. 25 20 15 10 5 0 6.0M 5.0M 4.0M 3.0M 2.0M 1.0M 0 NN Lift over RON Total Impressions Lift over random for 66 campaigns for online display ad prospecting Note: the top prospects are consistently rated as being excellent compared to alternatives by advertising clients’ internal measures, and when measured by their analysis partners (e.g., Nielsen): high ROI, low cost-per-acquisition, etc. median lift = 5x Lift over baseline <snip>
  • 33. Relative Performance to Third Party
  • 34. Measuring causal effect? A/B Testing Practical concerns Estimate Causal effects from observational data  Using targeted maximum likelihood (TMLE) to estimate causal impact  Can be done ex-post for different questions  Need to control for confounding  Data has to be ‘rich’ and cover all combinations of confounding and treatment ADKDD 2011 E[YA=ad] – E[YA=no ad]
  • 35. An important decision… I think she is hot! Hmm – so what should I write to her to get her number?
  • 37. Hardships of causality. Beauty is Confounding determines both the probability of getting the number and of the probability that James will say it need to control for the actual beauty or it can appear that making compliments is a bad idea “You are beautiful.”
  • 38. Hardships of causality. Targeting is Confounding We only show ads to people we know are more likely to convert (ad or not) conversion rates SAW AD DID NOT SEE AD Need to control for confounding Data has to be ‘rich’ and cover all combinations of confounding and treatment
  • 39. Observational Causal Methods: TMLE Negative Test: wrong ad Positive Test: A/B comparison
  • 40. Some creatives do not work … 38
  • 41. Data Quality in Exchanges Fraud KDD 2013
  • 42. Ensure location quality before using it Almost 30% of users with more than one location travel faster than the speed of sound
  • 43. Unreasonable Performance Increase Spring 12 2 weeks Performance Index 2x
  • 45. 36% traffic is Non-Intentional 6% 36% 2011 2012
  • 46. Traffic patterns are ‘non - human’ website 1 website 2 50% Data from Bid Requests in Ad-Exchanges
  • 47. WWW 2010 Node: hostname Edge: 50% co-visitation
  • 51.
  • 52.
  • 53.
  • 55. Unreasonable Performance Increase Spring 12 2 weeks Performance Index 2x
  • 56. Now it is coming also to brands • ‘Cookie Stuffing’ increases the value of the ad for retargeting • Messing up Web analytics … • Messes up my models because a botnet is easier to predict than a human
  • 57. Fraud pollutes my models • Don’t show ads on those sites • Don’t show ads to a high jacked browser • Need to remove the visits to the fraud sites • Need to remove the fraudulent brand visits When we see a browser on caught up in fraudulent activity: send him to the penalty box where we ignore all his actions
  • 58. Using the penalty box: all back to normal 56 3 more weeks in spring 2012 Performance Index
  • 60. Somebody is posing as nytimes.com
  • 61. Bottom-line It is all a question of how good you are at cheating! And that you can catch the bad guys at cheating …
  • 62. In eigener Sache claudia.perlich@gmail.com
  • 63. 1. B. Dalessandro, F. Provost, R. Hook. Audience Selection for On-Line Brand Advertising: Privacy Friendly Social Network Targeting, KDD 2009 2. O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Estimating The Effect Of Online Display Advertising On Browser Conversion. ADKDD 2011 3. C.Perlich, O. Stitelman, B. Dalessandro, T. Raeder and F. Provost. Bid Optimizing and Inventory Scoring in Targeted Online Advertising. KDD 2012 (Best Paper Award) 4. T. Raeder, O. Stitelman, B. Dalessandro, C. Perlich, and F. Provost. Design Principles of Massive, Robust Prediction Systems. KDD 2012 5. B. Dalessandro, O. Stitelman, C. Perlich, F. Provost Causally Motivated Attribution for Online Advertising. In Proceedings of KDD, ADKDD 2012 6. B. Dalessandro, R. Hook. C. Perlich, F. Provost. Transfer Learning for Display Advertising MLJ 2014 7. T. Raeder, C. Perlich, B. Dalessandro, O. Stitelman, F. Provost. Scalable Supervised Dimensionality Reduction Using Clustering at KDD 2013 8. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, F. Provost. Using Co-visitation Networks For Classifying Non-Intentional Traffic‘ at KDD 2013 61 Some References
  • 64. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/display-advertising- big-data