SlideShare a Scribd company logo
Online Display Advertising
Optimization with H2O
Hassan Namarvar
Principal Data Scientist
SF DATA MINING MEETUP
December 9th, 2014
2
OUTLINE
 Introducing ShareThis
 Online display advertising problem
 Estimation of conversion rate using H2O
 Results from live campaigns
 Ongoing work
 Q&A
SHARING TOOLS AT SCALE
23 Billion PAGE
VIEWS
120 SOCIAL
CHANNELS
1. comScore Media Matrix Report * Includes PC, Tablet, and Mobile sites.
210 MM US USERS1
95% REACH*
2.4 MM SITES AND
APPS
This is Missy!
She is busy chatting
and browsing on the
web…
USER
Missy reads an article and
shares it to her Facebook
page using the ShareThis
widget
SOCIAL ACTIVITY
ShareThis observes the
share and can then target
Missy and her friends with
advertising messages
tailored to their interests
SOCIAL DATA
MAKING SOCIAL DATA ACTIONABLE
CATEGORY TARGETING: TECHNOLOGY
TVS
1.1 MM
AUDIO
800K
SMARTPHONES
13.7 MM
TABLETS
5.3 MM
PCs
6 MM
GAMING
7 MM
CAMERAS
1.3 MM
28.6 MM
USERS
35 MM+
SOCIAL ACTIONS
1.2 MM
SOCIAL ACTIONS/DAY
STANDARD TARGETING
THRESHOLD
INTEREST
TIME
TRIGGER
EXCITEMENT
PEAK READINESS
FOR
ENGAGEMENT
FADING INTEREST
 MALE 25-45
 TECH ENTHUSIAST
 $HHI $75K+
“DAN”
6
ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most
relevant moment
SHARETHIS
MESSAGING TRIGGER
REAL TIME MESSAGING REACHES USERS
DURING PEAK INTEREST
7
ONLINE DISPLAY ADVERTISING
Advertisers’ goal is to target the most receptive online audience
in the right context and right time, so that to influence users to
engage with the ad.
Publisher Web
Page
Ad Ad
Exchange
Model Pipeline
(Production)
Real Time
Bidding (RTB)
System
ShareThis Data
Campaign DataMeta Data
Models
8
ONLINE DISPLAY ADVERTISING
Campaign Performance
Advertisers seek the optimal price to bid for each ad call.
 Cost per Click (CPC) Model
 Cost per Action (CPA) Model
9
MODELING CONVERSION RATE (CVR)
CTR and CVR are directly related to the user interacting with the
ad in a given context.
Challenge
 They are fundamentally difficult to directly model and predict.
 Even CVR is harder than CTR since conversion are very rare
events
 View-through conversions have longer delays in the logging
system.
10
PROBLEM SETUP
Let define Users, Publishers, Ads, Devices, and Locations as:
Goal
Find the optimal ad such that the probability of conversion is the
highest.
11
PROBLEM SETUP
At single user level, the problem is a binary problem: conversion
or no conversion.
Conversion event is a random binary event
Transactional (low-level) data features are poorly correlated with
user’s direct response on a display ad.
12
DATA HIERARCHIES
A2
A1
A0 Root
Advertiser1
Campaign
1
Campaign
2
Advertiser2
Campaign
3
Campaign
K
L2
L1
L0 Root
Location
1
Zipcode 1
Zipcode 2
Location
2
Zipcode 3
Zipcode
N
U2
U1
U0 Root
UserClust 1
UserGroup
1
UserGroup
2
UserClust 2
UserGroup
3
UserGroup
I
P2
P1
P0 Root
PubType
1
Publisher
1
Publisher
2
PubType
2
Publisher
3
Publisher
J
13
HIGH LEVEL MODELING
Compute conversions for similar users, contexts, ads, …
Maximum Likelihood Estimate (MLE):
14
COMBINING EESTIMATORS
LOGISTIC REGRESSION
Let denote MLE of the CVR’s of events at Q
different levels.
Goal
Estimate CVR using combination of estimators:
Log-likelihood
Logistic Regression
15
PRACTICAL ISSUES
Data Imbalance
 CVR is inherently very low
 Need to up-sample conversions or down-sample non conversions
Remove Anomalies
 Retargeting visit data as proxy for cnv when cnv data is not available
 Remove outliers
Missing Features
 Sometimes features are missing or not enough conversions
 Impute features
Feature Selection
 Discard feature if more than 70% of the training examples are missing
 Variance of attribution is lower than a threshold (10e-9)
16
WHY NEW MACHINE LEARNING TOOL?
Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop
RMR, native Spark MLLib have their own issues.
Critical Features for a state-of-the-art ML package:
 Ease of use
 System reliability
 In-memory (fast)
 Distributed
 Extensible (API/SDK)
 Accurate algorithms
 Visualization (data and results)
 Easy to deploy to production
17
H2O PLATFORM
Screen shot for H2O platform web API
18
H2O PLATFORM: GLM MODEL
Screen shot for the CPA model using the GLM algorithm.
19
SCORE CALIBRATION
Calibrate Model Scores
 Find best threshold from AUC
 Ad server attributes a conversion to the last impression
 RTB needs to deliver certain amount of impressions per day
 There is a trade-off between wasting impressions and winning
conversions.
20
BUILDING A CPA MODEL
RETARGETED VISITS AS A PROXY FOR CONVERSIONS
USER-CENTRIC
Focus on RT Users
Deliver Ads at the optimal
times
BETTER
PERFORMANCE
Leverage optimization
opportunities
OPTIMAL TIME
Target Users Who Likely
Convert
DON’T WASTE IMP.
21
LIVE TEST ON A CAR INSURANCE CAMPAIGN
TESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA.
The CPA test for a car insurance campaign showed 58% improvement on
eCPA and 57% on conversion rate (CVR).
22
LIVE TESTS ON DIFFERENT CAMPAIGNS
OBSERVED CPA LIFT
23
ONGOING WORK
 Tests are expensive and time consuming
 We need to evaluate models before deploying to production
 Build many models and evaluate them offline
 Different datasets
 Different features
 Different algorithms
24
COMBINING ESTIMATORS
GRADIENT BOOSTING MACHINE
Let denote categorical features.
Goal
Estimate CVR using an ensemble of weak prediction models,
decision trees:
Gradient boosting combines weak learners into a single strong
learner, in an iterative fashion.
25
MODEL COMPARISON
Comparing AUC plots of GBM and RF models on test data:
26
OFFLINE SIMULATIONS
Comparing AUC plots of GBM and RF models on test data:
27
OFFLINE SIMULATIONS
Selecting models in practice
 Accuracy of prediction on unseen data
 Scoring time at production
 Remove anomalies using Deep Learning
 Correlations with other campaign KPIs (CTR, Brand lift,
Viewability, Winning Price, …)
 Performance Stability
28
EVALUATION ON IMPRESSION DATA
Correlation of GBM model scores with CTR
29
EVALUATION ON IMPRESSION DATA
Correlation of GBM model scores with average winning bid price
30
GBM MODEL TESTS vs GLM MODEL CONTROL
A/B TEST: OBSERVED CPA LIFT
31
CONCLUSION
How H2O helped us?
 Maximized ROI by optimizing campaign performance and
budget allocation.
 Empowered advanced ML algorithms in Hadoop cluster
 Used all data and build models much faster
 Reduced R&D time significantly
 Building a smooth model building pipeline (R and Spark API)
ACKNOWLEDGEMENT
THE TEAM:
Prasanta Behera
Xibin Chen
Wahid Chrabakh
Jinghao Miao
Hassan Namarvar
Yan Qu
THANK YOU!
SHARETHIS IS HIRING!
Please check out:
www.sharethis.com/about/careers
Q&A

More Related Content

Similar to Online Display Advertising Optimization with H2O at ShareThis

20141209 meetup hassan
20141209 meetup hassan20141209 meetup hassan
20141209 meetup hassan
Nanda Kishore
 
Revolutionizing a broken market by perion
Revolutionizing a broken market by perionRevolutionizing a broken market by perion
Revolutionizing a broken market by perion
perionnetwork
 
IRJET- Advertisement Delivery in Vanet Based on User Preference
IRJET- Advertisement Delivery in Vanet Based on User PreferenceIRJET- Advertisement Delivery in Vanet Based on User Preference
IRJET- Advertisement Delivery in Vanet Based on User Preference
IRJET Journal
 
Core principles for successful Ad monetization / Vlad Muntean (Google)
Core principles for successful Ad monetization / Vlad Muntean (Google)Core principles for successful Ad monetization / Vlad Muntean (Google)
Core principles for successful Ad monetization / Vlad Muntean (Google)
DevGAMM Conference
 
Onyx Beacon: technology and commercial presentation 2015
Onyx Beacon: technology and commercial presentation 2015Onyx Beacon: technology and commercial presentation 2015
Onyx Beacon: technology and commercial presentation 2015
Onyx Beacon
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 
Deconstructing the Programmatic Ecosystem
Deconstructing the Programmatic EcosystemDeconstructing the Programmatic Ecosystem
Deconstructing the Programmatic Ecosystem
Katana Media
 
Connecting Applications from Mobile to Mainframe in the Application Economy
Connecting Applications from Mobile to Mainframe in the Application EconomyConnecting Applications from Mobile to Mainframe in the Application Economy
Connecting Applications from Mobile to Mainframe in the Application Economy
CA Technologies
 
Burt Presentation
Burt PresentationBurt Presentation
Burt Presentation
Webtisan Studio
 
ETARGET RTB Banner Advertising
ETARGET RTB Banner AdvertisingETARGET RTB Banner Advertising
ETARGET RTB Banner Advertising
Etarget
 
Secrets from the trenches
Secrets from the trenchesSecrets from the trenches
Secrets from the trenches
VWO
 
Powering Developer Communities for Prize Challenges
Powering Developer Communities for Prize ChallengesPowering Developer Communities for Prize Challenges
Powering Developer Communities for Prize Challenges
Crowdsourcing Week
 
Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...
Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...
Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...
Ensighten
 
Webinar Deck - How to Achieve True Mobile Marketing Agility
Webinar Deck - How to Achieve True Mobile Marketing Agility Webinar Deck - How to Achieve True Mobile Marketing Agility
Webinar Deck - How to Achieve True Mobile Marketing Agility
Ensighten
 
Devtodev
DevtodevDevtodev
Devtodev
devtodev
 
The Evolution of OOH through Programmatic
The Evolution of OOH through Programmatic The Evolution of OOH through Programmatic
The Evolution of OOH through Programmatic
Drew Thachuk
 
Appodeal general deck
Appodeal general deckAppodeal general deck
Appodeal general deck
Pavel Golubev
 
Mobile Acquisition Strategy for New IPs
Mobile Acquisition Strategy for New IPsMobile Acquisition Strategy for New IPs
Mobile Acquisition Strategy for New IPs
Yong Park
 
J. Phenix Streamlining Ops and Maximizing Revenue from Ad Network Social De...
J.  Phenix  Streamlining Ops and Maximizing Revenue from Ad Network Social De...J.  Phenix  Streamlining Ops and Maximizing Revenue from Ad Network Social De...
J. Phenix Streamlining Ops and Maximizing Revenue from Ad Network Social De...
Mediabistro
 

Similar to Online Display Advertising Optimization with H2O at ShareThis (20)

20141209 meetup hassan
20141209 meetup hassan20141209 meetup hassan
20141209 meetup hassan
 
Revolutionizing a broken market by perion
Revolutionizing a broken market by perionRevolutionizing a broken market by perion
Revolutionizing a broken market by perion
 
140521 babylon quarto_meeting_nielsen
140521 babylon quarto_meeting_nielsen140521 babylon quarto_meeting_nielsen
140521 babylon quarto_meeting_nielsen
 
IRJET- Advertisement Delivery in Vanet Based on User Preference
IRJET- Advertisement Delivery in Vanet Based on User PreferenceIRJET- Advertisement Delivery in Vanet Based on User Preference
IRJET- Advertisement Delivery in Vanet Based on User Preference
 
Core principles for successful Ad monetization / Vlad Muntean (Google)
Core principles for successful Ad monetization / Vlad Muntean (Google)Core principles for successful Ad monetization / Vlad Muntean (Google)
Core principles for successful Ad monetization / Vlad Muntean (Google)
 
Onyx Beacon: technology and commercial presentation 2015
Onyx Beacon: technology and commercial presentation 2015Onyx Beacon: technology and commercial presentation 2015
Onyx Beacon: technology and commercial presentation 2015
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
Deconstructing the Programmatic Ecosystem
Deconstructing the Programmatic EcosystemDeconstructing the Programmatic Ecosystem
Deconstructing the Programmatic Ecosystem
 
Connecting Applications from Mobile to Mainframe in the Application Economy
Connecting Applications from Mobile to Mainframe in the Application EconomyConnecting Applications from Mobile to Mainframe in the Application Economy
Connecting Applications from Mobile to Mainframe in the Application Economy
 
Burt Presentation
Burt PresentationBurt Presentation
Burt Presentation
 
ETARGET RTB Banner Advertising
ETARGET RTB Banner AdvertisingETARGET RTB Banner Advertising
ETARGET RTB Banner Advertising
 
Secrets from the trenches
Secrets from the trenchesSecrets from the trenches
Secrets from the trenches
 
Powering Developer Communities for Prize Challenges
Powering Developer Communities for Prize ChallengesPowering Developer Communities for Prize Challenges
Powering Developer Communities for Prize Challenges
 
Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...
Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...
Marketing in the Moment: Trends and Innovations in Real-Time Omni-Channel Mar...
 
Webinar Deck - How to Achieve True Mobile Marketing Agility
Webinar Deck - How to Achieve True Mobile Marketing Agility Webinar Deck - How to Achieve True Mobile Marketing Agility
Webinar Deck - How to Achieve True Mobile Marketing Agility
 
Devtodev
DevtodevDevtodev
Devtodev
 
The Evolution of OOH through Programmatic
The Evolution of OOH through Programmatic The Evolution of OOH through Programmatic
The Evolution of OOH through Programmatic
 
Appodeal general deck
Appodeal general deckAppodeal general deck
Appodeal general deck
 
Mobile Acquisition Strategy for New IPs
Mobile Acquisition Strategy for New IPsMobile Acquisition Strategy for New IPs
Mobile Acquisition Strategy for New IPs
 
J. Phenix Streamlining Ops and Maximizing Revenue from Ad Network Social De...
J.  Phenix  Streamlining Ops and Maximizing Revenue from Ad Network Social De...J.  Phenix  Streamlining Ops and Maximizing Revenue from Ad Network Social De...
J. Phenix Streamlining Ops and Maximizing Revenue from Ad Network Social De...
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 

Online Display Advertising Optimization with H2O at ShareThis

  • 1. Online Display Advertising Optimization with H2O Hassan Namarvar Principal Data Scientist SF DATA MINING MEETUP December 9th, 2014
  • 2. 2 OUTLINE  Introducing ShareThis  Online display advertising problem  Estimation of conversion rate using H2O  Results from live campaigns  Ongoing work  Q&A
  • 3. SHARING TOOLS AT SCALE 23 Billion PAGE VIEWS 120 SOCIAL CHANNELS 1. comScore Media Matrix Report * Includes PC, Tablet, and Mobile sites. 210 MM US USERS1 95% REACH* 2.4 MM SITES AND APPS
  • 4. This is Missy! She is busy chatting and browsing on the web… USER Missy reads an article and shares it to her Facebook page using the ShareThis widget SOCIAL ACTIVITY ShareThis observes the share and can then target Missy and her friends with advertising messages tailored to their interests SOCIAL DATA MAKING SOCIAL DATA ACTIONABLE
  • 5. CATEGORY TARGETING: TECHNOLOGY TVS 1.1 MM AUDIO 800K SMARTPHONES 13.7 MM TABLETS 5.3 MM PCs 6 MM GAMING 7 MM CAMERAS 1.3 MM 28.6 MM USERS 35 MM+ SOCIAL ACTIONS 1.2 MM SOCIAL ACTIONS/DAY
  • 6. STANDARD TARGETING THRESHOLD INTEREST TIME TRIGGER EXCITEMENT PEAK READINESS FOR ENGAGEMENT FADING INTEREST  MALE 25-45  TECH ENTHUSIAST  $HHI $75K+ “DAN” 6 ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most relevant moment SHARETHIS MESSAGING TRIGGER REAL TIME MESSAGING REACHES USERS DURING PEAK INTEREST
  • 7. 7 ONLINE DISPLAY ADVERTISING Advertisers’ goal is to target the most receptive online audience in the right context and right time, so that to influence users to engage with the ad. Publisher Web Page Ad Ad Exchange Model Pipeline (Production) Real Time Bidding (RTB) System ShareThis Data Campaign DataMeta Data Models
  • 8. 8 ONLINE DISPLAY ADVERTISING Campaign Performance Advertisers seek the optimal price to bid for each ad call.  Cost per Click (CPC) Model  Cost per Action (CPA) Model
  • 9. 9 MODELING CONVERSION RATE (CVR) CTR and CVR are directly related to the user interacting with the ad in a given context. Challenge  They are fundamentally difficult to directly model and predict.  Even CVR is harder than CTR since conversion are very rare events  View-through conversions have longer delays in the logging system.
  • 10. 10 PROBLEM SETUP Let define Users, Publishers, Ads, Devices, and Locations as: Goal Find the optimal ad such that the probability of conversion is the highest.
  • 11. 11 PROBLEM SETUP At single user level, the problem is a binary problem: conversion or no conversion. Conversion event is a random binary event Transactional (low-level) data features are poorly correlated with user’s direct response on a display ad.
  • 12. 12 DATA HIERARCHIES A2 A1 A0 Root Advertiser1 Campaign 1 Campaign 2 Advertiser2 Campaign 3 Campaign K L2 L1 L0 Root Location 1 Zipcode 1 Zipcode 2 Location 2 Zipcode 3 Zipcode N U2 U1 U0 Root UserClust 1 UserGroup 1 UserGroup 2 UserClust 2 UserGroup 3 UserGroup I P2 P1 P0 Root PubType 1 Publisher 1 Publisher 2 PubType 2 Publisher 3 Publisher J
  • 13. 13 HIGH LEVEL MODELING Compute conversions for similar users, contexts, ads, … Maximum Likelihood Estimate (MLE):
  • 14. 14 COMBINING EESTIMATORS LOGISTIC REGRESSION Let denote MLE of the CVR’s of events at Q different levels. Goal Estimate CVR using combination of estimators: Log-likelihood Logistic Regression
  • 15. 15 PRACTICAL ISSUES Data Imbalance  CVR is inherently very low  Need to up-sample conversions or down-sample non conversions Remove Anomalies  Retargeting visit data as proxy for cnv when cnv data is not available  Remove outliers Missing Features  Sometimes features are missing or not enough conversions  Impute features Feature Selection  Discard feature if more than 70% of the training examples are missing  Variance of attribution is lower than a threshold (10e-9)
  • 16. 16 WHY NEW MACHINE LEARNING TOOL? Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop RMR, native Spark MLLib have their own issues. Critical Features for a state-of-the-art ML package:  Ease of use  System reliability  In-memory (fast)  Distributed  Extensible (API/SDK)  Accurate algorithms  Visualization (data and results)  Easy to deploy to production
  • 17. 17 H2O PLATFORM Screen shot for H2O platform web API
  • 18. 18 H2O PLATFORM: GLM MODEL Screen shot for the CPA model using the GLM algorithm.
  • 19. 19 SCORE CALIBRATION Calibrate Model Scores  Find best threshold from AUC  Ad server attributes a conversion to the last impression  RTB needs to deliver certain amount of impressions per day  There is a trade-off between wasting impressions and winning conversions.
  • 20. 20 BUILDING A CPA MODEL RETARGETED VISITS AS A PROXY FOR CONVERSIONS USER-CENTRIC Focus on RT Users Deliver Ads at the optimal times BETTER PERFORMANCE Leverage optimization opportunities OPTIMAL TIME Target Users Who Likely Convert DON’T WASTE IMP.
  • 21. 21 LIVE TEST ON A CAR INSURANCE CAMPAIGN TESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA. The CPA test for a car insurance campaign showed 58% improvement on eCPA and 57% on conversion rate (CVR).
  • 22. 22 LIVE TESTS ON DIFFERENT CAMPAIGNS OBSERVED CPA LIFT
  • 23. 23 ONGOING WORK  Tests are expensive and time consuming  We need to evaluate models before deploying to production  Build many models and evaluate them offline  Different datasets  Different features  Different algorithms
  • 24. 24 COMBINING ESTIMATORS GRADIENT BOOSTING MACHINE Let denote categorical features. Goal Estimate CVR using an ensemble of weak prediction models, decision trees: Gradient boosting combines weak learners into a single strong learner, in an iterative fashion.
  • 25. 25 MODEL COMPARISON Comparing AUC plots of GBM and RF models on test data:
  • 26. 26 OFFLINE SIMULATIONS Comparing AUC plots of GBM and RF models on test data:
  • 27. 27 OFFLINE SIMULATIONS Selecting models in practice  Accuracy of prediction on unseen data  Scoring time at production  Remove anomalies using Deep Learning  Correlations with other campaign KPIs (CTR, Brand lift, Viewability, Winning Price, …)  Performance Stability
  • 28. 28 EVALUATION ON IMPRESSION DATA Correlation of GBM model scores with CTR
  • 29. 29 EVALUATION ON IMPRESSION DATA Correlation of GBM model scores with average winning bid price
  • 30. 30 GBM MODEL TESTS vs GLM MODEL CONTROL A/B TEST: OBSERVED CPA LIFT
  • 31. 31 CONCLUSION How H2O helped us?  Maximized ROI by optimizing campaign performance and budget allocation.  Empowered advanced ML algorithms in Hadoop cluster  Used all data and build models much faster  Reduced R&D time significantly  Building a smooth model building pipeline (R and Spark API)
  • 32. ACKNOWLEDGEMENT THE TEAM: Prasanta Behera Xibin Chen Wahid Chrabakh Jinghao Miao Hassan Namarvar Yan Qu THANK YOU!
  • 33. SHARETHIS IS HIRING! Please check out: www.sharethis.com/about/careers Q&A

Editor's Notes

  1. You’ve probably seen and used the ShareThis widget and tools … which isn’t surprising. We allow content to be shared seamlessly at nearly ubiquitous scale web-wide. As consumers are celebrating, entertaining and educating their circles of friends, colleagues and community members, ShareThis is at the center of each of those moments … I can share an example of one of those moments and how it leads to your brand engaging with one of your customers …
  2. Let’s take Missy, for example. As Missy is browsing the web, she comes across an article about laptops. Now because she’s just started her research for that new laptop before heading off to college in the fall, she’s going to post and email that article to various networks. ShareThis observes this share as well as the downstream social activity which allows us to effectively deliver the right message tailored on your behalf. Let’s explore more about how this happens …
  3. With our category shareblock, you can surround entire categories or subverticals that matter most to your brand. As an example, EVERY DAY nearly 1M social actions happen around technology-related topics. Here, you could immerse your brand into that technology content & the actions that accompany it. GET READY FOR MUTLIPLE FORMATS AND SCREENS …
  4. MOVING AWAY FROM OUTDATED AUDIENCE TARGETING BUCKETS – TO UTILIZING “FRESHER” REAL-TIME DATA . Other companies use standard audience targeting and bucket Dan as a “tech enthusiast”, we message him at the moments when it’s most relevant.