SlideShare a Scribd company logo
1 of 28
Download to read offline
Presentation for School of Continuing Studies
Data Science / Engineering
Section I: Advertising Technology Landscape
About Me
- Name: Larkin Liu
- Role: Data Scientist @ StackAdapt since 2016
- Specialties: Apache Spark, Scala, Python, R
- Education: MASc in Industrial Engineering, Specializing
in Operations Research, University of Toronto
- Other Fun Facts:
- Chinese / Canadian
- Competitive MMA fighter, and kickboxer.
- I really like race cars.
What I do ?
Agenda
Increase Profitability of Campaigns
- Ad Tech Landscape
- ML Models
- Logistic Regression
- Bagging: Random Forest
- Boosting: Adaboost (Gradient Boosted Trees, xgboost)
- Survival Regression (Proportional Hazards, Accelerated Failure Time Model)
- (Natural Language Processing)
- AB Testing
- RTB Auction Strategy
Real Time Bidding
- Online advertising goes through a process
known as Real Time Bidding (RTB)
- StackAdapt is a Demand Side Platform
(DSP).
- DSP’s interface with clients running
advertising campaigns, facing the Ad
Exchange.
- Our objective is to win valuable ad
impressions for our client’a campaigns.
Overview (Objectives)
- The ad exchange is a second price
auction.
- We bid on advertisements that are
valuable to our client.
- To accomplish we predict the likelihood
of a defined conversion, based on ML
modelling.
- We set our bid price proportional to our
predicted probability of a conversion.
Key Terms
- KPI - Key Performance Metric
- Win Price - the win price of the advertisement on the ad exchange, actual cost.
- Bid Price - what the DSP bid for the advertisement.
- CPM - Cost per Mille, cost per 1000 impressions.
- eCPC - Effective cost per click (total cost/number of clicks)
- eCPE - Effective cost per engagement (total cost/number of engagements)
- eCPA - Effective cost per action (total cost/number of conversions)
- AB Testing - split testing algorithms between control group and treatment algorithm.
Expectation
Initially we believed that each optimizer we designed will have a desired effect on the intermediate KPI’s
(CTR, eCPC, eCPE, eCPA, etc.), which in turn affect the overall profit of each campaign.
Reality
In reality, we discovered that the effect of each optimizer on various intermediate KPI’s follow a more
complex interaction scheme, which is also dependent on the market dynamics.
Data Science / Engineering
Section II: ML Models
Logistic Regression
Logistic Regression
We interpret the probability of p
i
provided predictor variables x
0,i
, x
1,i
, ..., x
m,i
.
Univariate logistic regression model F(x)
Can be re-written as, interpreted as the Odds ratio, where
F(x) is interpreted as probability of response = 1 (p)
Logistic Regression with
Interaction (IX) Terms
- Basic logistic regression makes a key assumption that all observations are independent of one
another. This is not the case in our data set.
- Interaction terms take into account the interaction between variables. For example, where variables X
and Z may not be independent, and the interaction between X and Z produce an effect on the log
odds.
- When deploying logistic regression for prediction of key KPI’s, the addition of interaction terms crucial
for accurate prediction, as variables are not independent, and the interaction between variables may
have a key effect in predicting KPI’s.
AdaBoost
- Adaptive Boosting (AdaBoost) is a well-established boosting algorithm.
- Unlike bagging, it produces a linear combination of tree results.
- Each weak classifier is trained on the entire dataset.
- Misclassified results are accentuated, and correctly classified results are diminished, depending on
each of the weak classifier results.
- The result is a linear combination of weak classifiers.
- Boosting can resolve the inherent capabilities of a specific class of classifiers, as well as reduce
class imbalance.
AdaBoost
Illustrative example of boosting.
AdaBoost Algorithm
Random Forest
- Random Forest (Breiman 2001) is a very
established bagging classification
algorithm allowing us to perform
classification and regression.
- An extension of the decision tree algorithm,
RF combines a random sampling of the
data, sample of the features, and sample of
the in and combines the result of many
small weak predictors.
- This approach makes RF much more
robust. Preventing overfitting and bias.
Survival Regression
- Proportional Hazards Model
- Accelerated Failure Time Model
- Models were evaluated using Akaike Information
Criterion (AIC), and Root Mean Square Error (RMSE).
- Primarily used to measure the time it takes for users to remain on a site (time on site). The
longer a user remains on a site, the lower the probability.
Survival Modelling
Survival Modelling
- We used a Random Forest model. Parameters,
- m: 33% of Total No. of Features
- No. Trees: 100
- Max Depth: 10 Layers
- Average RMSE across 10-fold cross validation
of 145. (A 25% Improvement from the Survival
Models investigated earlier).
Data Science / Engineering
Section III: RTB Deployment
AB Testing
- Currently our tests run 50/50 splits (S = 0.5), 50% goes to A group (control) 50% goes to B group
(experimental treatment).
- Our goal is to maximize profit, and minimize eCPC, something which we can achieve by deploying
our ML models.
- However, the effect of any model on any specific campaign can vary.
EMR-AB13-IX-5day-dailyUpdate
- Experimental Model Avg eCPC: 0.819
- Control Group eCPC: 0.833
- Experimental Model Profit: 2457.11
- Control Group Profit: 2773.32
EMR-AB14-mean_encoded_logisti
c_regression
- Experimental Model Avg eCPC: 1.246
- Control Group eCPC: 0.675
- Experimental Model Profit: 2285.23
- Control Group Profit: 1086.80
But wait
- Models perform differently with regards to various KPI’s and models on a
campaign-specific basis….
- Solution: a larger proportion of bid requests should go to the model with better KPI
performance.
RTB Optimizer
Our Min*/Max* framework is based off of a PID controller, where we adjust the split (S) proportional to
our objective of attaining a minimum or maximum value.
- Proportional: Immediate Error
- Integral: Cumulative Error
- Derivative: Rate of Change
References
- Zhang, Weinan, “Optimal Real-Time Bidding for Display Advertising”, 2016
- Freund & Schapire, “Experiments with a New Boosting Algorithm”, 1996
QUESTIONS

More Related Content

Similar to StackAdapt Machine Learning Pipeline

Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningSharvil Katariya
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsKun Liu
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchArtemSunfun
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Data-Driven (Reinforcement Learning-Based) Control
Data-Driven (Reinforcement Learning-Based) ControlData-Driven (Reinforcement Learning-Based) Control
Data-Driven (Reinforcement Learning-Based) ControlDebmalya Biswas
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Hayim Makabee
 
11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptx11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptxhu153574
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
What's New in SAP HANA SPS 11 Predictive
What's New in SAP HANA SPS 11 PredictiveWhat's New in SAP HANA SPS 11 Predictive
What's New in SAP HANA SPS 11 PredictiveSAP Technology
 
Machine Learning Explained and how apply lean startup to develop a MVP tool
Machine Learning Explained and how apply lean startup to develop a MVP toolMachine Learning Explained and how apply lean startup to develop a MVP tool
Machine Learning Explained and how apply lean startup to develop a MVP toolFranki Chamaki
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
A simple model for ppc bidding
A simple model for ppc biddingA simple model for ppc bidding
A simple model for ppc biddingYun Liu
 
IRJET - Ensembling Reinforcement Learning for Portfolio Management
IRJET -  	  Ensembling Reinforcement Learning for Portfolio ManagementIRJET -  	  Ensembling Reinforcement Learning for Portfolio Management
IRJET - Ensembling Reinforcement Learning for Portfolio ManagementIRJET Journal
 
Logistic Regression using Mahout
Logistic Regression using MahoutLogistic Regression using Mahout
Logistic Regression using Mahouttanuvir
 

Similar to StackAdapt Machine Learning Pipeline (20)

Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
Stock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised LearningStock Price Trend Forecasting using Supervised Learning
Stock Price Trend Forecasting using Supervised Learning
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Data-Driven (Reinforcement Learning-Based) Control
Data-Driven (Reinforcement Learning-Based) ControlData-Driven (Reinforcement Learning-Based) Control
Data-Driven (Reinforcement Learning-Based) Control
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptx11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptx
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
What's New in SAP HANA SPS 11 Predictive
What's New in SAP HANA SPS 11 PredictiveWhat's New in SAP HANA SPS 11 Predictive
What's New in SAP HANA SPS 11 Predictive
 
Machine Learning Explained and how apply lean startup to develop a MVP tool
Machine Learning Explained and how apply lean startup to develop a MVP toolMachine Learning Explained and how apply lean startup to develop a MVP tool
Machine Learning Explained and how apply lean startup to develop a MVP tool
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
A simple model for ppc bidding
A simple model for ppc biddingA simple model for ppc bidding
A simple model for ppc bidding
 
IRJET - Ensembling Reinforcement Learning for Portfolio Management
IRJET -  	  Ensembling Reinforcement Learning for Portfolio ManagementIRJET -  	  Ensembling Reinforcement Learning for Portfolio Management
IRJET - Ensembling Reinforcement Learning for Portfolio Management
 
Logistic Regression using Mahout
Logistic Regression using MahoutLogistic Regression using Mahout
Logistic Regression using Mahout
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

StackAdapt Machine Learning Pipeline

  • 1. Presentation for School of Continuing Studies
  • 2. Data Science / Engineering Section I: Advertising Technology Landscape
  • 3. About Me - Name: Larkin Liu - Role: Data Scientist @ StackAdapt since 2016 - Specialties: Apache Spark, Scala, Python, R - Education: MASc in Industrial Engineering, Specializing in Operations Research, University of Toronto - Other Fun Facts: - Chinese / Canadian - Competitive MMA fighter, and kickboxer. - I really like race cars.
  • 5. Agenda Increase Profitability of Campaigns - Ad Tech Landscape - ML Models - Logistic Regression - Bagging: Random Forest - Boosting: Adaboost (Gradient Boosted Trees, xgboost) - Survival Regression (Proportional Hazards, Accelerated Failure Time Model) - (Natural Language Processing) - AB Testing - RTB Auction Strategy
  • 6. Real Time Bidding - Online advertising goes through a process known as Real Time Bidding (RTB) - StackAdapt is a Demand Side Platform (DSP). - DSP’s interface with clients running advertising campaigns, facing the Ad Exchange. - Our objective is to win valuable ad impressions for our client’a campaigns.
  • 7. Overview (Objectives) - The ad exchange is a second price auction. - We bid on advertisements that are valuable to our client. - To accomplish we predict the likelihood of a defined conversion, based on ML modelling. - We set our bid price proportional to our predicted probability of a conversion.
  • 8. Key Terms - KPI - Key Performance Metric - Win Price - the win price of the advertisement on the ad exchange, actual cost. - Bid Price - what the DSP bid for the advertisement. - CPM - Cost per Mille, cost per 1000 impressions. - eCPC - Effective cost per click (total cost/number of clicks) - eCPE - Effective cost per engagement (total cost/number of engagements) - eCPA - Effective cost per action (total cost/number of conversions) - AB Testing - split testing algorithms between control group and treatment algorithm.
  • 9. Expectation Initially we believed that each optimizer we designed will have a desired effect on the intermediate KPI’s (CTR, eCPC, eCPE, eCPA, etc.), which in turn affect the overall profit of each campaign.
  • 10. Reality In reality, we discovered that the effect of each optimizer on various intermediate KPI’s follow a more complex interaction scheme, which is also dependent on the market dynamics.
  • 11. Data Science / Engineering Section II: ML Models
  • 12. Logistic Regression Logistic Regression We interpret the probability of p i provided predictor variables x 0,i , x 1,i , ..., x m,i . Univariate logistic regression model F(x) Can be re-written as, interpreted as the Odds ratio, where F(x) is interpreted as probability of response = 1 (p)
  • 13. Logistic Regression with Interaction (IX) Terms - Basic logistic regression makes a key assumption that all observations are independent of one another. This is not the case in our data set. - Interaction terms take into account the interaction between variables. For example, where variables X and Z may not be independent, and the interaction between X and Z produce an effect on the log odds. - When deploying logistic regression for prediction of key KPI’s, the addition of interaction terms crucial for accurate prediction, as variables are not independent, and the interaction between variables may have a key effect in predicting KPI’s.
  • 14. AdaBoost - Adaptive Boosting (AdaBoost) is a well-established boosting algorithm. - Unlike bagging, it produces a linear combination of tree results. - Each weak classifier is trained on the entire dataset. - Misclassified results are accentuated, and correctly classified results are diminished, depending on each of the weak classifier results. - The result is a linear combination of weak classifiers. - Boosting can resolve the inherent capabilities of a specific class of classifiers, as well as reduce class imbalance.
  • 17. Random Forest - Random Forest (Breiman 2001) is a very established bagging classification algorithm allowing us to perform classification and regression. - An extension of the decision tree algorithm, RF combines a random sampling of the data, sample of the features, and sample of the in and combines the result of many small weak predictors. - This approach makes RF much more robust. Preventing overfitting and bias.
  • 18. Survival Regression - Proportional Hazards Model - Accelerated Failure Time Model - Models were evaluated using Akaike Information Criterion (AIC), and Root Mean Square Error (RMSE). - Primarily used to measure the time it takes for users to remain on a site (time on site). The longer a user remains on a site, the lower the probability.
  • 20. Survival Modelling - We used a Random Forest model. Parameters, - m: 33% of Total No. of Features - No. Trees: 100 - Max Depth: 10 Layers - Average RMSE across 10-fold cross validation of 145. (A 25% Improvement from the Survival Models investigated earlier).
  • 21. Data Science / Engineering Section III: RTB Deployment
  • 22. AB Testing - Currently our tests run 50/50 splits (S = 0.5), 50% goes to A group (control) 50% goes to B group (experimental treatment). - Our goal is to maximize profit, and minimize eCPC, something which we can achieve by deploying our ML models. - However, the effect of any model on any specific campaign can vary.
  • 23. EMR-AB13-IX-5day-dailyUpdate - Experimental Model Avg eCPC: 0.819 - Control Group eCPC: 0.833 - Experimental Model Profit: 2457.11 - Control Group Profit: 2773.32
  • 24. EMR-AB14-mean_encoded_logisti c_regression - Experimental Model Avg eCPC: 1.246 - Control Group eCPC: 0.675 - Experimental Model Profit: 2285.23 - Control Group Profit: 1086.80
  • 25. But wait - Models perform differently with regards to various KPI’s and models on a campaign-specific basis…. - Solution: a larger proportion of bid requests should go to the model with better KPI performance.
  • 26. RTB Optimizer Our Min*/Max* framework is based off of a PID controller, where we adjust the split (S) proportional to our objective of attaining a minimum or maximum value. - Proportional: Immediate Error - Integral: Cumulative Error - Derivative: Rate of Change
  • 27. References - Zhang, Weinan, “Optimal Real-Time Bidding for Display Advertising”, 2016 - Freund & Schapire, “Experiments with a New Boosting Algorithm”, 1996