SlideShare a Scribd company logo
1 of 6
Base paper Title: A Comparative Analysis of Sampling Techniques for Click-Through Rate
Prediction in Native Advertising
Abstract
Native advertising is a popular form of online advertisements that has similar styles and
functions with the native content displayed on online platforms, such as news, sports and social
websites. It can better capture users’ attention, and they have gained increasing popularity in
many online platforms and among advertisers. In advertising, Click Trough Rate (CTR)
prediction is essential but challenging due to data sparsity: the non-clicks constitute most of
the data, whereas clicks form a significantly smaller portion. The performance of 19 class
imbalance approaches is compared in this study with the use of four traditional classifiers, to
determine the most effective imbalance methods for our native ads dataset. The data used is
real traffic data from Finland over the course of seven days provided by the native advertising
platform ReadPeak. The resampling methods used include seven undersampling techniques,
four oversampling techniques, four hybrid sampling techniques, and four ensemble systems.
The findings demonstrate that class imbalance learning can enhance the model’s capacity for
classification by as much as 20%. In general, oversampling is more stable comparatively. But,
undersampling performed the best with Random Forest. Our study also demonstrates that the
imbalance ratio plays an important role in the performance of the model and the features
importance.
Existing System
In a class imbalanced dataset, there are significantly less samples in one of its classes
than the other [3]. The difficulties of learning from such imbalanced data are inevitable.
Standard learning classifiers are biased toward the majority class due to the skewed distribution
of the training samples, making them unable to recognize unusual occurrences. It is possible to
mistake noise for rare minority samples and vise-versa [24]. This kind of problem with uneven
data is particularly prevalent in the advertising industry. The dataset contains a lot more non-
clicked advertising than clicked advertisements, and the difference between the two is typically
significant. To solve these issues, researchers have created numerous class imbalance
techniques and performance evaluation criteria that play an important role in this paper. In this
context, we implemented several class imbalance methods on a real-world class imbalanced
dataset provided by the native advertising platform ReadPeak [1]. The dataset used contains
information about the in-screen advertisements and whether they have been clicked or not.
According to the dataset, there is an extreme imbalance between clicks and non-clicks, with
250 non-clicks for every 1 click. Advertising is a key and crucial part of corporate operations.
In 2021, advertisers are estimated to have spent 118.72 billion dollars on display advertising
[52]. Demand-side platform (DSP)’s cost per click pricing model makes advertisers earnings
directly correlated to the number of clicks. Predicting performance indicators such as the click-
through rate (CTR) is crucial for DSP research [35]. The available literature presents a valid
solution to the problem at hand, but addressing the unbalance of data specifically and the
benefits we can get either in terms of training speed and prediction accuracy has been neglected.
In that spirit, the purpose of the current work is to explore the effect of sampling techniques on
prediction accuracy with the combination of a number of classical machine learning models
applied to the prediction of CTR in native advertisement. The contributions of this paper are
summarized as follows: • The performance of 19 class imbalance methods with 4 classical
classifiers is evaluated. The best performing methods are identified based on the AUC and
LogLoss metrics. • An analysis of the unbalanced data problem was conducted using real-world
data from ReadPeak, including complete outline of the steps for feature engineering, selection,
and data cleaning. • Through experimentation, it was found that feature importance varies with
different Imbalance Ratios. The Boruta algorithm was used to compute feature importance
using three levels of sampling, highlighting the importance of balancing the data for accurate
results. • Using Random Forest and random undersampling, it is shown that as the balance
between positive and negative samples is improved, the performance also improves. However,
there is a threshold of imbalance ratio beyond which the performance improvement becomes
marginal. • We available the community access to a real world datasets that can help
researchers study the phenomenon and find solutions to CTR Prediction in the native
advertisement field.
Drawback in Existing System
 Data Representation and Generalization:
Drawback: The choice of a particular sampling technique may lead to biases in the
representation of the dataset. Oversampling or undersampling may impact the generalization
of the models to real-world scenarios, especially when the dataset is not fully representative of
the target population.
 Impact on Model Complexity:
Drawback: Some sampling techniques may inadvertently impact the complexity of the
predictive models. For instance, oversampling may lead to overfitting, especially when dealing
with small datasets, while undersampling might result in loss of valuable information.
 Limited Transparency:
Drawback: Some advanced sampling techniques, especially those involving synthetic
data generation, may lack transparency in terms of how they modify the underlying data
distribution. This can make it challenging to interpret the model's behavior and decision-
making processes.
 Impact on Evaluation Metrics:
Drawback: The choice of sampling technique may influence the performance metrics
used for evaluation. For instance, oversampling may artificially boost metrics like accuracy
but may not necessarily improve the model's ability to generalize to new, unseen data.
Proposed System
 Sampling Techniques:
Implement and analyze various sampling techniques, including but not limited to
random sampling, stratified sampling, undersampling, oversampling, and hybrid methods.
Investigate advanced techniques such as SMOTE (Synthetic Minority Over-sampling
Technique) and ADASYN (Adaptive Synthetic Sampling).
 Predictive Models:
Employ state-of-the-art machine learning models for click-through rate prediction,
including logistic regression, decision trees, random forests, and gradient boosting
machines.
Utilize deep learning models, such as neural networks, to assess the impact of sampling
techniques on more complex predictive architectures.
 Evaluation Metrics:
Measure the performance of each sampling technique using standard evaluation metrics
such as precision, recall, F1 score, and area under the receiver operating characteristic curve
(AUC-ROC).
Assess the impact on model calibration and reliability.
 Comparison and Recommendations:
Present a comparative analysis of the various sampling techniques, highlighting their
strengths, weaknesses, and suitability for different scenarios.
Offer recommendations for selecting an appropriate sampling strategy based on specific
campaign characteristics and data distributions.
Algorithm
 Stratified Sampling:
Algorithm: Stratified sampling based on target classes.
Description: Divides the dataset into strata (subgroups) based on target classes and
samples proportionally from each stratum, ensuring representation across classes.
 ADASYN (Adaptive Synthetic Sampling):
Algorithm: ADASYN algorithm.
Description: An adaptive extension of SMOTE that focuses on generating synthetic
instances for regions of the minority class that are harder to learn.
 Cluster-Based Sampling:
Algorithm: K-means clustering for cluster-based sampling.
Description: Groups instances into clusters and samples proportionally from each
cluster, facilitating localized sampling.
Advantages
 Addressing Class Imbalance:
Advantage: Class imbalance is a common issue in CTR prediction, where the number
of non-click instances often significantly outweighs click instances. Sampling techniques,
such as oversampling or undersampling, help balance class distribution, ensuring that the
model is not biased towards the majority class.
 Enhanced Generalization to Unseen Data:
Advantage: Effective sampling techniques contribute to the generalization of
predictive models to new, unseen data. This is crucial in native advertising where the goal
is to predict CTR for new ad campaigns.
 Mitigation of Bias and Unfairness:
Advantage: Carefully chosen sampling techniques can help mitigate biases in predictive
models, ensuring fair representation and treatment of different classes. This is essential for
ethical considerations in advertising.
 Increased Interpretability:
Advantage: Certain sampling techniques, like undersampling or clustering-based
sampling, may lead to a reduction in the number of instances, making it easier to interpret
model decisions. Enhanced interpretability is valuable for stakeholders who need to
understand the factors influencing CTR predictions.
Software Specification
 Processor : I3 core processor
 Ram : 4 GB
 Hard disk : 500 GB
Software Specification
 Operating System : Windows 10 /11
 Frond End : Python
 Back End : Mysql Server
 IDE Tools : Pycharm

More Related Content

Similar to A Comparative Analysis of Sampling Techniques for Click-Through Rate Prediction in Native Advertising.

direct marketing in banking using data mining
direct marketing in banking using data miningdirect marketing in banking using data mining
direct marketing in banking using data miningHossein Malekinezhad
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONaciijournal
 
Causal Attribution - Proposing a better industry standard for measuring digit...
Causal Attribution - Proposing a better industry standard for measuring digit...Causal Attribution - Proposing a better industry standard for measuring digit...
Causal Attribution - Proposing a better industry standard for measuring digit...Peter Weingard
 
Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .tsysglobalsolutions
 
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYPROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYIJITCA Journal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationKaushik Rajan
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
 
Driving Results through Strategic Data Sourcing and Optimization: Life Line G...
Driving Results through Strategic Data Sourcing and Optimization: Life Line G...Driving Results through Strategic Data Sourcing and Optimization: Life Line G...
Driving Results through Strategic Data Sourcing and Optimization: Life Line G...Vivastream
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modelingEsteban Ribero
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxAnanthReddy38
 
Data Science Meetup 2017
Data Science Meetup 2017Data Science Meetup 2017
Data Science Meetup 2017Adello
 
Machine learning - session 4
Machine learning - session 4Machine learning - session 4
Machine learning - session 4Luis Borbon
 
Simply Data driven behavioural algorithms
Simply Data driven behavioural algorithmsSimply Data driven behavioural algorithms
Simply Data driven behavioural algorithmsNana Bianca
 
Dwdm chapter 5 data mining a closer look
Dwdm chapter 5  data mining a closer lookDwdm chapter 5  data mining a closer look
Dwdm chapter 5 data mining a closer lookShengyou Lin
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxVishalLabde
 

Similar to A Comparative Analysis of Sampling Techniques for Click-Through Rate Prediction in Native Advertising. (20)

direct marketing in banking using data mining
direct marketing in banking using data miningdirect marketing in banking using data mining
direct marketing in banking using data mining
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Causal Attribution - Proposing a better industry standard for measuring digit...
Causal Attribution - Proposing a better industry standard for measuring digit...Causal Attribution - Proposing a better industry standard for measuring digit...
Causal Attribution - Proposing a better industry standard for measuring digit...
 
Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .
 
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYPROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn Classification
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
Driving Results through Strategic Data Sourcing and Optimization: Life Line G...
Driving Results through Strategic Data Sourcing and Optimization: Life Line G...Driving Results through Strategic Data Sourcing and Optimization: Life Line G...
Driving Results through Strategic Data Sourcing and Optimization: Life Line G...
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
 
Data Science Meetup 2017
Data Science Meetup 2017Data Science Meetup 2017
Data Science Meetup 2017
 
Manuscript dss
Manuscript dssManuscript dss
Manuscript dss
 
Machine learning - session 4
Machine learning - session 4Machine learning - session 4
Machine learning - session 4
 
Simply Data driven behavioural algorithms
Simply Data driven behavioural algorithmsSimply Data driven behavioural algorithms
Simply Data driven behavioural algorithms
 
Dwdm chapter 5 data mining a closer look
Dwdm chapter 5  data mining a closer lookDwdm chapter 5  data mining a closer look
Dwdm chapter 5 data mining a closer look
 
Media synergy studies_methodology
Media synergy studies_methodologyMedia synergy studies_methodology
Media synergy studies_methodology
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 

More from Shakas Technologies

A Review on Deep-Learning-Based Cyberbullying Detection
A Review on Deep-Learning-Based Cyberbullying DetectionA Review on Deep-Learning-Based Cyberbullying Detection
A Review on Deep-Learning-Based Cyberbullying DetectionShakas Technologies
 
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...Shakas Technologies
 
A Novel Framework for Credit Card.
A Novel Framework for Credit Card.A Novel Framework for Credit Card.
A Novel Framework for Credit Card.Shakas Technologies
 
NS2 Final Year Project Titles 2023- 2024
NS2 Final Year Project Titles 2023- 2024NS2 Final Year Project Titles 2023- 2024
NS2 Final Year Project Titles 2023- 2024Shakas Technologies
 
MATLAB Final Year IEEE Project Titles 2023-2024
MATLAB Final Year IEEE Project Titles 2023-2024MATLAB Final Year IEEE Project Titles 2023-2024
MATLAB Final Year IEEE Project Titles 2023-2024Shakas Technologies
 
Latest Python IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024Latest Python IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024Shakas Technologies
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...Shakas Technologies
 
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSECYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSEShakas Technologies
 
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Detecting Mental Disorders in social Media through Emotional patterns-The cas...Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Detecting Mental Disorders in social Media through Emotional patterns-The cas...Shakas Technologies
 
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONCOMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONShakas Technologies
 
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCECO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCEShakas Technologies
 
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...Shakas Technologies
 
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...Shakas Technologies
 
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Shakas Technologies
 
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...Shakas Technologies
 
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...Shakas Technologies
 
Fighting Money Laundering With Statistics and Machine Learning.docx
Fighting Money Laundering With Statistics and Machine Learning.docxFighting Money Laundering With Statistics and Machine Learning.docx
Fighting Money Laundering With Statistics and Machine Learning.docxShakas Technologies
 
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...Shakas Technologies
 
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...Shakas Technologies
 
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...Shakas Technologies
 

More from Shakas Technologies (20)

A Review on Deep-Learning-Based Cyberbullying Detection
A Review on Deep-Learning-Based Cyberbullying DetectionA Review on Deep-Learning-Based Cyberbullying Detection
A Review on Deep-Learning-Based Cyberbullying Detection
 
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
 
A Novel Framework for Credit Card.
A Novel Framework for Credit Card.A Novel Framework for Credit Card.
A Novel Framework for Credit Card.
 
NS2 Final Year Project Titles 2023- 2024
NS2 Final Year Project Titles 2023- 2024NS2 Final Year Project Titles 2023- 2024
NS2 Final Year Project Titles 2023- 2024
 
MATLAB Final Year IEEE Project Titles 2023-2024
MATLAB Final Year IEEE Project Titles 2023-2024MATLAB Final Year IEEE Project Titles 2023-2024
MATLAB Final Year IEEE Project Titles 2023-2024
 
Latest Python IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024Latest Python IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
 
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSECYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
 
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Detecting Mental Disorders in social Media through Emotional patterns-The cas...Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
 
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONCOMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
 
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCECO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
 
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
 
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
 
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
 
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
 
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
 
Fighting Money Laundering With Statistics and Machine Learning.docx
Fighting Money Laundering With Statistics and Machine Learning.docxFighting Money Laundering With Statistics and Machine Learning.docx
Fighting Money Laundering With Statistics and Machine Learning.docx
 
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
 
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
 
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
 

Recently uploaded

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 

A Comparative Analysis of Sampling Techniques for Click-Through Rate Prediction in Native Advertising.

  • 1. Base paper Title: A Comparative Analysis of Sampling Techniques for Click-Through Rate Prediction in Native Advertising Abstract Native advertising is a popular form of online advertisements that has similar styles and functions with the native content displayed on online platforms, such as news, sports and social websites. It can better capture users’ attention, and they have gained increasing popularity in many online platforms and among advertisers. In advertising, Click Trough Rate (CTR) prediction is essential but challenging due to data sparsity: the non-clicks constitute most of the data, whereas clicks form a significantly smaller portion. The performance of 19 class imbalance approaches is compared in this study with the use of four traditional classifiers, to determine the most effective imbalance methods for our native ads dataset. The data used is real traffic data from Finland over the course of seven days provided by the native advertising platform ReadPeak. The resampling methods used include seven undersampling techniques, four oversampling techniques, four hybrid sampling techniques, and four ensemble systems. The findings demonstrate that class imbalance learning can enhance the model’s capacity for classification by as much as 20%. In general, oversampling is more stable comparatively. But, undersampling performed the best with Random Forest. Our study also demonstrates that the imbalance ratio plays an important role in the performance of the model and the features importance. Existing System In a class imbalanced dataset, there are significantly less samples in one of its classes than the other [3]. The difficulties of learning from such imbalanced data are inevitable. Standard learning classifiers are biased toward the majority class due to the skewed distribution of the training samples, making them unable to recognize unusual occurrences. It is possible to mistake noise for rare minority samples and vise-versa [24]. This kind of problem with uneven data is particularly prevalent in the advertising industry. The dataset contains a lot more non- clicked advertising than clicked advertisements, and the difference between the two is typically significant. To solve these issues, researchers have created numerous class imbalance techniques and performance evaluation criteria that play an important role in this paper. In this context, we implemented several class imbalance methods on a real-world class imbalanced dataset provided by the native advertising platform ReadPeak [1]. The dataset used contains
  • 2. information about the in-screen advertisements and whether they have been clicked or not. According to the dataset, there is an extreme imbalance between clicks and non-clicks, with 250 non-clicks for every 1 click. Advertising is a key and crucial part of corporate operations. In 2021, advertisers are estimated to have spent 118.72 billion dollars on display advertising [52]. Demand-side platform (DSP)’s cost per click pricing model makes advertisers earnings directly correlated to the number of clicks. Predicting performance indicators such as the click- through rate (CTR) is crucial for DSP research [35]. The available literature presents a valid solution to the problem at hand, but addressing the unbalance of data specifically and the benefits we can get either in terms of training speed and prediction accuracy has been neglected. In that spirit, the purpose of the current work is to explore the effect of sampling techniques on prediction accuracy with the combination of a number of classical machine learning models applied to the prediction of CTR in native advertisement. The contributions of this paper are summarized as follows: • The performance of 19 class imbalance methods with 4 classical classifiers is evaluated. The best performing methods are identified based on the AUC and LogLoss metrics. • An analysis of the unbalanced data problem was conducted using real-world data from ReadPeak, including complete outline of the steps for feature engineering, selection, and data cleaning. • Through experimentation, it was found that feature importance varies with different Imbalance Ratios. The Boruta algorithm was used to compute feature importance using three levels of sampling, highlighting the importance of balancing the data for accurate results. • Using Random Forest and random undersampling, it is shown that as the balance between positive and negative samples is improved, the performance also improves. However, there is a threshold of imbalance ratio beyond which the performance improvement becomes marginal. • We available the community access to a real world datasets that can help researchers study the phenomenon and find solutions to CTR Prediction in the native advertisement field. Drawback in Existing System  Data Representation and Generalization: Drawback: The choice of a particular sampling technique may lead to biases in the representation of the dataset. Oversampling or undersampling may impact the generalization of the models to real-world scenarios, especially when the dataset is not fully representative of the target population.
  • 3.  Impact on Model Complexity: Drawback: Some sampling techniques may inadvertently impact the complexity of the predictive models. For instance, oversampling may lead to overfitting, especially when dealing with small datasets, while undersampling might result in loss of valuable information.  Limited Transparency: Drawback: Some advanced sampling techniques, especially those involving synthetic data generation, may lack transparency in terms of how they modify the underlying data distribution. This can make it challenging to interpret the model's behavior and decision- making processes.  Impact on Evaluation Metrics: Drawback: The choice of sampling technique may influence the performance metrics used for evaluation. For instance, oversampling may artificially boost metrics like accuracy but may not necessarily improve the model's ability to generalize to new, unseen data. Proposed System  Sampling Techniques: Implement and analyze various sampling techniques, including but not limited to random sampling, stratified sampling, undersampling, oversampling, and hybrid methods. Investigate advanced techniques such as SMOTE (Synthetic Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling).  Predictive Models:
  • 4. Employ state-of-the-art machine learning models for click-through rate prediction, including logistic regression, decision trees, random forests, and gradient boosting machines. Utilize deep learning models, such as neural networks, to assess the impact of sampling techniques on more complex predictive architectures.  Evaluation Metrics: Measure the performance of each sampling technique using standard evaluation metrics such as precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC). Assess the impact on model calibration and reliability.  Comparison and Recommendations: Present a comparative analysis of the various sampling techniques, highlighting their strengths, weaknesses, and suitability for different scenarios. Offer recommendations for selecting an appropriate sampling strategy based on specific campaign characteristics and data distributions. Algorithm  Stratified Sampling: Algorithm: Stratified sampling based on target classes. Description: Divides the dataset into strata (subgroups) based on target classes and samples proportionally from each stratum, ensuring representation across classes.  ADASYN (Adaptive Synthetic Sampling): Algorithm: ADASYN algorithm. Description: An adaptive extension of SMOTE that focuses on generating synthetic instances for regions of the minority class that are harder to learn.
  • 5.  Cluster-Based Sampling: Algorithm: K-means clustering for cluster-based sampling. Description: Groups instances into clusters and samples proportionally from each cluster, facilitating localized sampling. Advantages  Addressing Class Imbalance: Advantage: Class imbalance is a common issue in CTR prediction, where the number of non-click instances often significantly outweighs click instances. Sampling techniques, such as oversampling or undersampling, help balance class distribution, ensuring that the model is not biased towards the majority class.  Enhanced Generalization to Unseen Data: Advantage: Effective sampling techniques contribute to the generalization of predictive models to new, unseen data. This is crucial in native advertising where the goal is to predict CTR for new ad campaigns.  Mitigation of Bias and Unfairness: Advantage: Carefully chosen sampling techniques can help mitigate biases in predictive models, ensuring fair representation and treatment of different classes. This is essential for ethical considerations in advertising.  Increased Interpretability: Advantage: Certain sampling techniques, like undersampling or clustering-based sampling, may lead to a reduction in the number of instances, making it easier to interpret model decisions. Enhanced interpretability is valuable for stakeholders who need to understand the factors influencing CTR predictions.
  • 6. Software Specification  Processor : I3 core processor  Ram : 4 GB  Hard disk : 500 GB Software Specification  Operating System : Windows 10 /11  Frond End : Python  Back End : Mysql Server  IDE Tools : Pycharm