SlideShare a Scribd company logo
1 of 11
Download to read offline
2
Detecting Adversarial
Advertisements in the Wild
Reporter : Youngmi Huang
Report Date: 2019/5/6
Paper Summary
• Introduction
• What Problem They Solved
• How They Solved
• Summary
3
Outline
Introduction
4
- Proceedings of the 17th ACM SIGKDD International
Conference on Data Mining and Knowledge
Discovery, KDD (2011)
- Situation: Online Advertising System
- Google’s main profit came from the Ad Revenue
(approximately 80%), and grew yearly.
- Types of adversarial advertisement
• Counterfeit goods
• User safety issues
• Phishing
• Unclear or deceptive billing
• Malware
(source) Google(GOOGL)經營策略分析
What problem they solved
5
Challenges
• High cost of FPs, FNs
• Minority-class and multi-class issues
• Training many models at scale
Goal
• to detect and block those adversarial adversaries
• protecting users and ensuring that online advertisement remains a trustworthy
source of commercial information
How they solved: automated and semi-automated
6
Ad Crawl
Data Feed
Model
Aggregation High
confidence?
Allow to Serve
Block from
Serving
model model model model
Train
and
Evaluate
Train
and
Evaluate
Train
and
Evaluate
Train
and
Evaluate
Labeled
Ad Data
yes
Ensemble-Aided
Sampling
no
Domain
Experts
Exploratory
Tools
Unbiased
Metrics
Human Expert
Quality
Monitoring
Ensemble + MapReduce
(I)
(II)
(III)
• Features
- string-based, page type, crawl-based, link-based, non-textual content-based, advertiser
account level, policy-specific…etc
• Minority-class and multi-class issues
- One-vs-Good Multi-Class Classification
- Learning-to-Rank Methods for Classification (ROC-SVM)
- Cascade Models
How they solved (I): Learning methods (1/2)
7
(Figure5) Performance on Cascade Models vs. Single Models.
Improvement in recall at high precision level
high recall
high precision
(Figure4) Multi-class Cascade Framework
labeled pairwise example ( 𝑋# − 𝑋% , +1)
(Figure2) Class Structure.
• Training many models at scale
- Focus on scalability, engineering work
- MapReduce SGD
- Control Model Size (feature-hashing + projected-gradient)
How they solved (I): Learning methods (2/2)
8
Do expensive work in parallel
Do cheap work in sequentially
Preprocessing is parallelized; training is sequential!
(Figure6) SGD learning via MapReduce
• Model management
- Calibration
- Monitoring
- model performance good or not ( precision/recall ; no production)
- input features stable or not (re-tuned model)
- model output scores (ground truth y) drift or not (aware, re-tuned model)
- system quality , based on pipeline (ensemble-aided stratified sampling)
How they solved (II) : Ensemble-aided stratified sampling
9
(Figure7) Ensemble-Aided Stratified Sampling.
• The multiple needs from hand-labeled data:
- Catching hard adversaries
- Improving learned models
- Detecting new trends
• Ensemble-aided stratification
- Divided into 3 categories
- Scores from ensemble model is used to divide the ads in each
category into score-bins containing different numbers of ads.
- How many ads to select that depends on the goals above.
• Priority sampling from bins
- Impression counts following heavy-tail distribution
- Priority Sampling ads from bins (Duffield et al.)
- near-optimally low variance
• Increased the effective impact of human experts by 50%
à selecting from new & all others
à mid-probability
How they solved (III) : Leveraging Expert Knowledge &
Data Quality Evaluation
10
• Active Learning
- Periodically detect new categories of bad ads
- Margin-based uncertainty sampling (crowd-sourcing & experts)
- When to stop?
• Exploring Adversaries
- Information retrieval system
• Rule-Based Model
- Only account for 4% of the overall system impact, they provide an important capability to
respond to new adversarial attacks within minutes of discovery.
• Monitor
- Human rater quality
- User experience
Actively select hard samples through algorithms
(source): Active Learning: 一個降低深度學習時間,空間,經濟成本的解決方案
Combining automated and semi-automated effort is powerful
11
Ad Crawl
Data Feed
Model
Aggregation High
confidence?
Allow to Serve
Block from
Serving
model model model model
Train
and
Evaluate
Train
and
Evaluate
Train
and
Evaluate
Train
and
Evaluate
Labeled
Ad Data
yes
Ensemble-Aided
Sampling
no
Domain
Experts
Exploratory
Tools
Unbiased
Metrics
Human Expert
Quality
Monitoring
(I)
(II)
(III)
More research is needed, automated classification methods, system-level challenges
12
Thank You.

More Related Content

Similar to Detecting adversarial advertisements in the wild

Using Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable InsightsUsing Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable Insights
莫利伟 Olivier Maugain
 
Google internetprivacy
Google internetprivacyGoogle internetprivacy
Google internetprivacy
KC Murphy
 

Similar to Detecting adversarial advertisements in the wild (20)

Using Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable InsightsUsing Big Data & Analytics to Create Consumer Actionable Insights
Using Big Data & Analytics to Create Consumer Actionable Insights
 
Session 5 MG 220 MBA - 30 Aug 10
Session 5   MG 220 MBA - 30 Aug 10Session 5   MG 220 MBA - 30 Aug 10
Session 5 MG 220 MBA - 30 Aug 10
 
Session 5 MG 220 BBA - 23 Aug 10
Session 5   MG 220 BBA - 23 Aug 10Session 5   MG 220 BBA - 23 Aug 10
Session 5 MG 220 BBA - 23 Aug 10
 
GoogleDraft
GoogleDraftGoogleDraft
GoogleDraft
 
Measuring market demand
Measuring market demandMeasuring market demand
Measuring market demand
 
11.direct marketing with the application of data mining
11.direct marketing with the application of data mining11.direct marketing with the application of data mining
11.direct marketing with the application of data mining
 
Direct marketing with the application of data mining
Direct marketing with the application of data miningDirect marketing with the application of data mining
Direct marketing with the application of data mining
 
85981996 advanced-marketing-management
85981996 advanced-marketing-management85981996 advanced-marketing-management
85981996 advanced-marketing-management
 
Interface Between Six Sigma and Knowledge Management
Interface Between Six Sigma and Knowledge ManagementInterface Between Six Sigma and Knowledge Management
Interface Between Six Sigma and Knowledge Management
 
Data science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle KecmanData science vs real world: friends or foes - Pavle Kecman
Data science vs real world: friends or foes - Pavle Kecman
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive Analytics
 
Marketing Research - Perceptual Map
Marketing Research - Perceptual MapMarketing Research - Perceptual Map
Marketing Research - Perceptual Map
 
Lec. 3 - External Analysis_١٠٤١٥٠.pptx
Lec. 3 - External Analysis_١٠٤١٥٠.pptxLec. 3 - External Analysis_١٠٤١٥٠.pptx
Lec. 3 - External Analysis_١٠٤١٥٠.pptx
 
Business analytics-capstone-project-final submission
Business analytics-capstone-project-final submissionBusiness analytics-capstone-project-final submission
Business analytics-capstone-project-final submission
 
New challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertisingNew challenges for scalable machine learning in online advertising
New challenges for scalable machine learning in online advertising
 
Market research institutes
Market research institutesMarket research institutes
Market research institutes
 
Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Google internetprivacy
Google internetprivacyGoogle internetprivacy
Google internetprivacy
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Detecting adversarial advertisements in the wild

  • 1. 2 Detecting Adversarial Advertisements in the Wild Reporter : Youngmi Huang Report Date: 2019/5/6 Paper Summary
  • 2. • Introduction • What Problem They Solved • How They Solved • Summary 3 Outline
  • 3. Introduction 4 - Proceedings of the 17th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, KDD (2011) - Situation: Online Advertising System - Google’s main profit came from the Ad Revenue (approximately 80%), and grew yearly. - Types of adversarial advertisement • Counterfeit goods • User safety issues • Phishing • Unclear or deceptive billing • Malware (source) Google(GOOGL)經營策略分析
  • 4. What problem they solved 5 Challenges • High cost of FPs, FNs • Minority-class and multi-class issues • Training many models at scale Goal • to detect and block those adversarial adversaries • protecting users and ensuring that online advertisement remains a trustworthy source of commercial information
  • 5. How they solved: automated and semi-automated 6 Ad Crawl Data Feed Model Aggregation High confidence? Allow to Serve Block from Serving model model model model Train and Evaluate Train and Evaluate Train and Evaluate Train and Evaluate Labeled Ad Data yes Ensemble-Aided Sampling no Domain Experts Exploratory Tools Unbiased Metrics Human Expert Quality Monitoring Ensemble + MapReduce (I) (II) (III)
  • 6. • Features - string-based, page type, crawl-based, link-based, non-textual content-based, advertiser account level, policy-specific…etc • Minority-class and multi-class issues - One-vs-Good Multi-Class Classification - Learning-to-Rank Methods for Classification (ROC-SVM) - Cascade Models How they solved (I): Learning methods (1/2) 7 (Figure5) Performance on Cascade Models vs. Single Models. Improvement in recall at high precision level high recall high precision (Figure4) Multi-class Cascade Framework labeled pairwise example ( 𝑋# − 𝑋% , +1) (Figure2) Class Structure.
  • 7. • Training many models at scale - Focus on scalability, engineering work - MapReduce SGD - Control Model Size (feature-hashing + projected-gradient) How they solved (I): Learning methods (2/2) 8 Do expensive work in parallel Do cheap work in sequentially Preprocessing is parallelized; training is sequential! (Figure6) SGD learning via MapReduce • Model management - Calibration - Monitoring - model performance good or not ( precision/recall ; no production) - input features stable or not (re-tuned model) - model output scores (ground truth y) drift or not (aware, re-tuned model) - system quality , based on pipeline (ensemble-aided stratified sampling)
  • 8. How they solved (II) : Ensemble-aided stratified sampling 9 (Figure7) Ensemble-Aided Stratified Sampling. • The multiple needs from hand-labeled data: - Catching hard adversaries - Improving learned models - Detecting new trends • Ensemble-aided stratification - Divided into 3 categories - Scores from ensemble model is used to divide the ads in each category into score-bins containing different numbers of ads. - How many ads to select that depends on the goals above. • Priority sampling from bins - Impression counts following heavy-tail distribution - Priority Sampling ads from bins (Duffield et al.) - near-optimally low variance • Increased the effective impact of human experts by 50% à selecting from new & all others à mid-probability
  • 9. How they solved (III) : Leveraging Expert Knowledge & Data Quality Evaluation 10 • Active Learning - Periodically detect new categories of bad ads - Margin-based uncertainty sampling (crowd-sourcing & experts) - When to stop? • Exploring Adversaries - Information retrieval system • Rule-Based Model - Only account for 4% of the overall system impact, they provide an important capability to respond to new adversarial attacks within minutes of discovery. • Monitor - Human rater quality - User experience Actively select hard samples through algorithms (source): Active Learning: 一個降低深度學習時間,空間,經濟成本的解決方案
  • 10. Combining automated and semi-automated effort is powerful 11 Ad Crawl Data Feed Model Aggregation High confidence? Allow to Serve Block from Serving model model model model Train and Evaluate Train and Evaluate Train and Evaluate Train and Evaluate Labeled Ad Data yes Ensemble-Aided Sampling no Domain Experts Exploratory Tools Unbiased Metrics Human Expert Quality Monitoring (I) (II) (III) More research is needed, automated classification methods, system-level challenges