SlideShare a Scribd company logo
1 of 20
Download to read offline
1
Auto Content Moderation
in C2C e-Commerce
Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango
(Mercari, inc)
2020 USENIX Conference
on Operational Machine Learning
JULY 27–AUGUST 7, 2020
2
1. Content Moderation
2. Auto Content Moderation in C2C e-Commerce
3. Task design and model strategy
4. Offline/online evaluation
5. System architecture
6. Business Impact
Contents
3
Identify potentially unsafe or inappropriate content in service
● App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse
at Scale
● YouTube Community Guidelines enforcement
● AI advances to better detect hate speech by Facebook
● Advances in content understanding, self-supervision to protect people by Facebook
● Facebook Transparency Report
● A Safe and Secure Marketplace by Mercari
● etc.
Content Moderation
4
The Mercari app is a C2C marketplace where individuals can easily sell used items
What is Mercari?
Japan
U.S.
Monthly active users: 16+ Million
Total number of items: 1.5+ Billion
5
Why Content Moderation in C2C e-Commerce?
C2C e-Commerce
Sellers Buyers
We want to decrease risk for customer and marketplace
Sellers unintentionally violate policy. Buyers buy violated items without
knowing
Policy case: counterfeits, weapons, etc.
6
Content Moderation system
C2C e-Commerce
Sell items Discover
Moderator
Manual review
Moderation
Service
Hide items & Alert
marketplace
Sellers Buyers
screened
7
Concept of Moderation Service: Rule based
Moderation
Service
Rule based
Pros
● Easy to develop and can be
quickly released to production
Cons
● Hard to manage
● Difficult to cover the
inconsistencies in spellings
e.g. {NIKE, nike, ないき, ナイキ}
Moderator
Manual review
8
Concept of Moderation Service: ML
Moderation
Service
Rule based
Pros
● Automatically learns the features
of items deleted by moderators
● Adapts to spelling inconsistencies
Cons
● Model update is hard
● Concept drift
(a.k.a. training-serving skew)
Moderator
Manual review
Machine
Learning
9
How to create the data for ML
Rule based
Moderator
Machine
Learning
Sell items
Report items
Hide items & Alert
Positive
Deleted items by Moderator
Negative
Not deleted items by Moderator
Dataset
Moderation Service
Review
10
Task Design
● Data is highly imbalanced
● Each violated topic’s total
number of alerts is bounded
by moderator team
All models trained as one-vs-all
● No side-effect when deploying
a trained model to other class
● Hard to improve performance
for each topic in a multi-class
model
Negative
Violated
Topic A
Violated
Topic N
...
Positive
Model
A
Model
B
...
counterfeits weapons
11
Multimodality of content
Case of items
Items have multimodal data
● Image
● Text
● Category
● Brand
● Price, etc.
We use multimodal model to improve model
performance.
See our article:
https://tech.mercari.com/entry/2019/09/12/130000
12
Model selection based on dataset size
● Gradient Boosted Decision Trees (GBDT)
→ Efficient for training and inference when training data size is not large
*Image feature is not used in GBDT
● Gated Multimodal Unit (GMU)
→ Potentially most accurate using multimodal data
13
Offline evaluation
Metric is Precision@K: K is the bound on the daily total number of alerts
in each violated topic decided by Moderators
2020-07-13
Current model’s prediction
result In production
Top K
Evaluate new model against current
model using the same item ids
item ids same
as production
top K
2020-07-13
New model’s prediction result
In test dataset.
e.g.
14
Online evaluation
→ Faster decision making leads to efficient operation
Current
Model
New
Model
Same traffic
Moderator
Manual review
Each model alert number: K/2
Metrics: Precision@K/2
After a certain time after a
new model is released, we
decide which model should
be deprecated based on the
above metrics.
Classic A/B testing can take several months. It was difficult to collect enough
transactions for t-test.
15
Offline/online evaluation result
Algorithms Offline Online
GBDT +18.2% Not Released
GMU +21.2% +23.2%
Table shows the relative performance gain of
offline evaluation metric is precision@K ,
online evaluation metric is precision@K/2
on one violated topic
Baseline model is Logistic regression that was already released in production
16
Container based Training Pipeline
Data Load
Write manifest files containing requirements like CPU, GPU and Storage
CPU CPU or GPU
Training Offline
Evaluation
CPU
BigQueryBigQuery
17
Serving system architecture
Message
queue
Message
queue
proxy layer prediction layer
.
Preprocessing + inference
Container
Pod
GBDT
based model
Preprocessing
Container
..Proxy
container
subscribe
publish
Pod
Inference
Container
Caffe2
Pod
Deep Learning
based model
We manage over 15 Machine Learning models in production
Pod
Deep Learning
based model
18
Horizontal Pod Autoscaler by kubernetes
● Reliable system: Traffic changes with time,
HPA can adopt to varying traffic
● Cheaper billing cost: Reduce to 1/6 by HPA
Billing cost transition after applying HPA
Billing cost
day
Each color is each machine learning model
19
Impact of Machine Learning system
Discovered 100 violating items
Moderator
Manual review
Moderation
Service
Rule based
Machine
Learning
Hide & Alert
+Discovered 554 violating items
Machine Learning system
has increased coverage by 554% ↑ over rule based approach
e.g.
20
If you have a question to this talk
First author is Shunya UETA, please e-mail: hurutoriya@mercari.com
Acknowledgements
Co-Authors: Suganprabu Nagarajan, Mizuki Sango
Contributter:
● Abhishek Vilas Munagekar, Yusuke Shido, Vamshi Teja Racha, Sumit Verma and
Keisuke Umezawa for their contribute to this system
● Dr. Antony for his feedback about the paper
● Yushi Kurita, Yuki Ito as Product Manager, All Trust and Safety project member and
all Customer Service as Moderator to success this project.
Question and Thanks collaborator

More Related Content

Similar to Auto Content Moderation in C2C e-Commerce at OpML20

Smart Traffic Monitoring System Report
Smart Traffic Monitoring System ReportSmart Traffic Monitoring System Report
Smart Traffic Monitoring System ReportALi Baker
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper reviewMazen Aly
 
Digital and data journey demystified: how it all works
Digital and data journey demystified: how it all worksDigital and data journey demystified: how it all works
Digital and data journey demystified: how it all worksMichal Hodinka
 
MAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJS
MAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJSMAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJS
MAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJSIRJET Journal
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 
Drupal content management system (cms) based e commerce portal
Drupal content management system (cms) based e commerce portalDrupal content management system (cms) based e commerce portal
Drupal content management system (cms) based e commerce portalSandeep Kumbhar
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
IRJET- Multi Design - Pattern React Application with Django Backend
IRJET- Multi Design - Pattern React Application with Django BackendIRJET- Multi Design - Pattern React Application with Django Backend
IRJET- Multi Design - Pattern React Application with Django BackendIRJET Journal
 
GECBSP Guide GDSC India ML Study Jam organizer guide modified.pdf
GECBSP Guide GDSC India ML Study Jam organizer guide modified.pdfGECBSP Guide GDSC India ML Study Jam organizer guide modified.pdf
GECBSP Guide GDSC India ML Study Jam organizer guide modified.pdfDomendra Sahu
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsMárton Kodok
 
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyProductionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyIrina Kukuyeva, Ph.D.
 
IRJET- Online Shopping System
IRJET-  	  Online Shopping SystemIRJET-  	  Online Shopping System
IRJET- Online Shopping SystemIRJET Journal
 
Zycus Online E- Auction
Zycus Online E- AuctionZycus Online E- Auction
Zycus Online E- AuctionSanjay Mitra
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsAlibaba Cloud
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...Databricks
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis PapaemmanouilPanagiotis Papaemmanouil
 

Similar to Auto Content Moderation in C2C e-Commerce at OpML20 (20)

Smart Traffic Monitoring System Report
Smart Traffic Monitoring System ReportSmart Traffic Monitoring System Report
Smart Traffic Monitoring System Report
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
Digital and data journey demystified: how it all works
Digital and data journey demystified: how it all worksDigital and data journey demystified: how it all works
Digital and data journey demystified: how it all works
 
MAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJS
MAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJSMAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJS
MAGE PROCESSING BASED BILLING STRUCTURE USING EDGE COMPUTING AND REACTJS
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Drupal content management system (cms) based e commerce portal
Drupal content management system (cms) based e commerce portalDrupal content management system (cms) based e commerce portal
Drupal content management system (cms) based e commerce portal
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Project report
Project reportProject report
Project report
 
IRJET- Multi Design - Pattern React Application with Django Backend
IRJET- Multi Design - Pattern React Application with Django BackendIRJET- Multi Design - Pattern React Application with Django Backend
IRJET- Multi Design - Pattern React Application with Django Backend
 
GECBSP Guide GDSC India ML Study Jam organizer guide modified.pdf
GECBSP Guide GDSC India ML Study Jam organizer guide modified.pdfGECBSP Guide GDSC India ML Study Jam organizer guide modified.pdf
GECBSP Guide GDSC India ML Study Jam organizer guide modified.pdf
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyProductionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
 
IRJET- Online Shopping System
IRJET-  	  Online Shopping SystemIRJET-  	  Online Shopping System
IRJET- Online Shopping System
 
Zycus Online E- Auction
Zycus Online E- AuctionZycus Online E- Auction
Zycus Online E- Auction
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart Logistics
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
 

More from Shunya Ueta

Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Shunya Ueta
 
Introduction to argo
Introduction to argoIntroduction to argo
Introduction to argoShunya Ueta
 
Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Shunya Ueta
 
Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Shunya Ueta
 
How to break the machine learning system barrier ?
How to break the machine learning system barrier ?How to break the machine learning system barrier ?
How to break the machine learning system barrier ?Shunya Ueta
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformShunya Ueta
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Applied machine learning at facebook  a datacenter infrastructure perspective...Applied machine learning at facebook  a datacenter infrastructure perspective...
Applied machine learning at facebook a datacenter infrastructure perspective...Shunya Ueta
 
C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説Shunya Ueta
 
Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)Shunya Ueta
 
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Shunya Ueta
 
Detecting Research Topics via the Correlation between Graphs and Texts
 Detecting Research Topics via the Correlation between Graphs and Texts Detecting Research Topics via the Correlation between Graphs and Texts
Detecting Research Topics via the Correlation between Graphs and TextsShunya Ueta
 
Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Shunya Ueta
 
"Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio..."Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio...Shunya Ueta
 
コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法Shunya Ueta
 

More from Shunya Ueta (14)

Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...
 
Introduction to argo
Introduction to argoIntroduction to argo
Introduction to argo
 
Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)
 
Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18
 
How to break the machine learning system barrier ?
How to break the machine learning system barrier ?How to break the machine learning system barrier ?
How to break the machine learning system barrier ?
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platform
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Applied machine learning at facebook  a datacenter infrastructure perspective...Applied machine learning at facebook  a datacenter infrastructure perspective...
Applied machine learning at facebook a datacenter infrastructure perspective...
 
C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説
 
Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)
 
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
 
Detecting Research Topics via the Correlation between Graphs and Texts
 Detecting Research Topics via the Correlation between Graphs and Texts Detecting Research Topics via the Correlation between Graphs and Texts
Detecting Research Topics via the Correlation between Graphs and Texts
 
Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)
 
"Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio..."Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio...
 
コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法
 

Recently uploaded

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 

Recently uploaded (20)

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 

Auto Content Moderation in C2C e-Commerce at OpML20

  • 1. 1 Auto Content Moderation in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference on Operational Machine Learning JULY 27–AUGUST 7, 2020
  • 2. 2 1. Content Moderation 2. Auto Content Moderation in C2C e-Commerce 3. Task design and model strategy 4. Offline/online evaluation 5. System architecture 6. Business Impact Contents
  • 3. 3 Identify potentially unsafe or inappropriate content in service ● App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse at Scale ● YouTube Community Guidelines enforcement ● AI advances to better detect hate speech by Facebook ● Advances in content understanding, self-supervision to protect people by Facebook ● Facebook Transparency Report ● A Safe and Secure Marketplace by Mercari ● etc. Content Moderation
  • 4. 4 The Mercari app is a C2C marketplace where individuals can easily sell used items What is Mercari? Japan U.S. Monthly active users: 16+ Million Total number of items: 1.5+ Billion
  • 5. 5 Why Content Moderation in C2C e-Commerce? C2C e-Commerce Sellers Buyers We want to decrease risk for customer and marketplace Sellers unintentionally violate policy. Buyers buy violated items without knowing Policy case: counterfeits, weapons, etc.
  • 6. 6 Content Moderation system C2C e-Commerce Sell items Discover Moderator Manual review Moderation Service Hide items & Alert marketplace Sellers Buyers screened
  • 7. 7 Concept of Moderation Service: Rule based Moderation Service Rule based Pros ● Easy to develop and can be quickly released to production Cons ● Hard to manage ● Difficult to cover the inconsistencies in spellings e.g. {NIKE, nike, ないき, ナイキ} Moderator Manual review
  • 8. 8 Concept of Moderation Service: ML Moderation Service Rule based Pros ● Automatically learns the features of items deleted by moderators ● Adapts to spelling inconsistencies Cons ● Model update is hard ● Concept drift (a.k.a. training-serving skew) Moderator Manual review Machine Learning
  • 9. 9 How to create the data for ML Rule based Moderator Machine Learning Sell items Report items Hide items & Alert Positive Deleted items by Moderator Negative Not deleted items by Moderator Dataset Moderation Service Review
  • 10. 10 Task Design ● Data is highly imbalanced ● Each violated topic’s total number of alerts is bounded by moderator team All models trained as one-vs-all ● No side-effect when deploying a trained model to other class ● Hard to improve performance for each topic in a multi-class model Negative Violated Topic A Violated Topic N ... Positive Model A Model B ... counterfeits weapons
  • 11. 11 Multimodality of content Case of items Items have multimodal data ● Image ● Text ● Category ● Brand ● Price, etc. We use multimodal model to improve model performance. See our article: https://tech.mercari.com/entry/2019/09/12/130000
  • 12. 12 Model selection based on dataset size ● Gradient Boosted Decision Trees (GBDT) → Efficient for training and inference when training data size is not large *Image feature is not used in GBDT ● Gated Multimodal Unit (GMU) → Potentially most accurate using multimodal data
  • 13. 13 Offline evaluation Metric is Precision@K: K is the bound on the daily total number of alerts in each violated topic decided by Moderators 2020-07-13 Current model’s prediction result In production Top K Evaluate new model against current model using the same item ids item ids same as production top K 2020-07-13 New model’s prediction result In test dataset. e.g.
  • 14. 14 Online evaluation → Faster decision making leads to efficient operation Current Model New Model Same traffic Moderator Manual review Each model alert number: K/2 Metrics: Precision@K/2 After a certain time after a new model is released, we decide which model should be deprecated based on the above metrics. Classic A/B testing can take several months. It was difficult to collect enough transactions for t-test.
  • 15. 15 Offline/online evaluation result Algorithms Offline Online GBDT +18.2% Not Released GMU +21.2% +23.2% Table shows the relative performance gain of offline evaluation metric is precision@K , online evaluation metric is precision@K/2 on one violated topic Baseline model is Logistic regression that was already released in production
  • 16. 16 Container based Training Pipeline Data Load Write manifest files containing requirements like CPU, GPU and Storage CPU CPU or GPU Training Offline Evaluation CPU BigQueryBigQuery
  • 17. 17 Serving system architecture Message queue Message queue proxy layer prediction layer . Preprocessing + inference Container Pod GBDT based model Preprocessing Container ..Proxy container subscribe publish Pod Inference Container Caffe2 Pod Deep Learning based model We manage over 15 Machine Learning models in production Pod Deep Learning based model
  • 18. 18 Horizontal Pod Autoscaler by kubernetes ● Reliable system: Traffic changes with time, HPA can adopt to varying traffic ● Cheaper billing cost: Reduce to 1/6 by HPA Billing cost transition after applying HPA Billing cost day Each color is each machine learning model
  • 19. 19 Impact of Machine Learning system Discovered 100 violating items Moderator Manual review Moderation Service Rule based Machine Learning Hide & Alert +Discovered 554 violating items Machine Learning system has increased coverage by 554% ↑ over rule based approach e.g.
  • 20. 20 If you have a question to this talk First author is Shunya UETA, please e-mail: hurutoriya@mercari.com Acknowledgements Co-Authors: Suganprabu Nagarajan, Mizuki Sango Contributter: ● Abhishek Vilas Munagekar, Yusuke Shido, Vamshi Teja Racha, Sumit Verma and Keisuke Umezawa for their contribute to this system ● Dr. Antony for his feedback about the paper ● Yushi Kurita, Yuki Ito as Product Manager, All Trust and Safety project member and all Customer Service as Moderator to success this project. Question and Thanks collaborator