SlideShare a Scribd company logo

PyParis2017 / Machine learning to moderate classifieds, by Vaibhav Singh

PyParis 2017 http://pyparis.org

1 of 18
Download to read offline
Machine Learning to
moderate Classifieds
Vaibhav Singh, Machine Learning Scientist
Content Moderation & Quality, OLX
Agenda
➔ Scale and Problem
➔ Feature generation
➔ Model Generation Pipeline
➔ Model Performance
➔ Architecture
➔ Model Validation and Management
Scale of business at OLX
4.4
APP
RATING
#1 app
+22 COUNTRIES (1)
1)	Google	play	store;	shopping/lifestyle	categories	
Note:	excludes	Letgo.		Associates	at	propor>onate	share	
→ People spend more than twice as long in
OLX apps versus competitors
	
	
became one of the top 3 classifieds app in US
less than a year after its launch
130 Countries
+60 million monthly listings
+18 million monthly sellers
+52 million cars are listed every year in our platforms;
77% of the total amount of cars manufactured!
+160,000 properties are listed daily
•  2 houses
•  2 cars
•  3 fashion items
•  2.5 mobile phones
At OLX, are listed every second:
●  Change title, description in a paid category so that they don’t need
to buy another ad post.
●  Duplicate Ads to get higher ranking and also to get higher chances
for selling
●  Add Phone numbers, Company information on image rather than in
description
●  Create multiple accounts to bypass free ad per user limit
●  Try to sell forbidden items with a title and description that may
evade keyword filters
Problem with User Posted Ads
“Feature engineering is the process of transforming raw
data into features that better represent the underlying
problem to the predictive models, resulting in improved
model accuracy on unseen data”
Feature Engineering
Data Leakage
➔  Remove obvious fields
eg: id, account numbers
➔  Remove variance and
standardize
➔  Cross Validation
➔  Add Noise

Recommended

Machine Learning to moderate ads in real world classified's business
Machine Learning to moderate ads in real world classified's businessMachine Learning to moderate ads in real world classified's business
Machine Learning to moderate ads in real world classified's businessJaroslaw Szymczak
 
31-Steps Conversion & Retention Optimization Checklist.pptx
31-Steps Conversion & Retention Optimization Checklist.pptx31-Steps Conversion & Retention Optimization Checklist.pptx
31-Steps Conversion & Retention Optimization Checklist.pptxRamanParashar3
 
SearchLeeds 2018 - Elizabeth Clark - Dream Agility - The future of Shopping
SearchLeeds 2018 - Elizabeth Clark - Dream Agility - The future of ShoppingSearchLeeds 2018 - Elizabeth Clark - Dream Agility - The future of Shopping
SearchLeeds 2018 - Elizabeth Clark - Dream Agility - The future of ShoppingBranded3
 
How Euroflorist is preparing for Artificial Intelligence
How Euroflorist is preparing for Artificial IntelligenceHow Euroflorist is preparing for Artificial Intelligence
How Euroflorist is preparing for Artificial IntelligenceGuido X Jansen
 
Crafting Your Product Strategy
Crafting Your Product StrategyCrafting Your Product Strategy
Crafting Your Product StrategyRyan Glasgow
 
The Latest PPC Updates to Catapult Campaigns in 2019
The Latest PPC Updates to Catapult Campaigns in 2019The Latest PPC Updates to Catapult Campaigns in 2019
The Latest PPC Updates to Catapult Campaigns in 2019Hanapin Marketing
 
SEO Friendly Migrations - Tea-Time SEO' Series of Daily SEO Live Talks
SEO Friendly Migrations - Tea-Time SEO' Series of Daily SEO Live TalksSEO Friendly Migrations - Tea-Time SEO' Series of Daily SEO Live Talks
SEO Friendly Migrations - Tea-Time SEO' Series of Daily SEO Live TalksAuthoritas
 

More Related Content

Similar to PyParis2017 / Machine learning to moderate classifieds, by Vaibhav Singh

Agile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated TestingAgile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated TestingDimitri Ponomareff
 
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...Databricks
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceWei Di
 
Product Development in the Cloud
Product Development in the Cloud Product Development in the Cloud
Product Development in the Cloud Amazon Web Services
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
 
Transforming B2B Sales with Spark Powered Sales Intelligence
Transforming B2B Sales with Spark Powered Sales IntelligenceTransforming B2B Sales with Spark Powered Sales Intelligence
Transforming B2B Sales with Spark Powered Sales IntelligenceSongtao Guo
 
How To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization StrategyHow To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization StrategyVWO
 
Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...
Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...
Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...Daniel Caridi
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission TeamsDashlane
 
Twin Cities Eloqua User Group 092413
Twin Cities Eloqua User Group 092413Twin Cities Eloqua User Group 092413
Twin Cities Eloqua User Group 092413Ron Corbisier
 
Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...
Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...
Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...VWO
 
Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100
Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100
Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100CleverTap
 
ENT206 Product Development in the Cloud
ENT206 Product Development in the CloudENT206 Product Development in the Cloud
ENT206 Product Development in the CloudAmazon Web Services
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentationMichael Young
 
Monetization: Unlock More Value from Your APIs
Monetization: Unlock More Value from Your APIs Monetization: Unlock More Value from Your APIs
Monetization: Unlock More Value from Your APIs Apigee | Google Cloud
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - MeasurefestGuido X Jansen
 

Similar to PyParis2017 / Machine learning to moderate classifieds, by Vaibhav Singh (20)

Agile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated TestingAgile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated Testing
 
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
 
Product Development in the Cloud
Product Development in the Cloud Product Development in the Cloud
Product Development in the Cloud
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 
Transforming B2B Sales with Spark Powered Sales Intelligence
Transforming B2B Sales with Spark Powered Sales IntelligenceTransforming B2B Sales with Spark Powered Sales Intelligence
Transforming B2B Sales with Spark Powered Sales Intelligence
 
How To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization StrategyHow To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization Strategy
 
Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...
Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...
Tech-Talk Tuesday: How to Develop and Grow Your Optimization Efforts Into a S...
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
Twin Cities Eloqua User Group 092413
Twin Cities Eloqua User Group 092413Twin Cities Eloqua User Group 092413
Twin Cities Eloqua User Group 092413
 
An overview of SugarCRM
An overview of SugarCRMAn overview of SugarCRM
An overview of SugarCRM
 
Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...
Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...
Beyond the Primary KPI: Leveraging Bad Test Results | Masters of Conversion b...
 
Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100
Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100
Reimagine Growth 3 - Session 2 - Planning your ASO strategy from 0 to 100
 
Software for startups
Software for startupsSoftware for startups
Software for startups
 
ENT206 Product Development in the Cloud
ENT206 Product Development in the CloudENT206 Product Development in the Cloud
ENT206 Product Development in the Cloud
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentation
 
Monetization: Unlock More Value from Your APIs
Monetization: Unlock More Value from Your APIs Monetization: Unlock More Value from Your APIs
Monetization: Unlock More Value from Your APIs
 
Frappe Open Day - August 2018
Frappe Open Day - August 2018Frappe Open Day - August 2018
Frappe Open Day - August 2018
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
 
Smazee Profile 2020
Smazee  Profile  2020Smazee  Profile  2020
Smazee Profile 2020
 

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Recently uploaded

Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?MENGSAYLOEM1
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...Fwdays
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
LF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIELF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIEDanBrown980551
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr TsapFwdays
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfinfogdgmi
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17Ana-Maria Mihalceanu
 

Recently uploaded (20)

Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...Building Bridges:  Merging RPA Processes, UiPath Apps, and Data Service to bu...
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
LF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIELF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIE
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdf
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
 

PyParis2017 / Machine learning to moderate classifieds, by Vaibhav Singh

  • 1. Machine Learning to moderate Classifieds Vaibhav Singh, Machine Learning Scientist Content Moderation & Quality, OLX
  • 2. Agenda ➔ Scale and Problem ➔ Feature generation ➔ Model Generation Pipeline ➔ Model Performance ➔ Architecture ➔ Model Validation and Management
  • 3. Scale of business at OLX 4.4 APP RATING #1 app +22 COUNTRIES (1) 1) Google play store; shopping/lifestyle categories Note: excludes Letgo. Associates at propor>onate share → People spend more than twice as long in OLX apps versus competitors became one of the top 3 classifieds app in US less than a year after its launch 130 Countries +60 million monthly listings +18 million monthly sellers +52 million cars are listed every year in our platforms; 77% of the total amount of cars manufactured! +160,000 properties are listed daily •  2 houses •  2 cars •  3 fashion items •  2.5 mobile phones At OLX, are listed every second:
  • 4. ●  Change title, description in a paid category so that they don’t need to buy another ad post. ●  Duplicate Ads to get higher ranking and also to get higher chances for selling ●  Add Phone numbers, Company information on image rather than in description ●  Create multiple accounts to bypass free ad per user limit ●  Try to sell forbidden items with a title and description that may evade keyword filters Problem with User Posted Ads
  • 5. “Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data” Feature Engineering
  • 6. Data Leakage ➔  Remove obvious fields eg: id, account numbers ➔  Remove variance and standardize ➔  Cross Validation ➔  Add Noise
  • 7. Feature hashing ➔  Good when dealing high dimensional, sparse features -- dimensionality reduction ➔  Memory efficient ➔  Cons - Getting back to feature names is difficult ➔  Cons - Hash collisions can have negative effects
  • 8. SVM Light Data Format ➔  Memory Efficient. Features can be created on one machine and does not requires huge clusters ➔  Cons - Number of features is unknown
  • 9. Lessons Learnt ➔  Choose your tech dependent on data size. Do not go for hype driven development ➔  Spend time on Feature Generation and selection ➔  Increase relevance and minimize redundancy ➔  Use the same Feature Generation pipeline for both training and prediction
  • 12. Lessons Learnt ➔  Automate and makes things deterministic ➔  Airflow, Luigi and many others are good choice for Job dependency management
  • 13. Measuring Classifier Performance ➔  Accuracy not always the best metric ➔  PR good for measuring classifier performance ➔  Can use ROC for general classifier performance ➔  Choose one evaluation metric
  • 14. Architecture Flask API Queue Prediction Module Mongo Monitoring & Stats Graphite, Grafana Learning Module Scikit XGBoost Luigi Ask Prediction Return Prediction Learning Ads
  • 15. Lessons Learnt ➔  Always Batch Batching will reduce CPU Utilization and the same machines would be able to handle much more requests ➔  Modularize, Dockerize and Orchestrate Containerize your code so that it is transparent to Machine configurations ➔  Monitoring Use a monitoring service ➔  Choose simple and easy tech
  • 16. Validating Models ➔  Sample predictions and manually verify ➔  Measure error rate ➔  Modify thresholds to achieve desired error rate