SlideShare a Scribd company logo
1 of 18
Download to read offline
Tracking counterfeiting on the Web
with Python and ML
Valerio Cosentino
Software Engineer
PyConEs, October 3rd, 2021
[1] https://www.cbc.ca/news/business/marketplace-counterfeits-fakes-online-shopping-1.5470639
[2] https://apnews.com/press-release/pr-businesswire/ef15478fa38649b5ba29b434c8e87c94
[3] https://www.cnbc.com/2020/03/02/shop-safe-act-2020-cracks-down-on-counterfeits-on-ecommerce-platforms.html
Buyer Marketplace Brand
Buyer Marketplace Brand
[1] https://arstechnica.com/tech-policy/2021/05/amazon-seized-and-destroyed-2-million-counterfeit-products-in-2020/
[2] https://www.ebay.com/help/policies/prohibited-restricted-items/counterfeit-item-policy?id=4276#section1
[3] https://www.aliexpress.com/buyerprotection/how_to_be_eligible.html
[4] https://ec.europa.eu/growth/industry/policy/intellectual-property/enforcement/memorandum-understanding-sale-counterfeit-goods-internet_en
?
?
?
How can a brand know if its products are being counterfeiting on the Web?
search extract evaluate get crazy
Can Python and ML help?
How can a brand know if its products are being counterfeiting on the Web?
search extract evaluate get crazy
Can Python and ML help?
EXTRACT ANALYSIS
etc..
SEARCH REPORT
How can a brand know if its products are being counterfeiting on the Web?
queries
marketplace
product
URLs
How to write effective queries?
How to set the frequency of queries?
SEARCH
queries
queue
search
product
URLs
lambda queue
scraping
API calls
SEARCH
queue
extract
lambda Dynamo
product
URLs
products
info
EXTRACT
mandatory
fields
optional
fields
ANALYSIS
Dynamo Aurora
contents
transform
ANALYSIS
What is a relevant content?
What is a legal/illegal content?
Relevance Detection
ANALYSIS
What is a relevant content?
What is a legal/illegal content?
Relevance Detection
manual
text analysis
image features
ANALYSIS
What is a relevant content?
What is a legal/illegal content?
Relevance Detection
rule-based
manual
text analysis
feature analysis
manual
text analysis
image features
[1] https://www.amazon.com/report/infringement
[2] https://sell.aliexpress.com/zh/__pc/77Y4QdcvjD.htm
[3] https://pages.ebay.com/seller-center/listing-and-marketing/verified-rights-owner-program.html
[4] https://merchant.wish.com/brand-protection/brand-violation-report
Fake product
URLs
Takedown
REPORT
Takeaways
● Counterfeiting is a growing problem
● Python and Machine Learning can help
● Manual intervention is still needed
● The approach can be applied to other scenarios
What’s next?
● More data, more questions to answer
○ Evolutionary analysis
○ Comparative analysis
Q&A
EXTRACT ANALYSIS
SEARCH REPORT

More Related Content

Similar to Tracking counterfeiting on the web with python and ml

Top 100 interview questions on e commerce part-2 info-techsite
Top 100 interview questions on e commerce part-2   info-techsiteTop 100 interview questions on e commerce part-2   info-techsite
Top 100 interview questions on e commerce part-2 info-techsite
Kaushal Pandey
 
WSC E-Commerce Web Site3. Detailed Design (one section for each comp.docx
WSC E-Commerce Web Site3. Detailed Design (one section for each comp.docxWSC E-Commerce Web Site3. Detailed Design (one section for each comp.docx
WSC E-Commerce Web Site3. Detailed Design (one section for each comp.docx
ericbrooks84875
 

Similar to Tracking counterfeiting on the web with python and ml (20)

Your Next IoT Journey
Your Next IoT JourneyYour Next IoT Journey
Your Next IoT Journey
 
Comparative Study on Identifying of Fake Product
Comparative Study on Identifying of Fake ProductComparative Study on Identifying of Fake Product
Comparative Study on Identifying of Fake Product
 
Top 100 interview questions on e commerce part-2 info-techsite
Top 100 interview questions on e commerce part-2   info-techsiteTop 100 interview questions on e commerce part-2   info-techsite
Top 100 interview questions on e commerce part-2 info-techsite
 
December 2021 Partners Meeting Group
December 2021 Partners Meeting GroupDecember 2021 Partners Meeting Group
December 2021 Partners Meeting Group
 
IOT - The 3rd Internet Tsunami is Here
IOT - The 3rd Internet Tsunami is HereIOT - The 3rd Internet Tsunami is Here
IOT - The 3rd Internet Tsunami is Here
 
IRJET - Smart Marketing using QR Code
IRJET -  	  Smart Marketing using QR CodeIRJET -  	  Smart Marketing using QR Code
IRJET - Smart Marketing using QR Code
 
What CFEs can do about digital ad fraud
What CFEs can do about digital ad fraudWhat CFEs can do about digital ad fraud
What CFEs can do about digital ad fraud
 
IRJET- Hashxplorer-A Distributed System for Hash Matching
IRJET- Hashxplorer-A Distributed System for Hash MatchingIRJET- Hashxplorer-A Distributed System for Hash Matching
IRJET- Hashxplorer-A Distributed System for Hash Matching
 
WSC E-Commerce Web Site3. Detailed Design (one section for each comp.docx
WSC E-Commerce Web Site3. Detailed Design (one section for each comp.docxWSC E-Commerce Web Site3. Detailed Design (one section for each comp.docx
WSC E-Commerce Web Site3. Detailed Design (one section for each comp.docx
 
A Review & Development of E-Commerce Website
A Review & Development of E-Commerce WebsiteA Review & Development of E-Commerce Website
A Review & Development of E-Commerce Website
 
#AusCERT2021 - Inside The Unlikely Romance Crowdsourced Security from a Finan...
#AusCERT2021 - Inside The Unlikely Romance Crowdsourced Security from a Finan...#AusCERT2021 - Inside The Unlikely Romance Crowdsourced Security from a Finan...
#AusCERT2021 - Inside The Unlikely Romance Crowdsourced Security from a Finan...
 
Data Protection & Privacy During the Coronavirus Pandemic
Data Protection & Privacy During the Coronavirus PandemicData Protection & Privacy During the Coronavirus Pandemic
Data Protection & Privacy During the Coronavirus Pandemic
 
IoT digital disruption and new IoT business models
IoT digital disruption and new IoT business modelsIoT digital disruption and new IoT business models
IoT digital disruption and new IoT business models
 
IRJET- Multi Design - Pattern React Application with Django Backend
IRJET- Multi Design - Pattern React Application with Django BackendIRJET- Multi Design - Pattern React Application with Django Backend
IRJET- Multi Design - Pattern React Application with Django Backend
 
Cryptocurrency Tracker
Cryptocurrency TrackerCryptocurrency Tracker
Cryptocurrency Tracker
 
chatgpt-privacy and security.pptx
chatgpt-privacy and security.pptxchatgpt-privacy and security.pptx
chatgpt-privacy and security.pptx
 
Best E-Wallet Mobile Application Development - CodeStore Technologies
Best E-Wallet Mobile Application Development - CodeStore TechnologiesBest E-Wallet Mobile Application Development - CodeStore Technologies
Best E-Wallet Mobile Application Development - CodeStore Technologies
 
Fake Product Detection Using Blockchain Technology
Fake Product Detection Using Blockchain TechnologyFake Product Detection Using Blockchain Technology
Fake Product Detection Using Blockchain Technology
 
IRJET- Browser Extension for Cryptojacking Malware Detection and Blocking
IRJET- Browser Extension for Cryptojacking Malware Detection and BlockingIRJET- Browser Extension for Cryptojacking Malware Detection and Blocking
IRJET- Browser Extension for Cryptojacking Malware Detection and Blocking
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacks
 

More from Valerio Cosentino

Gamification oss
Gamification ossGamification oss
Gamification oss
Valerio Cosentino
 
Extracting Business Rules from COBOL: A Model-Based Framework
Extracting Business Rules from COBOL: A Model-Based FrameworkExtracting Business Rules from COBOL: A Model-Based Framework
Extracting Business Rules from COBOL: A Model-Based Framework
Valerio Cosentino
 
A Model Driven Reverse Engineering framework for extracting business rules ou...
A Model Driven Reverse Engineering framework for extracting business rules ou...A Model Driven Reverse Engineering framework for extracting business rules ou...
A Model Driven Reverse Engineering framework for extracting business rules ou...
Valerio Cosentino
 

More from Valerio Cosentino (19)

GrimoireLab: Measuring the health of your software project with Python
GrimoireLab: Measuring the health of your software project with PythonGrimoireLab: Measuring the health of your software project with Python
GrimoireLab: Measuring the health of your software project with Python
 
Perceval, Graal and Arthur: The Quest for Software Project Data
Perceval, Graal and Arthur: The Quest for Software Project DataPerceval, Graal and Arthur: The Quest for Software Project Data
Perceval, Graal and Arthur: The Quest for Software Project Data
 
Gamification oss
Gamification ossGamification oss
Gamification oss
 
SortingHat: Wizardry on Software Project Members
SortingHat: Wizardry on Software Project MembersSortingHat: Wizardry on Software Project Members
SortingHat: Wizardry on Software Project Members
 
Measuring Software development with GrimoireLab
Measuring Software development with GrimoireLabMeasuring Software development with GrimoireLab
Measuring Software development with GrimoireLab
 
Graal The Quest for Source Code Knowledge
Graal  The Quest for Source Code KnowledgeGraal  The Quest for Source Code Knowledge
Graal The Quest for Source Code Knowledge
 
Measuring Software development with GrimoireLab
Measuring Software development with GrimoireLabMeasuring Software development with GrimoireLab
Measuring Software development with GrimoireLab
 
Crossminer and GrimoireLab
Crossminer and GrimoireLabCrossminer and GrimoireLab
Crossminer and GrimoireLab
 
Perceval: Software Project Data at Your Will
Perceval: Software Project Data at Your WillPerceval: Software Project Data at Your Will
Perceval: Software Project Data at Your Will
 
Extending grimoirelab
Extending grimoirelabExtending grimoirelab
Extending grimoirelab
 
Perceval
PercevalPerceval
Perceval
 
Gamification pres-scme-2017
Gamification pres-scme-2017Gamification pres-scme-2017
Gamification pres-scme-2017
 
A Model-Based Approach for Extracting Business Rules out of Legacy Informatio...
A Model-Based Approach for Extracting Business Rules out of Legacy Informatio...A Model-Based Approach for Extracting Business Rules out of Legacy Informatio...
A Model-Based Approach for Extracting Business Rules out of Legacy Informatio...
 
Gitana: a SQL-based Git Repository Inspector
Gitana: a SQL-based Git Repository InspectorGitana: a SQL-based Git Repository Inspector
Gitana: a SQL-based Git Repository Inspector
 
Assessing the Bus Factor of Git Repositories
Assessing the Bus Factor of Git RepositoriesAssessing the Bus Factor of Git Repositories
Assessing the Bus Factor of Git Repositories
 
A Model-Driven Approach to Generate External DSLs from Object-Oriented APIs
A Model-Driven Approach to Generate External DSLs from Object-Oriented APIsA Model-Driven Approach to Generate External DSLs from Object-Oriented APIs
A Model-Driven Approach to Generate External DSLs from Object-Oriented APIs
 
Extracting Business Rules from COBOL: A Model-Based Framework
Extracting Business Rules from COBOL: A Model-Based FrameworkExtracting Business Rules from COBOL: A Model-Based Framework
Extracting Business Rules from COBOL: A Model-Based Framework
 
Extracting UML/OCL Integrity Constraints and Derived Types from Relational Da...
Extracting UML/OCL Integrity Constraints and Derived Types from Relational Da...Extracting UML/OCL Integrity Constraints and Derived Types from Relational Da...
Extracting UML/OCL Integrity Constraints and Derived Types from Relational Da...
 
A Model Driven Reverse Engineering framework for extracting business rules ou...
A Model Driven Reverse Engineering framework for extracting business rules ou...A Model Driven Reverse Engineering framework for extracting business rules ou...
A Model Driven Reverse Engineering framework for extracting business rules ou...
 

Recently uploaded

Recently uploaded (20)

Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and Applications
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Lesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsxLesson no16 application of Induction Generator in Wind.ppsx
Lesson no16 application of Induction Generator in Wind.ppsx
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 

Tracking counterfeiting on the web with python and ml