SlideShare a Scribd company logo
Applying NLP to Product Comparison at Visual Meta
1
Ross Turner
Elasticsearch Meetup Berlin 22/02/17
Overview
Product Comparison on the Visual Meta Platform1
Applying NLP to Product Comparison
Using NLP to Maintain a Product Catalogue2
Making Product Discovery Conversational3
2
About Me
Previously…
• Researcher in Natural Language Generation (NLG)
• Software Engineer on Local Search
• Co-founder and Principal Engineer at an NLG Start Up
Currently…
• Engineering Head at Visual Meta
Product Comparison on the Visual Meta Platform
4
Product Comparison at Visual Meta
‘All shops, one site’
• Online marketing platform with
shopping portals in 12 different
countries
• 3 brands: Ladenzeile, ShopAlike,
UmSóLugar
• 100,000,000+ items
• 6,000+ partner shops
Faceted Search at Visual Meta
Discover fashion, furniture and
more….
• 800,000 platform visits per day
• 80 filter types across 21
categories
• Currently porting filter search
to ElasticSearch
Maintaining a Product Catalogue at Visual Meta
Product feeds are continuously synced from partner shops:
• Feed items must be categorised in order to be discoverable on the platform
We want to:
• Identify all variants of a product
• Compare offers across shops
• Make it easy for our for users to browse through millions of products
Model Colour Memory
Apple iPhone 6s Space Grey 32GB
Apple iPhone 6s Space Grey 128GB
Apple iPhone 6s Gold 32GB
Apple iPhone 6s Gold 128GB
Apple iPhone 6s Rose Gold 32GB
Apple iPhone 6s Rose Gold 128GB
Apple iPhone 6s Silver 32GB
Apple iPhone 6s Silver 128GB
Assigning Tags Based on Textual Attributes
8
String Matching
Index item names and descriptions, query product variant tag names against the index
Lucene query:
• +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s)
+(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey)
Test by manually assigning items to a random sample of products
Recall Precision Fscore
0.59 0.64 0.61
Error Analysis
Naming for the same product is not consistent across feeds:
1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)”
2. efg.com: “Apple iPhone 6 64 GB Space Grey”
3. xyz.com: “Apple iPhone 6”
Naming for the same product is not consistent within the same feed:
1. “Apple Iphone 6 - 64GB”
2. “Apple Iphone 6 64GB Space Grey”
3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone”
Wrongly categorised Products in the feed:
• “Cover for Apple Iphone 6 - 64GB”
Comparing Tag Names to Item Names
Comparing Names Between Item Feeds
Text Classification
13
Language Models
Drawbacks of bag of words / n-grams:
• Words are equally distant
• Vectors are sparse
Word embeddings capture semantics:
• Vectors are continuous
• Similar words are close in vector space
1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg
Corrado, Jeffrey Dean
15
Word2Vec for Mobile Phone Items
Mobile phone item corpus:
• 7,890 feed items
• 863k tokens, 41.5k unique
Closest words to “Galaxy”:
Word Cosine Distance
1 Samsung 0.51
2 S2 0.48
3 S5 0.46
Classification Performance
Tag Best BOW Classifier Decision Tree with Word2Vec
Fscore Precision Recall Fscore Precision Recall
“Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90
“Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80
“Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66
“Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58
“Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
Feed Enhancement
17
Two Descriptions of a Samsung TV
Samsung UE40H6400AK. Display diagonal:
101.6 cm (40"), HD type: Full HD, Display
resolution: 1920 x 1080 pixels. Tuner type:
Analog & Digital, Digital signal format
system: DVB-C, DVB-T. RMS rated power:
20 W. Consumer Electronics Control (CEC):
Anynet+. Picture processing technology:
Samsung Wide Color Enhancer
The Samsung UE40H6400 has a 101.6cm
screen size and a resolution of 1920 x
1080 pixels. It is a Full HD TV, has an
Analog & Digital tuner and comes with
Anynet+.
Generating Product Descriptions
Choosing what to say Deciding how to say it
3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
Two Descriptions of a Samsung Smartphone
Samsung SM-G920F, Galaxy. Display
diagonal: 12.9 cm (5.1"), Display
resolution: 2560 x 1440 pixels, Display
type: SAMOLED. Processor frequency: 2.1
GHz, Coprocessor frequency: 1.5 GHz.
Internal storage capacity: 32 GB, Internal
RAM: 3072 MB. Main camera resolution
(numeric): 16 MP, Video recording modes:
1080p, 2160p, Maximum frame rate: 30
fps. SIM card capability: Single SIM, SIM
card type: NanoSIM, 2G standards: GSM
The Samsung GALAXY S6 has a 12.9'
display with 2560 x 1440 pixel resolution.
It has a 2.1GHZ processor, a 16 megapixel
camera and 3072MB of internal RAM with
32GB of internal storage capacity.
Building Messages from a Product Catalogue
The Samsung Galaxy S6 has a 12.9' display
with 2560 x 1440 pixel resolution. It has a
2.1GHZ processor, a 16 megapixel camera
and 3072MB of internal RAM with 32GB of
internal storage capacity.
Making Product Discovery Conversational
22
Entity Recognition for Voice Search
Input - “I’d like some red adidas trainers”
Output:
• <brands, [adidas]>
• <categories, [trainers]>
• <colours, [red]>
234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
Lucene index is built from labels to tag tree
tokens
1. Word shingles are extracted from the input
query
2. Each shingle is queried against the index (top
down, greedy)
Labeled tokens are used to:
1. Query the product index
2. Keep track of the dialogue state
Using the Product Catalogue to Parse Queries
24
• “I’d like some red adidas trainers”
• “I’d like some red adidas”
• “like some red adidas trainers”
• “I’d like some red”
• “like some red adidas”
• “some red adidas trainers”
• ...
• “red”
• “adidas”
• “trainers”
Putting It all Together: Answering Queries
How big is the Samsung Galaxy S6’s screen?
The Samsung Galaxy S6 has a 12’9 display
How much RAM does it have?
It has 3072MB of RAM
Wrapping Up
26
Takeaways
1. Word embeddings, even when trained on limited data can:
a. provide significant improvement over bag of words models for text classification; and
b. reduce the amount of manually curated data required for the task
2. Product catalogues provide a rich information source for conversational apps
3. NLG can be utilised for product feed enhancement as well as conversation
Thank you
28

More Related Content

Viewers also liked

Developing highly scalable applications with Symfony and RabbitMQ
Developing highly scalable applications with  Symfony and RabbitMQDeveloping highly scalable applications with  Symfony and RabbitMQ
Developing highly scalable applications with Symfony and RabbitMQ
Alexey Petrov
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
ShapeBlue
 
NSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoNSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland Chapeco
Rodrigo Montoro
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud.
Amazon Web Services
 
Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016
Amazon Web Services
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricos
elkbcion
 
Business selectors
Business selectorsBusiness selectors
Business selectors
benwaine
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
lxfontes
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
Brian Brazil
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
The Incredible Automation Day
 
Hunting powerpoint
Hunting powerpointHunting powerpoint
Hunting powerpoint
KJRoss9
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Marcin Grzejszczak
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSS
Sonatype
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC Riverside
Michael Kennedy
 
Python Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesPython Pants Build System for Large Codebases
Python Pants Build System for Large Codebases
Angad Singh
 
API Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceAPI Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation Experience
Capgemini
 

Viewers also liked (17)

Developing highly scalable applications with Symfony and RabbitMQ
Developing highly scalable applications with  Symfony and RabbitMQDeveloping highly scalable applications with  Symfony and RabbitMQ
Developing highly scalable applications with Symfony and RabbitMQ
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
 
NSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoNSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland Chapeco
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud.
 
Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricos
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
 
Hunting powerpoint
Hunting powerpointHunting powerpoint
Hunting powerpoint
 
Jake Fox Pd. 5
Jake Fox Pd. 5Jake Fox Pd. 5
Jake Fox Pd. 5
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSS
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC Riverside
 
Python Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesPython Pants Build System for Large Codebases
Python Pants Build System for Large Codebases
 
API Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceAPI Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation Experience
 

Similar to Applying NLP to product comparison at visual meta

Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!
Databricks
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
MongoDB
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
Sandra Garcia
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
Trent McConaghy
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Elasticsearch
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
Transformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfTransformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdf
ChristopherLennan
 
Oracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewOracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level Overview
Gordon Kiser
 
World of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaWorld of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfua
Andy Shutka
 
Search enginebasics
Search enginebasicsSearch enginebasics
Search enginebasics
Aijaz Mohamad
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex Systems
Trent McConaghy
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
Steven Francia
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
MongoDB
 
Design Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonDesign Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems London
Sarah Federman
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combination
Steven Francia
 
Gadget Store Application
Gadget Store ApplicationGadget Store Application
Gadget Store Application
makersbay
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Denny Lee
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
Paya Do
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
Dataiku
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlow
Christian Hassa
 

Similar to Applying NLP to product comparison at visual meta (20)

Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Transformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfTransformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdf
 
Oracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewOracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level Overview
 
World of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaWorld of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfua
 
Search enginebasics
Search enginebasicsSearch enginebasics
Search enginebasics
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex Systems
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Design Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonDesign Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems London
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combination
 
Gadget Store Application
Gadget Store ApplicationGadget Store Application
Gadget Store Application
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlow
 

Recently uploaded

Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 

Recently uploaded (20)

Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 

Applying NLP to product comparison at visual meta

  • 1. Applying NLP to Product Comparison at Visual Meta 1 Ross Turner Elasticsearch Meetup Berlin 22/02/17
  • 2. Overview Product Comparison on the Visual Meta Platform1 Applying NLP to Product Comparison Using NLP to Maintain a Product Catalogue2 Making Product Discovery Conversational3 2
  • 3. About Me Previously… • Researcher in Natural Language Generation (NLG) • Software Engineer on Local Search • Co-founder and Principal Engineer at an NLG Start Up Currently… • Engineering Head at Visual Meta
  • 4. Product Comparison on the Visual Meta Platform 4
  • 5. Product Comparison at Visual Meta ‘All shops, one site’ • Online marketing platform with shopping portals in 12 different countries • 3 brands: Ladenzeile, ShopAlike, UmSóLugar • 100,000,000+ items • 6,000+ partner shops
  • 6. Faceted Search at Visual Meta Discover fashion, furniture and more…. • 800,000 platform visits per day • 80 filter types across 21 categories • Currently porting filter search to ElasticSearch
  • 7. Maintaining a Product Catalogue at Visual Meta Product feeds are continuously synced from partner shops: • Feed items must be categorised in order to be discoverable on the platform We want to: • Identify all variants of a product • Compare offers across shops • Make it easy for our for users to browse through millions of products Model Colour Memory Apple iPhone 6s Space Grey 32GB Apple iPhone 6s Space Grey 128GB Apple iPhone 6s Gold 32GB Apple iPhone 6s Gold 128GB Apple iPhone 6s Rose Gold 32GB Apple iPhone 6s Rose Gold 128GB Apple iPhone 6s Silver 32GB Apple iPhone 6s Silver 128GB
  • 8. Assigning Tags Based on Textual Attributes 8
  • 9. String Matching Index item names and descriptions, query product variant tag names against the index Lucene query: • +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s) +(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey) Test by manually assigning items to a random sample of products Recall Precision Fscore 0.59 0.64 0.61
  • 10. Error Analysis Naming for the same product is not consistent across feeds: 1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)” 2. efg.com: “Apple iPhone 6 64 GB Space Grey” 3. xyz.com: “Apple iPhone 6” Naming for the same product is not consistent within the same feed: 1. “Apple Iphone 6 - 64GB” 2. “Apple Iphone 6 64GB Space Grey” 3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone” Wrongly categorised Products in the feed: • “Cover for Apple Iphone 6 - 64GB”
  • 11. Comparing Tag Names to Item Names
  • 14. Language Models Drawbacks of bag of words / n-grams: • Words are equally distant • Vectors are sparse Word embeddings capture semantics: • Vectors are continuous • Similar words are close in vector space 1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
  • 15. 15 Word2Vec for Mobile Phone Items Mobile phone item corpus: • 7,890 feed items • 863k tokens, 41.5k unique Closest words to “Galaxy”: Word Cosine Distance 1 Samsung 0.51 2 S2 0.48 3 S5 0.46
  • 16. Classification Performance Tag Best BOW Classifier Decision Tree with Word2Vec Fscore Precision Recall Fscore Precision Recall “Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90 “Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80 “Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66 “Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58 “Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
  • 18. Two Descriptions of a Samsung TV Samsung UE40H6400AK. Display diagonal: 101.6 cm (40"), HD type: Full HD, Display resolution: 1920 x 1080 pixels. Tuner type: Analog & Digital, Digital signal format system: DVB-C, DVB-T. RMS rated power: 20 W. Consumer Electronics Control (CEC): Anynet+. Picture processing technology: Samsung Wide Color Enhancer The Samsung UE40H6400 has a 101.6cm screen size and a resolution of 1920 x 1080 pixels. It is a Full HD TV, has an Analog & Digital tuner and comes with Anynet+.
  • 19. Generating Product Descriptions Choosing what to say Deciding how to say it 3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
  • 20. Two Descriptions of a Samsung Smartphone Samsung SM-G920F, Galaxy. Display diagonal: 12.9 cm (5.1"), Display resolution: 2560 x 1440 pixels, Display type: SAMOLED. Processor frequency: 2.1 GHz, Coprocessor frequency: 1.5 GHz. Internal storage capacity: 32 GB, Internal RAM: 3072 MB. Main camera resolution (numeric): 16 MP, Video recording modes: 1080p, 2160p, Maximum frame rate: 30 fps. SIM card capability: Single SIM, SIM card type: NanoSIM, 2G standards: GSM The Samsung GALAXY S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 21. Building Messages from a Product Catalogue The Samsung Galaxy S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 22. Making Product Discovery Conversational 22
  • 23. Entity Recognition for Voice Search Input - “I’d like some red adidas trainers” Output: • <brands, [adidas]> • <categories, [trainers]> • <colours, [red]> 234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
  • 24. Lucene index is built from labels to tag tree tokens 1. Word shingles are extracted from the input query 2. Each shingle is queried against the index (top down, greedy) Labeled tokens are used to: 1. Query the product index 2. Keep track of the dialogue state Using the Product Catalogue to Parse Queries 24 • “I’d like some red adidas trainers” • “I’d like some red adidas” • “like some red adidas trainers” • “I’d like some red” • “like some red adidas” • “some red adidas trainers” • ... • “red” • “adidas” • “trainers”
  • 25. Putting It all Together: Answering Queries How big is the Samsung Galaxy S6’s screen? The Samsung Galaxy S6 has a 12’9 display How much RAM does it have? It has 3072MB of RAM
  • 27. Takeaways 1. Word embeddings, even when trained on limited data can: a. provide significant improvement over bag of words models for text classification; and b. reduce the amount of manually curated data required for the task 2. Product catalogues provide a rich information source for conversational apps 3. NLG can be utilised for product feed enhancement as well as conversation