SlideShare a Scribd company logo
1 of 28
Applying NLP to Product Comparison at Visual Meta
1
Ross Turner
Elasticsearch Meetup Berlin 22/02/17
Overview
Product Comparison on the Visual Meta Platform1
Applying NLP to Product Comparison
Using NLP to Maintain a Product Catalogue2
Making Product Discovery Conversational3
2
About Me
Previously…
• Researcher in Natural Language Generation (NLG)
• Software Engineer on Local Search
• Co-founder and Principal Engineer at an NLG Start Up
Currently…
• Engineering Head at Visual Meta
Product Comparison on the Visual Meta Platform
4
Product Comparison at Visual Meta
‘All shops, one site’
• Online marketing platform with
shopping portals in 12 different
countries
• 3 brands: Ladenzeile, ShopAlike,
UmSóLugar
• 100,000,000+ items
• 6,000+ partner shops
Faceted Search at Visual Meta
Discover fashion, furniture and
more….
• 800,000 platform visits per day
• 80 filter types across 21
categories
• Currently porting filter search
to ElasticSearch
Maintaining a Product Catalogue at Visual Meta
Product feeds are continuously synced from partner shops:
• Feed items must be categorised in order to be discoverable on the platform
We want to:
• Identify all variants of a product
• Compare offers across shops
• Make it easy for our for users to browse through millions of products
Model Colour Memory
Apple iPhone 6s Space Grey 32GB
Apple iPhone 6s Space Grey 128GB
Apple iPhone 6s Gold 32GB
Apple iPhone 6s Gold 128GB
Apple iPhone 6s Rose Gold 32GB
Apple iPhone 6s Rose Gold 128GB
Apple iPhone 6s Silver 32GB
Apple iPhone 6s Silver 128GB
Assigning Tags Based on Textual Attributes
8
String Matching
Index item names and descriptions, query product variant tag names against the index
Lucene query:
• +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s)
+(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey)
Test by manually assigning items to a random sample of products
Recall Precision Fscore
0.59 0.64 0.61
Error Analysis
Naming for the same product is not consistent across feeds:
1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)”
2. efg.com: “Apple iPhone 6 64 GB Space Grey”
3. xyz.com: “Apple iPhone 6”
Naming for the same product is not consistent within the same feed:
1. “Apple Iphone 6 - 64GB”
2. “Apple Iphone 6 64GB Space Grey”
3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone”
Wrongly categorised Products in the feed:
• “Cover for Apple Iphone 6 - 64GB”
Comparing Tag Names to Item Names
Comparing Names Between Item Feeds
Text Classification
13
Language Models
Drawbacks of bag of words / n-grams:
• Words are equally distant
• Vectors are sparse
Word embeddings capture semantics:
• Vectors are continuous
• Similar words are close in vector space
1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg
Corrado, Jeffrey Dean
15
Word2Vec for Mobile Phone Items
Mobile phone item corpus:
• 7,890 feed items
• 863k tokens, 41.5k unique
Closest words to “Galaxy”:
Word Cosine Distance
1 Samsung 0.51
2 S2 0.48
3 S5 0.46
Classification Performance
Tag Best BOW Classifier Decision Tree with Word2Vec
Fscore Precision Recall Fscore Precision Recall
“Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90
“Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80
“Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66
“Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58
“Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
Feed Enhancement
17
Two Descriptions of a Samsung TV
Samsung UE40H6400AK. Display diagonal:
101.6 cm (40"), HD type: Full HD, Display
resolution: 1920 x 1080 pixels. Tuner type:
Analog & Digital, Digital signal format
system: DVB-C, DVB-T. RMS rated power:
20 W. Consumer Electronics Control (CEC):
Anynet+. Picture processing technology:
Samsung Wide Color Enhancer
The Samsung UE40H6400 has a 101.6cm
screen size and a resolution of 1920 x
1080 pixels. It is a Full HD TV, has an
Analog & Digital tuner and comes with
Anynet+.
Generating Product Descriptions
Choosing what to say Deciding how to say it
3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
Two Descriptions of a Samsung Smartphone
Samsung SM-G920F, Galaxy. Display
diagonal: 12.9 cm (5.1"), Display
resolution: 2560 x 1440 pixels, Display
type: SAMOLED. Processor frequency: 2.1
GHz, Coprocessor frequency: 1.5 GHz.
Internal storage capacity: 32 GB, Internal
RAM: 3072 MB. Main camera resolution
(numeric): 16 MP, Video recording modes:
1080p, 2160p, Maximum frame rate: 30
fps. SIM card capability: Single SIM, SIM
card type: NanoSIM, 2G standards: GSM
The Samsung GALAXY S6 has a 12.9'
display with 2560 x 1440 pixel resolution.
It has a 2.1GHZ processor, a 16 megapixel
camera and 3072MB of internal RAM with
32GB of internal storage capacity.
Building Messages from a Product Catalogue
The Samsung Galaxy S6 has a 12.9' display
with 2560 x 1440 pixel resolution. It has a
2.1GHZ processor, a 16 megapixel camera
and 3072MB of internal RAM with 32GB of
internal storage capacity.
Making Product Discovery Conversational
22
Entity Recognition for Voice Search
Input - “I’d like some red adidas trainers”
Output:
• <brands, [adidas]>
• <categories, [trainers]>
• <colours, [red]>
234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
Lucene index is built from labels to tag tree
tokens
1. Word shingles are extracted from the input
query
2. Each shingle is queried against the index (top
down, greedy)
Labeled tokens are used to:
1. Query the product index
2. Keep track of the dialogue state
Using the Product Catalogue to Parse Queries
24
• “I’d like some red adidas trainers”
• “I’d like some red adidas”
• “like some red adidas trainers”
• “I’d like some red”
• “like some red adidas”
• “some red adidas trainers”
• ...
• “red”
• “adidas”
• “trainers”
Putting It all Together: Answering Queries
How big is the Samsung Galaxy S6’s screen?
The Samsung Galaxy S6 has a 12’9 display
How much RAM does it have?
It has 3072MB of RAM
Wrapping Up
26
Takeaways
1. Word embeddings, even when trained on limited data can:
a. provide significant improvement over bag of words models for text classification; and
b. reduce the amount of manually curated data required for the task
2. Product catalogues provide a rich information source for conversational apps
3. NLG can be utilised for product feed enhancement as well as conversation
Thank you
28

More Related Content

Viewers also liked

Viewers also liked (17)

Developing highly scalable applications with Symfony and RabbitMQ
Developing highly scalable applications with  Symfony and RabbitMQDeveloping highly scalable applications with  Symfony and RabbitMQ
Developing highly scalable applications with Symfony and RabbitMQ
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
 
NSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoNSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland Chapeco
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud.
 
Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricos
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
 
Hunting powerpoint
Hunting powerpointHunting powerpoint
Hunting powerpoint
 
Jake Fox Pd. 5
Jake Fox Pd. 5Jake Fox Pd. 5
Jake Fox Pd. 5
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSS
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC Riverside
 
Python Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesPython Pants Build System for Large Codebases
Python Pants Build System for Large Codebases
 
API Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceAPI Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation Experience
 

Similar to Applying NLP to product comparison at visual meta

Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
MongoDB
 

Similar to Applying NLP to product comparison at visual meta (20)

Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Transformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfTransformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdf
 
Oracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewOracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level Overview
 
World of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaWorld of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfua
 
Search enginebasics
Search enginebasicsSearch enginebasics
Search enginebasics
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex Systems
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Design Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonDesign Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems London
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combination
 
Gadget Store Application
Gadget Store ApplicationGadget Store Application
Gadget Store Application
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlow
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Applying NLP to product comparison at visual meta

  • 1. Applying NLP to Product Comparison at Visual Meta 1 Ross Turner Elasticsearch Meetup Berlin 22/02/17
  • 2. Overview Product Comparison on the Visual Meta Platform1 Applying NLP to Product Comparison Using NLP to Maintain a Product Catalogue2 Making Product Discovery Conversational3 2
  • 3. About Me Previously… • Researcher in Natural Language Generation (NLG) • Software Engineer on Local Search • Co-founder and Principal Engineer at an NLG Start Up Currently… • Engineering Head at Visual Meta
  • 4. Product Comparison on the Visual Meta Platform 4
  • 5. Product Comparison at Visual Meta ‘All shops, one site’ • Online marketing platform with shopping portals in 12 different countries • 3 brands: Ladenzeile, ShopAlike, UmSóLugar • 100,000,000+ items • 6,000+ partner shops
  • 6. Faceted Search at Visual Meta Discover fashion, furniture and more…. • 800,000 platform visits per day • 80 filter types across 21 categories • Currently porting filter search to ElasticSearch
  • 7. Maintaining a Product Catalogue at Visual Meta Product feeds are continuously synced from partner shops: • Feed items must be categorised in order to be discoverable on the platform We want to: • Identify all variants of a product • Compare offers across shops • Make it easy for our for users to browse through millions of products Model Colour Memory Apple iPhone 6s Space Grey 32GB Apple iPhone 6s Space Grey 128GB Apple iPhone 6s Gold 32GB Apple iPhone 6s Gold 128GB Apple iPhone 6s Rose Gold 32GB Apple iPhone 6s Rose Gold 128GB Apple iPhone 6s Silver 32GB Apple iPhone 6s Silver 128GB
  • 8. Assigning Tags Based on Textual Attributes 8
  • 9. String Matching Index item names and descriptions, query product variant tag names against the index Lucene query: • +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s) +(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey) Test by manually assigning items to a random sample of products Recall Precision Fscore 0.59 0.64 0.61
  • 10. Error Analysis Naming for the same product is not consistent across feeds: 1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)” 2. efg.com: “Apple iPhone 6 64 GB Space Grey” 3. xyz.com: “Apple iPhone 6” Naming for the same product is not consistent within the same feed: 1. “Apple Iphone 6 - 64GB” 2. “Apple Iphone 6 64GB Space Grey” 3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone” Wrongly categorised Products in the feed: • “Cover for Apple Iphone 6 - 64GB”
  • 11. Comparing Tag Names to Item Names
  • 14. Language Models Drawbacks of bag of words / n-grams: • Words are equally distant • Vectors are sparse Word embeddings capture semantics: • Vectors are continuous • Similar words are close in vector space 1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
  • 15. 15 Word2Vec for Mobile Phone Items Mobile phone item corpus: • 7,890 feed items • 863k tokens, 41.5k unique Closest words to “Galaxy”: Word Cosine Distance 1 Samsung 0.51 2 S2 0.48 3 S5 0.46
  • 16. Classification Performance Tag Best BOW Classifier Decision Tree with Word2Vec Fscore Precision Recall Fscore Precision Recall “Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90 “Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80 “Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66 “Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58 “Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
  • 18. Two Descriptions of a Samsung TV Samsung UE40H6400AK. Display diagonal: 101.6 cm (40"), HD type: Full HD, Display resolution: 1920 x 1080 pixels. Tuner type: Analog & Digital, Digital signal format system: DVB-C, DVB-T. RMS rated power: 20 W. Consumer Electronics Control (CEC): Anynet+. Picture processing technology: Samsung Wide Color Enhancer The Samsung UE40H6400 has a 101.6cm screen size and a resolution of 1920 x 1080 pixels. It is a Full HD TV, has an Analog & Digital tuner and comes with Anynet+.
  • 19. Generating Product Descriptions Choosing what to say Deciding how to say it 3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
  • 20. Two Descriptions of a Samsung Smartphone Samsung SM-G920F, Galaxy. Display diagonal: 12.9 cm (5.1"), Display resolution: 2560 x 1440 pixels, Display type: SAMOLED. Processor frequency: 2.1 GHz, Coprocessor frequency: 1.5 GHz. Internal storage capacity: 32 GB, Internal RAM: 3072 MB. Main camera resolution (numeric): 16 MP, Video recording modes: 1080p, 2160p, Maximum frame rate: 30 fps. SIM card capability: Single SIM, SIM card type: NanoSIM, 2G standards: GSM The Samsung GALAXY S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 21. Building Messages from a Product Catalogue The Samsung Galaxy S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 22. Making Product Discovery Conversational 22
  • 23. Entity Recognition for Voice Search Input - “I’d like some red adidas trainers” Output: • <brands, [adidas]> • <categories, [trainers]> • <colours, [red]> 234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
  • 24. Lucene index is built from labels to tag tree tokens 1. Word shingles are extracted from the input query 2. Each shingle is queried against the index (top down, greedy) Labeled tokens are used to: 1. Query the product index 2. Keep track of the dialogue state Using the Product Catalogue to Parse Queries 24 • “I’d like some red adidas trainers” • “I’d like some red adidas” • “like some red adidas trainers” • “I’d like some red” • “like some red adidas” • “some red adidas trainers” • ... • “red” • “adidas” • “trainers”
  • 25. Putting It all Together: Answering Queries How big is the Samsung Galaxy S6’s screen? The Samsung Galaxy S6 has a 12’9 display How much RAM does it have? It has 3072MB of RAM
  • 27. Takeaways 1. Word embeddings, even when trained on limited data can: a. provide significant improvement over bag of words models for text classification; and b. reduce the amount of manually curated data required for the task 2. Product catalogues provide a rich information source for conversational apps 3. NLG can be utilised for product feed enhancement as well as conversation