SlideShare a Scribd company logo
1 of 43
What’s in a Query?
Understanding query
intent
Bharat Thakarar
Subhadeep Maji
Mohit Kumar
Flipkart confidential - For Internal use only. Not to be shared externally.
E-commerce Search
Query: rectangle
room mat
Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
E-commerce Search
Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
○ Products have key-value attributes
■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’;
Place of use: ‘Living room’
E-commerce Search
Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
○ Products have key-value attributes
■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’
● Intent of a query: ‘rectangle room mat’
○ Store: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets &
Rugs’
○ Attribute Tagging: <shape>: ‘rectangle’ <place of use>: ‘living
room’ <store>: ‘mat’
E-commerce Search
Flipkart confidential - For Internal use only. Not to be shared externally.
Life of a query - simplified view
Ranking
- Relevance
- Query independent
signals
- ...
Augmentation
- Normalisation
- Spell Correction
- Phrasing
- Stemming
- Synonymization
- ...
Intent Understanding
- Store identification
- Intent Tagging
- …
Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Customer Focused)
Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Customer Focused)
Lifestyle
Bigger Images, Less Text
Mobiles & Large
Spec heavy
Furniture
Aspect Ratio, Swatches
Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Internal)
● Establishes context for the query attribute tagging
○ Restricts labeling space
● Backend efficiency
● ...
Flipkart confidential - For Internal use only. Not to be shared externally.
● Source: click - log data (query -> products clicked ->
stores)
● Statistical aggregation of click measure
● Empirically determined confidence level for redirection
○ Sample data: ‘rectangle room mat’ : ‘Home Furnishing’ -> ‘Floor
Coverings’ -> ‘Carpets & Rugs’ : 95% confidence
Query to Store identification :
Statistical approach (baseline)
Flipkart confidential - For Internal use only. Not to be shared externally.
● Works on exact queries, memorises (no generalization)
● Cannot learn anything useful for verticals where query
volume and product clicks are low
Statistical approach -
Challenges
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification
● ML problem setup
○ Short text multi-label multi-class classification
○ Order of 10s L1 classes
● Model: Linear SVM (One vs All)
● Feature sets
○ BOW features (tf.idf)
○ Store name overlap features (tf.idf)
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Results
Before After
Query: canvas car body covers
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Results
Before After
Query: T-Series led tv
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Impact
● Backend metrics
○ Nearly 40% drop in queries without
store (saving valuable compute resources)
● First user path deployment of
ML platform’s modelhost
Backend requests without stores
Flipkart confidential - For Internal use only. Not to be shared externally.
● ML problem setup
○ Short text *multi-label* multi-class classification
○ Order of 1000s leaf stores
Leaf level store identification
Flipkart confidential - For Internal use only. Not to be shared externally.
● ML problem setup
○ Short text *multi-label* multi-class classification
○ Order of 1000s leaf stores
● Challenges in extending L1 model:
○ Data sparsity
■ Linear SVM (One vs All) scaling for 1000s of classes
○ BOW features (no generalisation, no sharing)
Leaf level store identification
Flipkart confidential - For Internal use only. Not to be shared externally.
● Approach: fastText
● Key idea(s):
○ Leverage word2vec (cbow) model
where instead of target word use label
instead
○ Hierarchical softmax - scaling to large
number of classes
Leaf level store identification
fastText: https://github.com/facebookresearch/fastText
Flipkart confidential - For Internal use only. Not to be shared externally.
Leaf level store identification:
How were challenges addressed?
● Data sparsity
○ Using catalog data for seeding the embeddings
○ Helps learn with less amount of labeled data
● BOW features (no generalisation, no sharing)
○ Embeddings help in the abstraction
Flipkart confidential - For Internal use only. Not to be shared externally.
● Significant A/B metrics
○ +3 bps Search Conversion
○ +2 bps Visit Conversion
● SQA analysis (PBAGE): 8% improvement
Leaf level store identification - Impact
Flipkart confidential - For Internal use only. Not to be shared externally.
● Classifier trained only on catalog space (lot more labeled
data) didn’t work well in query space as-is
● Seed embeddings trained with store context in catalog
space work
Leaf level store identification:
Some Learnings
Flipkart confidential - For Internal use only. Not to be shared externally.
Life of a query - simplified view
Ranking
- Relevance
- Query independent
signals
- ...
Augmentation
- Normalisation
- Spell Correction
- Phrasing
- Stemming
- Synonymization
- ...
Intent Understanding
- Store identification
- Intent Tagging
- …
Flipkart confidential - For Internal use only. Not to be shared externally.
Given a query predict the attributes that best describe
the terms (chunks) in the query
Query: kids party dress 4-5 years pack of 2
Tagging <ideal_for>: kids <occasion>: party <store>:
dress <size>: 4-5 years <pack_of>: pack of 2
Query Intent Tagging
Flipkart confidential - For Internal use only. Not to be shared externally.
● Use Query product click through logs
● For each query, click product pair
○ Identify the attributes matched from product description
to query tokens
○ Store the fraction of the match to attributes for each
query token
Statistical Aggregation
Flipkart confidential - For Internal use only. Not to be shared externally.
● Works on query token space, weak generalization
● Considers all clicks equally but clicks are noisy
● Cannot learn anything useful for verticals where
query volume is low
Limitations
Flipkart confidential - For Internal use only. Not to be shared externally.
● samsung galaxy j7
○ brand model_name model_name
● samsung galaxy j7 covers
○ designed_for designed_for category
Problem Complexity
Flipkart confidential - For Internal use only. Not to be shared externally.
Some Exploratory Analysis
● ~40 % catalog
tokens cannot be
identified
unambiguously
● “Cotton” appears
in vocabulary of 23
attributes in
“HomeFurnishing”
Flipkart confidential - For Internal use only. Not to be shared externally.
● Attribute labelling at a position depends on tokens at
other positions in the query
● Attributes have affinity (brand, model_name) more
likely than (brand, color) in mobiles
Is Sequence necessary?
Flipkart confidential - For Internal use only. Not to be shared externally.
● Let X be the query s.t X = {x1, x2, . . . , xn} where xj is
a query token
● Let Y be the intent s.t Y = {y1, y2, . . . , yn} where yj ∈
attributes
Sequence Formulation
Flipkart confidential - For Internal use only. Not to be shared externally.
Supervised - Conditional
Random Field
Flipkart confidential - For Internal use only. Not to be shared externally.
● looks_like_attribute
○ Attributes like brand, color, model_name
○ Multinomial NB to generate features
● Defined over window at each position in query
● Global feature like is_alnum, is_shortword
Feature Design
Flipkart confidential - For Internal use only. Not to be shared externally.
● Moving from query token space to attribute feature
space, improves generalization
● Can generate multiple partial labellings, better
ranking of search results
What did we gain ?
Flipkart confidential - For Internal use only. Not to be shared externally.
● Significant A/B metrics
○ +5 bps Search Conversion
○ +2 bps Visit Conversion
● SQA analysis (PBAGE): 4% improvement
What did we gain ? Metrics
Flipkart confidential - For Internal use only. Not to be shared externally.
Query : samsung galaxy s7 edge 2017
Some Examples
AfterBefore
Flipkart confidential - For Internal use only. Not to be shared externally.
Query: Watches with steel belt with square dial
Some Examples..
AfterBefore
Flipkart confidential - For Internal use only. Not to be shared externally.
● Low volume of high confidence labeled data in some
verticals
● Click noise, users sometimes click randomly,
especially for lifestyle
● The labeled data for CRF suffers from above issues
Why CRF is not enough ?
Flipkart confidential - For Internal use only. Not to be shared externally.
Some Exploratory Analysis...
● Labeled data has low
coverage of on
unique queries ~ 10
%
● A supervised model
will fail to generalize
for these stores
Flipkart confidential - For Internal use only. Not to be shared externally.
● Generative vs a Discriminative setting like CRF
● Learning from unlabeled queries
● Catalog and limited labeled data used as weak
supervision
● WIP… research paper … production
Weakly-Supervised Models
Flipkart confidential - For Internal use only. Not to be shared externally.
Summary
Flipkart confidential - For Internal use only. Not to be shared externally.
● Pattern of solution evolution
○ Statistical -> Supervised -> Supervised ++ (side information)
● Common challenges
○ Not enough labeled data (side information / weak supervision)
○ Label/presentation bias
Summary
Flipkart confidential - For Internal use only. Not to be shared externally.
Query: ‘diamond ring’
Flipkart confidential - For Internal use only. Not to be shared externally.
Query: ‘diamond ring’
Flipkart confidential - For Internal use only. Not to be shared externally.
Questions ?

More Related Content

What's hot

Amazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience Demand
Amazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience DemandAmazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience Demand
Amazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience DemandTinuiti
 
Search Impression Share - Myths & Realities
Search Impression Share - Myths & RealitiesSearch Impression Share - Myths & Realities
Search Impression Share - Myths & RealitiesPierre M. Fiorini, Ph.D.
 
Social Media Marketing Roadmap - TopRankMarketing.com
Social Media Marketing Roadmap - TopRankMarketing.comSocial Media Marketing Roadmap - TopRankMarketing.com
Social Media Marketing Roadmap - TopRankMarketing.comTopRank Marketing Agency
 
What is Technical SEO ?
What is Technical SEO ? What is Technical SEO ?
What is Technical SEO ? intern_jaguar
 
Whatsapp | Success Diaries
Whatsapp | Success DiariesWhatsapp | Success Diaries
Whatsapp | Success DiariesKalaari Capital
 
Top 10 reasons to learn Digital marketing
Top 10 reasons to learn Digital marketingTop 10 reasons to learn Digital marketing
Top 10 reasons to learn Digital marketingSimplilearn
 

What's hot (6)

Amazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience Demand
Amazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience DemandAmazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience Demand
Amazon DSP Strategy: How to Leverage DSP Capabilities to Capture Audience Demand
 
Search Impression Share - Myths & Realities
Search Impression Share - Myths & RealitiesSearch Impression Share - Myths & Realities
Search Impression Share - Myths & Realities
 
Social Media Marketing Roadmap - TopRankMarketing.com
Social Media Marketing Roadmap - TopRankMarketing.comSocial Media Marketing Roadmap - TopRankMarketing.com
Social Media Marketing Roadmap - TopRankMarketing.com
 
What is Technical SEO ?
What is Technical SEO ? What is Technical SEO ?
What is Technical SEO ?
 
Whatsapp | Success Diaries
Whatsapp | Success DiariesWhatsapp | Success Diaries
Whatsapp | Success Diaries
 
Top 10 reasons to learn Digital marketing
Top 10 reasons to learn Digital marketingTop 10 reasons to learn Digital marketing
Top 10 reasons to learn Digital marketing
 

Similar to What’s in a Query? Understanding query intent

Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019FlipkartStories
 
Slash n 2018 - Just In Time Personalization
Slash n  2018 - Just In Time Personalization Slash n  2018 - Just In Time Personalization
Slash n 2018 - Just In Time Personalization FlipkartStories
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkartStories
 
Emergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly DropEmergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly DropSearch Engine Journal
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Somnath Banerjee
 
JAB2012 Smart Search Presentation
JAB2012 Smart Search PresentationJAB2012 Smart Search Presentation
JAB2012 Smart Search PresentationChris Davenport
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
 
How to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdfHow to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdfVolume Nine
 
Course outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seoCourse outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seozameerulhasaann
 
Selling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style GuideSelling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style GuideTinuiti
 
Startup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor PitchStartup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor PitchMichael Skok
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
 
ChatGPT For Business Use
ChatGPT For Business UseChatGPT For Business Use
ChatGPT For Business UseSanjay Willie
 
WWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias HeimbeckjanWWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias Heimbeckjanwebwinkelvakdag
 
How to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured DataHow to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured DataMartin Tang
 
10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot Solution10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot SolutionPardot
 

Similar to What’s in a Query? Understanding query intent (20)

Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019
 
Slash n 2018 - Just In Time Personalization
Slash n  2018 - Just In Time Personalization Slash n  2018 - Just In Time Personalization
Slash n 2018 - Just In Time Personalization
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 reprise
 
Emergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly DropEmergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly Drop
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
JAB2012 Smart Search Presentation
JAB2012 Smart Search PresentationJAB2012 Smart Search Presentation
JAB2012 Smart Search Presentation
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
How to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdfHow to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdf
 
Timothy Resnik - Advanced Search Summit Napa 2021
Timothy Resnik - Advanced Search Summit Napa 2021Timothy Resnik - Advanced Search Summit Napa 2021
Timothy Resnik - Advanced Search Summit Napa 2021
 
Course outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seoCourse outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seo
 
Keyword research webinar 256
Keyword research   webinar 256Keyword research   webinar 256
Keyword research webinar 256
 
Selling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style GuideSelling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style Guide
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Startup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor PitchStartup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor Pitch
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 
ChatGPT For Business Use
ChatGPT For Business UseChatGPT For Business Use
ChatGPT For Business Use
 
WWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias HeimbeckjanWWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias Heimbeckjan
 
How to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured DataHow to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured Data
 
Amazon mp
Amazon mpAmazon mp
Amazon mp
 
10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot Solution10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot Solution
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

What’s in a Query? Understanding query intent

  • 1. What’s in a Query? Understanding query intent Bharat Thakarar Subhadeep Maji Mohit Kumar
  • 2. Flipkart confidential - For Internal use only. Not to be shared externally. E-commerce Search Query: rectangle room mat
  • 3. Flipkart confidential - For Internal use only. Not to be shared externally. ● Search over structured product catalog ○ Products belong to a ‘store’ ■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ E-commerce Search
  • 4. Flipkart confidential - For Internal use only. Not to be shared externally. ● Search over structured product catalog ○ Products belong to a ‘store’ ■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ ○ Products have key-value attributes ■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’ E-commerce Search
  • 5. Flipkart confidential - For Internal use only. Not to be shared externally. ● Search over structured product catalog ○ Products belong to a ‘store’ ■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ ○ Products have key-value attributes ■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’ ● Intent of a query: ‘rectangle room mat’ ○ Store: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ ○ Attribute Tagging: <shape>: ‘rectangle’ <place of use>: ‘living room’ <store>: ‘mat’ E-commerce Search
  • 6. Flipkart confidential - For Internal use only. Not to be shared externally. Life of a query - simplified view Ranking - Relevance - Query independent signals - ... Augmentation - Normalisation - Spell Correction - Phrasing - Stemming - Synonymization - ... Intent Understanding - Store identification - Intent Tagging - …
  • 7. Flipkart confidential - For Internal use only. Not to be shared externally. Query to Store identification : Why? (Customer Focused)
  • 8. Flipkart confidential - For Internal use only. Not to be shared externally. Query to Store identification : Why? (Customer Focused) Lifestyle Bigger Images, Less Text Mobiles & Large Spec heavy Furniture Aspect Ratio, Swatches
  • 9. Flipkart confidential - For Internal use only. Not to be shared externally. Query to Store identification : Why? (Internal) ● Establishes context for the query attribute tagging ○ Restricts labeling space ● Backend efficiency ● ...
  • 10. Flipkart confidential - For Internal use only. Not to be shared externally. ● Source: click - log data (query -> products clicked -> stores) ● Statistical aggregation of click measure ● Empirically determined confidence level for redirection ○ Sample data: ‘rectangle room mat’ : ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ : 95% confidence Query to Store identification : Statistical approach (baseline)
  • 11. Flipkart confidential - For Internal use only. Not to be shared externally. ● Works on exact queries, memorises (no generalization) ● Cannot learn anything useful for verticals where query volume and product clicks are low Statistical approach - Challenges
  • 12. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification ● ML problem setup ○ Short text multi-label multi-class classification ○ Order of 10s L1 classes ● Model: Linear SVM (One vs All) ● Feature sets ○ BOW features (tf.idf) ○ Store name overlap features (tf.idf)
  • 13. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification: Results Before After Query: canvas car body covers
  • 14. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification: Results Before After Query: T-Series led tv
  • 15. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification: Impact ● Backend metrics ○ Nearly 40% drop in queries without store (saving valuable compute resources) ● First user path deployment of ML platform’s modelhost Backend requests without stores
  • 16. Flipkart confidential - For Internal use only. Not to be shared externally. ● ML problem setup ○ Short text *multi-label* multi-class classification ○ Order of 1000s leaf stores Leaf level store identification
  • 17. Flipkart confidential - For Internal use only. Not to be shared externally. ● ML problem setup ○ Short text *multi-label* multi-class classification ○ Order of 1000s leaf stores ● Challenges in extending L1 model: ○ Data sparsity ■ Linear SVM (One vs All) scaling for 1000s of classes ○ BOW features (no generalisation, no sharing) Leaf level store identification
  • 18. Flipkart confidential - For Internal use only. Not to be shared externally. ● Approach: fastText ● Key idea(s): ○ Leverage word2vec (cbow) model where instead of target word use label instead ○ Hierarchical softmax - scaling to large number of classes Leaf level store identification fastText: https://github.com/facebookresearch/fastText
  • 19. Flipkart confidential - For Internal use only. Not to be shared externally. Leaf level store identification: How were challenges addressed? ● Data sparsity ○ Using catalog data for seeding the embeddings ○ Helps learn with less amount of labeled data ● BOW features (no generalisation, no sharing) ○ Embeddings help in the abstraction
  • 20. Flipkart confidential - For Internal use only. Not to be shared externally. ● Significant A/B metrics ○ +3 bps Search Conversion ○ +2 bps Visit Conversion ● SQA analysis (PBAGE): 8% improvement Leaf level store identification - Impact
  • 21. Flipkart confidential - For Internal use only. Not to be shared externally. ● Classifier trained only on catalog space (lot more labeled data) didn’t work well in query space as-is ● Seed embeddings trained with store context in catalog space work Leaf level store identification: Some Learnings
  • 22. Flipkart confidential - For Internal use only. Not to be shared externally. Life of a query - simplified view Ranking - Relevance - Query independent signals - ... Augmentation - Normalisation - Spell Correction - Phrasing - Stemming - Synonymization - ... Intent Understanding - Store identification - Intent Tagging - …
  • 23. Flipkart confidential - For Internal use only. Not to be shared externally. Given a query predict the attributes that best describe the terms (chunks) in the query Query: kids party dress 4-5 years pack of 2 Tagging <ideal_for>: kids <occasion>: party <store>: dress <size>: 4-5 years <pack_of>: pack of 2 Query Intent Tagging
  • 24. Flipkart confidential - For Internal use only. Not to be shared externally. ● Use Query product click through logs ● For each query, click product pair ○ Identify the attributes matched from product description to query tokens ○ Store the fraction of the match to attributes for each query token Statistical Aggregation
  • 25. Flipkart confidential - For Internal use only. Not to be shared externally. ● Works on query token space, weak generalization ● Considers all clicks equally but clicks are noisy ● Cannot learn anything useful for verticals where query volume is low Limitations
  • 26. Flipkart confidential - For Internal use only. Not to be shared externally. ● samsung galaxy j7 ○ brand model_name model_name ● samsung galaxy j7 covers ○ designed_for designed_for category Problem Complexity
  • 27. Flipkart confidential - For Internal use only. Not to be shared externally. Some Exploratory Analysis ● ~40 % catalog tokens cannot be identified unambiguously ● “Cotton” appears in vocabulary of 23 attributes in “HomeFurnishing”
  • 28. Flipkart confidential - For Internal use only. Not to be shared externally. ● Attribute labelling at a position depends on tokens at other positions in the query ● Attributes have affinity (brand, model_name) more likely than (brand, color) in mobiles Is Sequence necessary?
  • 29. Flipkart confidential - For Internal use only. Not to be shared externally. ● Let X be the query s.t X = {x1, x2, . . . , xn} where xj is a query token ● Let Y be the intent s.t Y = {y1, y2, . . . , yn} where yj ∈ attributes Sequence Formulation
  • 30. Flipkart confidential - For Internal use only. Not to be shared externally. Supervised - Conditional Random Field
  • 31. Flipkart confidential - For Internal use only. Not to be shared externally. ● looks_like_attribute ○ Attributes like brand, color, model_name ○ Multinomial NB to generate features ● Defined over window at each position in query ● Global feature like is_alnum, is_shortword Feature Design
  • 32. Flipkart confidential - For Internal use only. Not to be shared externally. ● Moving from query token space to attribute feature space, improves generalization ● Can generate multiple partial labellings, better ranking of search results What did we gain ?
  • 33. Flipkart confidential - For Internal use only. Not to be shared externally. ● Significant A/B metrics ○ +5 bps Search Conversion ○ +2 bps Visit Conversion ● SQA analysis (PBAGE): 4% improvement What did we gain ? Metrics
  • 34. Flipkart confidential - For Internal use only. Not to be shared externally. Query : samsung galaxy s7 edge 2017 Some Examples AfterBefore
  • 35. Flipkart confidential - For Internal use only. Not to be shared externally. Query: Watches with steel belt with square dial Some Examples.. AfterBefore
  • 36. Flipkart confidential - For Internal use only. Not to be shared externally. ● Low volume of high confidence labeled data in some verticals ● Click noise, users sometimes click randomly, especially for lifestyle ● The labeled data for CRF suffers from above issues Why CRF is not enough ?
  • 37. Flipkart confidential - For Internal use only. Not to be shared externally. Some Exploratory Analysis... ● Labeled data has low coverage of on unique queries ~ 10 % ● A supervised model will fail to generalize for these stores
  • 38. Flipkart confidential - For Internal use only. Not to be shared externally. ● Generative vs a Discriminative setting like CRF ● Learning from unlabeled queries ● Catalog and limited labeled data used as weak supervision ● WIP… research paper … production Weakly-Supervised Models
  • 39. Flipkart confidential - For Internal use only. Not to be shared externally. Summary
  • 40. Flipkart confidential - For Internal use only. Not to be shared externally. ● Pattern of solution evolution ○ Statistical -> Supervised -> Supervised ++ (side information) ● Common challenges ○ Not enough labeled data (side information / weak supervision) ○ Label/presentation bias Summary
  • 41. Flipkart confidential - For Internal use only. Not to be shared externally. Query: ‘diamond ring’
  • 42. Flipkart confidential - For Internal use only. Not to be shared externally. Query: ‘diamond ring’
  • 43. Flipkart confidential - For Internal use only. Not to be shared externally. Questions ?