SlideShare a Scribd company logo
Taxonomical Semantical
Magical Search
OpenSource Connections
Doug Turnbull
Relevance Lead
dturnbull@o19s.com
@softwaredoug
© OpenSource Connections, 2017
Solr/ES consulting: team 100%
focused on relevance
Learn to rank – semantic search –
relevance – personalization – findability
Who are we?
© OpenSource Connections, 2017
Reflect:
What problem are you trying to solve
when you jump to 'semantic search'?
© OpenSource Connections, 2017
"We studied spontaneous word choice for objects in five
application-related domains, and found the variability to be
surprisingly large. In every case two people favored the same
term with probability <0.20. "
"Simulations show how this fundamental
property of language limits the success of
various design methodologies for
vocabulary-driven interaction. "
© OpenSource Connections, 2017
Solve with keyword stuffing?
- Content creators guarantee every "shoe" has a
"shoe" keyword somewhere!
- And every wing-tip mentions dress shoes…
- ...Ad infinitum…
© OpenSource Connections, 2017
Solve with tagging?
- Java is a type of JVM language. Should this be
tagged JVM too? What is a "query string"? Which
of these tags is useful for search?
- Who tags everything? Is it consistent? What are
the rules?
(taken from Stackoverflow)
© OpenSource Connections, 2017
Solve with synonyms?
Yes! Synonyms can be a tool that can help us. But
it's easy to mess up:
shoes => dress shoes
wing tips,shoes
tennis shoes,shoes
When I search for tennis shoes, why do I get wing
tips; why do I get dresses?!?
© OpenSource Connections, 2017
Talking teaches/reminds vocab
(Searching)
shoes dress shoes brown wing tips
Searcher learning:
results gives clues to
help shopper refine
further
Searcher trusting:
more confident on
terms to use
Searcher
uncertain: uses
broad queries to
experiment
© OpenSource Connections, 2017
Searchers get more specific...
wing tips
Hierarchy of Ideas:
NP (item): "wing tips"
type_of:"dress shoes"
type_of:"shoe"
shoes
NP(item): "shoe"
More
specific
© OpenSource Connections, 2017
… and try types of modifiers
wing tips
NP (item): "wing tips"
type_of:"dress shoes"
type_of:"shoe"
sapphire wing tips
NP (item): "wing tips"
type_of:"dress shoes"
type_of:"shoe"
ADJ (color) "sapphire"
type_of:"blue"
© OpenSource Connections, 2017
Semantic search:
enable semantic exploration
Low term specificity:
search term specifies a
wide category to explore
Searching for "shoes"
High term specificity:
search term too specific, try
semantically broader/similar
items
"Show 'dress shoes' for
'oxfords' "
© OpenSource Connections, 2017
Make Solr grok type-of relationships
"wing tip" is a type of "dress shoe" is a type of "shoe"
Search here, only
show wing tips
Search here, show all
things that are a
type-of shoe
Beyond the actual terms used in docs
© OpenSource Connections, 2017
Per-entity terms a taxonomy
Shoes
Athletic Shoes
Dress Shoes
High Heels
Oxfords
Wing Tips
Running Shoes
Tennis Shoes
Blue Sapphire
Sky blue
A search taxonomy (not the
taxonomy for your site nav)
© OpenSource Connections, 2017
Index-time tax. expansion
Item
Color
Size
Substrings ->
Entities
Expand to
broad/narrow
tennis shoes => footwearshoesathletictennis_shoes
sapphire => bluesapphire
© OpenSource Connections, 2017
In Solr...
Item
Color
Size
Possible to build from
simple keepwords
Query or Index time
synonyms uses TF*IDF of
concept
Substrings ->
Entities
Expand to
broad/narrow
tennis shoes => tennis_shoes,athletic_shoes,shoes,...
sapphire => sapphire,blue
© OpenSource Connections, 2017
In Solr, index time...
(Input Text) You will love these maroon dress shoes
(tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes]
compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes]
Keepwords for entity [dress_shoes]
Semantic expansion (syn filter) [dress_shoes] [shoes]
(Input Text) You will love these maroon dress shoes
(tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes]
compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes]
Keepwords for entity [maroon]
Semantic expansion (syn filter) [maroon] [brown]
"Item"
copy
field
"Color"
copy
field
© OpenSource Connections, 2017
Index time solution
(Input Text) brown wing tips
(Item analyzer output) [wing_tips] [dress_shoes]
[shoes]
(Input Text) brown wing tips
(Color analyzer output) [brown]
Matches maroon, because at index
time: maroon => brown, maroon
IDF Highest for wing_tips
Lowest for shoes
(eliminate TF? norms?)
q=brown wing tips
&defType=edismax
&sow=false
&qf=item^100 color^10
(you'll want to search more than
these semantic fields)
© OpenSource Connections, 2017
Query-time tax. expansion
How do users think
of your items?
Item
Color
Size
Trained/built
From Query logs
Substrings ->
Entities
Expand to
broad/narrow
tennis shoes => item:"tennis shoes" OR item:"athletic
shoes" OR item:"shoes" ...
sapphire => color:blue OR color:sapphire
sapphire tennis shoes
© OpenSource Connections, 2017
Query Phrase In Solr...
(Input Text) Brown wing tips
Semantic expansion (syn filter) [wing tips] [dress shoes] [shoes]
(Input Text) Brown wing tips
Semantic expansion (syn filter) [brown] [maroon]
Item
Semantic
Analyzer
Color
Semantic
Analyzer
Transform to description("dress shoes" OR "wing tips" OR shoes OR maroon OR brown)
Problems:
- two query analyzers for same field not possible in Solr
- Can't re-tokenize [dress shoes] -> "dress shoes" phrase q
© OpenSource Connections, 2017
Match Query Parserhttps://github.com/o19s/match-query-parser
q=brown wing tips
&defType=edismax
&qf=description title
&bq={!match analyze_as=item_tax search_with=phrase qf=description
v=$q}^100
&bq={!match analyze_as=color_tax search_with=phrase qf=description v=$q}
How to analyze
query string
Phrase: retokenize
multi word tokens and
do phrase search
© OpenSource Connections, 2017
Other building blocks
Auto Phrase Token Filter / Query Auto Filtering:
- https://github.com/lucidworks/auto-phrase-tokenfilter
- https://lucidworks.com/2015/02/17/introducing-query-autofiltering/
Health-on-net Lucene Synonyms
- https://github.com/healthonnet/hon-lucene-synonyms
Sematext Query Segmenter:
- https://github.com/sematext/query-segmenter
Shopping 24 Bmax Query Parser
- https://github.com/shopping24/solr-bmax-queryparser
© OpenSource Connections, 2017
Deriving Querqy rules from taxonomies
https://github.com/renekrie/querqy
© OpenSource Connections, 2017
Query Time vs Index Time
Query Time:
PROS
- No need to reindex when
updating managed vocab
CONS
- Relevance scoring of terms
(boosts help)
- Complex / slow queries
Index Time:
PROS
- TF*IDF more accurate scoring
(broad concepts score low,
narrow score high)
- Faster queries
CONS
- Reindexing for synonym
changes
© OpenSource Connections, 2017
Structure your docs for query understanding
Relevance engineer's challenge:
- Where can we begin with a taxonomy?
- Reuse filters & facets
- Reuse your page's navigational taxonomy?
- Track which searches land on pages (old school click
tracking)?
- Zero results tracking?
- How do we incentivize content creators to move away from
keyword stuffing to organizing to search keyword taxonomy?
- Finally: we don't care about the source data model, only what helps
users find things
© OpenSource Connections, 2017
SHReC Algorithm
© OpenSource Connections, 2017
SHReC Algorithm
Simple doc frequency in-content to look for super-concepts / sub-concepts
term/phrase x subsumes y (x parent concept?) when:
df(x) > df(y)
df(x ∧ y) / df(y) >= α (α = 1 complete subsumption)
© OpenSource Connections, 2017
SHReC Algorithm Example
Shoes
Wing Tips
df("shoes") > df("wing tips")
df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8
© OpenSource Connections, 2017
SHReC Algorithm with Solr
Shoes
Wing Tips
df("shoes") > df("wing tips")
df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8
Cache doc freq (q=*:*&facet.field=item&facet=true)
q=item:"wing tips" AND item:shoes, num results
© OpenSource Connections, 2017
Unfortunately reality is messy
Shoes
Wing Tips
Your data
probably
looks like
© OpenSource Connections, 2017
Idea:mine other corpus?
Shoes Wing Tips
● but still, what
phrases do
you test?
© OpenSource Connections, 2017
Statistically sig. colocations
Wing Tips
WingTips
Student t-test against null hypothesis that wing / tips
unrelated
© OpenSource Connections, 2017
Refinements
shoe
dress shoe (12%) wing tip (23%)
tennis shoe (11%)
blue dress shoe (1%)
sapphire brooks brothers dress shoe (0.001%)
brown dress shoe (20%)
Colors scattered
throughout
Sub
concepts,
likely child
phrases
tennis shoe (11%)
Siblings refine
each other
running shoe (34%)
Should these be in
supercategory
"athletic shoes"?
© OpenSource Connections, 2017
Refinement mining in Solr
docs = [{
"query": "shoe"
"refinement": "dress shoe"
},
{
"query": "shoe"
"refinement": "brown shoe"
},
{
"query": "tie"
"refinement": "brown tie"
}]
q=query:shoe&
facet=true&
facet.field=refinement
Refinements:
- dress shoe (4)
- tennis shoe (2)
- ...
© OpenSource Connections, 2017
SHReC w/ Refinements
docs = [{
"query": "shoe"
"refinement": "dress shoe"
},
{
"query": "shoe"
"refinement": "brown shoe"
},
{
"query": "tie"
"refinement": "brown tie"
}]
q=query:shoe&
facet=true&
facet.field=refinement
© OpenSource Connections, 2017
SHReC w/ Refinements
q=query:shoe&
facet=true&
facet.field=refinement
Num results for q=shoe
(Slow, but you do this rarely)
Seed the
corpus
exploration
SHReC
© OpenSource Connections, 2017
SHReC w/ sig terms
scoreNodes(
select(
facet(collectionName,
q="query:shoes",
buckets="refinements",
bucketSorts="count(*) desc",
bucketSizeLimit="100",
count(*)),
refine_graph as node,
"count(*)",
replace(collection, null, withValue=collectionName),
replace(field, null, withValue=refine_graph))
)
What's actually
happening in
SHReC is
significance
scoring, which is
baked into Solr:
Relationship of
local vs global
© OpenSource Connections, 2017
Other ways of measuring term stat. significance
● Trey G. Solr knowledge graph (hope you saw his
talk)!
https://lucidworks.com/video/leveraging-lucenesolr-as
-a-knowledge-graph-and-intent-engine/
● Mark Harwood Elastic Graph / Sig Terms
https://www.elastic.co/elasticon/conf/2016/sf/graph-c
apabilities-in-the-elastic-stack
© OpenSource Connections, 2017
But word2vec, LDA, etc
- Focused on content, not users: Focused on discovering topics/synonyms in
content: we often need search query to content vernacular mappings
- Traditional topic modeling flat
- Hierarchies extracted from content don't reflect user's hierarchies & how they
map to content
- Don't confuse co-occurences with synonyms without extensive data
modeling/munging to get your content here
© OpenSource Connections, 2017
Questions?
Further Reading:
- Relevant Search!
- Blog articles:
- Building Entity-focused search w/ Keyphrases:
- http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synony
ms-better-patterns-keyphrases/
- Synonym best practices:
- http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-p
atterns-taxonomies/
- Match Query Parser:
- http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multite
rm-synonyms/
Discount code: relsearch
http://manning.com
- <shoutout BLOOOMBERG!!>
- We built a learning to rank plugin for that other
search engine...
Shameless plug

More Related Content

Similar to Taxonomical Semantical Magical Search - Doug Turnbull, OpenSource Connections

Effective web search techniques
Effective web search techniquesEffective web search techniques
Effective web search techniques
aliciafe0215
 
Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014
Anders Häggdahl
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
richwig
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
Andy Stretton
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
Databricks
 
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
National Information Standards Organization (NISO)
 
Swot Analysis Essay.pdf
Swot Analysis Essay.pdfSwot Analysis Essay.pdf
Swot Analysis Essay.pdf
Evelin Santos
 
Swot Analysis Essay
Swot Analysis EssaySwot Analysis Essay
Swot Analysis Essay
Jessica Hunter
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
Key Phrases for Better Search
Key Phrases for Better SearchKey Phrases for Better Search
Key Phrases for Better Search
Sematext Group, Inc.
 
Constructing your search
Constructing your searchConstructing your search
Constructing your search
Jamie Bisset
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 
#1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong #1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong
One North
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Lucidworks
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
Amazon Web Services
 
Quepy
QuepyQuepy
Quepy
dmoisset
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
PyData
 
yn
ynyn
Information Architecture
Information ArchitectureInformation Architecture
Information Architecture
Olivier Tripet
 
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?
Fred Leise
 

Similar to Taxonomical Semantical Magical Search - Doug Turnbull, OpenSource Connections (20)

Effective web search techniques
Effective web search techniquesEffective web search techniques
Effective web search techniques
 
Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014Learn more about Entity Extraction May 2014
Learn more about Entity Extraction May 2014
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
 
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary:  Real-World A...
Hlava, Davis, Corson-Rikert, and Parr "Control Your Vocabulary: Real-World A...
 
Swot Analysis Essay.pdf
Swot Analysis Essay.pdfSwot Analysis Essay.pdf
Swot Analysis Essay.pdf
 
Swot Analysis Essay
Swot Analysis EssaySwot Analysis Essay
Swot Analysis Essay
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Key Phrases for Better Search
Key Phrases for Better SearchKey Phrases for Better Search
Key Phrases for Better Search
 
Constructing your search
Constructing your searchConstructing your search
Constructing your search
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
#1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong #1NLab17 - Eight for Eight: Finishing Strong
#1NLab17 - Eight for Eight: Finishing Strong
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
Quepy
QuepyQuepy
Quepy
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
yn
ynyn
yn
 
Information Architecture
Information ArchitectureInformation Architecture
Information Architecture
 
Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?Why Are Taxonomies Necessary?
Why Are Taxonomies Necessary?
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 

Recently uploaded (20)

Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 

Taxonomical Semantical Magical Search - Doug Turnbull, OpenSource Connections

  • 1. Taxonomical Semantical Magical Search OpenSource Connections Doug Turnbull Relevance Lead dturnbull@o19s.com @softwaredoug © OpenSource Connections, 2017
  • 2. Solr/ES consulting: team 100% focused on relevance Learn to rank – semantic search – relevance – personalization – findability Who are we?
  • 3. © OpenSource Connections, 2017 Reflect: What problem are you trying to solve when you jump to 'semantic search'?
  • 4. © OpenSource Connections, 2017 "We studied spontaneous word choice for objects in five application-related domains, and found the variability to be surprisingly large. In every case two people favored the same term with probability <0.20. " "Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. "
  • 5. © OpenSource Connections, 2017 Solve with keyword stuffing? - Content creators guarantee every "shoe" has a "shoe" keyword somewhere! - And every wing-tip mentions dress shoes… - ...Ad infinitum…
  • 6. © OpenSource Connections, 2017 Solve with tagging? - Java is a type of JVM language. Should this be tagged JVM too? What is a "query string"? Which of these tags is useful for search? - Who tags everything? Is it consistent? What are the rules? (taken from Stackoverflow)
  • 7. © OpenSource Connections, 2017 Solve with synonyms? Yes! Synonyms can be a tool that can help us. But it's easy to mess up: shoes => dress shoes wing tips,shoes tennis shoes,shoes When I search for tennis shoes, why do I get wing tips; why do I get dresses?!?
  • 8. © OpenSource Connections, 2017 Talking teaches/reminds vocab (Searching) shoes dress shoes brown wing tips Searcher learning: results gives clues to help shopper refine further Searcher trusting: more confident on terms to use Searcher uncertain: uses broad queries to experiment
  • 9. © OpenSource Connections, 2017 Searchers get more specific... wing tips Hierarchy of Ideas: NP (item): "wing tips" type_of:"dress shoes" type_of:"shoe" shoes NP(item): "shoe" More specific
  • 10. © OpenSource Connections, 2017 … and try types of modifiers wing tips NP (item): "wing tips" type_of:"dress shoes" type_of:"shoe" sapphire wing tips NP (item): "wing tips" type_of:"dress shoes" type_of:"shoe" ADJ (color) "sapphire" type_of:"blue"
  • 11. © OpenSource Connections, 2017 Semantic search: enable semantic exploration Low term specificity: search term specifies a wide category to explore Searching for "shoes" High term specificity: search term too specific, try semantically broader/similar items "Show 'dress shoes' for 'oxfords' "
  • 12. © OpenSource Connections, 2017 Make Solr grok type-of relationships "wing tip" is a type of "dress shoe" is a type of "shoe" Search here, only show wing tips Search here, show all things that are a type-of shoe Beyond the actual terms used in docs
  • 13. © OpenSource Connections, 2017 Per-entity terms a taxonomy Shoes Athletic Shoes Dress Shoes High Heels Oxfords Wing Tips Running Shoes Tennis Shoes Blue Sapphire Sky blue A search taxonomy (not the taxonomy for your site nav)
  • 14. © OpenSource Connections, 2017 Index-time tax. expansion Item Color Size Substrings -> Entities Expand to broad/narrow tennis shoes => footwearshoesathletictennis_shoes sapphire => bluesapphire
  • 15. © OpenSource Connections, 2017 In Solr... Item Color Size Possible to build from simple keepwords Query or Index time synonyms uses TF*IDF of concept Substrings -> Entities Expand to broad/narrow tennis shoes => tennis_shoes,athletic_shoes,shoes,... sapphire => sapphire,blue
  • 16. © OpenSource Connections, 2017 In Solr, index time... (Input Text) You will love these maroon dress shoes (tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes] compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes] Keepwords for entity [dress_shoes] Semantic expansion (syn filter) [dress_shoes] [shoes] (Input Text) You will love these maroon dress shoes (tokenization & maybe stemming) [you] [will] [love] [these] [maroon] [dress] [shoes] compound/decompound (syn filter) [you] [will] [love] [these] [maroon] [dress_shoes] Keepwords for entity [maroon] Semantic expansion (syn filter) [maroon] [brown] "Item" copy field "Color" copy field
  • 17. © OpenSource Connections, 2017 Index time solution (Input Text) brown wing tips (Item analyzer output) [wing_tips] [dress_shoes] [shoes] (Input Text) brown wing tips (Color analyzer output) [brown] Matches maroon, because at index time: maroon => brown, maroon IDF Highest for wing_tips Lowest for shoes (eliminate TF? norms?) q=brown wing tips &defType=edismax &sow=false &qf=item^100 color^10 (you'll want to search more than these semantic fields)
  • 18. © OpenSource Connections, 2017 Query-time tax. expansion How do users think of your items? Item Color Size Trained/built From Query logs Substrings -> Entities Expand to broad/narrow tennis shoes => item:"tennis shoes" OR item:"athletic shoes" OR item:"shoes" ... sapphire => color:blue OR color:sapphire sapphire tennis shoes
  • 19. © OpenSource Connections, 2017 Query Phrase In Solr... (Input Text) Brown wing tips Semantic expansion (syn filter) [wing tips] [dress shoes] [shoes] (Input Text) Brown wing tips Semantic expansion (syn filter) [brown] [maroon] Item Semantic Analyzer Color Semantic Analyzer Transform to description("dress shoes" OR "wing tips" OR shoes OR maroon OR brown) Problems: - two query analyzers for same field not possible in Solr - Can't re-tokenize [dress shoes] -> "dress shoes" phrase q
  • 20. © OpenSource Connections, 2017 Match Query Parserhttps://github.com/o19s/match-query-parser q=brown wing tips &defType=edismax &qf=description title &bq={!match analyze_as=item_tax search_with=phrase qf=description v=$q}^100 &bq={!match analyze_as=color_tax search_with=phrase qf=description v=$q} How to analyze query string Phrase: retokenize multi word tokens and do phrase search
  • 21. © OpenSource Connections, 2017 Other building blocks Auto Phrase Token Filter / Query Auto Filtering: - https://github.com/lucidworks/auto-phrase-tokenfilter - https://lucidworks.com/2015/02/17/introducing-query-autofiltering/ Health-on-net Lucene Synonyms - https://github.com/healthonnet/hon-lucene-synonyms Sematext Query Segmenter: - https://github.com/sematext/query-segmenter Shopping 24 Bmax Query Parser - https://github.com/shopping24/solr-bmax-queryparser
  • 22. © OpenSource Connections, 2017 Deriving Querqy rules from taxonomies https://github.com/renekrie/querqy
  • 23. © OpenSource Connections, 2017 Query Time vs Index Time Query Time: PROS - No need to reindex when updating managed vocab CONS - Relevance scoring of terms (boosts help) - Complex / slow queries Index Time: PROS - TF*IDF more accurate scoring (broad concepts score low, narrow score high) - Faster queries CONS - Reindexing for synonym changes
  • 24. © OpenSource Connections, 2017 Structure your docs for query understanding Relevance engineer's challenge: - Where can we begin with a taxonomy? - Reuse filters & facets - Reuse your page's navigational taxonomy? - Track which searches land on pages (old school click tracking)? - Zero results tracking? - How do we incentivize content creators to move away from keyword stuffing to organizing to search keyword taxonomy? - Finally: we don't care about the source data model, only what helps users find things
  • 25. © OpenSource Connections, 2017 SHReC Algorithm
  • 26. © OpenSource Connections, 2017 SHReC Algorithm Simple doc frequency in-content to look for super-concepts / sub-concepts term/phrase x subsumes y (x parent concept?) when: df(x) > df(y) df(x ∧ y) / df(y) >= α (α = 1 complete subsumption)
  • 27. © OpenSource Connections, 2017 SHReC Algorithm Example Shoes Wing Tips df("shoes") > df("wing tips") df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8
  • 28. © OpenSource Connections, 2017 SHReC Algorithm with Solr Shoes Wing Tips df("shoes") > df("wing tips") df("shoes" ∧ "wing tips") / df("wing tips") >= 0.8 Cache doc freq (q=*:*&facet.field=item&facet=true) q=item:"wing tips" AND item:shoes, num results
  • 29. © OpenSource Connections, 2017 Unfortunately reality is messy Shoes Wing Tips Your data probably looks like
  • 30. © OpenSource Connections, 2017 Idea:mine other corpus? Shoes Wing Tips ● but still, what phrases do you test?
  • 31. © OpenSource Connections, 2017 Statistically sig. colocations Wing Tips WingTips Student t-test against null hypothesis that wing / tips unrelated
  • 32. © OpenSource Connections, 2017 Refinements shoe dress shoe (12%) wing tip (23%) tennis shoe (11%) blue dress shoe (1%) sapphire brooks brothers dress shoe (0.001%) brown dress shoe (20%) Colors scattered throughout Sub concepts, likely child phrases tennis shoe (11%) Siblings refine each other running shoe (34%) Should these be in supercategory "athletic shoes"?
  • 33. © OpenSource Connections, 2017 Refinement mining in Solr docs = [{ "query": "shoe" "refinement": "dress shoe" }, { "query": "shoe" "refinement": "brown shoe" }, { "query": "tie" "refinement": "brown tie" }] q=query:shoe& facet=true& facet.field=refinement Refinements: - dress shoe (4) - tennis shoe (2) - ...
  • 34. © OpenSource Connections, 2017 SHReC w/ Refinements docs = [{ "query": "shoe" "refinement": "dress shoe" }, { "query": "shoe" "refinement": "brown shoe" }, { "query": "tie" "refinement": "brown tie" }] q=query:shoe& facet=true& facet.field=refinement
  • 35. © OpenSource Connections, 2017 SHReC w/ Refinements q=query:shoe& facet=true& facet.field=refinement Num results for q=shoe (Slow, but you do this rarely) Seed the corpus exploration SHReC
  • 36. © OpenSource Connections, 2017 SHReC w/ sig terms scoreNodes( select( facet(collectionName, q="query:shoes", buckets="refinements", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*)), refine_graph as node, "count(*)", replace(collection, null, withValue=collectionName), replace(field, null, withValue=refine_graph)) ) What's actually happening in SHReC is significance scoring, which is baked into Solr: Relationship of local vs global
  • 37. © OpenSource Connections, 2017 Other ways of measuring term stat. significance ● Trey G. Solr knowledge graph (hope you saw his talk)! https://lucidworks.com/video/leveraging-lucenesolr-as -a-knowledge-graph-and-intent-engine/ ● Mark Harwood Elastic Graph / Sig Terms https://www.elastic.co/elasticon/conf/2016/sf/graph-c apabilities-in-the-elastic-stack
  • 38. © OpenSource Connections, 2017 But word2vec, LDA, etc - Focused on content, not users: Focused on discovering topics/synonyms in content: we often need search query to content vernacular mappings - Traditional topic modeling flat - Hierarchies extracted from content don't reflect user's hierarchies & how they map to content - Don't confuse co-occurences with synonyms without extensive data modeling/munging to get your content here
  • 39. © OpenSource Connections, 2017 Questions? Further Reading: - Relevant Search! - Blog articles: - Building Entity-focused search w/ Keyphrases: - http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synony ms-better-patterns-keyphrases/ - Synonym best practices: - http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-p atterns-taxonomies/ - Match Query Parser: - http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multite rm-synonyms/ Discount code: relsearch http://manning.com
  • 40. - <shoutout BLOOOMBERG!!> - We built a learning to rank plugin for that other search engine... Shameless plug