Ted Sullivan
Building Smarter Search Applications
Using Built-In Knowledge Graphs and
Query Introspection
lucidworks.com
Senior Solutions Architect
Ted Sullivan
Building Smarter Search Applications
Using Built-In Knowledge Graphs (aka
Solr!) and Query Introspection
lucidworks.com
Senior Solutions Architect
Relevance - Precision - Recall
Do we put the cart before the horse?
Precision/Recall determine what matches and what doesn’t.
Relevance then computes the “best” matches from what is left.
Without focusing more on precision/recall first, we tend to have
Garbage In/Garbage Out
This is especially true in faceted search - relevance tuning can fix
the first few pages but the facets cannot be fixed!!!
Improving Precision starts …
with better phrase detection
Embarrassing “noise” hits are often due to phrase cross matches.
Synonyms can improve recall tremendously but they need some help in
Solr when they are multi-term
Stop words can be important for disambiguation within phrases:
“The Lady Is A Tramp” vs. “The Lady And The Tramp”
“To Be Or Not To Be”
Better Search: Detecting Noun Phrases
The basic technique is called “autophrasing” – recognizing when more
than one word represents just one thing.
Autophrasing – uses an extra knowledge-base file “autophrases.txt”
Query Autofiltering – uses the phrases that are stored as metadata
values in the index.
A Novel Approach to Natural Language Processing:
Mapping Noun and Verb phrases to metadata fields
“Who’s in The Who?”
Multi-term Synonym Demo
autophrases.txt
new york

new york state

empire state

new york city

new york new york

big apple

ny ny

city of new york

state of new york

ny state
synonyms.txt
new_york => new_york_state, new_york_city, big_apple,
new_york_new_york, ny_ny, nyc,empire_state,ny_state,
state_of_new_york
new_york_state,empire_state,ny_state, state_of_new_york
new_york_city,big_apple,new_york_new_york,

ny_ny,nyc, city_of_new_york
Multi-term Synonym Demo
This document is about new york state.
This document is about new york city.
There is a lot going on in NYC.
I heart the big apple.
The empire state is a great state.
New York, New York is a hellova town.
I am a native of the great state of New York.
New York New York City New York State
/select /autophrase
Query Autofiltering Implementation
Use Lucene FieldCache to build a map of field values to field names
(string fields)
Add synonym mappings from synonyms.txt and stemming to this
value(s) -> field(s) map
Use this map to discover noun phrases in the query that correspond to
field values in the index – longest contiguous phrase wins
Build filter or boost queries based on these discovered mappings
SOLR-7539 created 5/30/2015 - One comment so far “+1” - Bill Bell - Thanks Bill
Query Autofiltering – Basic Behavior
q = red socks -> fq=color:red&fq=product_type:socks
or bq=(color:red AND product_type:socks)^20
q = red lion socks -> fq=brand:”Red Lion”&fq=product_type:socks
q = scarlet chaise lounge -> color:red AND product_type:”Lounge Chair”
q = white dress shirts -> color:white AND product_type:”dress shirt”
q = white linen shirts -> ((brand:"White Linen" OR (color:white AND material:linen)) AND product_category:shirts)
q = white and grey dress shirts
((product_type:"dress shirt" OR ((product_type:dress OR product_category:dress) AND
(product_type:shirt OR product_category:shirt))) AND (color:(white OR grey) OR colors:(white AND grey)))
Query Autofiltering – Language And Logic
Logical or “Boolean” operators (named after mathematician George Boole) have
precise meaning in Set Theory and Computer Science
Search is about returning a set of records that match a set of terms
AND - Intersection ( && )
OR - Union ( | | )
NOT - Exclusion ( ! )
In language - the meaning of “and” and “or” is contextual - sometimes they are
synonyms and sometimes they are antonyms!
- depends on the cardinality (single or multi-value) of an noun property or
attribute!
A Music Ontology
Song
Songwriter
Genre
Performer
Recording
Guitarist
Pianist
VocalistProducer
Record Label
Band
Album
Natural Language Processing - Lite
(Front-end NLP)
Precise Free-text searching of Structured Metadata
Query Autofilter can take Natural Language Queries and turn
them into structured Boolean Queries.
Now processes both noun and verb/adjective phrases
Verb phrase mapping enables better selection of field names
Beatles Songs written by George Harrison
Willie Dixon Songs covered by Led Zeppelin
==> Look Mom - No SQL!!
Natural Language Processing - Lite
Noun phrases that map to fieldName/fieldValue pairs
Bob Dylan Songs
composer:”Bob Dylan” OR performer:”Bob Dylan”
Verb phrase patterns that map to field names:
Songs written by Bob Dylan ==> composer:”Bob Dylan”
“Who’s in The Who?” ==> memberOfGroup:”The Who”
Songs Bob Dylan covered vs covers of Bob Dylan Songs
Short Demo of Query Autofiltering NLP-Lite
A Suggester for Query Autofiltering
Create multi-field suggestions using “Pivot Facet Patterns” that can be
processed by Query Autofiltering
Use facets - at index time - to extract suggestion meta phrases and
context.
Steps in building the Suggester
Processing - Denormalizing the Graph
Create searchable metadata from object links
Process graph relationships, apply business rules
Creating the Pivot Patterns
${name_s} ${recording_type}s ==> Bob Dylan Songs
${genre} ${recording_type}s ==> Progressive Rock Albums
${genre} ${musician_type}s ==> Rock Drummers, Heavy Metal Bands
Calculating which recordings are covers
Finding Related entities (i.e. John Lennon <=> Paul McCartney)
${members_ss} ${musician_type}s ==> Paul McCartney Bands
Repurposing Facets for relationship mining
Using pivot facets to generate multi-field phrases
Only get linguistically sensible permutations!
Facets provide Specification and Context
Traditionally used for visualization and navigation.
We can repurpose this to make a smarter suggester!
Security Trimming of suggestions
Only show suggestions that can return results given the current
user’s entitlements
A Suggester that learns what the
user is looking for
Suggester now brings back metadata with the suggestion
Front end can cache this metadata and use it to boost
subsequent typeahead queries based on what the user
selected.
Searching for Beatles Songs - “Baby’s In Black” and
“Baby You’re A Rich Man” are now boosted over all of
the other song titles that start with “Baby”
Thank you!
lucidworks.com
Ted Sullivan
Senior Solutions Architect

Building Smarter Search Applications Using Built-In Knowledge Graphs and Query Introspection: Presented by Ted Sullivan, Lucidworks

  • 2.
    Ted Sullivan Building SmarterSearch Applications Using Built-In Knowledge Graphs and Query Introspection lucidworks.com Senior Solutions Architect
  • 3.
    Ted Sullivan Building SmarterSearch Applications Using Built-In Knowledge Graphs (aka Solr!) and Query Introspection lucidworks.com Senior Solutions Architect
  • 4.
    Relevance - Precision- Recall Do we put the cart before the horse? Precision/Recall determine what matches and what doesn’t. Relevance then computes the “best” matches from what is left. Without focusing more on precision/recall first, we tend to have Garbage In/Garbage Out This is especially true in faceted search - relevance tuning can fix the first few pages but the facets cannot be fixed!!!
  • 5.
    Improving Precision starts… with better phrase detection Embarrassing “noise” hits are often due to phrase cross matches. Synonyms can improve recall tremendously but they need some help in Solr when they are multi-term Stop words can be important for disambiguation within phrases: “The Lady Is A Tramp” vs. “The Lady And The Tramp” “To Be Or Not To Be”
  • 6.
    Better Search: DetectingNoun Phrases The basic technique is called “autophrasing” – recognizing when more than one word represents just one thing. Autophrasing – uses an extra knowledge-base file “autophrases.txt” Query Autofiltering – uses the phrases that are stored as metadata values in the index. A Novel Approach to Natural Language Processing: Mapping Noun and Verb phrases to metadata fields “Who’s in The Who?”
  • 7.
    Multi-term Synonym Demo autophrases.txt newyork
 new york state
 empire state
 new york city
 new york new york
 big apple
 ny ny
 city of new york
 state of new york
 ny state synonyms.txt new_york => new_york_state, new_york_city, big_apple, new_york_new_york, ny_ny, nyc,empire_state,ny_state, state_of_new_york new_york_state,empire_state,ny_state, state_of_new_york new_york_city,big_apple,new_york_new_york,
 ny_ny,nyc, city_of_new_york
  • 8.
    Multi-term Synonym Demo Thisdocument is about new york state. This document is about new york city. There is a lot going on in NYC. I heart the big apple. The empire state is a great state. New York, New York is a hellova town. I am a native of the great state of New York. New York New York City New York State /select /autophrase
  • 9.
    Query Autofiltering Implementation UseLucene FieldCache to build a map of field values to field names (string fields) Add synonym mappings from synonyms.txt and stemming to this value(s) -> field(s) map Use this map to discover noun phrases in the query that correspond to field values in the index – longest contiguous phrase wins Build filter or boost queries based on these discovered mappings SOLR-7539 created 5/30/2015 - One comment so far “+1” - Bill Bell - Thanks Bill
  • 10.
    Query Autofiltering –Basic Behavior q = red socks -> fq=color:red&fq=product_type:socks or bq=(color:red AND product_type:socks)^20 q = red lion socks -> fq=brand:”Red Lion”&fq=product_type:socks q = scarlet chaise lounge -> color:red AND product_type:”Lounge Chair” q = white dress shirts -> color:white AND product_type:”dress shirt” q = white linen shirts -> ((brand:"White Linen" OR (color:white AND material:linen)) AND product_category:shirts) q = white and grey dress shirts ((product_type:"dress shirt" OR ((product_type:dress OR product_category:dress) AND (product_type:shirt OR product_category:shirt))) AND (color:(white OR grey) OR colors:(white AND grey)))
  • 11.
    Query Autofiltering –Language And Logic Logical or “Boolean” operators (named after mathematician George Boole) have precise meaning in Set Theory and Computer Science Search is about returning a set of records that match a set of terms AND - Intersection ( && ) OR - Union ( | | ) NOT - Exclusion ( ! ) In language - the meaning of “and” and “or” is contextual - sometimes they are synonyms and sometimes they are antonyms! - depends on the cardinality (single or multi-value) of an noun property or attribute!
  • 12.
  • 13.
    Natural Language Processing- Lite (Front-end NLP) Precise Free-text searching of Structured Metadata Query Autofilter can take Natural Language Queries and turn them into structured Boolean Queries. Now processes both noun and verb/adjective phrases Verb phrase mapping enables better selection of field names Beatles Songs written by George Harrison Willie Dixon Songs covered by Led Zeppelin ==> Look Mom - No SQL!!
  • 14.
    Natural Language Processing- Lite Noun phrases that map to fieldName/fieldValue pairs Bob Dylan Songs composer:”Bob Dylan” OR performer:”Bob Dylan” Verb phrase patterns that map to field names: Songs written by Bob Dylan ==> composer:”Bob Dylan” “Who’s in The Who?” ==> memberOfGroup:”The Who” Songs Bob Dylan covered vs covers of Bob Dylan Songs
  • 15.
    Short Demo ofQuery Autofiltering NLP-Lite
  • 16.
    A Suggester forQuery Autofiltering Create multi-field suggestions using “Pivot Facet Patterns” that can be processed by Query Autofiltering Use facets - at index time - to extract suggestion meta phrases and context.
  • 17.
    Steps in buildingthe Suggester Processing - Denormalizing the Graph Create searchable metadata from object links Process graph relationships, apply business rules Creating the Pivot Patterns ${name_s} ${recording_type}s ==> Bob Dylan Songs ${genre} ${recording_type}s ==> Progressive Rock Albums ${genre} ${musician_type}s ==> Rock Drummers, Heavy Metal Bands Calculating which recordings are covers Finding Related entities (i.e. John Lennon <=> Paul McCartney) ${members_ss} ${musician_type}s ==> Paul McCartney Bands
  • 18.
    Repurposing Facets forrelationship mining Using pivot facets to generate multi-field phrases Only get linguistically sensible permutations! Facets provide Specification and Context Traditionally used for visualization and navigation. We can repurpose this to make a smarter suggester! Security Trimming of suggestions Only show suggestions that can return results given the current user’s entitlements
  • 19.
    A Suggester thatlearns what the user is looking for Suggester now brings back metadata with the suggestion Front end can cache this metadata and use it to boost subsequent typeahead queries based on what the user selected. Searching for Beatles Songs - “Baby’s In Black” and “Baby You’re A Rich Man” are now boosted over all of the other song titles that start with “Baby”
  • 20.