Webinar: Natural Language Search with Solr

Ted Sullivan
Natural Language Search
with Solr
lucidworks.com
Senior Solutions Architect

The take-home word for this talk is:
CONTEXT

What I will talk about …
Why does context matter?
Phrase and contextual ambiguities in search
• Recent advances in Query Autofiltering that attack the context
problem by adding “verb/preposition” disambiguation *
Traditional ways of visualizing context in search - forging search “loops”
• Facets
• Typeahead
https://lucidworks.com/blog/2015/11/19/query-autofiltering-chapter-4-a-novel-approach-to-natural-language-processing/*

Adding metadata context to Suggestions using Facets
Using Pivot Facets to create semantically rich suggestions
Facets to bring user-centric context to suggestions
• Entitlements: Security trimming of suggestions
• User session context: Dynamic On-The-Fly Predictive Analytics!
What I will talk about …

Why Does Context Matter?
Relevance is contextual - relevant to whom under what circumstances?
Language / User Intent / Social and business factors
Ambiguities in search are often due to an failure/inability to detect context.
So, what can we do about this - or is this talk just some obvious hand-waving
BS that we’ve heard a thousand times? Hopefully, not.
But that said - maybe just a little theory first …

Contextual Relationships
Semantic Context - Language, Lexicon
User Context - Intent, Agendas,Permissions, Demographics, Location
Social Context - Popularity, Common Behaviors => Recommendations
Business Context - Rules, Organization, Domain, Security
Context == Relationships
Within and between metadata “objects”
Search is an exchange of one metadata object - the query - for others -
the results.

Things are related to other Things
Relationships provide context
• Static or known Relationships - defined by a knowledge graph
such as an Ontology
• Discovered Relationships - computed by data mining
Knowledge Graphs - connected-ness
Usage Logs (query logs, other captured events or signals) -
behavioral contexts
Clustering - unsupervised learning algorithms
Natural Language Processing - semantic contexts - noun phrases -
statements
Machine Learning - supervised learning => Feature extraction

Apple
Tim Cook
Times Square
Granny Smith
White Album
iPhone Macintosh Computer Tablet Steve Jobs Lisa iTunes
Broadway Wall Street Empire State Building Bronx Zoo
Pie Fritters Season Sauce Cider Picking Tree McIntosh
Records Beatles George Martin Capitol White Album
Feature Sets

Resolving Ambiguities
Phrase or syntactic ambiguities - detecting nouns
Autophrasing - unstructured data
Query Autofiltering - structured data
Contextual or semantic ambiguities (subject-verb-object) - detecting intent
Traditional NLP - POS detection, Machine Learning
Query Autofiltering with verb/preposition disambiguation

Song
Songwriter
Genre
Performer
Recording
Guitarist
Pianist
VocalistProducer
Record Label
Band
Album
Enough abstractions - give me some examples!
Music Ontology

Discovery and Focus
Enough abstractions - give me some examples!
Medical Ontology
Disease
Condition Symptom
Drug
Treatment

Query Autofiltering
“songs Eric Clapton wrote” vs. “songs Eric Clapton performed”
Without Verb support get:
(performer_ss:”Eric Clapton” OR composer_ss:”Eric Clapton”) AND composition_type:Song
For either.
With Verb support
Now we get:
songs Eric Clapton wrote => composer_ss:”Eric Clapton” AND composition_type:Song
songs Eric Clapton performed => performer_ss:”Eric Claptpn” AND composition_type:Song
Verb/Preposition context rules
written,wrote,composed =>composer_ss
performed,played,sang,recorded:performer_ss

Query Autofiltering
“Bands that Eric Clapton was in”
No context rules (raw autofiltering):
((name_s:Band OR musician_type_ss:Band) AND (name_s:"Eric Clapton" OR
original_performer_s:"Eric Clapton" OR composer_ss:"Eric Clapton" OR
performer_ss:"Eric Clapton" OR groupMembers_ss:"Eric Clapton”))
Add context rule
members,member,was in,is in,who's in,who's in the,is in the,was in the =>
memberOfGroup_ss,groupMembers_ss
((name_s:Band OR musician_type_ss:Band) AND groupMembers_ss:"Eric Clapton")

Query Autofiltering Verb/Preposition context rules
Who’s in The Who
raw autofiltering
((name_s:"The Who" OR original_performer_s:"The Who" OR
performer_ss:"The Who" OR memberOfGroup_ss:"The Who”))

Query Autofiltering Verb/Preposition context rules
Who’s in The Who
raw autofiltering
((name_s:"The Who" OR original_performer_s:"The Who" OR performer_ss:
"The Who" OR memberOfGroup_ss:"The Who”))
with context rule
members,member,was in,is in,who's in,who's in the,is in the,was in the =>
memberOfGroup_ss,groupMembers_ss
query is now:
(memberOfGroup_ss:"The Who")

Query Autofiltering
Drugs that treat abdominal pain
treatment_type:Drug AND has_indication:”abdominal pain”
Drugs that cause abdominal pain
treatment_type:Drug AND has_side_effect:”abdominal pain”
vs.
treatment_type:Drug AND (has_indication:”abdominal pain” OR
has_side_effect:”abdominal pain”)
treat,for,indicated => has_indication
cause,produce => has_side_effect

Query Autofiltering
Beatles Songs covered vs Songs Beatles covered
covers by other artists of songs written by the Beatles
vs covers by Beatles of songs by other songwriters
Robert Johnson Songs that Eric Clapton covered
works the same as:
Eric Clapton covers of Robert Johnson Songs
Insomnia Drugs - are just indicated drugs
Noun-Noun Phrases
Robert Johnson Songs
Beatles Songs
Robert Johnson Songs
Insomnia Drugs
covered,covers:performer_ss | version_s:Cover |
original_performer_s:_ENTITY_,recording_type_ss:Song=>original_performer_s:_ENTITY_

Facets provide Context
Visualization and the search “conversation”: Discovery and Focus
• Post-query visualization- facet display - aggregated attributes of found things
• Pre-query visualization - query suggestion or typeahead - can use facets too
(stay tuned).
• The Good, The Bad and The Ugly aspects of Facets
New and Improved: Statistics, Analytics and APIs - Oh My!
• Dashboards and Dynamic Business Intelligence
• Heatmap Faceting
• Pivot Facets and Ad-Hoc Object Hierarchies - now with stats!
•JSON Facet API

How can we use facets to improve typeahead?
Put more precision and more context into a suggester.
=> Using metadata - guide the user to more precise queries
that we can be really GOOD at!
To do this, we can build a specialized suggester collection - then
we can use facet contexts to build semantic and behavioral
relationships within and between searches.
* Shameless Monty Python’s Flying Circus reference
And now for something completely
different! *

Suggester Buildware
Query Collectors or Fetchers
Gather sets of query suggestions - Interface with multiple
implementations possible
Suggester Builder
• Validates suggestions
• Adds context to suggestions using faceting
• Submits suggestion and metadata to Solr Index
Query Logs
Terms Component
Curated Lists
Pivot Facet CollectorPivot Facet Collector
Databases - SQL or Not

Pivot Facet Query Collector
Uses “Field Pattern Templates” to generate semantically rich suggestions
Structured data - metadata fields contain object attributes
Can combine these attributes into phrases - semantically (or not)
Machine doesn’t know semantics.
Example
Bob Jobs Accountant Cincinnati Ohio
makes sense
Ohio Accountant Jones Cincinnati Bob
doesn’t
first_name last_name occupation city state

Pivot Facet Query Collector
${musician_type} ${recording_type}s
${genre} ${musician_type}s
${performer} ${recording_type}s
Rolling Stones Albums
New Wave Songs
Classical Pianists
If we create Pivot Template Patterns like this:
${original_performer} ${recording_type}s covered by ${performer} (plus context)
Beatles Songs covered by Joe Cocker
We get suggestions like this:
${name}
Stuck Inside of Mobile With The Memphis Blues Again

Suggester Builder - validate and contextualize
• Validate - make sure that the query works
• Contextualize - use facets to acquire “aboutness” stuff
Tests the query against the content collection
“Stuck Inside of Mobile With The Memphis Blues Again”
composition_type_ss: [
"Song"
]
composer_ss: [
"Bob Dylan"
]
genre_ss: [
"Blues Rock"
"Folk Rock"
]

Use Cases - User Context sensitive typeahead
User Permissions: Security Trimming of Suggestions
Faceting on ACL lists of content collection - copy set of ACL values for
suggestion result set to suggester collection
=> Don’t suggest queries that return 0 results for a given user
User Behavior: Dynamic On-The-Fly Predictive Analytics
Cache context facets returned by Suggester - use as boost queries for
subsequent queries in a user session
=> System learns “what” user is looking for

Data Quality - Text - Metadata
Data design and curation - solve garbage in - garbage out at the
source.
More fields with more precise values - combine for
expressiveness
The Ole Structured vs Unstructured bugga-boo
Use Machine Learning / Knowledge Base Classification to add
metadata

“MODEL”(
Machine(Learning(
Subject(Ma6er(Experts(
Model Building
Training'Set'–'“Seed'Crystal”'Subject'Ma8er'Experts'
Machine'Learning'
Model'
QUERY& DOCUMENTS&
Yes$
No$
Feature&Sets&
Model: Mapping of Text => Feature Sets
Detecting and Consuming Context

(more)'Structured'Document'
Collec1on'
Query'Autoﬁltering'
Query'
Solr'/'Lucene''
Result'Set'
Query Autofiltering can be used as a
“normalization” layer for classification
Document)Classiﬁca0on)Stages)
(Manual,ML,Ontology,Hybrid))
Metadata)Enrichment)
(more))Structured)Document)
Collec0on)–)The)Model!)
=> Can Think of the Solr/Lucene Index itself as the “Model”

Thank you!
lucidworks.com
Ted Sullivan
Senior Solutions Architect

Webinar: Natural Language Search with Solr

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Webinar: Natural Language Search with Solr

Similar to Webinar: Natural Language Search with Solr (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Webinar: Natural Language Search with Solr