SlideShare a Scribd company logo
Ted Talk
Ted Sullivan
(Well Before Back in the Day - 2018)
- Ted Sullivan, PhD
“(old Phuddy Duddy)”
“Senior (very much so I’m afraid)

Solutions (I hope)

Architect (and sometime plumber)”
- Ted Sullivan
When is my search app done?

“How do you get there grasshopper? Add semantic
intelligence to the engine!”
In his own words...
For the past 15 or so years now I have been building search applications, first with Verity K2 for a project
with a publishing company H.W. Wilson, then with most of the vendor products in the search space,
Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and
develop from an interesting little search engine to a major force in the search technology business. Before
that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was
working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in
Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still
developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's
book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I
am struggling up the AngularJS learning curve.
Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math
games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more
precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in
St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as
much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler).
Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in
1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that
I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll
settle for Solr Evangelist.
I'll settle for Solr Evangelist.
The Search Curmudgeon
• Learned
• Wise
• Pragmatic
• Caring
Random Rants from the
Search Curmudgeon
• https://lucidworks.com/2015/03/09/random-
rants-search-curmudgeon/
• Search vs. Information Access
Data Science for
Dummies
• https://lucidworks.com/2016/09/06/data-
science-for-dummies/
• "A conditional probability is like the probability
that you are a moron if you text while driving
(pretty high it turns out – and would be a good
source of Darwin awards except for the innocent
people that also suffer from this lunacy.)"
The Twilight of the Vengine Gods
(Die Göttervenginedämmerung) or
Die Hard with A Vengines!!!
•  https://lucidworks.com/2016/10/18/the-
twilight-of-the-vengine-gods-die-
gottervenginedammerung/
• "The Curmudgeon doesn’t dispense news, he just
tells you what information, new or old sucks or
what pisses him off and then rants about it. "
Where did all the
Librarians go?
• https://lucidworks.com/2017/11/21/where-did-
all-the-librarians-go/
• "You’ve probably gotten tired of me by now, that’s
OK because I’m tired of me too."
Search Legacy
• Blogs: as Search Curmudgeon and himself
• Lucidworks: heavy duty implementations
• Techniques: autophrasing and query autofiltering
• Presentations: Revolutions and inaugural Haystack
Automatic Phrase Tokenization:
Improving Lucene Search Precision
by More Precise Linguistic Analysis
• https://lucidworks.com/2014/07/02/automatic-
phrase-tokenization-improving-lucene-search-
precision-by-more-precise-linguistic-analysis/
• Takeaway: moving from bag of words towards bag
of things
Solution for Multi-term Synonyms in
Lucene/Solr Using the Auto
Phrasing TokenFilter
• https://lucidworks.com/2014/07/12/solution-for-
multi-term-synonyms-in-lucenesolr-using-the-auto-
phrasing-tokenfilter/
• LUCENE-2605 & Friends resolved over two years
later
• split on whitespace = false
The Well Tempered Search
Application – Prelude
• https://lucidworks.com/2015/01/27/well-tempered-search-application-
prelude/
• Semantic Search, linguistics, context
• Best Bets (landing pages / rules)
• Synonyms, stemming, lemmatization, taxonomy, ontology, machine learning
/ classification, NLP/AI
The Well Tempered Search
Application – Fugue
• https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/
• autophrasing
• "red sofa" problem
• Takeaway: ahead of its time (evolving into Solr Text Tagger and query
rewriting)
• "seed crystals of knowledge": SME tagging
Introducing Query
Autofiltering
• https://lucidworks.com/2015/02/17/introducing-query-autofiltering/
• "autotagging of the incoming query where the knowledge source is the
search index itself"
• we already have the information that we need to “do the right thing”
we just don’t use it
• "Another approach that was suggested by Erik Hatcher, is to have a
separate collection that is specialized as a knowledge store and query it to
get the categories with which to autofilter on the content collection."
• The key is that in both cases, we are using the search index itself as a
knowledge source that we can use for intelligent query introspection
and thus powerful inferential search!!
Thoughts on 

“Search vs. Discovery”
• https://lucidworks.com/2015/03/02/thoughts-search-
vs-discovery/
• "findability", facets, aboutness, relatedness
• "However if a document is not appropriately tagged, it
may become invisible..."; Data quality really matters here!
• Auto classification and manual subject matter expert
tagging
• Visualization, search driven analytics
Query Autofiltering Revisited
– Lets be more precise!!!
• https://lucidworks.com/2015/05/13/query-autofiltering-
revisited-can-precise/
• "blue red lion socks"
Query Autofiltering Extended –
On Language and Logic in Search
• https://lucidworks.com/2015/06/06/query-
autofiltering-extended-language-logic-search/
• If you've got metadata, use (autofilter) it. If you've
got known multi-word phrases, use them.
• Language usage understanding of AND vs. OR
Focusing on Search Quality at
Lucene/Solr Revolution 2015
• https://lucidworks.com/2015/10/19/focusing-on-
search-quality-at-lucenesolr-revolution-2015/
• "Again, the “knowledge base” ... can be the Solr/
Lucene index itself!"
• “On-The-Fly Predictive Analytics” – as we say in
the search quality biz – its ALL about context!
Query Autofiltering IV:
A Novel Approach to NLP
• https://lucidworks.com/2015/11/19/query-
autofiltering-chapter-4-a-novel-approach-to-
natural-language-processing/
• Verbs
• Bob Dylan cover tunes
• Query Introspection: inferring user intent
• POS mapped to query fields
Pivoting to the Query: Using Pivot
Facets to build a Multi-Field
Suggester
• https://lucidworks.com/2016/08/12/pivoting-to-the-
query-using-pivot-facets-to-build-a-multi-field-suggester/
• Pivot facets: "Think of it as a way of generating a facet
value “taxonomy” – on the fly."
• Facet Phrases
• Once we commit to building a special Solr collection (also
known as a ‘sidecar’ collection) just for typeahead, there
are other powerful search features that we now have to
work with. One of them is contextual metadata. [!!!]
Building a Subject Classifier using
Automatically Discovered Keyword
Clusters, Part I
• https://lucidworks.com/2017/02/28/building-a-
subject-classifier-using-automatically-discovered-
keyword-clusters-part-i/
• subject classifier that uses automatically discovered
key term “clusters” that can then be used to classify
documents
• autophrasing + /terms....
• blah blah relatedness(...) blah blah
Why Facets are Even More
Fascinating than you Might Have
Thought
• https://lucidworks.com/2017/09/22/why-facets-are-even-more-
fascinating-than-you-might-have-thought/
• Context matters!
• Spatial metaphor: N-Dimensional hyperspace
• "Paul McCartney" => "John Lennon"
• contextual usage of first result to boost second
• Facets and UI
• This is “surfin’ the meta-informational universe” that is your Solr collection.
• The Facet Theorem
When Worlds Collide – Artificial
Intelligence Meets Search
• https://lucidworks.com/2018/04/30/when-worlds-collide-artificial-
intelligence-meets-search/
• The Search Loop: questions, answers, then more questions
• Inferring User Intent: NLP, POS, head-tail analysis, directed pattern-
based
• Information Spaces: conceptually near
• Knowledge Spaces and Semantic Reference Frames
• Word Embedded Vectors
• Knowledge Graphs: taxonomies and ontologies
-Ted says
“Sh*t...”
“the Curmudgeon doesn’t dispense
news, he just tells you what
information, new or old sucks or
what pisses him off and then rants
about it. ”
“You may be thinking – "Who’s this
Search Curmudgeon guy? He’s a real
jerk". No argument there.”
“hey IT guys – Buy More Memory for
chrissake! Thanks to Moore’s Law it’s
pretty cheap now so don’t be such a
tight-ass”
“And the role of DBA will likely be
staffed by curmudgeons like me – so
be nice to them – they can save your
ass. We’ve seen our share of techno
cliff jumpers – it doesn’t end well.”
“what we old guys know is that some
of the hot things that you whiz kids
are doing now were done before, i.e.,
`back in the day`. ”
“You are not as smart as you think
you are kiddies – dual quad core, 3
GHz CPUs and 512 GB of RAM can
hide lots of coding sins. ”
“When I was your age sonny, we had
to walk three miles through snow to
submit our box of punch cards … talk
about crappy BAUD rates!)”
“....because in my opinion (notice that
I didn’t say ‘humble’ because that is
one thing that the Curmudgeon is
definitely NOT)...”
“I’m a humanist believe it or not – I
like humans even if they don’t like
me sometimes – I EARNED my
nickname of ‘curmudgeon’ you
know.”
“proper care and feeding of these
"analysis chains" can make you
some serious money – especially you
eCommerce guys”
“You’ve probably gotten tired of me
by now, that’s OK because I’m tired
of me too. Believe me, you don’t have
to live with me – I do.”
Ted on...
• IDOL: "should really be spelled IDLE"
• Fast vs. Solr: "One is named Fast, the other actually is fast"
• Endeca: "what took several hours in Endeca indexed in
about 10 minutes in Solr"
• elidedsearch: "The name of the company is like the material
that is used to hold up my Jockey Shorts (hint, hint)", Fruit-
of-the-Loom Finders, Tightie Whitie Quest, RubberBand
Finders, Brain Splitters, BungeeSeek
-Search Curmudgeon
Big Data: 

“50 foot tall Brent Spiner”
Ted's Big Adventure
• Semantics: bag of things, not bag of words
• synonyms, autophrasing, lemmatization
• "in text search – semantics matter"
• Linguistics: noun phrases, POS, NLP
• Facets
• autofiltering
• The Facet Theorem
• Relatedness
• Knowledge Space, Semantic Reference Frames
• Context matters
The Facet Theorem
• Lemma 1: Similar things tend to occur in similar
contexts
• Lemma 2: Facets are a tool for exploring meta-
informational contexts
•it therefore follows that:
• Theorem: Facets can be used to find similar things.
PubTed
• https://github.com/lucidworks/
• auto-phrase-tokenfilter
• query-autofiltering-component (also SOLR-7539)
• https://github.com/detnavillus/
• multifield_suggester_code
Ted Talk
Ted Talk
Ted Talk
Ted Talk
Ted Talk
Ted Talk

More Related Content

What's hot

Future of semantic apps
Future of semantic appsFuture of semantic apps
Future of semantic apps
Anthony (Tony) Sarris
 
Internet101 Presentation
Internet101 PresentationInternet101 Presentation
Internet101 Presentation
macfam6
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
OSCON Byrum
 
Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)
Uche Ogbuji
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
Dr. Starr Hoffman
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
Krishna Sankar
 
2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web
GIS in the Rockies
 

What's hot (8)

Future of semantic apps
Future of semantic appsFuture of semantic apps
Future of semantic apps
 
Internet101 Presentation
Internet101 PresentationInternet101 Presentation
Internet101 Presentation
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
 
Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
 
2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web
 
Basics of Web Research for ELA 10
Basics of Web Research for ELA 10Basics of Web Research for ELA 10
Basics of Web Research for ELA 10
 

Similar to Ted Talk

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
Roi Blanco
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanities
Dorothea Salo
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
COSC 111 Research Fall 2012
COSC 111 Research Fall 2012COSC 111 Research Fall 2012
COSC 111 Research Fall 2012Laksamee Putnam
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
Krishna Sankar
 
Looking into the future with web media analytics marshall sponder - montreal...
Looking into the future with web media analytics  marshall sponder - montreal...Looking into the future with web media analytics  marshall sponder - montreal...
Looking into the future with web media analytics marshall sponder - montreal...
Marshall Sponder
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
Ayon Sinha
 
Can you Cope
Can you CopeCan you Cope
Can you Cope
Derek Moore
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
Dorothea Salo
 
Taming Text
Taming TextTaming Text
Taming Text
Grant Ingersoll
 
Connecting the Dots
Connecting the DotsConnecting the Dots
Connecting the Dots
InnoTech
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based Research
David Nzoputa Ofili
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
Barbara Starr
 
TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2Laksamee Putnam
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
Lorri Mon
 
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
Branded3
 
Google Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEOGoogle Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEO
Kristine Schachinger SEO and Online Marketing
 
Lesson 2 network and the internet
Lesson 2 network and the internetLesson 2 network and the internet
Lesson 2 network and the internet
Maria Theresa
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Roi Blanco
 

Similar to Ted Talk (20)

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanities
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
COSC 111 Research Fall 2012
COSC 111 Research Fall 2012COSC 111 Research Fall 2012
COSC 111 Research Fall 2012
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
 
Looking into the future with web media analytics marshall sponder - montreal...
Looking into the future with web media analytics  marshall sponder - montreal...Looking into the future with web media analytics  marshall sponder - montreal...
Looking into the future with web media analytics marshall sponder - montreal...
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
 
Can you Cope
Can you CopeCan you Cope
Can you Cope
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
Taming Text
Taming TextTaming Text
Taming Text
 
Connecting the Dots
Connecting the DotsConnecting the Dots
Connecting the Dots
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based Research
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
 
Google Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEOGoogle Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEO
 
Lesson 2 network and the internet
Lesson 2 network and the internetLesson 2 network and the internet
Lesson 2 network and the internet
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 

More from Erik Hatcher

Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
Erik Hatcher
 
it's just search
it's just searchit's just search
it's just search
Erik Hatcher
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
Erik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
Erik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
Erik Hatcher
 
Solr 4
Solr 4Solr 4
Solr 4
Erik Hatcher
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 

More from Erik Hatcher (20)

Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Ted Talk

  • 2. Ted Sullivan (Well Before Back in the Day - 2018)
  • 3. - Ted Sullivan, PhD “(old Phuddy Duddy)” “Senior (very much so I’m afraid)
 Solutions (I hope)
 Architect (and sometime plumber)”
  • 4. - Ted Sullivan When is my search app done?
 “How do you get there grasshopper? Add semantic intelligence to the engine!”
  • 5. In his own words... For the past 15 or so years now I have been building search applications, first with Verity K2 for a project with a publishing company H.W. Wilson, then with most of the vendor products in the search space, Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and develop from an interesting little search engine to a major force in the search technology business. Before that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I am struggling up the AngularJS learning curve. Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler). Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in 1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll settle for Solr Evangelist. I'll settle for Solr Evangelist.
  • 6. The Search Curmudgeon • Learned • Wise • Pragmatic • Caring
  • 7. Random Rants from the Search Curmudgeon • https://lucidworks.com/2015/03/09/random- rants-search-curmudgeon/ • Search vs. Information Access
  • 8. Data Science for Dummies • https://lucidworks.com/2016/09/06/data- science-for-dummies/ • "A conditional probability is like the probability that you are a moron if you text while driving (pretty high it turns out – and would be a good source of Darwin awards except for the innocent people that also suffer from this lunacy.)"
  • 9. The Twilight of the Vengine Gods (Die Göttervenginedämmerung) or Die Hard with A Vengines!!! •  https://lucidworks.com/2016/10/18/the- twilight-of-the-vengine-gods-die- gottervenginedammerung/ • "The Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. "
  • 10. Where did all the Librarians go? • https://lucidworks.com/2017/11/21/where-did- all-the-librarians-go/ • "You’ve probably gotten tired of me by now, that’s OK because I’m tired of me too."
  • 11. Search Legacy • Blogs: as Search Curmudgeon and himself • Lucidworks: heavy duty implementations • Techniques: autophrasing and query autofiltering • Presentations: Revolutions and inaugural Haystack
  • 12. Automatic Phrase Tokenization: Improving Lucene Search Precision by More Precise Linguistic Analysis • https://lucidworks.com/2014/07/02/automatic- phrase-tokenization-improving-lucene-search- precision-by-more-precise-linguistic-analysis/ • Takeaway: moving from bag of words towards bag of things
  • 13. Solution for Multi-term Synonyms in Lucene/Solr Using the Auto Phrasing TokenFilter • https://lucidworks.com/2014/07/12/solution-for- multi-term-synonyms-in-lucenesolr-using-the-auto- phrasing-tokenfilter/ • LUCENE-2605 & Friends resolved over two years later • split on whitespace = false
  • 14. The Well Tempered Search Application – Prelude • https://lucidworks.com/2015/01/27/well-tempered-search-application- prelude/ • Semantic Search, linguistics, context • Best Bets (landing pages / rules) • Synonyms, stemming, lemmatization, taxonomy, ontology, machine learning / classification, NLP/AI
  • 15. The Well Tempered Search Application – Fugue • https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/ • autophrasing • "red sofa" problem • Takeaway: ahead of its time (evolving into Solr Text Tagger and query rewriting) • "seed crystals of knowledge": SME tagging
  • 16. Introducing Query Autofiltering • https://lucidworks.com/2015/02/17/introducing-query-autofiltering/ • "autotagging of the incoming query where the knowledge source is the search index itself" • we already have the information that we need to “do the right thing” we just don’t use it • "Another approach that was suggested by Erik Hatcher, is to have a separate collection that is specialized as a knowledge store and query it to get the categories with which to autofilter on the content collection." • The key is that in both cases, we are using the search index itself as a knowledge source that we can use for intelligent query introspection and thus powerful inferential search!!
  • 17. Thoughts on 
 “Search vs. Discovery” • https://lucidworks.com/2015/03/02/thoughts-search- vs-discovery/ • "findability", facets, aboutness, relatedness • "However if a document is not appropriately tagged, it may become invisible..."; Data quality really matters here! • Auto classification and manual subject matter expert tagging • Visualization, search driven analytics
  • 18. Query Autofiltering Revisited – Lets be more precise!!! • https://lucidworks.com/2015/05/13/query-autofiltering- revisited-can-precise/ • "blue red lion socks"
  • 19. Query Autofiltering Extended – On Language and Logic in Search • https://lucidworks.com/2015/06/06/query- autofiltering-extended-language-logic-search/ • If you've got metadata, use (autofilter) it. If you've got known multi-word phrases, use them. • Language usage understanding of AND vs. OR
  • 20. Focusing on Search Quality at Lucene/Solr Revolution 2015 • https://lucidworks.com/2015/10/19/focusing-on- search-quality-at-lucenesolr-revolution-2015/ • "Again, the “knowledge base” ... can be the Solr/ Lucene index itself!" • “On-The-Fly Predictive Analytics” – as we say in the search quality biz – its ALL about context!
  • 21. Query Autofiltering IV: A Novel Approach to NLP • https://lucidworks.com/2015/11/19/query- autofiltering-chapter-4-a-novel-approach-to- natural-language-processing/ • Verbs • Bob Dylan cover tunes • Query Introspection: inferring user intent • POS mapped to query fields
  • 22. Pivoting to the Query: Using Pivot Facets to build a Multi-Field Suggester • https://lucidworks.com/2016/08/12/pivoting-to-the- query-using-pivot-facets-to-build-a-multi-field-suggester/ • Pivot facets: "Think of it as a way of generating a facet value “taxonomy” – on the fly." • Facet Phrases • Once we commit to building a special Solr collection (also known as a ‘sidecar’ collection) just for typeahead, there are other powerful search features that we now have to work with. One of them is contextual metadata. [!!!]
  • 23. Building a Subject Classifier using Automatically Discovered Keyword Clusters, Part I • https://lucidworks.com/2017/02/28/building-a- subject-classifier-using-automatically-discovered- keyword-clusters-part-i/ • subject classifier that uses automatically discovered key term “clusters” that can then be used to classify documents • autophrasing + /terms.... • blah blah relatedness(...) blah blah
  • 24. Why Facets are Even More Fascinating than you Might Have Thought • https://lucidworks.com/2017/09/22/why-facets-are-even-more- fascinating-than-you-might-have-thought/ • Context matters! • Spatial metaphor: N-Dimensional hyperspace • "Paul McCartney" => "John Lennon" • contextual usage of first result to boost second • Facets and UI • This is “surfin’ the meta-informational universe” that is your Solr collection. • The Facet Theorem
  • 25. When Worlds Collide – Artificial Intelligence Meets Search • https://lucidworks.com/2018/04/30/when-worlds-collide-artificial- intelligence-meets-search/ • The Search Loop: questions, answers, then more questions • Inferring User Intent: NLP, POS, head-tail analysis, directed pattern- based • Information Spaces: conceptually near • Knowledge Spaces and Semantic Reference Frames • Word Embedded Vectors • Knowledge Graphs: taxonomies and ontologies
  • 27. “the Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. ”
  • 28. “You may be thinking – "Who’s this Search Curmudgeon guy? He’s a real jerk". No argument there.”
  • 29. “hey IT guys – Buy More Memory for chrissake! Thanks to Moore’s Law it’s pretty cheap now so don’t be such a tight-ass”
  • 30. “And the role of DBA will likely be staffed by curmudgeons like me – so be nice to them – they can save your ass. We’ve seen our share of techno cliff jumpers – it doesn’t end well.”
  • 31. “what we old guys know is that some of the hot things that you whiz kids are doing now were done before, i.e., `back in the day`. ”
  • 32. “You are not as smart as you think you are kiddies – dual quad core, 3 GHz CPUs and 512 GB of RAM can hide lots of coding sins. ”
  • 33. “When I was your age sonny, we had to walk three miles through snow to submit our box of punch cards … talk about crappy BAUD rates!)”
  • 34. “....because in my opinion (notice that I didn’t say ‘humble’ because that is one thing that the Curmudgeon is definitely NOT)...”
  • 35. “I’m a humanist believe it or not – I like humans even if they don’t like me sometimes – I EARNED my nickname of ‘curmudgeon’ you know.”
  • 36. “proper care and feeding of these "analysis chains" can make you some serious money – especially you eCommerce guys”
  • 37. “You’ve probably gotten tired of me by now, that’s OK because I’m tired of me too. Believe me, you don’t have to live with me – I do.”
  • 38. Ted on... • IDOL: "should really be spelled IDLE" • Fast vs. Solr: "One is named Fast, the other actually is fast" • Endeca: "what took several hours in Endeca indexed in about 10 minutes in Solr" • elidedsearch: "The name of the company is like the material that is used to hold up my Jockey Shorts (hint, hint)", Fruit- of-the-Loom Finders, Tightie Whitie Quest, RubberBand Finders, Brain Splitters, BungeeSeek
  • 39. -Search Curmudgeon Big Data: 
 “50 foot tall Brent Spiner”
  • 40. Ted's Big Adventure • Semantics: bag of things, not bag of words • synonyms, autophrasing, lemmatization • "in text search – semantics matter" • Linguistics: noun phrases, POS, NLP • Facets • autofiltering • The Facet Theorem • Relatedness • Knowledge Space, Semantic Reference Frames • Context matters
  • 41. The Facet Theorem • Lemma 1: Similar things tend to occur in similar contexts • Lemma 2: Facets are a tool for exploring meta- informational contexts •it therefore follows that: • Theorem: Facets can be used to find similar things.
  • 42. PubTed • https://github.com/lucidworks/ • auto-phrase-tokenfilter • query-autofiltering-component (also SOLR-7539) • https://github.com/detnavillus/ • multifield_suggester_code