SlideShare a Scribd company logo
1 of 50
Download to read offline
Retrieving
Information from Solr
JOSA Data Science Bootcamp
● Head of Technology @
OpenSooq.com
● Technical Reviewer for
“Scaling Apache Solr” and
“Apache Solr Search Patterns”
(Books)
● Contributor in Apache Solr
● Built 10 search engines in the
last 2 years
Ramzi Alqrainy
Topics to be covered
● Exploring Solr’s Query Form
● Basic Queries and Parameters
● Matching Multiple Terms
● Fuzzy Matching
● Range Searches
● Sorting
● Pseudo Fields
● Geospatial Searches
● Filter Queries
● Faceting and Stats
● Tuning Relevance
Detailed
Architectural
Diagram
Basic Queries and Parameters
Exploring Solr’s
Query Form
Basic Queries and Parameters
Matching Multiple Terms
Boolean Queries
● Search for two different terms, new and house,requiring both to
match
● Search for two different terms, new and house, requiring only one
to match
● Default operator is OR, can be changed using the q.op query
parameter.
Negation
● Exclude documents containing specific terms
Inverted Index—Revisited
● All terms in the index map to 1 or more documents.
● Terms in inverted index are stored in ascending
lexicographical order
● When searching for multiple terms/ expressions, Solr (and
Lucene) returns multiple document result sets corresponding
to the various terms in the query and then does the specified
binary operations on these result sets in order to generate the
final result set.
● Scoring is performed on the result set o generate final result
Grouped Expressions
● Represent arbitrarily complex queries
Exact Phrase Queries
● Search for exact phrase “new house”
● Can Combine with Boolean Queries
Proximity Searches
● Represent arbitrarily complex queries
● Solr/Lucene not only stores the documents that contain the
terms, but also their positions within a document (term
positions), which is used to provide phrase and proximity
search functionality
● The number of the “~” is called a slop factor and has a hard
limit of 2, above which the number of permutations get too
large to provide results within a reasonable time
Fuzzy Matching
Fuzzy Edit-Distance Searching
● Flexibility to handle misspellings and different spellings of a
word
● Character variations based on Damerau-Levenshtein
distances
● Accounts for 80% of human misspellings
Wildcard Matching
● Robust functionality, but can be expensive if not properly
used.
○ First all terms that match parts of the term before wildcard expression are
extracted
○ Then all those terms are inspected to see if they match the entire wildcard
expression
○ Expensive if your expression matches a large number of terms (for
example the query e*)
Range Searches
Query on a Range
● Solr Date Time uses a format that is a restricted form of the canonical
representation of dateTime in the XML Schema specification (inspired by ISO
8601). All times are assumed to be UTC (no timezone specification)
● Based on a lexicographically sorted order for the field being queried
● Solr has Trie field types (tint, tdate, etc.) that should be used when you are
doing a large number of range queries
● Various field types will be covered later in the course
Solr Date Syntax
● Uses UTC and Restricted DateTime format
● Allows rounding down by YEAR, MONTH, WEEK, DAY,
MINUTE, SECOND
● NOW represents current time and using DateMath, we can
specify yesterday, tomorrow, last year, etc.
Sorting
Sorting
● Sort by score
● Values of Fields
● Ascending or Descending
● Multiple Fields
Pseudo Fields
● Dynamically added at query time and calculated from fields in
the schema using in- built functions
● Through functions, you can manipulate the values of any field
before it is returned
● Can also be used to modify the order of documents by sorting
on the pseudo field
Geospatial Searches
Geospatial Searches
● Solr provides location-based search
● Define a “location” field that contains latitude and longitude
● You can use a Query parser called “geofilt” to search on this
field, specifying the point and radius around it
● Another query parser bbox uses a square around the point to
do faster but approximate calculations
● Other types of searches (grids, polygons, etc. are possible
and covered in advanced course
Returning Calculated Distances
● You can use a pseudo field (a field that is calculated at query
time) to achieve this
Filter Queries
The fq and q Parameters
● Indistinguishable at first glance: same query parameters passed to either
parameters will return same documents.
● But,
○ fq serves a single purpose, to limit what is returned
○ q limits what is returned AND supplies the relevancy algorithm with a set
of terms used for scoring
● fq results are cached and can be reused between searches
● Using fq we can avoid unnecessary relevancy calculations
● You can use multiple fq’s in a request (each individually cached), but only one q
parameter
Faceting and Stats
Faceted Search
● High-level breakdown of search
results based on one or more
aspects (facets) of their
documents
● Allows users to filter by (drill
down into) specific components
● Can facet on values of fields, or
facet by queries
Types of Facet
● Field Facets
● Range Facets
● Pivot Facets
Field Faceting
● Request back the unique values
found in a particular field
● Most commonly used
● Works for single- and multi-valued
fields
● Values are based on the indexed
values of the field
● Common practice is to facet on a
String field and search on a text field
(to be discussed later). So, some
schema preparation is required for
faceting
Range Facet
● Divide a range into equal size buckets
Range Faceting
Date Range Facets
● Recall Solr Date Syntax covered earlier in class
● Uses UTC and Restricted DateTime format
● Allows rounding down by YEAR, MONTH, WEEK, DAY,
MINUTE, SECOND
● NOW represents current time and using DateMath, we can
specify yesterday, tomorrow, last year, etc.
Stats and Facets
● Can get aggregations on various fields
● From Solr 5.x onwards, stats on pivot facets is also available
● See https://lucidworks.com/blog/you-got-stats-in-my-facets/ for
a great explanation of faceting
Pivot Facets
● Functions like pivot tables in spreadsheet apps
● Aggregate calculations that pivot on values from multiple fields
● Example: give me a count of 3,4 and 5 star hotels in the top
three cities
● Solr 5.x also allows you to stats calculations on pivots
Facet by Query
● Sometimes, you need unequal ranges
● You can use the facet.query parameter
● Provides counts for subqueries
Tuning Relevance
Precision and recall
Precision and recall
Are the top results we show to users relevant?
Recall
Of the full set of documents found, have we found all of the
relevant content in the index?
Relevancy
Our goal is to give users relevant results Relevance is a soft or fuzzy thing
● Depends upon the judgment of users
Scoring is our attempt to predict relevance
Similarity classes hold the implementations
● DefaultSimilarity ( TF-IDF )
● BM25Similarity
● DFRSimilarity
● IBSimilarity
● LMDirichletSimilarity
● LMJelinekMercerSimilarity
Lucene Scoring
Similarity scoring formula
• Used to rank results by measuring the similarity between a query
and the documents that match the query
Domain knowledge
Examples
● Cheaper
● Newer or more recent
● More popular or higher user clicks Higher average user ratings
Interesting combinations
● Value = average user ratings ÷ price
● Staying power = recent popularity ÷ age
Boosting and biasing
Lucene uses a standardized scoring approach
Lucene does not know:
● Your data
● Your users
● Their queries Their preferences
Domain knowledge
What do you know about your data?
● Any specific rules about your data that wouldn't be suitable in
a generic IR scoring algorithm
● In many data domains, there are fundamental numeric
properties that make some objects generally "better" than
others
Domain knowledge
More subtle examples
● Novelty factor
○ Quantity of user ratings × stdDev of ratings Profit margin
● Profit margin
○ Retail price ‒ factory cost Scarcity
● Scarcity
○ Quantity remaining
● Popularity by association or categorization
○ Sweaters sell better then swimsuits in November
● Manual ranking
○ New York Times bestseller list
Request parameters
We are going to make substantial use of request parameters, so
let's recap:
How can you improve search results?
Using a sledge hammer
● Ignore score, sort on X
● Filter by X, retry if 0 results
How can you improve search results?
● Boost functions and queries
● Apply domain knowledge based on numeric properties by
multiplying functions directly into the score
Retrieving
Information from Solr
JOSA Data Science Bootcamp

More Related Content

What's hot

Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 

What's hot (20)

Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 

Viewers also liked

Евгений Ильин. Drupal + Solr: Яндекс.Маркет своими руками
Евгений Ильин. Drupal + Solr: Яндекс.Маркет своими рукамиЕвгений Ильин. Drupal + Solr: Яндекс.Маркет своими руками
Евгений Ильин. Drupal + Solr: Яндекс.Маркет своими рукамиDrupalSib
 
Интеграция ЭБС с АБИС вузов. Работа с записями RusMARC
Интеграция ЭБС с АБИС вузов. Работа с записями RusMARCИнтеграция ЭБС с АБИС вузов. Работа с записями RusMARC
Интеграция ЭБС с АБИС вузов. Работа с записями RusMARCPavel Kallinikov
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondLucidworks
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoLucidworks
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache SolrEmpathyBroker
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaDropsolid
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrLucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...Lucidworks
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesLucidworks
 

Viewers also liked (20)

Евгений Ильин. Drupal + Solr: Яндекс.Маркет своими руками
Евгений Ильин. Drupal + Solr: Яндекс.Маркет своими рукамиЕвгений Ильин. Drupal + Solr: Яндекс.Маркет своими руками
Евгений Ильин. Drupal + Solr: Яндекс.Маркет своими руками
 
Интеграция ЭБС с АБИС вузов. Работа с записями RusMARC
Интеграция ЭБС с АБИС вузов. Работа с записями RusMARCИнтеграция ЭБС с АБИС вузов. Работа с записями RusMARC
Интеграция ЭБС с АБИС вузов. Работа с записями RusMARC
 
Solr5
Solr5Solr5
Solr5
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and Beyond
 
Apache solr
Apache solrApache solr
Apache solr
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache Solr
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with Solr
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Seminario Apache Solr
Seminario Apache SolrSeminario Apache Solr
Seminario Apache Solr
 
Formación apache Solr
Formación apache SolrFormación apache Solr
Formación apache Solr
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 

Similar to Retrieving Information From Solr

Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAlessandro Benedetti
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneSease
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. ElasticsearchSelecto
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Sunspot - The Ruby Way into Solr
Sunspot - The Ruby Way into SolrSunspot - The Ruby Way into Solr
Sunspot - The Ruby Way into SolrBADR
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataRoi Blanco
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA
 
Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Richard Boulton
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid Matt Sarrel
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache DruidImply
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic
 
Efficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity RankingEfficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity RankingSWAMI06
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL ServerEduardo Castro
 

Similar to Retrieving Information From Solr (20)

Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
 
Solr5
Solr5Solr5
Solr5
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. Elasticsearch
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Sunspot - The Ruby Way into Solr
Sunspot - The Ruby Way into SolrSunspot - The Ruby Way into Solr
Sunspot - The Ruby Way into Solr
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
 
Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache DruidBenchmarking Apache Druid
Benchmarking Apache Druid
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
Efficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity RankingEfficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity Ranking
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Spatial Data in SQL Server
Spatial Data in SQL ServerSpatial Data in SQL Server
Spatial Data in SQL Server
 

More from Ramzi Alqrainy

Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem Ramzi Alqrainy
 
OpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML LabOpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML LabRamzi Alqrainy
 
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019Ramzi Alqrainy
 
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices Ramzi Alqrainy
 
Infrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ ScaleInfrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ ScaleRamzi Alqrainy
 
Choosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooqChoosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooqRamzi Alqrainy
 
Arabic Content with Apache Solr
Arabic Content with Apache SolrArabic Content with Apache Solr
Arabic Content with Apache SolrRamzi Alqrainy
 
Recommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithmsRecommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithmsRamzi Alqrainy
 
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Ramzi Alqrainy
 
Evaluating Search Engines
Evaluating Search EnginesEvaluating Search Engines
Evaluating Search EnginesRamzi Alqrainy
 
Starting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results PageStarting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results PageRamzi Alqrainy
 
Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior PatternsRamzi Alqrainy
 
Intel microprocessor history
Intel microprocessor historyIntel microprocessor history
Intel microprocessor historyRamzi Alqrainy
 
How to prevent the cache problem in AJAX
How to prevent the cache problem in AJAXHow to prevent the cache problem in AJAX
How to prevent the cache problem in AJAXRamzi Alqrainy
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queuesRamzi Alqrainy
 
Advance Data Structure
Advance Data StructureAdvance Data Structure
Advance Data StructureRamzi Alqrainy
 

More from Ramzi Alqrainy (20)

Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem Non English Search as a Machine Learning Problem
Non English Search as a Machine Learning Problem
 
OpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML LabOpenSooq Image Recognition on AWS - AWS ML Lab
OpenSooq Image Recognition on AWS - AWS ML Lab
 
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
A Few Milliseconds in the Life of an HTTP Request - AWS Summit 2019
 
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
Mastering Chaos - OpenSooq’s journey from Monolithic to Microservices
 
Infrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ ScaleInfrastructure OpenSooq Mobile @ Scale
Infrastructure OpenSooq Mobile @ Scale
 
Choosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooqChoosing the Right Technologies for OpenSooq
Choosing the Right Technologies for OpenSooq
 
MemSQL
MemSQLMemSQL
MemSQL
 
Arabic Content with Apache Solr
Arabic Content with Apache SolrArabic Content with Apache Solr
Arabic Content with Apache Solr
 
Recommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithmsRecommender Systems, Part 1 - Introduction to approaches and algorithms
Recommender Systems, Part 1 - Introduction to approaches and algorithms
 
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
 
Evaluating Search Engines
Evaluating Search EnginesEvaluating Search Engines
Evaluating Search Engines
 
Starting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results PageStarting From Zero - Winning Strategies for Zero Results Page
Starting From Zero - Winning Strategies for Zero Results Page
 
Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior Patterns
 
Intel microprocessor history
Intel microprocessor historyIntel microprocessor history
Intel microprocessor history
 
How to prevent the cache problem in AJAX
How to prevent the cache problem in AJAXHow to prevent the cache problem in AJAX
How to prevent the cache problem in AJAX
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queues
 
Advance Data Structure
Advance Data StructureAdvance Data Structure
Advance Data Structure
 
Hashing
HashingHashing
Hashing
 
Markov Matrix
Markov MatrixMarkov Matrix
Markov Matrix
 
STACK
STACKSTACK
STACK
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Retrieving Information From Solr

  • 2. ● Head of Technology @ OpenSooq.com ● Technical Reviewer for “Scaling Apache Solr” and “Apache Solr Search Patterns” (Books) ● Contributor in Apache Solr ● Built 10 search engines in the last 2 years Ramzi Alqrainy
  • 3. Topics to be covered ● Exploring Solr’s Query Form ● Basic Queries and Parameters ● Matching Multiple Terms ● Fuzzy Matching ● Range Searches ● Sorting ● Pseudo Fields ● Geospatial Searches ● Filter Queries ● Faceting and Stats ● Tuning Relevance
  • 5. Basic Queries and Parameters
  • 7. Basic Queries and Parameters
  • 9. Boolean Queries ● Search for two different terms, new and house,requiring both to match ● Search for two different terms, new and house, requiring only one to match ● Default operator is OR, can be changed using the q.op query parameter.
  • 10. Negation ● Exclude documents containing specific terms
  • 11. Inverted Index—Revisited ● All terms in the index map to 1 or more documents. ● Terms in inverted index are stored in ascending lexicographical order ● When searching for multiple terms/ expressions, Solr (and Lucene) returns multiple document result sets corresponding to the various terms in the query and then does the specified binary operations on these result sets in order to generate the final result set. ● Scoring is performed on the result set o generate final result
  • 12. Grouped Expressions ● Represent arbitrarily complex queries
  • 13. Exact Phrase Queries ● Search for exact phrase “new house” ● Can Combine with Boolean Queries
  • 14. Proximity Searches ● Represent arbitrarily complex queries ● Solr/Lucene not only stores the documents that contain the terms, but also their positions within a document (term positions), which is used to provide phrase and proximity search functionality ● The number of the “~” is called a slop factor and has a hard limit of 2, above which the number of permutations get too large to provide results within a reasonable time
  • 16. Fuzzy Edit-Distance Searching ● Flexibility to handle misspellings and different spellings of a word ● Character variations based on Damerau-Levenshtein distances ● Accounts for 80% of human misspellings
  • 17. Wildcard Matching ● Robust functionality, but can be expensive if not properly used. ○ First all terms that match parts of the term before wildcard expression are extracted ○ Then all those terms are inspected to see if they match the entire wildcard expression ○ Expensive if your expression matches a large number of terms (for example the query e*)
  • 19. Query on a Range ● Solr Date Time uses a format that is a restricted form of the canonical representation of dateTime in the XML Schema specification (inspired by ISO 8601). All times are assumed to be UTC (no timezone specification) ● Based on a lexicographically sorted order for the field being queried ● Solr has Trie field types (tint, tdate, etc.) that should be used when you are doing a large number of range queries ● Various field types will be covered later in the course
  • 20. Solr Date Syntax ● Uses UTC and Restricted DateTime format ● Allows rounding down by YEAR, MONTH, WEEK, DAY, MINUTE, SECOND ● NOW represents current time and using DateMath, we can specify yesterday, tomorrow, last year, etc.
  • 22. Sorting ● Sort by score ● Values of Fields ● Ascending or Descending ● Multiple Fields
  • 23. Pseudo Fields ● Dynamically added at query time and calculated from fields in the schema using in- built functions ● Through functions, you can manipulate the values of any field before it is returned ● Can also be used to modify the order of documents by sorting on the pseudo field
  • 25. Geospatial Searches ● Solr provides location-based search ● Define a “location” field that contains latitude and longitude ● You can use a Query parser called “geofilt” to search on this field, specifying the point and radius around it ● Another query parser bbox uses a square around the point to do faster but approximate calculations ● Other types of searches (grids, polygons, etc. are possible and covered in advanced course
  • 26. Returning Calculated Distances ● You can use a pseudo field (a field that is calculated at query time) to achieve this
  • 28. The fq and q Parameters ● Indistinguishable at first glance: same query parameters passed to either parameters will return same documents. ● But, ○ fq serves a single purpose, to limit what is returned ○ q limits what is returned AND supplies the relevancy algorithm with a set of terms used for scoring ● fq results are cached and can be reused between searches ● Using fq we can avoid unnecessary relevancy calculations ● You can use multiple fq’s in a request (each individually cached), but only one q parameter
  • 30. Faceted Search ● High-level breakdown of search results based on one or more aspects (facets) of their documents ● Allows users to filter by (drill down into) specific components ● Can facet on values of fields, or facet by queries
  • 31. Types of Facet ● Field Facets ● Range Facets ● Pivot Facets
  • 32. Field Faceting ● Request back the unique values found in a particular field ● Most commonly used ● Works for single- and multi-valued fields ● Values are based on the indexed values of the field ● Common practice is to facet on a String field and search on a text field (to be discussed later). So, some schema preparation is required for faceting
  • 33. Range Facet ● Divide a range into equal size buckets
  • 35. Date Range Facets ● Recall Solr Date Syntax covered earlier in class ● Uses UTC and Restricted DateTime format ● Allows rounding down by YEAR, MONTH, WEEK, DAY, MINUTE, SECOND ● NOW represents current time and using DateMath, we can specify yesterday, tomorrow, last year, etc.
  • 36. Stats and Facets ● Can get aggregations on various fields ● From Solr 5.x onwards, stats on pivot facets is also available ● See https://lucidworks.com/blog/you-got-stats-in-my-facets/ for a great explanation of faceting
  • 37. Pivot Facets ● Functions like pivot tables in spreadsheet apps ● Aggregate calculations that pivot on values from multiple fields ● Example: give me a count of 3,4 and 5 star hotels in the top three cities ● Solr 5.x also allows you to stats calculations on pivots
  • 38. Facet by Query ● Sometimes, you need unequal ranges ● You can use the facet.query parameter ● Provides counts for subqueries
  • 40. Precision and recall Precision and recall Are the top results we show to users relevant? Recall Of the full set of documents found, have we found all of the relevant content in the index?
  • 41. Relevancy Our goal is to give users relevant results Relevance is a soft or fuzzy thing ● Depends upon the judgment of users Scoring is our attempt to predict relevance Similarity classes hold the implementations ● DefaultSimilarity ( TF-IDF ) ● BM25Similarity ● DFRSimilarity ● IBSimilarity ● LMDirichletSimilarity ● LMJelinekMercerSimilarity
  • 42. Lucene Scoring Similarity scoring formula • Used to rank results by measuring the similarity between a query and the documents that match the query
  • 43. Domain knowledge Examples ● Cheaper ● Newer or more recent ● More popular or higher user clicks Higher average user ratings Interesting combinations ● Value = average user ratings ÷ price ● Staying power = recent popularity ÷ age
  • 44. Boosting and biasing Lucene uses a standardized scoring approach Lucene does not know: ● Your data ● Your users ● Their queries Their preferences
  • 45. Domain knowledge What do you know about your data? ● Any specific rules about your data that wouldn't be suitable in a generic IR scoring algorithm ● In many data domains, there are fundamental numeric properties that make some objects generally "better" than others
  • 46. Domain knowledge More subtle examples ● Novelty factor ○ Quantity of user ratings × stdDev of ratings Profit margin ● Profit margin ○ Retail price ‒ factory cost Scarcity ● Scarcity ○ Quantity remaining ● Popularity by association or categorization ○ Sweaters sell better then swimsuits in November ● Manual ranking ○ New York Times bestseller list
  • 47. Request parameters We are going to make substantial use of request parameters, so let's recap:
  • 48. How can you improve search results? Using a sledge hammer ● Ignore score, sort on X ● Filter by X, retry if 0 results
  • 49. How can you improve search results? ● Boost functions and queries ● Apply domain knowledge based on numeric properties by multiplying functions directly into the score
  • 50. Retrieving Information from Solr JOSA Data Science Bootcamp