Gestione delle Informazioni su Web
Solr is an open source enterprise search platform from the
Apache Lucene project.
Solr is written in Java and runs as a standalone full-text
search server within a servlet container such as Apache
Tomcat or Jetty.
Solr uses the Lucene Java search library at its core for full-
text indexing and search, and has REST-like HTTP/XML and
JSON APIs that make it usable from most popular
Solr's powerful external configuration allows it to be tailored
to many types of application without Java coding, and it has
an plugin architecture to support more advanced
Implementation of additional Features
Web Application for Search
“Search engine indexing collects, parses, and stores data to
facilitate fast and accurate information retrieval.”
What needs to be indexed
What does not
Queries made through the Request Handlers that are
responsible to answer your request
Have been implemented two SearchHandler:
First handler manages the search of terms
Second handler manages the the search of images
The handler allows you to assign different weights for the
terms in the index in order to sort the search results
Solr provides a collection of highlighting utilities which can be called by
Request Handlers to include "highlighted" matches in field values.
Describe the structure of the data index. It consists of
body, title, description, keywords, alt, src, text_autocomplete
type definitions (tokenizer)
text_html………………………… (body, alt)
text_general…………………….. (title, description, keywords)
copy of body, title, description in text_autocomplete
Configuration file for search components and request
Each document (html page) consists of searchable fields.
The rules for searching each field are defined using field
When a document is added/updated, its fields are analyzed
and tokenized, and those tokens are stored in the index.
The analysis process in SOLR consists of the following
Analysis pre-tokenization (through the class CharFilter)
Tokenization (class Tokenizer)
Analysis of post-tokenization (classes Filter)
Tokenizers and Filters
(strip out HTML elements from an analyzed text)
(divides text at whitespace)
(split on intra-word delimiters, ex. “Wi-Fi” → “Wi”, “Fi”)
solr.SynonymFilterFactory (only for query, ex. I-pod → Ipod)
solr.SnowballPorterFilterFactory (both for query and index)
For the misspelling feature has been implemented an Index
Based Spell checker.
Solr uses one of the configured field in the indexed
document as Dictionary input and uses it for spell
The field used in the indexed document as Dictionary input is
The spellcheck distance measure used is the Levenshtein
The aim of Autocomplete is suggest individual words that
begin with the letters specified by the user.
For configure the suggester we had to prepare the
appropriate field on which we will build hints. In our case, we
use the field “name_autocomplete”.