Your SlideShare is downloading. ×
0
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Solr Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Solr Presentation

165

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
165
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. TorreSaracena Group Gestione delle Informazioni su Web Esperienza IR. Francesco Maglia Ilario Maiolo Gianluca Porcino Matteo Cannaviccio
  • 2. Apache SOLR  Solr is an open source enterprise search platform from the Apache Lucene project.  Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty.  Solr uses the Lucene Java search library at its core for full- text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages.  Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has an plugin architecture to support more advanced customization.
  • 3. Summary  Dataset Indexing  Dataset Querying  Implementation of additional Features  Web Application for Search
  • 4. Dataset Indexing “Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval.” Wikipedia  What needs to be indexed  Metatag  Relevants fields  What does not  Useless field
  • 5. Terms Indexing  Metatag  Relevant field
  • 6. Image Indexing  Metatag (who has)  Image fields  Check the image
  • 7. Dataset Querying  Queries made through the Request Handlers that are responsible to answer your request  Have been implemented two SearchHandler: First handler manages the search of terms <requestHandler name="/lighthouse" class="solr.SearchHandler">.../> Second handler manages the the search of images <requestHandler name="/lighthouseimg" class="solr.SearchHandler">.../>
  • 8. Querying settings  The handler allows you to assign different weights for the terms in the index in order to sort the search results  Highlights Solr provides a collection of highlighting utilities which can be called by Request Handlers to include "highlighted" matches in field values.
  • 9. Configuration Files  Schema.xml Describe the structure of the data index. It consists of several parts:  field definitions body, title, description, keywords, alt, src, text_autocomplete  type definitions (tokenizer) text_html………………………… (body, alt) text_general…………………….. (title, description, keywords) text_auto………………………… (text_autocomplete)  copyField section copy of body, title, description in text_autocomplete  Solrconfig.xml Configuration file for search components and request handlers
  • 10. Analysis Process  Each document (html page) consists of searchable fields. The rules for searching each field are defined using field type definitions.  When a document is added/updated, its fields are analyzed and tokenized, and those tokens are stored in the index.  The analysis process in SOLR consists of the following phases:  Analysis pre-tokenization (through the class CharFilter)  Tokenization (class Tokenizer)  Analysis of post-tokenization (classes Filter)
  • 11. Tokenizers and Filters  Text_html  solr.HTMLStripCharFilterFactory (strip out HTML elements from an analyzed text)  solr.StandardTokenizerFactory  solr.LowerCaseFilterFactory  solr.StopFilterFactory (stopwords.txt)  Text_auto  solr.WhitespaceTokenizerFactory (divides text at whitespace)  solr.WordDelimiterFilterFactory (split on intra-word delimiters, ex. “Wi-Fi” → “Wi”, “Fi”)  solr.LowerCaseFilterFactory  Text_general  2,3,4  solr.SynonymFilterFactory (only for query, ex. I-pod → Ipod)  solr.SnowballPorterFilterFactory (both for query and index)
  • 12. Misspelling  For the misspelling feature has been implemented an Index Based Spell checker.  Solr uses one of the configured field in the indexed document as Dictionary input and uses it for spell suggestions.  The field used in the indexed document as Dictionary input is “name_autocomplete”.  The spellcheck distance measure used is the Levenshtein distance.
  • 13. Autocomplete  The aim of Autocomplete is suggest individual words that begin with the letters specified by the user.  For configure the suggester we had to prepare the appropriate field on which we will build hints. In our case, we use the field “name_autocomplete”.
  • 14. Web Application /lighthouse Search terms Search images /lighthouse img response response $.parseJSON() $.parseJSON()
  • 15. Screenshots MENÙ TEXTBOX
  • 16. Search with Autocomplete SUGGESTSTYPING…
  • 17. Misspelling MISSPELLING
  • 18. Snippet Results NUMBER RESULTS
  • 19. Snippet Results PAGE HYPERLINK Sorted Results
  • 20. Images Search
  • 21. Images Search

×