Solr features
Upcoming SlideShare
Loading in...5
×
 

Solr features

on

  • 493 views

5 minutes description of Solr f

5 minutes description of Solr f

Statistics

Views

Total Views
493
Views on SlideShare
493
Embed Views
0

Actions

Likes
1
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Solr features Solr features Presentation Transcript

  • Search Platform Features & Use Cases
  • SOLR● SOLR is an standalone search server, that can scale separatedly from the application that uses it ● i.e. Avoid the case where an e-commerce server is slowed down by the users searching their product catalog● SOLR is accessed using HTTP/XML REST-like and JSON APIs ● Multi-platform, multi-language and client-independent ● Results in XML, CSV, or JSON (with custom variations for Ruby,Python,PHP)● 100% Opensource, written in Java, runs in JVM● Apache Foundation top-level project● Most widely-used search server in industry
  • SOLR : A Lucene server● Solr is a search platform that provides all the features of Lucene search engine * ● high-performance indexing ● Incremental and batch indexing ● Small footprint (RAM and disk)● And has all of Lucene features ● Ranked searching ● Many query types (phrase, wildcard, regexp, range, geospatial proximity) ● Many field types, meaningful sorting ● Multi-index search and merge of results ● Faceting ● Language recognition (stemming) ● Suggestions * (both projects are actually merged since SOLR 3.1, March 2010)
  • Simple SOLR Example● Index a product catalog (i.e. IPod Video)● Data in XML format <doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="price">399.00</field> <field name="inStock">true</field> <field name="store">37.7752,-100.0232</field> <!-- Dodge City store --> </doc>● Schema configuration <field <field name="id" type="string" indexed="true" stored="true"/> name="name" type="text" indexed="true" stored="true"/> <field name="features" type="text" indexed="true" stored="true" multiValued="true"/> <field name="price" type="float" indexed="true" stored="true"/> <field name="inStock" type="boolean" indexed="true" stored="true" /> <field name="store" type="location" indexed="true" stored="true"/>
  • Simple SOLR Example● Query ● Return all products with « video » in any field, sorted by descendant price, show just the name,price,inStockcurl "http://localhost:8983/solr/collection1/select?q=video&sort=price+desc&fl=name,price,instock&indent=true"<?xml version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="fl">name,price</str> <str name="sort">price desc</str> <str name="indent">true</str> <str name="q">video</str> </lst></lst><result name="response" numFound="3" start="0"> <doc> <str name="name">ATI Radeon X1900 XTX 512 MB PCIE Video Card</str> <float name="price">649.99</float> <bool name="inStock">false</bool></doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str> <float name="price">479.95</float> <bool name="inStock">false</bool></doc> <doc> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <float name="price">399.0</float> <bool name="inStock">true</bool></doc></result></response>
  • Simple SOLR Example● Query Facets ● Add facets options and desired category Facet : inStock Facet : price, from 0 to 1000$, in 100$ gaps q=video&sort=price+desc&facet=tru q=video&sort=price+desc&facet=true&facet.range=pr e&facet.field=inStock ice&facet.range.gap=100&facet.range.start=0.0&fac et.range.end=1000 <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="counts"> <lst name="facet_fields"> <int name="0.0">0</int> <lst name="inStock"> <int name="100.0">0</int> <int name="false">2</int> <int name="200.0">0</int> <int name="true">1</int> <int name="300.0">1</int> (Apple Ipod 399$) </lst> <int name="400.0">1</int> (Asus Extreme 479$) </lst> <int name="500.0">0</int> <lst name="facet_dates"/> <int name="600.0">1</int> (ATI Radeon 649$) <lst name="facet_ranges"/> <int name="700.0">0</int> </lst> <int name="800.0">0</int> <int name="900.0">0</int> </lst>
  • Simple SOLR Example● Filter Query ● Uses different cache than Search Cache (useful for big results) Filter Query : all products priced from 300 to 499 USD q=*&fl=name,price&fq=price:[300 TO 499] <result name="response" numFound="4" start="0"> <doc> <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB – SATA-300</str> <float name="price">350.0</float> </doc> <doc> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <float name="price">399.0</float> </doc> <doc> <str name="name">Canon PowerShot SD500</str> <float name="price">329.95</float> </doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str> <float name="price">479.95</float> </doc> </result>
  • Simple SOLR Example● Spatial Query ● Store data: – <field name="store">45.17614,-93.87341</field> <!-- Buffalo store --> – <field name="store">40.7143,-74.006</field> <!-- NYC store --> – <field name="store">37.7752,-122.4232</field> <!-- San Francisco store --> ● We are at 45.15,-93.85 (at 3.437 km from the Buffalo store) ● Find all products in a store within 5km of our position: QUERY : &fl=name,store&q=*:*&fq={!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5} » "response":{"numFound":3,"start":0,"docs":[ { "name":"Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300", "store":"45.17614,-93.87341"}, { "name":"Belkin Mobile Power Cord for iPod w/ Dock", "store":"45.18014,-93.87741"}, { "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM", "store":"45.18414,-93.88141"}] }
  • SOLR Features● SOLR Cloud ● Cluster configuration using zookeper ● Easy sharding and failover management ● Self-healing, no single point of failure● SOLR Cell (aka RequestImportHandler) ● TIKA integration for binary document parsing ● Parses DOC, PDF, XLS, MIME, etc● DataImportHandlers ● Automatically fetch and index SQL Databases, E-mails, RSS feeds, Files in folder, etc.
  • SOLR Features● Multiple Solr Core ● Many index collections in the same server ● Different schema definitions for each collection ● Different configurations for storage, replication, etc● Caching ● Recurrent searches are cached, improves speed ● Advanced warming techniques ● Adding content triggers just a partial cache update● Advanced ● Language detection ● Natural Language Processing ● Clustering to scale both search and document retrieval
  • SOLR CLoud
  • SOLR TIKA integration● SOLRCell embeds TIKA for binary file parsing● TIKA parses DOC, PDF, XLSX, HTML... and represent it using XHTML, JSON or CSV ● Full list of accepted formats : http://tika.apache.org/1.3/formats.html ● For some files, it can just index metadata (MP3, JPG, AVI)● SOLRCell will internally recover the TIKA output and store it so we can search it● SOLR does not store the original binary file
  • SOLR Addons● Admin Interface
  • SOLR Addons● Web Interface (SOLRitas)
  • SOLR Use Cases● Liferay Search ● As liferay already uses Lucene, we can connect it to a SOLR server ● Leverages the Liferay server and lets the SOLR cluster handle all the user searches in the portal● Magento E-Commerce . ● Avoids using MySQL for searching ● Better search results ● Better overall performance● Alfresco Search ● Currently, Alfresco recommends to setup SOLR from the beginning ● By default, Lucene+Tika is used internally