Your SlideShare is downloading. ×
Solr features
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Solr features


Published on

5 minutes description of Solr f

5 minutes description of Solr f

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Search Platform Features & Use Cases
  • 2. SOLR● SOLR is an standalone search server, that can scale separatedly from the application that uses it ● i.e. Avoid the case where an e-commerce server is slowed down by the users searching their product catalog● SOLR is accessed using HTTP/XML REST-like and JSON APIs ● Multi-platform, multi-language and client-independent ● Results in XML, CSV, or JSON (with custom variations for Ruby,Python,PHP)● 100% Opensource, written in Java, runs in JVM● Apache Foundation top-level project● Most widely-used search server in industry
  • 3. SOLR : A Lucene server● Solr is a search platform that provides all the features of Lucene search engine * ● high-performance indexing ● Incremental and batch indexing ● Small footprint (RAM and disk)● And has all of Lucene features ● Ranked searching ● Many query types (phrase, wildcard, regexp, range, geospatial proximity) ● Many field types, meaningful sorting ● Multi-index search and merge of results ● Faceting ● Language recognition (stemming) ● Suggestions * (both projects are actually merged since SOLR 3.1, March 2010)
  • 4. Simple SOLR Example● Index a product catalog (i.e. IPod Video)● Data in XML format <doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="price">399.00</field> <field name="inStock">true</field> <field name="store">37.7752,-100.0232</field> <!-- Dodge City store --> </doc>● Schema configuration <field <field name="id" type="string" indexed="true" stored="true"/> name="name" type="text" indexed="true" stored="true"/> <field name="features" type="text" indexed="true" stored="true" multiValued="true"/> <field name="price" type="float" indexed="true" stored="true"/> <field name="inStock" type="boolean" indexed="true" stored="true" /> <field name="store" type="location" indexed="true" stored="true"/>
  • 5. Simple SOLR Example● Query ● Return all products with « video » in any field, sorted by descendant price, show just the name,price,inStockcurl "http://localhost:8983/solr/collection1/select?q=video&sort=price+desc&fl=name,price,instock&indent=true"<?xml version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="fl">name,price</str> <str name="sort">price desc</str> <str name="indent">true</str> <str name="q">video</str> </lst></lst><result name="response" numFound="3" start="0"> <doc> <str name="name">ATI Radeon X1900 XTX 512 MB PCIE Video Card</str> <float name="price">649.99</float> <bool name="inStock">false</bool></doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str> <float name="price">479.95</float> <bool name="inStock">false</bool></doc> <doc> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <float name="price">399.0</float> <bool name="inStock">true</bool></doc></result></response>
  • 6. Simple SOLR Example● Query Facets ● Add facets options and desired category Facet : inStock Facet : price, from 0 to 1000$, in 100$ gaps q=video&sort=price+desc&facet=tru q=video&sort=price+desc&facet=true&facet.range=pr e&facet.field=inStock ice& et.range.end=1000 <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="counts"> <lst name="facet_fields"> <int name="0.0">0</int> <lst name="inStock"> <int name="100.0">0</int> <int name="false">2</int> <int name="200.0">0</int> <int name="true">1</int> <int name="300.0">1</int> (Apple Ipod 399$) </lst> <int name="400.0">1</int> (Asus Extreme 479$) </lst> <int name="500.0">0</int> <lst name="facet_dates"/> <int name="600.0">1</int> (ATI Radeon 649$) <lst name="facet_ranges"/> <int name="700.0">0</int> </lst> <int name="800.0">0</int> <int name="900.0">0</int> </lst>
  • 7. Simple SOLR Example● Filter Query ● Uses different cache than Search Cache (useful for big results) Filter Query : all products priced from 300 to 499 USD q=*&fl=name,price&fq=price:[300 TO 499] <result name="response" numFound="4" start="0"> <doc> <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB – SATA-300</str> <float name="price">350.0</float> </doc> <doc> <str name="name">Apple 60 GB iPod with Video Playback Black</str> <float name="price">399.0</float> </doc> <doc> <str name="name">Canon PowerShot SD500</str> <float name="price">329.95</float> </doc> <doc> <str name="name">ASUS Extreme N7800GTX/2DHTV (256 MB)</str> <float name="price">479.95</float> </doc> </result>
  • 8. Simple SOLR Example● Spatial Query ● Store data: – <field name="store">45.17614,-93.87341</field> <!-- Buffalo store --> – <field name="store">40.7143,-74.006</field> <!-- NYC store --> – <field name="store">37.7752,-122.4232</field> <!-- San Francisco store --> ● We are at 45.15,-93.85 (at 3.437 km from the Buffalo store) ● Find all products in a store within 5km of our position: QUERY : &fl=name,store&q=*:*&fq={!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5} » "response":{"numFound":3,"start":0,"docs":[ { "name":"Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300", "store":"45.17614,-93.87341"}, { "name":"Belkin Mobile Power Cord for iPod w/ Dock", "store":"45.18014,-93.87741"}, { "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM", "store":"45.18414,-93.88141"}] }
  • 9. SOLR Features● SOLR Cloud ● Cluster configuration using zookeper ● Easy sharding and failover management ● Self-healing, no single point of failure● SOLR Cell (aka RequestImportHandler) ● TIKA integration for binary document parsing ● Parses DOC, PDF, XLS, MIME, etc● DataImportHandlers ● Automatically fetch and index SQL Databases, E-mails, RSS feeds, Files in folder, etc.
  • 10. SOLR Features● Multiple Solr Core ● Many index collections in the same server ● Different schema definitions for each collection ● Different configurations for storage, replication, etc● Caching ● Recurrent searches are cached, improves speed ● Advanced warming techniques ● Adding content triggers just a partial cache update● Advanced ● Language detection ● Natural Language Processing ● Clustering to scale both search and document retrieval
  • 11. SOLR CLoud
  • 12. SOLR TIKA integration● SOLRCell embeds TIKA for binary file parsing● TIKA parses DOC, PDF, XLSX, HTML... and represent it using XHTML, JSON or CSV ● Full list of accepted formats : ● For some files, it can just index metadata (MP3, JPG, AVI)● SOLRCell will internally recover the TIKA output and store it so we can search it● SOLR does not store the original binary file
  • 13. SOLR Addons● Admin Interface
  • 14. SOLR Addons● Web Interface (SOLRitas)
  • 15. SOLR Use Cases● Liferay Search ● As liferay already uses Lucene, we can connect it to a SOLR server ● Leverages the Liferay server and lets the SOLR cluster handle all the user searches in the portal● Magento E-Commerce . ● Avoids using MySQL for searching ● Better search results ● Better overall performance● Alfresco Search ● Currently, Alfresco recommends to setup SOLR from the beginning ● By default, Lucene+Tika is used internally