Coffee at DBG- Solr introduction


An event conducted at DBG about Apache Solr as part of Coffee at DBG program.

  1. 1. Prepared byNithin S, Sajin TMDigital Brand Group
  2. 2.  Apache solr is a search server written in Javausing the java search library “lucene”. Open source Get results using web service as JSON/XML UTF-8 support
  3. 3.  Ebay Hp Guardian Cisco At&t Intoit Ford
  4. 4.  Text based library in Java Fast , feature rich with active apachedevelopment community Inverted Index mechanism - Index thecontent related to the terms/words
  5. 5. Server Solr 4.3.0 Java server containers ( Tomcat/Jetty Servers ) Java 1.6 and aboveClient Any system which can post and get data throughhttp
  6. 6.  Schema – can consider as a db table Core - schema container Collection – multiple core handling DIH - Data import handler Request handler - StandardRequestHandler ,DisMaxRequestHandler (multiple fields),IndexInfoRequestHandler Response handler - xml , json , python,ruby
  7. 7. Start Solr java -jar start.jarThis will start up t he Jetty application server on port 8983, and useyour terminal to display the logging information from Solr.Index your data java -jar post.jar *.xmlInterface http://localhost:8983/solr
  8. 8. The Solr Home directory typically contains the following sub-directories...conf/This directory is mandatory and must contain your solrconfig.xmland schema.xml. Any other optional configuration files would alsobe kept directory is the default location where Solr will keep yourindex, and is used by the replication scripts for dealing withsnapshots. You can override this location in theconf/solrconfig.xml. Solr will create this directory if it does notalready exist.lib/This directory is optional. If it exists, Solr will load any Jarsfound in this directory and use them to resolve any "plugins"specified in your solrconfig.xml or schema.xml (ie: Analyzers,Request Handlers, etc...). Alternatively you can use the <lib>syntax in conf/solrconfig.xml to direct Solr to your plugins. Seethe example conf/solrconfig.xml file for details.
  9. 9.  solr-php-client Pecl extention for solr
  10. 10. Field options Indexed Stored multiValued compressed
  11. 11.  add/update - allows you to add or update a document to Solr.Additions and updates are not available for searching until a committakes place. commit - tells Solr that all changes made since the last commitshould be made available for searching. optimize - restructures Lucenes files to improve performance forsearching. Optimization is generally good to do when indexing hascompleted. If there are frequent updates, you should scheduleoptimization for low-usage times. An index does not need to beoptimized to work properly. Optimization can be a time-consumingprocess. delete - can be specified by id or by query. Delete by id deletesthe document with the specified id; delete by query deletes alldocuments returned by a query.
  12. 12. Supported formats XML, JSON, CSV, or javabin.Supported document types are Microsoft office docs, PDF’s curl http://localhost:8983/solr/collection1/update/csv -H Content-type:text/csv; charset=utf-8 --data-binary @D:/Projects/solr-4.3.0/example/exampledocs/books.csv http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3E
  13. 13. q The query to search with in Solr. See "Lucene QueryParser Syntax"in Resources for a full description of the syntax. Sortinginformation can be included by appending a semi-colon and thename of an indexed, non-tokenized field (explained below). Thedefault sort is score desc, which means sort by descending score.q=myField:Java ANDotherField:developerWorks; date ascThis query searches the two fieldsspecified and sorts the results basedon a date field.start Specifies the starting offset into the result set. Useful for pagingthrough results. The default value is 0.start=15Returns results starting with thefifteenth ranked result.rows The maximum number of documents to return. The default valueis 10.rows=25fq Provide an optional filtering query. Results of the query arerestricted to searching only those results returned by the filterquery. Filtered queries are cached by Solr. They are very usefulfor improving the speed of complex queries.Any valid query that could be passedin the q parameter, not includingsort information.hl When hl=true, highlight snippets in the query response. Defaultis false. See the Solr Wiki section on highlighting parameters formore options (in Resources).hl=truefl Specify as a comma-separated list the set of Fields that shouldbe returned in the document results. "*" is the default and meansall fields. "score" indicates the score should be returned as well.*,score
  14. 14. Full text search http://localhost:8983/solr/select?q=SearchtextSearch only within a field http://localhost:8983/solr/select?q=fieldname:searchtextControl which fields are displayed in result http://localhost:8983/solr/select?q=video&fl=id,categoryProvide ranges to fields http://localhost:8983/solr/select?q=price:[0TO400]&fl=id,name,priceMore like this (MLT) http://localhost:8983/solr/select?q=Searchtext&mlt=true&mlt.fl=headline&mlt.mindf=1&mlt.mintf=1&fl=id,score&rows=100More information on how this works and the options availablecan be found at
  15. 15. http://localhost:8983/solr/query?q=camera&facet=true&facet.field=manu
  16. 16.  Hit Highlight Auto suggest Spell suggestion Spatial search
  17. 17. Removing Data from Indexcurl http://localhost:8983/solr/collection1/update -H"Content-Type: text/xml“ --data-binary“<delete><query>*:*</query></delete>”