Tutorial on developing a Solr search component plugin


Published on

In this set of slides we give a step by step tutorial on how to develop a fully functional solr search component plugin. Additionally we provide links to full source code which can be used as a template to rapidly start creating your own search components.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tutorial on developing a Solr search component plugin

  1. 1. andrew.janowczyk@searchbox.com
  2. 2. Solr is◦ Blazing fast open source enterprise search platform◦ Lucene-based search server◦ Written in Java◦ Has REST-like HTTP/XML and JSON APIs◦ Extensive plugin architecturehttp://lucene.apache.org/solr/
  3. 3.  Allows for the development of plugins whichprovide advanced operations Types of plugins:◦ RequestHandlers Uses url parameters and returns own response◦ SearchComponents Responses are embedded in other responses (such as/select)◦ ProcessFactory Response is stored into a field along with thedocument during index time
  4. 4.  A quick tutorial on how to program aSearchComponent to◦ Be initialized◦ Parse configuration file arguments◦ Do something useful on search request (countssome words in indexed documents)◦ Format and return response We’ll name our plugin“DemoSearchComponent” and show how tostick it into the solrconfig.xml for loading
  5. 5.  In the next slide, we’ll specify a list of variablescalled “words”, and each list subtype is a string“word” We want to load these specific words and thencount them in all result sets of queries. Ex: config file has “body”, “fish”, “dog”◦ Indexed Document has: dog body body body fish fishfish fish orange◦ Result should be: body=3.0 fish=4.0 dog=1.0
  6. 6. <searchComponentclass="com.searchbox.DemoSearchComponent"name="democomponent"><str name=“field">myfield</str><lst name="words"><str name="word">body</str><str name="word">fish</str><str name="word">dog</str></lst></searchComponent>• We tell Solr the name of theclass which has ourcomponent• Variables will be loadedfrom this section duringthe init method• We set a default field foranalyzing the documents• We specify a list of wordswe’d like to have counts of
  7. 7.  We can see that we’re asking for Solr to loadcom.searchbox.DemoSearchComponent. This will be the output of our project in .jarfile format Copy the .jar file to the lib directory in theSolr installation so that Solr can find it. That’s it!
  8. 8. package com.searchbox;import java.io.IOException;import java.util.Date;import java.util.HashMap;import java.util.HashSet;import java.util.List;import java.util.Set;import java.util.logging.Level;import org.apache.lucene.document.Document;import org.apache.lucene.index.IndexableField;import org.apache.solr.common.SolrException;import org.apache.solr.common.params.SolrParams;import org.apache.solr.common.util.NamedList;import org.apache.solr.common.util.SimpleOrderedMap;import org.apache.solr.core.SolrCore;import org.apache.solr.core.SolrEventListener;import org.apache.solr.handler.component.ResponseBuilder;import org.apache.solr.handler.component.SearchComponent;import org.apache.solr.schema.SchemaField;import org.apache.solr.search.DocIterator;import org.apache.solr.search.DocList;import org.apache.solr.search.SolrIndexSearcher;import org.apache.solr.util.plugin.SolrCoreAware;import org.slf4j.Logger;import org.slf4j.LoggerFactory;Just some of thecommon packages we’llneed to import to getthings rolling!
  9. 9. public class DemoSearchComponentextends SearchComponent {private static Logger LOGGER =LoggerFactory.getLogger(DemoSearchComponent.class);volatile long numRequests;volatile long numErrors;volatile long totalRequestsTime;volatile String lastnewSearcher;volatile String lastOptimizeEvent;protected String defaultField;private List<String> words;• We specify that our classextends SearchComponent, sowe know we’re in business!• We decide that we’ll keep trackof some basic statistics forfuture usage• Number of requests/errors• Total time• Make a variable to store ourdefaultField and our words.
  10. 10.  Initialization is called when the plugin is firstloaded This most commonly occurs when Solr isstarted up At this point we can load things from file(models, serialized objects, etc) Have access to the variables set insolrconfig.xml
  11. 11.  We have selected to pass a list called “words”and have also provided the list “fish”, ”body”,”cat” of words we’d like to count. During initialization we need to load this listfrom solrconfig.xml and store it locally
  12. 12. @Overridepublic void init(NamedList args) {super.init(args);defaultField = (String) args.get("field");if (defaultField == null) {throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify the default for analysis");}words = ((NamedList) args.get("words")).getAll("word");if (words.isEmpty()) {throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Need to specify at least one word insearchComponent config!");}}Notice that we’ve loaded the list “words” andthen all of its attributes called “word” and putthem into the class level variable words.Also we’ve identified ourdefaultField
  13. 13.  There are 2 phases in a searchComponent◦ Prepare◦ Process During a query the prepare method is calledon all components before any work is done. This allows modifying, adding or substractingvariables or components in the stack Afterwards, the process methods are calledfor the components in the exact orderspecified by the solrconfig
  14. 14. @Overridepublic void prepare(ResponseBuilder rb)throws IOException {//none necessary}Nothing going on here, but weneed to override it otherwisewe can’t extendSearchComponent
  15. 15. @Overridepublic void process(ResponseBuilder rb) throws IOException {numRequests++;SolrParams params = rb.req.getParams();long lstartTime = System.currentTimeMillis();SolrIndexSearcher searcher = rb.req.getSearcher();NamedList response = new SimpleOrderedMap();String queryField = params.get("field");String field = null;if (defaultField != null) {field = defaultField;}if (queryField != null) {field = queryField;}if (field == null) {LOGGER.error("Fields arent defined, not performing counting.");return;}• We start off by keeping track in a volatilevariable the number of requests we’veseen (for use later in statistics), and we’dlike to know how long the process takesso we note the time.• We create a new NamedList which willhold this components response• We look at the URL parameters to see ifthere is a “field” variable present. Wehave set this up to override the defaultwe loaded from the config file
  16. 16. DocList docs = rb.getResults().docList;if (docs == null || docs.size() == 0) {LOGGER.debug("No results");}LOGGER.debug("Doing This many docs:t" + docs.size());Set<String> fieldSet = new HashSet<String>();SchemaField keyField =rb.req.getCore().getSchema().getUniqueKeyField();if (null != keyField) {fieldSet.add(keyField.getName());}fieldSet.add(field);• Since the search hasalready been completed,we get a list of documentswhich will be returned.• We also need to pull fromthe schema the field whichcontains the unique id.This will let us correlateour results with the rest ofthe response
  17. 17. DocIterator iterator = docs.iterator();for (int i = 0; i < docs.size(); i++) {try {int docId = iterator.nextDoc();HashMap<String, Double> counts = new HashMap<String, Double>();Document doc = searcher.doc(docId, fieldSet);IndexableField[] multifield = doc.getFields(field);for (IndexableField singlefield : multifield) {for (String string : singlefield.stringValue().split(" ")) {if (words.contains(string)) {Double oldcount = counts.containsKey(string) ? counts.get(string) : 0;counts.put(string, oldcount + 1);}}}String id = doc.getField(keyField.getName()).stringValue();NamedList<Double> docresults = new NamedList<Double>();for (String word : words) {docresults.add(word, counts.get(word));}response.add(id, docresults);} catch (IOException ex) {java.util.logging.Logger.getLogger(DemoSearchComponent.class.getName()).log(Level.SEVERE, null, ex);}}• Get a document iterator to lookthrough all docs• Setup count variable this doc• Load the document through thesearcher• Get the value of the field• BEWARE if it is a multifield, usinggetField will only return the firstinstance, not ALL instances• Do our basic word counting• Get the document unique id fromthe keyfield• Add each word to the results forthe doc• Add the doc result to the overallresponse, using its id value
  18. 18. rb.rsp.add("demoSearchComponent", response);totalRequestsTime += System.currentTimeMillis() - lstartTime;}• Add all results to the finalresponse• The name we pick here willshow up in the Solr output• Note down how long it tookfor the entire process
  19. 19. @Overridepublic String getDescription() {return "Searchbox DemoSearchComponent";}@Overridepublic String getVersion() {return "1.0";}@Overridepublic String getSource() {return "http://www.searchbox.com";}@Overridepublic NamedList<Object> getStatistics() {NamedList all = new SimpleOrderedMap<Object>();all.add("requests", "" + numRequests);all.add("errors", "" + numErrors);all.add("totalTime(ms)", "" + totalTime);return all;}• In order to have a productiongrade plugin, users expect to seecertain pieces of informationavailable in their Solr adminpanel• Description, version and sourceare just Strings• We see getStatistics() actuallyuses the volatile variables wewere keeping track of before,sticks them into another namedlist and returns them. Theseappear under the statistics panelin Solr.That’s it!
  20. 20. <requestHandler name="/demoendpoint" class="solr.SearchHandler"><arr name="last-components"><str>democomponent</str></arr></requestHandler>We need some way to run our searchComponent, so we’ll add a quickrequestHandler to test it. This is done simply by overriding the normalsearchHandler and telling it to run the component we defined on an earlierslide. Of course you could use your component directly in the select handlerand/or add it to a chain of other components! Solr is super versatile!
  21. 21.*%3A*&wt=xml&rows=2&fl=id,myfield<response><lst name="responseHeader"><int name="status">0</int><int name="QTime">79</int></lst><result name="response" numFound="13262" start="0"><doc><str name="id">f73ca075-3826-45d5-85df-64b33c760efc</str><arr name="myfield"><str>dog body body body fish fish fish fish orange</str></arr></doc><doc><str name="id">bc72dbef-87d1-4c39-b388-ec67babe6f05</str><arr name="myfield"><str>the fish had a small body. the dog likes to eat fish</str></arr></doc></result><lst name="demoSearchComponent"><lst name="f73ca075-3826-45d5-85df-64b33c760efc"><double name="body">3.0</double><double name="fish">4.0</double><double name="dog">1.0</double></lst><lst name="bc72dbef-87d1-4c39-b388-ec67babe6f05"><double name="body">1.0</double><double name="fish">2.0</double><double name="dog">1.0</double></lst></lst></response>Query resultsOur resultsSame order + idsfor correlation
  22. 22. • Because we’ve overridden the getStatistics() method, wecan get real-time stats from the admin panel!• In this case since it’s a component of the SearchHandler,our fields are concatenated with the other statistics
  23. 23. Happy Developing!Full Source Code available at:http://www.searchbox.com/developing-a-solr-plugin/