Bringing reusability to enterprise search


Published on

Using & working with Solr - the reusable enterprise search engine.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bringing reusability to enterprise search

  1. 1. Bringing Reusability to Enterprise Search Using Solr for building reusable enterprise search engine. A Collabor Labs Technologypaper, May 2011 This whitepaper discusses the high level technical aspects of using Solr to bring reusability in enterprise search implementation Brahmaji Pusuluri Sr. Software Share on Twitter
  2. 2. Apache Lucene(TM) is a high-performance, full-featured text search engine library writtenentirely in Java. It is suitable for nearly any application that requires full-text search.Solr is the popular, blazing fast open source enterprise search platform from the ApacheLucene project. HTTP request processing for indexing and querying documents. Thus, youcan have an application anywhere query and index files over the Internet via XML overHTTP using the URL of your Solr search server. It is also a highly optimized search serverwith caching and replication to other Solr search servers. It has the powerful feature ofindexing Rich text documents (e.g.: word, pdf, etc.)Once Solr is installed successfully, we need to modify the following files as per the projectrequirements.Solrconfig.xml: solrconfig.xml is the file that contains most of the parameters for configuringSolr itself.Schema.xml: The schema.xml file contains all of the details about which fields yourdocuments can contain, and how those fields should be dealt with when adding documents tothe index, or when querying those fields.Once the settings are done you can send an xml file to the Solr to index the data by usingcurl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xmlexample.xml file containing the tags format which is defined in schema.xml.Research by: Collabor Labs Share on TwitterPage 2May 2011 All trademarks belong to their respective owners
  3. 3. Most applications store data in relational databases or XML files and searching over suchdata is a common use-case. The DataImportHandler is a Solr contrib that provides aconfiguration driven way to import this data into Solr in both "full builds" and usingincremental delta imports.Edit your solrconfig.xml to add the request handler<requestHandler name="/dataimport"class="org.apache.solr.handler.dataimport.DataImportHandler"><lst name="defaults"> <str name="config">data-config.xml</str></lst></requestHandler>The data-config.xml file contains the following.<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/> <document> <entity name="name" query="select id,name,desc from mytable"> <field column="id" name="solr_id"/> <field column="name" name="solr_name"/> <field column="desc" name="solr_desc"/> <entity name="inner" query="select details from another_table where id =${}"> <field column="details" name="solr_details"/> </entity> </entity> </document></dataConfig>Run the full-import command to index the entire database.http://localhost:8983/solr/dataimport?command=full-importRun the delta-import command to index the incremental data importshttp://localhost:8983/solr/dataimport?command=delta-importMultiple cores let you have a single Solr instance with separate configurations and indexes,with their own configuration and schema for very different applications, but still have theconvenience of unified administration. Individual indexes are still fairly isolated, but you canmanage them as a single application, create new indexes on the fly by spinning up newResearch by: Collabor Labs Share on TwitterPage 3May 2011 All trademarks belong to their respective owners
  4. 4. SolrCores, and even make one SolrCore replace another SolrCore without ever restartingyour Servlet Container.Edit the solr.xml and write a snippet. See example below.<solr persistent="true" sharedLib="lib"><cores adminPath="/admin/cores"> <core name="application1" instanceDir="app1"> <property name="dataDir" value="/app1/data" /> <property name="configName" value="/app1/config.xml" /> <property name="schemaName" value="/app1/schema.xml" /> </core> <core name="application2" instanceDir="app2" /></cores></solr>Run the full-import command to index the entire database in application1.http://localhost:8983/solr/application1/dataimport?command=full-importRun the delta-import command to index the incremental data importshttp://localhost:8983/solr/application1/dataimport?command=delta-importRun the full-import command to index the entire database in application2.http://localhost:8983/solr/application2/dataimport?command=full-importRun the delta-import command to index the incremental data importshttp://localhost:8983/solr/application2/dataimport?command=delta-importSearching for indexeshttp://localhost:8983/solr/application1/select/?q=searchterm returns xml file with results.We can reuse single Solr installation to multiple enterprise search implementations.References: 1. 2. Wikipedia pages – Apache SolrFor more information, contact: info@collabor.comResearch by: Collabor Labs Share on TwitterPage 4May 2011 All trademarks belong to their respective owners