Apache solr liferay

  • 804 views
Uploaded on

A 2009 presentation which I just found in archives

A 2009 presentation which I just found in archives

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
804
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
42
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Solr Enterprise search platform from the Apache Lucene projectRivet Logic Corporation1800 Alexander Bell DriveSuite 400Reston, VA 20191Ph: 703.955.3480 Fax: 703.234.7711
  • 2. What is Solr? ● Search Server ● Built upon Apache Lucene ● Fast, very ● Scalable, query load and collection size ● Interoperable ● Extensible ● Lucene power exposed over HTTP ● Spell checking, highlighting, faceting and etc. ● Caching ● Replication ● Distributed search
  • 3. How stuff works?
  • 4. schema.xml● Field types ○ <fieldType name="text" class="solr.TextField" indexed="true" />● Fields ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>● Unique key (optional) ○ <uniqueKey>id</uniqueKey>● copy fields ○ <copyField source="developers" dest="df"/>● dynamic fields ○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>● similarity configuration ○ Similarity is the scoring routine for each document vs. a query
  • 5. solrconfig.xml● Lucene indexing parameters ○ <mergeFactor>10</mergeFactor> ○ <ramBufferSizeMB>32</ramBufferSizeMB>● Cache settings ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount=" 32"/>● Request handler configuration ○ <requestHandler name="dismax" class="solr.SearchHandler" >● HTTP cache settings ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">● Search components, response writers, query parsers ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ○ <queryResponseWriter name="velocity" class="org.apache.solr.request. VelocityResponseWriter"/> ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
  • 6. Request Handler<requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str> <str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>
  • 7. Response Writer● A Response Writer generates the formatted response of a search.● The wt parameter selects the Response Writer to be used● json, php, phps, python, ruby, xml, xslt, velocity <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>
  • 8. Analyzers, Tokenizers, Filters● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> </fieldType>● The job of a tokenizer is to break up a stream of text into tokens● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
  • 9. Other features● Highlighting ○ &hl=true&hl.fl=developers● Synonyms ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>● Spell check ○ The spell check component can return a list of alternative spelling suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">● Content Streams ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml● Solr Cell ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types● More like this ○ http://wiki.apache.org/solr/MoreLikeThis
  • 10. Indexing with solrJSolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr"));SolrInputDocument doc = new SolrInputDocument();doc.addField("id", "EXAMPLEDOC01");doc.addField("title", "NOVAJUG SolrJ Example");solr.add(doc);solr.commit(); // after a batch, not per documentsolr.optimize(); // periodically, if/when needed
  • 11. Data Import Handler● Indexes relational database, XML data, and e-mail sources● Supports full and incremental/delta indexing● Highly extensible with custom data sources, transformers, etc● http://wiki.apache.org/solr/DataImportHandler
  • 12. Replication● Master is polled● Replicant pulls Lucene index and optionally also Solr configuration files● Query throughput scaling: replicate and load balance● http://wiki.apache.org/solr/SolrReplication
  • 13. Demo● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/● Start solr ○ cd <solr_home>/example ○ java -jar start.jar● Post documents ○ cd <solr_home>/example/exampledocs ○ java -jar post.jar *.xml ○ java -jar post.jar cw.xml● Access Solr ○ http://localhost:8983/solr/admin/● Querying solr ○ http://localhost:8983/solr/select/?q=binesh ○ http://localhost:8983/solr/select/?q=binny ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1 ○ http://localhost:8983/solr/itas/● Luke ○ http://www.getopt.org/luke/
  • 14. Liferay + Solr: Motivation● Centralizing search index in clustered Liferay environment● Performance improvement ○ Re-indexing costs too much for large DBs ○ Often time indexes of Liferay deployments in a cluster are not synchronized
  • 15. Liferay + Solr: Configuration 1Install Solr (http://lucene.apache.org/solr)Setting up environment variables ● SOLR_HOME = /${solr installed folder} ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"solr.xml ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content <?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>
  • 16. Liferay + Solr: Configuration 2schema.xml ● This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have to create the conf directory) in your Solr home folder.... <fields><field name="comments" type="text" indexed="true" stored="true" /><field name="content" type="text" indexed="true" stored="true" /><field name="description" type="text" indexed="true" stored="true" /><field name="name" type="text" indexed="true" stored="true" /><field name="properties" type="text" indexed="true" stored="true" /><field name="title" type="text" indexed="true" stored="true" /><field name="uid" type="string" indexed="true" stored="true" /><field name="url" type="text" indexed="true" stored="true" /><field name="userName" type="text" indexed="true" stored="true" /><field name="version" type="text" indexed="true" stored="true" /><dynamicField name="*" type="string" indexed="true" stored="true" /></fields><uniqueKey>uid</uniqueKey><defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...
  • 17. Liferay + Solr: Configuration 3Copy WAR file ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0.Start Liferay/tomcat ● Solr will be picked up and "solr" will be deployed automatically under ${tomcat}/webapps folderInstall solr-web Liferay plugin ● Latest Liferay plugin can be checked out from the following locationhttp://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web ● Build the checked out plugin and deploy it
  • 18. Liferay + Solr: Configuration 4Final Step ● We need to rebuild Liferay search indexes ● Control Panel > Server Administration
  • 19. Liferay + Solr: How it works solr-spring.xml (from solr-web plugin) ... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"><property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"><property name="commit" value="true" /><property name="solrServer" ref="solrServer" /> </bean> ...
  • 20. Liferay + Solr: Back to the default?● Simply undeploy solr-web plugin● Rebuild search indexes using the control panel described in the previous step