Apache solr liferay
Upcoming SlideShare
Loading in...5
×
 

Apache solr liferay

on

  • 895 views

A 2009 presentation which I just found in archives

A 2009 presentation which I just found in archives

Statistics

Views

Total Views
895
Views on SlideShare
895
Embed Views
0

Actions

Likes
0
Downloads
37
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Apache solr liferay Apache solr liferay Presentation Transcript

    • Apache Solr Enterprise search platform from the Apache Lucene projectRivet Logic Corporation1800 Alexander Bell DriveSuite 400Reston, VA 20191Ph: 703.955.3480 Fax: 703.234.7711
    • What is Solr? ● Search Server ● Built upon Apache Lucene ● Fast, very ● Scalable, query load and collection size ● Interoperable ● Extensible ● Lucene power exposed over HTTP ● Spell checking, highlighting, faceting and etc. ● Caching ● Replication ● Distributed search
    • How stuff works?
    • schema.xml● Field types ○ <fieldType name="text" class="solr.TextField" indexed="true" />● Fields ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>● Unique key (optional) ○ <uniqueKey>id</uniqueKey>● copy fields ○ <copyField source="developers" dest="df"/>● dynamic fields ○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>● similarity configuration ○ Similarity is the scoring routine for each document vs. a query
    • solrconfig.xml● Lucene indexing parameters ○ <mergeFactor>10</mergeFactor> ○ <ramBufferSizeMB>32</ramBufferSizeMB>● Cache settings ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount=" 32"/>● Request handler configuration ○ <requestHandler name="dismax" class="solr.SearchHandler" >● HTTP cache settings ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">● Search components, response writers, query parsers ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ○ <queryResponseWriter name="velocity" class="org.apache.solr.request. VelocityResponseWriter"/> ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
    • Request Handler<requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str> <str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>
    • Response Writer● A Response Writer generates the formatted response of a search.● The wt parameter selects the Response Writer to be used● json, php, phps, python, ruby, xml, xslt, velocity <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>
    • Analyzers, Tokenizers, Filters● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> </fieldType>● The job of a tokenizer is to break up a stream of text into tokens● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
    • Other features● Highlighting ○ &hl=true&hl.fl=developers● Synonyms ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>● Spell check ○ The spell check component can return a list of alternative spelling suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">● Content Streams ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml● Solr Cell ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types● More like this ○ http://wiki.apache.org/solr/MoreLikeThis
    • Indexing with solrJSolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr"));SolrInputDocument doc = new SolrInputDocument();doc.addField("id", "EXAMPLEDOC01");doc.addField("title", "NOVAJUG SolrJ Example");solr.add(doc);solr.commit(); // after a batch, not per documentsolr.optimize(); // periodically, if/when needed
    • Data Import Handler● Indexes relational database, XML data, and e-mail sources● Supports full and incremental/delta indexing● Highly extensible with custom data sources, transformers, etc● http://wiki.apache.org/solr/DataImportHandler
    • Replication● Master is polled● Replicant pulls Lucene index and optionally also Solr configuration files● Query throughput scaling: replicate and load balance● http://wiki.apache.org/solr/SolrReplication
    • Demo● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/● Start solr ○ cd <solr_home>/example ○ java -jar start.jar● Post documents ○ cd <solr_home>/example/exampledocs ○ java -jar post.jar *.xml ○ java -jar post.jar cw.xml● Access Solr ○ http://localhost:8983/solr/admin/● Querying solr ○ http://localhost:8983/solr/select/?q=binesh ○ http://localhost:8983/solr/select/?q=binny ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1 ○ http://localhost:8983/solr/itas/● Luke ○ http://www.getopt.org/luke/
    • Liferay + Solr: Motivation● Centralizing search index in clustered Liferay environment● Performance improvement ○ Re-indexing costs too much for large DBs ○ Often time indexes of Liferay deployments in a cluster are not synchronized
    • Liferay + Solr: Configuration 1Install Solr (http://lucene.apache.org/solr)Setting up environment variables ● SOLR_HOME = /${solr installed folder} ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"solr.xml ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content <?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>
    • Liferay + Solr: Configuration 2schema.xml ● This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have to create the conf directory) in your Solr home folder.... <fields><field name="comments" type="text" indexed="true" stored="true" /><field name="content" type="text" indexed="true" stored="true" /><field name="description" type="text" indexed="true" stored="true" /><field name="name" type="text" indexed="true" stored="true" /><field name="properties" type="text" indexed="true" stored="true" /><field name="title" type="text" indexed="true" stored="true" /><field name="uid" type="string" indexed="true" stored="true" /><field name="url" type="text" indexed="true" stored="true" /><field name="userName" type="text" indexed="true" stored="true" /><field name="version" type="text" indexed="true" stored="true" /><dynamicField name="*" type="string" indexed="true" stored="true" /></fields><uniqueKey>uid</uniqueKey><defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...
    • Liferay + Solr: Configuration 3Copy WAR file ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0.Start Liferay/tomcat ● Solr will be picked up and "solr" will be deployed automatically under ${tomcat}/webapps folderInstall solr-web Liferay plugin ● Latest Liferay plugin can be checked out from the following locationhttp://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web ● Build the checked out plugin and deploy it
    • Liferay + Solr: Configuration 4Final Step ● We need to rebuild Liferay search indexes ● Control Panel > Server Administration
    • Liferay + Solr: How it works solr-spring.xml (from solr-web plugin) ... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"><property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"><property name="commit" value="true" /><property name="solrServer" ref="solrServer" /> </bean> ...
    • Liferay + Solr: Back to the default?● Simply undeploy solr-web plugin● Rebuild search indexes using the control panel described in the previous step