Your SlideShare is downloading. ×
0
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Apache solr liferay
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache solr liferay

1,047

Published on

A 2009 presentation which I just found in archives

A 2009 presentation which I just found in archives

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,047
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Apache Solr Enterprise search platform from the Apache Lucene projectRivet Logic Corporation1800 Alexander Bell DriveSuite 400Reston, VA 20191Ph: 703.955.3480 Fax: 703.234.7711
  2. What is Solr? ● Search Server ● Built upon Apache Lucene ● Fast, very ● Scalable, query load and collection size ● Interoperable ● Extensible ● Lucene power exposed over HTTP ● Spell checking, highlighting, faceting and etc. ● Caching ● Replication ● Distributed search
  3. How stuff works?
  4. schema.xml● Field types ○ <fieldType name="text" class="solr.TextField" indexed="true" />● Fields ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>● Unique key (optional) ○ <uniqueKey>id</uniqueKey>● copy fields ○ <copyField source="developers" dest="df"/>● dynamic fields ○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>● similarity configuration ○ Similarity is the scoring routine for each document vs. a query
  5. solrconfig.xml● Lucene indexing parameters ○ <mergeFactor>10</mergeFactor> ○ <ramBufferSizeMB>32</ramBufferSizeMB>● Cache settings ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount=" 32"/>● Request handler configuration ○ <requestHandler name="dismax" class="solr.SearchHandler" >● HTTP cache settings ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">● Search components, response writers, query parsers ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ○ <queryResponseWriter name="velocity" class="org.apache.solr.request. VelocityResponseWriter"/> ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
  6. Request Handler<requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str> <str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>
  7. Response Writer● A Response Writer generates the formatted response of a search.● The wt parameter selects the Response Writer to be used● json, php, phps, python, ruby, xml, xslt, velocity <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>
  8. Analyzers, Tokenizers, Filters● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> </fieldType>● The job of a tokenizer is to break up a stream of text into tokens● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
  9. Other features● Highlighting ○ &hl=true&hl.fl=developers● Synonyms ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>● Spell check ○ The spell check component can return a list of alternative spelling suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">● Content Streams ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml● Solr Cell ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types● More like this ○ http://wiki.apache.org/solr/MoreLikeThis
  10. Indexing with solrJSolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr"));SolrInputDocument doc = new SolrInputDocument();doc.addField("id", "EXAMPLEDOC01");doc.addField("title", "NOVAJUG SolrJ Example");solr.add(doc);solr.commit(); // after a batch, not per documentsolr.optimize(); // periodically, if/when needed
  11. Data Import Handler● Indexes relational database, XML data, and e-mail sources● Supports full and incremental/delta indexing● Highly extensible with custom data sources, transformers, etc● http://wiki.apache.org/solr/DataImportHandler
  12. Replication● Master is polled● Replicant pulls Lucene index and optionally also Solr configuration files● Query throughput scaling: replicate and load balance● http://wiki.apache.org/solr/SolrReplication
  13. Demo● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/● Start solr ○ cd <solr_home>/example ○ java -jar start.jar● Post documents ○ cd <solr_home>/example/exampledocs ○ java -jar post.jar *.xml ○ java -jar post.jar cw.xml● Access Solr ○ http://localhost:8983/solr/admin/● Querying solr ○ http://localhost:8983/solr/select/?q=binesh ○ http://localhost:8983/solr/select/?q=binny ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1 ○ http://localhost:8983/solr/itas/● Luke ○ http://www.getopt.org/luke/
  14. Liferay + Solr: Motivation● Centralizing search index in clustered Liferay environment● Performance improvement ○ Re-indexing costs too much for large DBs ○ Often time indexes of Liferay deployments in a cluster are not synchronized
  15. Liferay + Solr: Configuration 1Install Solr (http://lucene.apache.org/solr)Setting up environment variables ● SOLR_HOME = /${solr installed folder} ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"solr.xml ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content <?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>
  16. Liferay + Solr: Configuration 2schema.xml ● This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have to create the conf directory) in your Solr home folder.... <fields><field name="comments" type="text" indexed="true" stored="true" /><field name="content" type="text" indexed="true" stored="true" /><field name="description" type="text" indexed="true" stored="true" /><field name="name" type="text" indexed="true" stored="true" /><field name="properties" type="text" indexed="true" stored="true" /><field name="title" type="text" indexed="true" stored="true" /><field name="uid" type="string" indexed="true" stored="true" /><field name="url" type="text" indexed="true" stored="true" /><field name="userName" type="text" indexed="true" stored="true" /><field name="version" type="text" indexed="true" stored="true" /><dynamicField name="*" type="string" indexed="true" stored="true" /></fields><uniqueKey>uid</uniqueKey><defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...
  17. Liferay + Solr: Configuration 3Copy WAR file ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0.Start Liferay/tomcat ● Solr will be picked up and "solr" will be deployed automatically under ${tomcat}/webapps folderInstall solr-web Liferay plugin ● Latest Liferay plugin can be checked out from the following locationhttp://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web ● Build the checked out plugin and deploy it
  18. Liferay + Solr: Configuration 4Final Step ● We need to rebuild Liferay search indexes ● Control Panel > Server Administration
  19. Liferay + Solr: How it works solr-spring.xml (from solr-web plugin) ... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"><property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"><property name="commit" value="true" /><property name="solrServer" ref="solrServer" /> </bean> ...
  20. Liferay + Solr: Back to the default?● Simply undeploy solr-web plugin● Rebuild search indexes using the control panel described in the previous step

×