• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Make Plone Search Act Like Google Using Solr
 

Make Plone Search Act Like Google Using Solr

on

  • 2,649 views

Solr is a powerful open source search engine server which has become a popular choice for extending the search capabilities of Plone sites. The default configuration works well, but how do you answer ...

Solr is a powerful open source search engine server which has become a popular choice for extending the search capabilities of Plone sites. The default configuration works well, but how do you answer the client's request to "Make my search just like Google's"?

In this talk we will take a look at the various options that are available for configuring Solr's schema and configuration. We will discuss how to set up stop words, spell checking, n-grams and alternate query handlers. We will see what effect these settings will have on the search results and find out how to debug problems when they arise.

Statistics

Views

Total Views
2,649
Views on SlideShare
1,359
Embed Views
1,290

Actions

Likes
4
Downloads
11
Comments
0

6 Embeds 1,290

http://blog.redturtle.it 1266
http://localhost 7
http://castell.hosting.redturtle.it 7
http://translate.googleusercontent.com 5
http://coderwall.com 4
http://webcache.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Make Plone Search Act Like Google Using Solr Make Plone Search Act Like Google Using Solr Presentation Transcript

    • Make Plone Search ActLike Google Using Solr Clayton Parker | Senior Web Developer PLONE CONFERENCE 2011
    • Who Am I
    • What will we learn? PLONE CONFERENCE 2011
    • What will we learn? PLONE CONFERENCE 2011• Intro to Solr
    • What will we learn? PLONE CONFERENCE 2011• Intro to Solr• Brief overview of Plone integration points
    • What will we learn? PLONE CONFERENCE 2011• Intro to Solr• Brief overview of Plone integration points• Solr configuration
    • What will we learn? PLONE CONFERENCE 2011• Intro to Solr• Brief overview of Plone integration points• Solr configuration• Solr schema setup
    • What will we learn? PLONE CONFERENCE 2011• Intro to Solr• Brief overview of Plone integration points• Solr configuration• Solr schema setup• Debugging tips and tricks
    • PLONE CONFERENCE 2011What is Solr ?
    • Version Madness PLONE CONFERENCE 2011 1.x (up to 1.4) 1.5 (number abandoned) 3.x (merge of Lucene and Solr)
    • Books PLONE CONFERENCE 2011
    • PLONE CONFERENCE 2011Integration
    • PLONE CONFERENCE 2011alm.solrindex
    • PLONE CONFERENCE 2011collective.solr
    • Solr Configuration
    • Query Handlers PLONE CONFERENCE 2011• Standard• Disjunction Max (DisMax)• Extended DisMax (experimental)
    • DisMax PLONE CONFERENCE 2011• Multiple index searches• Boosting• Friendlier to end users
    • DisMax PLONE CONFERENCE 2011 Index Name qf=SearchableText^1.0 substring^0.2 Weight
    • MinShouldMatch PLONE CONFERENCE 2011 mm=100% All terms required mm=50% Half of the terms required mm=-2 All but two terms required
    • MinShouldMatch PLONE CONFERENCE 2011 mm=2<-25% 9<-3 2 or less 3-9 terms all more than 9 terms are but 25% terms all but required required three are required
    • Spelling Component PLONE CONFERENCE 2011<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">default</str> <str name="classname">solr.IndexBasedSpellChecker</str> <str name="buildOnCommit">true</str> <str name="spellcheckIndexDir">path/to/spellcheck</str> <!-- The field that will contain the dynamic spelling data --> <str name="field">spell</str> <str name="accuracy">0.5</str> </lst> <!-- Control indexing and query of spelling data --> <str name="queryAnalyzerFieldType">spell-text</str></searchComponent>
    • Spelling Schema PLONE CONFERENCE 2011<fieldType name="spell-text" class="solr.TextField"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> </analyzer></fieldType>
    • Solr Schema
    • Index vs Query PLONE CONFERENCE 2011 http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
    • PLONE CONFERENCE 2011
    • PLONE CONFERENCE 2011Character Filters
    • PLONE CONFERENCE 2011Character Filters Tokenizer
    • PLONE CONFERENCE 2011Character Filters Tokenizer Filters
    • PLONE CONFERENCE 2011Character Filters Tokenizer Filters
    • Complete Field PLONE CONFERENCE 2011 <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> </analyzer> <analyzer type="query"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class="solr.PositionFilterFactory"/> </analyzer> </fieldType>
    • Copy Field PLONE CONFERENCE 2011 <copyField source="SearchableText" dest="spell"/> <copyField source="SearchableText" dest="substring"/>
    • PLONE CONFERENCE 2011Character Filters• Process text before tokenizing• Remove irrelevant characters
    • Pattern Replace PLONE CONFERENCE 2011 <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-z0-9_-]" replacement="" replace="all"/>That WAS a narrow escape! said Alice, a good deal frightened That WAS a narrow escape said Alice a good deal frightened
    • Mapping PLONE CONFERENCE 2011 <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> # œ => oe "u0153" => "oe" # ß => ss "u00DF" => "ss"
    • HTML Strip PLONE CONFERENCE 2011 <charFilter class="solr.HTMLStripCharFilterFactory"/>
    • PLONE CONFERENCE 2011Tokenizers• Split raw text into tokens / terms• Typically the first step
    • Whitespace Tokenizer PLONE CONFERENCE 2011 <tokenizer class="solr.WhitespaceTokenizerFactory"/> That WAS a narrow escape! said Alice That WAS a narrow escape! said Alice
    • ICU Tokenizer PLONE CONFERENCE 2011 <tokenizer class="solr.ICUTokenizerFactory"/> That WAS a narrow escape! said Alice That WAS a narrow escape said Alice
    • Pattern Tokenizer PLONE CONFERENCE 2011<tokenizer class="solr.PatternTokenizerFactory" pattern=";s*" /> one; two; three one two three
    • Path Hierarchy PLONE CONFERENCE 2011 <tokenizer class="solr.PathHierarchyTokenizerFactory"/> /usr/local/etc/nginx /usr /usr/local /usr/local/etc /usr/local/etc/nginx
    • PLONE CONFERENCE 2011Token Filters• Process after tokenizing• Normalization of terms
    • Lower Case PLONE CONFERENCE 2011 <filter class="solr.LowerCaseFilterFactory"/> Foo bAr BAZ foo bar baz
    • ASCII Folding PLONE CONFERENCE 2011 <filter class="solr.ASCIIFoldingFilterFactory"/> idée bête grüßen idee bete grussen
    • ICU Folding PLONE CONFERENCE 2011 <filter class="solr.ICUFoldingFilterFactory"/> Idée BÊTE GrüßeN idee bete grussen
    • Pattern Replace PLONE CONFERENCE 2011<filter class="solr.PatternReplaceFilterFactory" pattern="[^a-zA-z0-9_-]" replacement="" replace="all"/> That That WAS WAS a a narrow narrow escape! escape said said Alice Alice
    • Word Delimiter PLONE CONFERENCE 2011<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> StudlyCaps1234-5678 Caps StudlyCaps1234-5678 Studly 1234 5678
    • Edge N Gram PLONE CONFERENCE 2011 <filter class="solr.EdgeNGramFilterFactory" minGramSize="4" maxGramSize="100" side="front"/> Conqueror Conqueror Conquero Conquer Conque Conqu Conq
    • Stop Words PLONE CONFERENCE 2011 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> That narrow WAS escape a said narrow Alice escape good said deal Alice frightened a good deal frightened
    • Synonyms PLONE CONFERENCE 2011<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> # synonyms.txt foozball foosball foosball # add multiple terms baby-foot foozball, foosball, baby-foot tele television # merge into one t.v. television tv, t.v., tele => television tv television
    • Language Stemming PLONE CONFERENCE 2011 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> dry dri drying dri dried dri
    • Language Stemming PLONE CONFERENCE 2011<filter class="solr.ElisionFilterFactory" articles="stopwordarticles.txt"/> quil il ne ne comprend comprend pas pas langlais anglais<filter class="solr.EnglishPorterFilterFactory" language="French"/> considere consider consideres consider considerent consider
    • Solr Debugging
    • Schema Browser PLONE CONFERENCE 2011
    • Analysis PLONE CONFERENCE 2011
    • Analysis PLONE CONFERENCE 2011
    • Analysis PLONE CONFERENCE 2011
    • Analysis PLONE CONFERENCE 2011
    • Analysis PLONE CONFERENCE 2011
    • Search Interface PLONE CONFERENCE 2011
    • Crafting a URL PLONE CONFERENCE 2011 http://localhost:8983/solr/select? qf=SearchableText^1.0&rows=10&fl=*,score&debugQuery=on& explainOther=True&indent=true&defType=dismax&q=test q=test qf=SearchableText^1.0 defType=dismax debugQuery=on explainOther=on indent=on
    • Verbose XML* PLONE CONFERENCE 2011 <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> <lst name="params"> <str name="explainOther">True</str> <str name="fl">*,score</str> <str name="debugQuery">on</str> <str name="indent">true</str> <str name="q">test</str> <str name="qf">SearchableText^1.0</str> <str name="rows">10</str> <str name="defType">dismax</str> </lst> </lst>* like there is any other kind
    • Verbose XML* PLONE CONFERENCE 2011 <result name="response" numFound="2" start="0" maxScore="0.70710677"> <doc> <float name="score">0.70710677</float> <int name="docid">-643919099</int> </doc> <doc> <float name="score">0.3788861</float> <int name="docid">-643919097</int> </doc> </result>* like there is any other kind
    • Verbose XML*<lst name="debug"> PLONE CONFERENCE 2011 <str name="rawquerystring">test</str> <str name="querystring">test</str> <str name="parsedquery">+DisjunctionMaxQuery((SearchableText:test)) ()</str> <str name="parsedquery_toString">+(SearchableText:test) ()</str> <lst name="explain"> <str name="-643919099">0.70710677 = (MATCH) sum of: 0.70710677 = (MATCH) fieldWeight(SearchableText:test in 4), product of: 1.4142135 = tf(termFreq(SearchableText:test)=2) 1.0 = idf(docFreq=5, maxDocs=6) 0.5 = fieldNorm(field=SearchableText, doc=4) </str> <str name="-643919097">0.3788861 = (MATCH) sum of: 0.3788861 = (MATCH) fieldWeight(SearchableText:test in 0), product of: 1.7320508 = tf(termFreq(SearchableText:test)=3) 1.0 = idf(docFreq=5, maxDocs=6) 0.21875 = fieldNorm(field=SearchableText, doc=0) </str></lst>* like there is any other kind
    • Links PLONE CONFERENCE 2011• Solr (http://lucene.apache.org/solr)• Solr Wiki (http://wiki.apache.org/solr)• Books (http://www.packtpub.com/books/all?keys=solr)• SolrIndex (http://pypi.python.org/pypi/alm.solrindex/)• collective.solr (http://pypi.python.org/pypi/collective.solr)
    • Flickr Credits PLONE CONFERENCE 2011• http://www.flickr.com/photos/naturegeak/5642083189/ (who)• http://www.flickr.com/photos/eklektikos/2541408630/ (schema)• http://www.flickr.com/photos/sidelong/13954593/ (char filter)• http://www.flickr.com/photos/benimoto/2214240119/ (tokenizers)• http://www.flickr.com/photos/chaunceydavis/3264077445/ (filters)• http://www.flickr.com/photos/comedynose/3271760209/ (configuration)• http://www.flickr.com/photos/nicksart/4821509371/ (debugging) Thanks to
    • Questions? Check out .co m/d emos s ixfeetup