Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find Recipes & Insights
September 4
Bol, Croatia
Paul Borgermans
© 2013 Paul Borgermans, K-Minds Comm.V.
About me
l  12+ years in the eZ ecosystem
-  eZ Lucene → eZ Solr → eZ Find
l  Fa...
© 2013 Paul Borgermans, K-Minds Comm.V.
Part 1: eZ Find Kitchen Basics
•  Get to know the ingredients & tools
•  Installat...
© 2013 Paul Borgermans, K-Minds Comm.V.
Get to know the ingredients &
tools
Powered by
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find main search ingredients
l  Tunable relevancy ranking
l  Keyword highligh...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find with two main additional roles
l  eZ Find to replace your (complex) ‘fetc...
© 2013 Paul Borgermans, K-Minds Comm.V.
Your tools
Credit: http://commons.wikimedia.org/wiki/File:Werkzeugwand.jpg
© 2013 Paul Borgermans, K-Minds Comm.V.
Core template level tools
l  Dedicated template fetch functions
l  Leveraging So...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tuning tools for relevancy
•  Index time
•  Configuration (ezfind.ini)
•  Custom i...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tools for extending eZ Find
•  Custom data type plugins
•  Tailor indexing and sea...
© 2013 Paul Borgermans, K-Minds Comm.V.
The Solr administration interface
l  http://localhost:8983/solr/<core>/admin
l  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
© 2013 Paul Borgermans, K-Minds Comm.V.
Installation and configuration recipes
© 2013 Paul Borgermans, K-Minds Comm.V.
Installation and configuration recipes
l  Requirements
l  Installing the extensi...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr backend-requirements
l  Java VM
-  JRE 6 or 7 (OpenJDK, Oracle/Sun)
l  Serv...
© 2013 Paul Borgermans, K-Minds Comm.V.
Extension installation and activation
l  eZ Find extension activated the usual wa...
© 2013 Paul Borgermans, K-Minds Comm.V.
Putting the backend somewhere
•  Inside eZ Find extension
•  Single installation
•...
© 2013 Paul Borgermans, K-Minds Comm.V.
Multiple ways and operating modes
for starting the Solr backend
•  Single core
•  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Multi-core setup advantages
•  Every language / tenant has its own
•  Index
•  Tun...
© 2013 Paul Borgermans, K-Minds Comm.V.
How to configure multicore setups ...
•  Create a new Solr home directory under
th...
© 2013 Paul Borgermans, K-Minds Comm.V.
Configuration of multiple cores ...
l  solr.xml as the master entry
point
l  lib...
© 2013 Paul Borgermans, K-Minds Comm.V.
Multicore master config file: solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<so...
© 2013 Paul Borgermans, K-Minds Comm.V.
Performance configuration options
l  Enable delayed indexation of objects (site.i...
© 2013 Paul Borgermans, K-Minds Comm.V.
Performance configuration options (...)
l  Disable optimize on commit
-  Configur...
© 2013 Paul Borgermans, K-Minds Comm.V.
Performance configuration options (...)
l  Enable commitWithin (ezfind.ini)
-  Us...
© 2013 Paul Borgermans, K-Minds Comm.V.
Search handler configuration
l  Defaults to “ezpublish” now (Apache Solr 3.6.1), ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Devops side-dish for large scale
installations (Linux)
•  Goal: avoid crashes, slo...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic indexing and re-indexing
l  Initial indexing: use dedicated eZ Find provide...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic indexing and re-indexing (…)
l  Full re-indexing with important changes
-  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Tika: indexing binary files
l  http://projects.ez.no/eztika
l  Based on Apach...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic searching, filtering and facets
recipes
© 2013 Paul Borgermans, K-Minds Comm.V.
Terminology 101
•  Searching
•  What you expect J
•  Includes relevancy calculati...
© 2013 Paul Borgermans, K-Minds Comm.V.
Terminology 101
•  Facets
•  Provides counts on potential filters to use
•  Tool t...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr/Lucene search syntax 101
l  Query using “eZ Publish/eDismax” handler
l  One...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr/Lucene search syntax 101
l  Terms and phrases
l  Term: cocktail
l  Phrase:...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr/Lucene search syntax 101(..)
l  Ranges
-  Inclusive/exclusive
-  One part ma...
© 2013 Paul Borgermans, K-Minds Comm.V.
Date handling
l  No real limits like unix timestamps
l  Date values in ISO 6801 ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Searching in templates
l  You can use the standard content/search
templates and p...
© 2013 Paul Borgermans, K-Minds Comm.V.
Dedicated search fetch parameters
•  Basic query parameters
•  query: query string...
© 2013 Paul Borgermans, K-Minds Comm.V.
Dedicated search fetch parameters
•  Advanced query parameters
•  spellcheck: arra...
© 2013 Paul Borgermans, K-Minds Comm.V.
Filtering
l  AND logic connects array elements using
Standard Lucene syntax.
l  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find and field names
l  The normal case for filtering: 3 ways
-  array('articl...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find raw field names
l  Main principle: <source>_<identifier>_<type>
-  <sourc...
© 2013 Paul Borgermans, K-Minds Comm.V.
Filter recipes
•  A specific age: last 2 weeks
.. filter, array( 

’meta_published...
© 2013 Paul Borgermans, K-Minds Comm.V.
Filter recipes (…)
•  ‘or’ conditions within fields
.. filter, array( 

'attr_tags...
© 2013 Paul Borgermans, K-Minds Comm.V.
Facets
•  Facet types
•  field: enumeration
•  function: Solr functions
•  prefix:...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic facet types
l  Field facets
l  Enumerate over contents
l  Can give large ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Range facets
l  For numerical and date ranges
l  Emits a multiple counts, depend...
© 2013 Paul Borgermans, K-Minds Comm.V.
Range facets: parameters
l  Mandatory
-  'field' (can also be custom Solr fields)...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipes with facets and filters
•  Analytics on publishing activities in the previ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipes with facets and filters (..)
•  Analytics on publishing activities in the ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Part 2: Advanced recipes & insights
•  Tuning search result relevancy
•  Create yo...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tuning search result relevancy
l  Index time boosting
-  “Permanent boosting”
-  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time boosting
l  Available for:
-  Classes
-  Attributes
-  Datatypes
l  B...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time boosting: ezfind.ini
example
[IndexBoost]
#ClassBoost: set boost factor...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting
l  Boosting types and corresponding sub-
parameters
-  'field...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: 'fields'
l  Example
.. 'boost_functions', hash('fields',arra...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: 'mfunctions'
l  Multiplicative
l  No need to know raw relev...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipe: promote more recent content
•  Parameter snippet
... 'boost_functions',
ha...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipe: promote more recent content (…)
Implementing
1+(a/m*x+b)
with
a = 2
b = 0....
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: 'queries'
l  These are added to the main query and need to
f...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: ’ functions'
l  These are like mfunctions, but add their val...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr has many functions to use
l  Strings
l  Numbers and mapping
l  Date math
l...
© 2013 Paul Borgermans, K-Minds Comm.V.
Absolute boosting: elevation
l  If a query term matches, one or more objects
are ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Custom datatype handlers
l  Usually for “complex” datatypes
-  Subfields (!)
l  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Create your own datatype handler
l  Derive from a base class:
-  ezfSolrDocumentF...
© 2013 Paul Borgermans, K-Minds Comm.V.
Overview of eZ Find / Solr lower level API
© 2013 Paul Borgermans, K-Minds Comm.V.
Base classes to know
l  extension/ezfind/classes
-  ezsolrbase.php
handles commun...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index Time Plugin Mechanism
l  Write your own functions to:
-  Expand the Solr fi...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time plugins (...)
l  Implement the following interface
l  docList is the ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time plugins (...)
l  Activate your plugin in ezfind.ini
-  Global
-  Per c...
© 2013 Paul Borgermans, K-Minds Comm.V.
Customizing autocomplete
•  Tweaking schema.xml
•  Goal: decrease "noise"
•  Use c...
© 2013 Paul Borgermans, K-Minds Comm.V.
Customizing autocomplete:
schema.xml
<fields>
..
<field name="my_autocomplete_fiel...
© 2013 Paul Borgermans, K-Minds Comm.V.
Customizing autocomplete:
ezfind.ini
[AutoCompleteSettings]
AutoComplete=enabled
#...
© 2013 Paul Borgermans, K-Minds Comm.V.
Suggested exercises
© 2013 Paul Borgermans, K-Minds Comm.V.
Warm up exercise
l  Make sure you are on the latest code base
l  Play with the L...
© 2013 Paul Borgermans, K-Minds Comm.V.
Exercise: boosting
l  Use the new 'mfunctions' parameter to boost
more recent val...
© 2013 Paul Borgermans, K-Minds Comm.V.
Exercise: Facets & attribute filtering
l  Adapt the previous examples/recipes
l ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Exercise: sub-attribute filtering on
a related object
l  Create an override templ...
© 2013 Paul Borgermans, K-Minds Comm.V.
A last plug: You are invited to our
5th anniversary!
conference.phpbenelux.eu/2014/
© 2013 Paul Borgermans, K-Minds Comm.V.
Appendix A
Replication and loadbalancing
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication / Distribution
l  Solr 3.x (current stable eZ Find)
-  Master/slave m...
© 2013 Paul Borgermans, K-Minds Comm.V.
Master/Slave replication
l  solrconfig.xml
-  Activate handlers
-  Allow paramete...
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication: example config
<requestHandler name="/replication" class="solr.Replic...
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication: starting master and slave
Slave!
!
java -Denable.slave=true -Dmaster....
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication and load balancing
•  Reverse proxy and rewrite rules
•  Point eZ Find...
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication and load balancing (…)
Listen 8988!
<VirtualHost *:8988>!
# Need: mod_...
© 2013 Paul Borgermans, K-Minds Comm.V.
Appendix B
Inside Solr analysis
© 2013 Paul Borgermans, K-Minds Comm.V.
A deeper dive into
Apache Solr
l  From index → document → field
l  Schema.xml
l...
© 2013 Paul Borgermans, K-Minds Comm.V.
The Solr/Lucene index
l  Inverted index
l  Holds a collection of “documents” (he...
© 2013 Paul Borgermans, K-Minds Comm.V.
Field types and fields
l  Various field types, derived from base classes
l  Inde...
© 2013 Paul Borgermans, K-Minds Comm.V.
Field definitions: schema.xml
l  Field types
-  text
-  numerical
-  dates
-  loc...
© 2013 Paul Borgermans, K-Minds Comm.V.
schema.xml: simple field type examples
<fieldType name="string" class="solr.StrFie...
© 2013 Paul Borgermans, K-Minds Comm.V.
schema.xml: more complex field type
<!-- A general unstemmed text field - good if ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Analysis
l  Solr does not really search your text, but rather
the terms that resu...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr comes with many tokenizers and
filters
l  Some are language specific
l  Oth...
© 2013 Paul Borgermans, K-Minds Comm.V.
Text analysis examples
Input phrase:
Ivo Lukač presents a geek-interview on the eZ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Character filters
l  Used to cleanup text before tokenizing
-  HTMLStripCharFilte...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tokenizers
l  Convert text to tokens (terms)
l  You can define only one per fiel...
© 2013 Paul Borgermans, K-Minds Comm.V.
Additional filters
l  Many possible per field/analyzer
l  Many delivered with So...
© 2013 Paul Borgermans, K-Minds Comm.V.
Phonetic filters
l  PhoneticFilterFactory
l  “sounds like” transformations and m...
© 2013 Paul Borgermans, K-Minds Comm.V.
Reversing Filter
l  Reverses the order of characters
l  Use: allow “leading wild...
© 2013 Paul Borgermans, K-Minds Comm.V.
Synonyms
l  Inject synonyms for certain terms
l  Language specific
l  Best used...
© 2013 Paul Borgermans, K-Minds Comm.V.
Stemming
l  Reduce terms to their root form
-  Plural forms
-  Conjugations
l  L...
© 2013 Paul Borgermans, K-Minds Comm.V.
Copy fields
l  Analysis is done differently for
-  searching/filtering
-  facetin...
© 2013 Paul Borgermans, K-Minds Comm.V.
Geospatial fields
l  Solr dedicated fields
-  Latitude Longitude type (trunk)
l ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Dedicated fields for every context in
eZ Find if configured
l  Context
-  Search
...
© 2013 Paul Borgermans, K-Minds Comm.V.
Upcoming SlideShare
Loading in …5
×

eZ Find workshop: advanced insights & recipes

6,209 views

Published on

Various how-to's and recipes to get things done with eZ Find, advanced searches, facet navigation, clustering of search results, domain specific boosting, etc. This workshop is based on eZ version 4 stack but the knowledge provided reaches beyond eZ versions.

Published in: Technology
  • SECRET: Men usually out of emotion, not logic. Take advantage of this and get your Ex back today! See how at: ➤➤ http://t.cn/R50e5nn
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT, ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

eZ Find workshop: advanced insights & recipes

  1. 1. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find Recipes & Insights September 4 Bol, Croatia Paul Borgermans
  2. 2. © 2013 Paul Borgermans, K-Minds Comm.V. About me l  12+ years in the eZ ecosystem -  eZ Lucene → eZ Solr → eZ Find l  Fancying : -  Apache Lucene family of projects (mainly Solr) -  NoSQL (Not only SQL) and scalable architectures -  eZ Publish & CMS systems in general -  Semantic aspects -  PHPBenelux Community & Conference l  Contact paul.borgermans@gmail.com @paulborgermans
  3. 3. © 2013 Paul Borgermans, K-Minds Comm.V. Part 1: eZ Find Kitchen Basics •  Get to know the ingredients & tools •  Installation recipes •  Basic configuration options •  Basic indexing •  Basic searching, filtering and facets
  4. 4. © 2013 Paul Borgermans, K-Minds Comm.V. Get to know the ingredients & tools Powered by
  5. 5. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find main search ingredients l  Tunable relevancy ranking l  Keyword highlighting l  Filtering and Facets (drill down navigation) l  Automatic related content l  Language dependent optimizations l  Fast l  Adaptive to your domain data models l  Leverages Apache Solr/Lucene
  6. 6. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find with two main additional roles l  eZ Find to replace your (complex) ‘fetch content’ calls -  Speed up template rendering, especially with complex dynamic pages l  eZ Find/Solr as a content and integration engine -  Document oriented storage system (hello NoSQL) -  Archive use-case -  External content
  7. 7. © 2013 Paul Borgermans, K-Minds Comm.V. Your tools Credit: http://commons.wikimedia.org/wiki/File:Werkzeugwand.jpg
  8. 8. © 2013 Paul Borgermans, K-Minds Comm.V. Core template level tools l  Dedicated template fetch functions l  Leveraging Solr search (including spell check, highlighting, …) l  More Like This l  Raw access to Solr index (ex: integrating foreign sources) l  JS/AJAX l  Term suggestions
  9. 9. © 2013 Paul Borgermans, K-Minds Comm.V. Tuning tools for relevancy •  Index time •  Configuration (ezfind.ini) •  Custom index time plugin •  Search time •  Boost functions •  Elevation of objects •  Apche Solr schema.xml and solrconfig.xml magic
  10. 10. © 2013 Paul Borgermans, K-Minds Comm.V. Tools for extending eZ Find •  Custom data type plugins •  Tailor indexing and searching for your data-types •  General index time plugins •  Even more tailoring and exotic dishes •  Custom suggesters •  Add your own vocabularies
  11. 11. © 2013 Paul Borgermans, K-Minds Comm.V. The Solr administration interface l  http://localhost:8983/solr/<core>/admin l  Statistics and health monitor l  Search index l  Java VM (devops) l  Advanced use l  Learning l  Debugging (understanding search results) l  Tuning tool
  12. 12. © 2013 Paul Borgermans, K-Minds Comm.V.
  13. 13. © 2013 Paul Borgermans, K-Minds Comm.V. Installation and configuration recipes
  14. 14. © 2013 Paul Borgermans, K-Minds Comm.V. Installation and configuration recipes l  Requirements l  Installing the extension l  Basic installation/activation of Solr
  15. 15. © 2013 Paul Borgermans, K-Minds Comm.V. Solr backend-requirements l  Java VM -  JRE 6 or 7 (OpenJDK, Oracle/Sun) l  Servlet container -  Jetty shipped by default, Tomcat, .... -  Security to be configured (by default: open) -  See also http://wiki.apache.org/solr/SolrInstall l  For larger sites/indexes: enough RAM -  Yet leave enough for the OS/file caching
  16. 16. © 2013 Paul Borgermans, K-Minds Comm.V. Extension installation and activation l  eZ Find extension activated the usual way -  ActiveExtensions[]=ezfind -  (!) Regenerate autoloads if using direct editing of ini settings l  Execute the DB upgrade script -  Used for elevation -  See extension/ezfind/sql/<db>
  17. 17. © 2013 Paul Borgermans, K-Minds Comm.V. Putting the backend somewhere •  Inside eZ Find extension •  Single installation •  Quick testing •  Dedicated locations •  Production setups •  Multi-tenant setups •  Multiple instances (development) •  Separate the binaries and data/conf, example: /opt/solr for binaries /srv/solr for data/conf
  18. 18. © 2013 Paul Borgermans, K-Minds Comm.V. Multiple ways and operating modes for starting the Solr backend •  Single core •  Deprecated •  Multiple cores •  Multi-lingual •  Multi-tenant •  Multiple instances on your dev installation •  Setup instructions: see online docs or last years presentation
  19. 19. © 2013 Paul Borgermans, K-Minds Comm.V. Multi-core setup advantages •  Every language / tenant has its own •  Index •  Tunable analyzer options •  Spell checker dictionary •  Synonyms, stop word list •  Elevate configuration •  Additional bonuses: •  slight increase in performance •  core admin features
  20. 20. © 2013 Paul Borgermans, K-Minds Comm.V. How to configure multicore setups ... •  Create a new Solr home directory under the java subdir •  Put a config file solr.xml which specifies the cores •  Copy the conf and data directories •  Specify the solr home when starting the servlet container sudo java -jar -Dsolr.solr.home=solr.multicore -jar start.jar
  21. 21. © 2013 Paul Borgermans, K-Minds Comm.V. Configuration of multiple cores ... l  solr.xml as the master entry point l  lib for all shared jars (extensions) l  in each subdir, dedicated: -  index (“data”) -  Configuration files (“conf”) -  (option) “lib“ with core specific jars
  22. 22. © 2013 Paul Borgermans, K-Minds Comm.V. Multicore master config file: solr.xml <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> <core name="project1-eng-GB" instanceDir="pro1-eng" /> <core name="project1-ger-DE" instanceDir="pro1-ger" /> <core name="develop" instanceDir="inventory" /> </cores> </solr>
  23. 23. © 2013 Paul Borgermans, K-Minds Comm.V. Performance configuration options l  Enable delayed indexation of objects (site.ini) l  Editors will be happier (“faster publishing”) l  Can be done globally or per class (recommended for binary file indexing) l  Downside: objects will only be in search results after the configured cronjob has run
  24. 24. © 2013 Paul Borgermans, K-Minds Comm.V. Performance configuration options (...) l  Disable optimize on commit -  Configure cronjob to do it once per day/week -  Makes files compact -  If many delete operations happen, optimize accordingly
  25. 25. © 2013 Paul Borgermans, K-Minds Comm.V. Performance configuration options (...) l  Enable commitWithin (ezfind.ini) -  Use case: large sites, where commits can also take some time -  Specified in milliseconds -  No cronjobs needed l  Only in special cases: disable direct commits -  Indexing -  Delete operations
  26. 26. © 2013 Paul Borgermans, K-Minds Comm.V. Search handler configuration l  Defaults to “ezpublish” now (Apache Solr 3.6.1), based on “eDismax” l  Supports Lucene syntax (wildcards) l  Does partial language analysis in presence of wildcards l  If upgrading from older versions: check value in ezfind.ini [SearchHandler] DefaultSearchHandler=ezpublish
  27. 27. © 2013 Paul Borgermans, K-Minds Comm.V. Devops side-dish for large scale installations (Linux) •  Goal: avoid crashes, slowness •  Environment •  Many Solr index cores •  Many facet queries and filters used •  Heavy traffic •  Linux process limits (Solr startup) •  Memory limit setting ! !ulimit -v unlimited! •  File descriptors (open files) ulimit -n 30000!
  28. 28. © 2013 Paul Borgermans, K-Minds Comm.V. Basic indexing and re-indexing l  Initial indexing: use dedicated eZ Find provided script -  php extension/ezfind/bin/php/updatesearchindexsolr.php -s <admin siteaccess> --php-exec=php –conc=2 -  typical speed: 5-25 objects /sec l  Further indexing: automatically
  29. 29. © 2013 Paul Borgermans, K-Minds Comm.V. Basic indexing and re-indexing (…) l  Full re-indexing with important changes -  Schema changes in the backend Solr -  ezfind.ini changes related to field mapping -  Switching from single to multi-core setups -  Upgrades of eZ Find and/or Solr
  30. 30. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Tika: indexing binary files l  http://projects.ez.no/eztika l  Based on Apache Tika -  Text and meta-data extraction for a large variety of file types l  Extension provides -  Standalone binary (yet another Java .jar) -  Configuration settings -  A stub binary file handler -  A wrapper shell script
  31. 31. © 2013 Paul Borgermans, K-Minds Comm.V. Basic searching, filtering and facets recipes
  32. 32. © 2013 Paul Borgermans, K-Minds Comm.V. Terminology 101 •  Searching •  What you expect J •  Includes relevancy calculations •  Filtering •  Narrows down the set of documents to search for •  Does NOT influence relevancy calculations •  Full search syntax and more for you to use Index FilterSearch result
  33. 33. © 2013 Paul Borgermans, K-Minds Comm.V. Terminology 101 •  Facets •  Provides counts on potential filters to use •  Tool to create navigation interfaces
  34. 34. © 2013 Paul Borgermans, K-Minds Comm.V. Solr/Lucene search syntax 101 l  Query using “eZ Publish/eDismax” handler l  One or more keywords l  + or – prefix to denote required or excluded example: +cocktail -workshop l  Multiple terms: “minimum must match rules” Default: 1, 2 keywords: at least one must match 3-5 keywords, at least 2-4 must match 6-7 keywords, at least 4-5 must match above 7 keywords, 60% of them must match
  35. 35. © 2013 Paul Borgermans, K-Minds Comm.V. Solr/Lucene search syntax 101 l  Terms and phrases l  Term: cocktail l  Phrase: “Elaphusa hotel” l  Wildcards l  Using '*': pro* l  Using '?': ma?ch l  Allowing certain “edit distance”: fuzzy searches l  march~0.7 l  Proximity l  “john doe”~10
  36. 36. © 2013 Paul Borgermans, K-Minds Comm.V. Solr/Lucene search syntax 101(..) l  Ranges -  Inclusive/exclusive -  One part may be open ended using '*' l  Inclusive -  [1 TO 5] l  Exclusive -  {0 TO 6} l  Open ended -  [NOW/DAY-1YEARS TO *]
  37. 37. © 2013 Paul Borgermans, K-Minds Comm.V. Date handling l  No real limits like unix timestamps l  Date values in ISO 6801 format yyyy-mm-ddThh:mm:ssZ (in UTC) l  Macro like syntax -  “NOW” -  “NOW/DAY-1YEAR” -  “NOW+3DAYS” l  Templates: format datetime with 'solr’ operator
  38. 38. © 2013 Paul Borgermans, K-Minds Comm.V. Searching in templates l  You can use the standard content/search templates and parameters l  But much better: dedicated eZ Find fetch functions -  fetch( ezfind, search, hash( query, 'eZ Systems' ) ) -  fetch( ezfind, moreLikeThis, …) -  fetch( ezfind, rawSolrRequest, …)
  39. 39. © 2013 Paul Borgermans, K-Minds Comm.V. Dedicated search fetch parameters •  Basic query parameters •  query: query string •  offset: result offset •  limit: max number of results •  class_id: class id’s/identifiers (string or array) •  section_id: section identifier •  query_handler: string (default “ezpublish”) See doc.ez.no for the full list of parameters
  40. 40. © 2013 Paul Borgermans, K-Minds Comm.V. Dedicated search fetch parameters •  Advanced query parameters •  spellcheck: array(true/false, ‘default’) •  filter: mixed filter expression •  facet: mixed facet expression •  sort_by: array of hashes •  criterium: score (default), name, class_name, published, modified, …. •  order: “asc” or “desc” See doc.ez.no for the full list of parameters
  41. 41. © 2013 Paul Borgermans, K-Minds Comm.V. Filtering l  AND logic connects array elements using Standard Lucene syntax. l  Within element, ‘OR’ logick can be applied l  Attribute identifiers are mapped to Solr fields l  Example fetch( ezfind, search, hash( query, 'cocktails', filter, array( 'article/tags:Bol' ) ) )
  42. 42. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find and field names l  The normal case for filtering: 3 ways -  array('article/title:a*') //will generate 2 filters -  array('title:a*') //cross class attribute filtering -  array('attr_title_s:a*') //using raw field names
  43. 43. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find raw field names l  Main principle: <source>_<identifier>_<type> -  <source>: meta, attr, as -  <identifier>: eZ Publish native identifier -  <type>: Solr field type mapping (schema.xml) l  Extra -  timestamp: time when the object was indexed -  ezf_df_text: aggregator for all text -  ezf_sp_words: spellcheck source l  Subattributes: another separator with 3x '_' <source>_<identifier>___<sub_attr_id>_<type>
  44. 44. © 2013 Paul Borgermans, K-Minds Comm.V. Filter recipes •  A specific age: last 2 weeks .. filter, array( 
 ’meta_published_dt:[NOW/DAY-2WEEKS TO NOW/DAY+1DAYS]’ ) ..
 ! Solr filter query cache friendly: •  Lower bound: rounds on day and substracts 2 weeks •  Upper bound: rounds on current day + 1 day in order to get also the published items after 00:00 ‘today’
  45. 45. © 2013 Paul Borgermans, K-Minds Comm.V. Filter recipes (…) •  ‘or’ conditions within fields .. filter, array( 
 'attr_tags_lk:(ezfind ezsummercamp netgen)’ ) ..
 ! •  ‘or’ conditions across fields .. filter, array( 
 ’(attr_length_si:[2 TO 5]) (attr_color_s:red)’ ) ..
 !
  46. 46. © 2013 Paul Borgermans, K-Minds Comm.V. Facets •  Facet types •  field: enumeration •  function: Solr functions •  prefix: prefix/wildcard •  range, date •  Main facet parameters: •  sort: count or alphanumerical •  limit, offset! •  mincount! •  missing!
  47. 47. © 2013 Paul Borgermans, K-Minds Comm.V. Basic facet types l  Field facets l  Enumerate over contents l  Can give large results, use wisely l  Typical: keywords, object metadata l  Functions l  The sky is the limit l  Gives back 1 count result l  Prefix l  Shortcut for a simple function facet
  48. 48. © 2013 Paul Borgermans, K-Minds Comm.V. Range facets l  For numerical and date ranges l  Emits a multiple counts, depending on parameters provided l  Example: fetch( ezfind, search, hash( 'query', '$queryString, 'facet',array( hash( 'range', hash('field', 'published', 'start', 'NOW/YEAR-3YEARS', 'end', 'NOW/YEAR+1YEAR', 'gap', '+1YEAR' ) ) ) ) )
  49. 49. © 2013 Paul Borgermans, K-Minds Comm.V. Range facets: parameters l  Mandatory -  'field' (can also be custom Solr fields) -  'start' (numeric/date) -  'end' (numeric/date) l  Optional -  'hardend' -  'include' -  'other’
  50. 50. © 2013 Paul Borgermans, K-Minds Comm.V. Recipes with facets and filters •  Analytics on publishing activities in the previous month fetch( ezfind, search, hash( query, '', filter, array( 'meta_published_dt:[NOW/MONTH-1MONTHS TO NOW/MONTH]' ), facet, array( hash('field','meta_contentclass_id_si' ), hash('field','meta_owner_id_si') ) ) ) •  Results in counts on content types and authors
  51. 51. © 2013 Paul Borgermans, K-Minds Comm.V. Recipes with facets and filters (..) •  Analytics on publishing activities in the previous months for a certain content type, using range facets fetch( ezfind, search, hash( query, '', filter, array( 'meta_class_identifier_ms:article' ), facet, array( hash('range', hash( 'field', 'published', 'start', 'NOW/MONTH-12MONTHS', 'end', 'NOW/MONTH', 'gap', '+1MONTHS' )), ) ) )
  52. 52. © 2013 Paul Borgermans, K-Minds Comm.V. Part 2: Advanced recipes & insights •  Tuning search result relevancy •  Create your own data-type plugin •  eZ Find / Solr lower-level API •  General index time plugins Appendix •  Devops: replication and loadbalancing/failover •  A deeper dive into Solr analysis
  53. 53. © 2013 Paul Borgermans, K-Minds Comm.V. Tuning search result relevancy l  Index time boosting -  “Permanent boosting” -  Best used after some real-life measurements (logs, user feedback, dedicated tests) -  ezfind.ini l  Query time boosting -  For ezpublish/eDismax request handlers -  Fields (also meta-data) -  Function queries -  Multiplicative and additive boosting
  54. 54. © 2013 Paul Borgermans, K-Minds Comm.V. Index time boosting l  Available for: -  Classes -  Attributes -  Datatypes l  Boost factor ranges -  [0 … 1] suppression -  [1 … ] boosting l  ezfind.ini
  55. 55. © 2013 Paul Borgermans, K-Minds Comm.V. Index time boosting: ezfind.ini example [IndexBoost] #ClassBoost: set boost factors on document (object) level #format Class[<attribute identifier>]=<boost factor as int or float> Class[] Class[article]=4 Class[folder]=0.1 #AttributeBoost: set boost factors on attributes at field level #you can specify the class identifier as optional (!) element for greatest flexibility #If more than attributeidentifier is used, the last one has precedence Attribute[] Attribute[product/name]=8.0 Attribute[bio]=1.5 #AttributeBoost: set boost factors on attributes at field level based on their datatype Datatype[] Datatype[ezkeyword]=3.0 #ReverseRelatedScale: scale factor to use in $boost = $boost + <scalefactor> * <number of reverse relations> ReverseRelatedScale=0 ReverseRelatedScale=0.8
  56. 56. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting l  Boosting types and corresponding sub- parameters -  'field' -  'mfunctions' -  'queries' -  'functions' l  Properly supported only since eZ Publish 5, eZ Find master
  57. 57. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: 'fields' l  Example .. 'boost_functions', hash('fields',array ('article/tags:3')).. or with a raw Solr field identifier .. 'boost_functions', hash('fields',array ('attr_tags_lk:3'))..
  58. 58. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: 'mfunctions' l  Multiplicative l  No need to know raw relevancy numbers l  Multiplies the individual score with the specified function(s) l  Preferred over other query boost functions in most cases!
  59. 59. © 2013 Paul Borgermans, K-Minds Comm.V. Recipe: promote more recent content •  Parameter snippet ... 'boost_functions', hash('mfunctions', array('recip( ms(NOW/DAY,meta_published_dt), 1.58e-11,2.0,0.5)' )) … •  Scaling parameters for reciprocal function •  recip(x,m,a,b) = a/ (m*x+b) •  x = age in milliseconds •  m = 1.58 e-11 (milliseconds in 6 months)-1 •  a,b scaling factors (a “amplitude”, b “speed of age decline”)
  60. 60. © 2013 Paul Borgermans, K-Minds Comm.V. Recipe: promote more recent content (…) Implementing 1+(a/m*x+b) with a = 2 b = 0.5 m = 1.58e-11 x = age in milliseconds
  61. 61. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: 'queries' l  These are added to the main query and need to follow the Solr/Lucene query format ans specify the boost factor explicitely for it l  Example ..'boost_functions', hash('queries', array( 'meta_class_identifier_ms:article^10')).. l  Also available in ini settings (applies always) [QueryBoost] #RawBoostQueries[] RawBoostQueries[]=meta_class_identifier_ms:summary^4
  62. 62. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: ’ functions' l  These are like mfunctions, but add their value to the relevancy score l  Usually 'mfunctions' are the easier choice l  Example ..'boost_functions', hash('functions', array('sum(product (attr_importance_si,0.1),1)')) ..
  63. 63. © 2013 Paul Borgermans, K-Minds Comm.V. Solr has many functions to use l  Strings l  Numbers and mapping l  Date math l  Geospatial http://wiki.apache.org/solr/FunctionQuery/
  64. 64. © 2013 Paul Borgermans, K-Minds Comm.V. Absolute boosting: elevation l  If a query term matches, one or more objects are pushed to the top l  Query term has to be part of the object l  Dedicated admin interface J
  65. 65. © 2013 Paul Borgermans, K-Minds Comm.V. Custom datatype handlers l  Usually for “complex” datatypes -  Subfields (!) l  Can optionally be context aware -  Facets/Sort -  Search -  Filter
  66. 66. © 2013 Paul Borgermans, K-Minds Comm.V. Create your own datatype handler l  Derive from a base class: -  ezfSolrDocumentFieldBase -  Naming convention l  Provide at least two methods -  “schema” data: (sub)field names -  Data to index l  Starting point -  extension/ezfind/classes: ezfsolrdocumentfielddummyexample.php l  Add in ezfind.ini, [Indexoptions]
  67. 67. © 2013 Paul Borgermans, K-Minds Comm.V. Overview of eZ Find / Solr lower level API
  68. 68. © 2013 Paul Borgermans, K-Minds Comm.V. Base classes to know l  extension/ezfind/classes -  ezsolrbase.php handles communication with Solr backends -  ezsolrdoc.php creates proper XML structures for indexing -  ezfsolrutils.php easy to use higher level functions l  Let's have a look ...
  69. 69. © 2013 Paul Borgermans, K-Minds Comm.V. Index Time Plugin Mechanism l  Write your own functions to: -  Expand the Solr fields per object -  Modify existing fields -  Change per object and per field boosting dynamically l  Use cases -  Complex custom data, partially external -  Boost documents based on page views, user score, ….
  70. 70. © 2013 Paul Borgermans, K-Minds Comm.V. Index time plugins (...) l  Implement the following interface l  docList is the array of eZSolrDocs to be sent to Solr, one per language for the given contentObject interface ezfIndexPlugin { /** * @var eZContentObject $contentObject * @var array $docList */ public function modify(eZContentObject $contentObject, &$docList); }
  71. 71. © 2013 Paul Borgermans, K-Minds Comm.V. Index time plugins (...) l  Activate your plugin in ezfind.ini -  Global -  Per content class [IndexPlugins] # Allow injection of custom fields and manipulation of fields/boost parameters # at index time # This can be defined at the class level or general General[] #General[]=ezfIndexParentName #Classhooks will only be called for objects of the specified class Class[] Class[myspecialclass]=ezfIndexParentName
  72. 72. © 2013 Paul Borgermans, K-Minds Comm.V. Customizing autocomplete •  Tweaking schema.xml •  Goal: decrease "noise" •  Use copyfield directives to use only selected input fields and aggregate into a custom autocomplete source field •  Adapt ezfind.ini settings
  73. 73. © 2013 Paul Borgermans, K-Minds Comm.V. Customizing autocomplete: schema.xml <fields> .. <field name="my_autocomplete_field" type="textgen" indexed="true" stored="true" multiValued="true"/> .. <copyField source="*_lk" dest="my_autocomplete_field"/> .. </fields> Example source: only lowercased tags
  74. 74. © 2013 Paul Borgermans, K-Minds Comm.V. Customizing autocomplete: ezfind.ini [AutoCompleteSettings] AutoComplete=enabled # The maximum number of suggestions to return from search engine. Limit=10 # Facet field used by autocomplete. FacetField=my_autocomplete_field
  75. 75. © 2013 Paul Borgermans, K-Minds Comm.V. Suggested exercises
  76. 76. © 2013 Paul Borgermans, K-Minds Comm.V. Warm up exercise l  Make sure you are on the latest code base l  Play with the Lucene syntax supported by the new ezpubish/eDismax handler: -  Proximity searches -  Fuzzy searches -  Wildcards -  Ranges And see what happens
  77. 77. © 2013 Paul Borgermans, K-Minds Comm.V. Exercise: boosting l  Use the new 'mfunctions' parameter to boost more recent values l  Tweak your content with ratings and boost higher rated articles
  78. 78. © 2013 Paul Borgermans, K-Minds Comm.V. Exercise: Facets & attribute filtering l  Adapt the previous examples/recipes l  Try to facet and filters on classnames -  As a field facet (enumerate all classes) -  As a set of several query facets (enumerate only a selection) l  Range facets -  Date ranges
  79. 79. © 2013 Paul Borgermans, K-Minds Comm.V. Exercise: sub-attribute filtering on a related object l  Create an override template for a dummy node l  In the template add code for fetching with ez find, search with an empty query string, but use a filter with a subbatribute clause {def $searchResults = fetch( 'ezfind', 'search', hash( 'query', '', 'filter', array('article/testrelation/caption:specialvalue1')))
  80. 80. © 2013 Paul Borgermans, K-Minds Comm.V. A last plug: You are invited to our 5th anniversary! conference.phpbenelux.eu/2014/
  81. 81. © 2013 Paul Borgermans, K-Minds Comm.V. Appendix A Replication and loadbalancing
  82. 82. © 2013 Paul Borgermans, K-Minds Comm.V. Replication / Distribution l  Solr 3.x (current stable eZ Find) -  Master/slave model (pull) -  Easy to setup l  Solr 4.x (future eZ Find?) -  “SolrCloud”, dustributed capabilities (push) -  Apache Zookeeper based -  A bit more complicated setup -  Automatic failover, monitoring
  83. 83. © 2013 Paul Borgermans, K-Minds Comm.V. Master/Slave replication l  solrconfig.xml -  Activate handlers -  Allow parameters (slave must know master) -  Define replication trigger points (commit/ optimize/manual) -  Define config files to replicate if needed l  HTTP REST API l  Status monitoring in admin interface
  84. 84. © 2013 Paul Borgermans, K-Minds Comm.V. Replication: example config <requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="enable">${enable.master:false}</str> <str name="replicateAfter">commit</str> <str name="replicateAfter">startup</str> <str name="replicateAfter">optimize</str> <str name="confFiles">elevate.xml</str> </lst> <lst name="slave"> <str name="enable">${enable.slave:false}</str> <str name="masterUrl">http://${master.core.url:localhost:8983}/${solr.core.name}/replication</str> <str name="pollInterval">${poll.time:'00:00:10'}</str> </lst> </requestHandler> Startup parameters from command line or system
  85. 85. © 2013 Paul Borgermans, K-Minds Comm.V. Replication: starting master and slave Slave! ! java -Denable.slave=true -Dmaster.core.url=master:8983/solr -Dsolr.solr.home=/var/solr -jar start.jar! ! ! Master! ! java -Denable.master=true -Dsolr.solr.home=/var/solr -jar start.jar &!
  86. 86. © 2013 Paul Borgermans, K-Minds Comm.V. Replication and load balancing •  Reverse proxy and rewrite rules •  Point eZ Find Solr URI’s to load balancer URI •  Direct reads to slaves •  Direct everything else to master
  87. 87. © 2013 Paul Borgermans, K-Minds Comm.V. Replication and load balancing (…) Listen 8988! <VirtualHost *:8988>! # Need: mod_proxy mod_proxy_http mod_proxy_balancer active! ! <Proxy balancer://solrread>! # just two, localhost and the Solr master server as a hot stand-by: may also add the second webserver! BalancerMember http://localhost:8983! BalancerMember http://master-solr:8983 status=+H! </Proxy>! ! <Proxy balancer://solrwrite>! # just the Solr master server! BalancerMember http://master-solr:8983! </Proxy>! ! RewriteEngine On! ! # Send select to the solrread balancer! RewriteCond %{REQUEST_URI} ^/(.*)select/$! RewriteRule ^/(.*)$ balancer://solrread/$1 [P]! ! # Send all others to the write balancer! RewriteRule ^/(.*)$ balancer://solrwrite/$1 [P]! ! ProxyPassReverse / balancer://solrwrite! ProxyPassReverse / balancer://solrread! </VirtualHost>! Apache mod_proxy example
  88. 88. © 2013 Paul Borgermans, K-Minds Comm.V. Appendix B Inside Solr analysis
  89. 89. © 2013 Paul Borgermans, K-Minds Comm.V. A deeper dive into Apache Solr l  From index → document → field l  Schema.xml l  What happens under the hood
  90. 90. © 2013 Paul Borgermans, K-Minds Comm.V. The Solr/Lucene index l  Inverted index l  Holds a collection of “documents” (hello NoSQL) l  Document -  Collection of fields -  Flexible schema! -  Unique ID (user defined) l  Solr uses a XML based config file: schema.xml
  91. 91. © 2013 Paul Borgermans, K-Minds Comm.V. Field types and fields l  Various field types, derived from base classes l  Indexed (optional) -  usually analyzed & tokenized -  makes it searchable and sortable l  Stored (optional) -  contains also the original submitted content -  content can be part of the request response l  Can be multi-valued! -  opens possibilities beyond full text search
  92. 92. © 2013 Paul Borgermans, K-Minds Comm.V. Field definitions: schema.xml l  Field types -  text -  numerical -  dates -  location -  … (about 30 in total) l  Actual fields (name, definition, properties) l  Dynamic fields l  Copy fields (as aggregators)
  93. 93. © 2013 Paul Borgermans, K-Minds Comm.V. schema.xml: simple field type examples <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <!-- boolean type: "true" or "false" --> <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/> <!-- A Trie based date field for faster date range queries and date faceting. --> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/> <!-- A text field that only splits on whitespace for exact matching of words --> <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType>
  94. 94. © 2013 Paul Borgermans, K-Minds Comm.V. schema.xml: more complex field type <!-- A general unstemmed text field - good if one does not know the language of the field --> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  95. 95. © 2013 Paul Borgermans, K-Minds Comm.V. Analysis l  Solr does not really search your text, but rather the terms that result from the analysis of text l  Typically a chain of -  Character filter(s) -  Tokenisation -  Filter A -  Filter B -  …
  96. 96. © 2013 Paul Borgermans, K-Minds Comm.V. Solr comes with many tokenizers and filters l  Some are language specific l  Others are very specialised l  It is very important to get this right otherwise, you may not get what you expect!
  97. 97. © 2013 Paul Borgermans, K-Minds Comm.V. Text analysis examples Input phrase: Ivo Lukač presents a geek-interview on the eZSummerCamp.
  98. 98. © 2013 Paul Borgermans, K-Minds Comm.V. Character filters l  Used to cleanup text before tokenizing -  HTMLStripCharFilter (strips html, xml, js, css) -  MappingCharFilter (normalisation of characters, removing accents) -  Regular expression filter
  99. 99. © 2013 Paul Borgermans, K-Minds Comm.V. Tokenizers l  Convert text to tokens (terms) l  You can define only one per field/analyzer l  Examples -  WhitespaceTokenizer (splits on white space) -  StandardTokenizer -  CJK variants
  100. 100. © 2013 Paul Borgermans, K-Minds Comm.V. Additional filters l  Many possible per field/analyzer l  Many delivered with Solr out of the box l  If not enough, write a tiny bit of Java or look for contributions l  Examples ...
  101. 101. © 2013 Paul Borgermans, K-Minds Comm.V. Phonetic filters l  PhoneticFilterFactory l  “sounds like” transformations and matching l  Algorithms: -  Metaphone -  Double Metaphone -  Soundex -  Refined Soundex
  102. 102. © 2013 Paul Borgermans, K-Minds Comm.V. Reversing Filter l  Reverses the order of characters l  Use: allow “leading wildcards” l  *thing => gniht* l  A lot faster (prefixes)
  103. 103. © 2013 Paul Borgermans, K-Minds Comm.V. Synonyms l  Inject synonyms for certain terms l  Language specific l  Best used for query time analysis -  may inflate the search index too much -  decreases relevancy
  104. 104. © 2013 Paul Borgermans, K-Minds Comm.V. Stemming l  Reduce terms to their root form -  Plural forms -  Conjugations l  Language specific (or not relevant, CJK) l  Many specialised stemmers available -  Most european languages -  Some exotic ones through contributions outside ASF
  105. 105. © 2013 Paul Borgermans, K-Minds Comm.V. Copy fields l  Analysis is done differently for -  searching/filtering -  faceting/sorting l  Stemming and not stemming in different fields can increase relevance of results l  Use copy fields in schema.xml or do it client side
  106. 106. © 2013 Paul Borgermans, K-Minds Comm.V. Geospatial fields l  Solr dedicated fields -  Latitude Longitude type (trunk) l  Special geospatial functions in filtering & boosting -  Haversine distance (geosphere) -  Simple ranges (squares in 2-D) -  Special query constructs (upcoming)
  107. 107. © 2013 Paul Borgermans, K-Minds Comm.V. Dedicated fields for every context in eZ Find if configured l  Context -  Search -  Facets -  Filtering (usually the same as search) -  Sorting l  ezfind.ini l  Also for custom handlers if needed (see part 6)
  108. 108. © 2013 Paul Borgermans, K-Minds Comm.V.

×