© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find Recipes & Insights
September 4
Bol, Croatia
Paul Borgermans
© 2013 Paul Borgermans, K-Minds Comm.V.
About me
l  12+ years in the eZ ecosystem
-  eZ Lucene → eZ Solr → eZ Find
l  Fa...
© 2013 Paul Borgermans, K-Minds Comm.V.
Part 1: eZ Find Kitchen Basics
•  Get to know the ingredients & tools
•  Installat...
© 2013 Paul Borgermans, K-Minds Comm.V.
Get to know the ingredients &
tools
Powered by
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find main search ingredients
l  Tunable relevancy ranking
l  Keyword highligh...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find with two main additional roles
l  eZ Find to replace your (complex) ‘fetc...
© 2013 Paul Borgermans, K-Minds Comm.V.
Your tools
Credit: http://commons.wikimedia.org/wiki/File:Werkzeugwand.jpg
© 2013 Paul Borgermans, K-Minds Comm.V.
Core template level tools
l  Dedicated template fetch functions
l  Leveraging So...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tuning tools for relevancy
•  Index time
•  Configuration (ezfind.ini)
•  Custom i...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tools for extending eZ Find
•  Custom data type plugins
•  Tailor indexing and sea...
© 2013 Paul Borgermans, K-Minds Comm.V.
The Solr administration interface
l  http://localhost:8983/solr/<core>/admin
l  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
© 2013 Paul Borgermans, K-Minds Comm.V.
Installation and configuration recipes
© 2013 Paul Borgermans, K-Minds Comm.V.
Installation and configuration recipes
l  Requirements
l  Installing the extensi...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr backend-requirements
l  Java VM
-  JRE 6 or 7 (OpenJDK, Oracle/Sun)
l  Serv...
© 2013 Paul Borgermans, K-Minds Comm.V.
Extension installation and activation
l  eZ Find extension activated the usual wa...
© 2013 Paul Borgermans, K-Minds Comm.V.
Putting the backend somewhere
•  Inside eZ Find extension
•  Single installation
•...
© 2013 Paul Borgermans, K-Minds Comm.V.
Multiple ways and operating modes
for starting the Solr backend
•  Single core
•  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Multi-core setup advantages
•  Every language / tenant has its own
•  Index
•  Tun...
© 2013 Paul Borgermans, K-Minds Comm.V.
How to configure multicore setups ...
•  Create a new Solr home directory under
th...
© 2013 Paul Borgermans, K-Minds Comm.V.
Configuration of multiple cores ...
l  solr.xml as the master entry
point
l  lib...
© 2013 Paul Borgermans, K-Minds Comm.V.
Multicore master config file: solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<so...
© 2013 Paul Borgermans, K-Minds Comm.V.
Performance configuration options
l  Enable delayed indexation of objects (site.i...
© 2013 Paul Borgermans, K-Minds Comm.V.
Performance configuration options (...)
l  Disable optimize on commit
-  Configur...
© 2013 Paul Borgermans, K-Minds Comm.V.
Performance configuration options (...)
l  Enable commitWithin (ezfind.ini)
-  Us...
© 2013 Paul Borgermans, K-Minds Comm.V.
Search handler configuration
l  Defaults to “ezpublish” now (Apache Solr 3.6.1), ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Devops side-dish for large scale
installations (Linux)
•  Goal: avoid crashes, slo...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic indexing and re-indexing
l  Initial indexing: use dedicated eZ Find provide...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic indexing and re-indexing (…)
l  Full re-indexing with important changes
-  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Tika: indexing binary files
l  http://projects.ez.no/eztika
l  Based on Apach...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic searching, filtering and facets
recipes
© 2013 Paul Borgermans, K-Minds Comm.V.
Terminology 101
•  Searching
•  What you expect J
•  Includes relevancy calculati...
© 2013 Paul Borgermans, K-Minds Comm.V.
Terminology 101
•  Facets
•  Provides counts on potential filters to use
•  Tool t...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr/Lucene search syntax 101
l  Query using “eZ Publish/eDismax” handler
l  One...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr/Lucene search syntax 101
l  Terms and phrases
l  Term: cocktail
l  Phrase:...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr/Lucene search syntax 101(..)
l  Ranges
-  Inclusive/exclusive
-  One part ma...
© 2013 Paul Borgermans, K-Minds Comm.V.
Date handling
l  No real limits like unix timestamps
l  Date values in ISO 6801 ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Searching in templates
l  You can use the standard content/search
templates and p...
© 2013 Paul Borgermans, K-Minds Comm.V.
Dedicated search fetch parameters
•  Basic query parameters
•  query: query string...
© 2013 Paul Borgermans, K-Minds Comm.V.
Dedicated search fetch parameters
•  Advanced query parameters
•  spellcheck: arra...
© 2013 Paul Borgermans, K-Minds Comm.V.
Filtering
l  AND logic connects array elements using
Standard Lucene syntax.
l  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find and field names
l  The normal case for filtering: 3 ways
-  array('articl...
© 2013 Paul Borgermans, K-Minds Comm.V.
eZ Find raw field names
l  Main principle: <source>_<identifier>_<type>
-  <sourc...
© 2013 Paul Borgermans, K-Minds Comm.V.
Filter recipes
•  A specific age: last 2 weeks
.. filter, array( 

’meta_published...
© 2013 Paul Borgermans, K-Minds Comm.V.
Filter recipes (…)
•  ‘or’ conditions within fields
.. filter, array( 

'attr_tags...
© 2013 Paul Borgermans, K-Minds Comm.V.
Facets
•  Facet types
•  field: enumeration
•  function: Solr functions
•  prefix:...
© 2013 Paul Borgermans, K-Minds Comm.V.
Basic facet types
l  Field facets
l  Enumerate over contents
l  Can give large ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Range facets
l  For numerical and date ranges
l  Emits a multiple counts, depend...
© 2013 Paul Borgermans, K-Minds Comm.V.
Range facets: parameters
l  Mandatory
-  'field' (can also be custom Solr fields)...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipes with facets and filters
•  Analytics on publishing activities in the previ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipes with facets and filters (..)
•  Analytics on publishing activities in the ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Part 2: Advanced recipes & insights
•  Tuning search result relevancy
•  Create yo...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tuning search result relevancy
l  Index time boosting
-  “Permanent boosting”
-  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time boosting
l  Available for:
-  Classes
-  Attributes
-  Datatypes
l  B...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time boosting: ezfind.ini
example
[IndexBoost]
#ClassBoost: set boost factor...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting
l  Boosting types and corresponding sub-
parameters
-  'field...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: 'fields'
l  Example
.. 'boost_functions', hash('fields',arra...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: 'mfunctions'
l  Multiplicative
l  No need to know raw relev...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipe: promote more recent content
•  Parameter snippet
... 'boost_functions',
ha...
© 2013 Paul Borgermans, K-Minds Comm.V.
Recipe: promote more recent content (…)
Implementing
1+(a/m*x+b)
with
a = 2
b = 0....
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: 'queries'
l  These are added to the main query and need to
f...
© 2013 Paul Borgermans, K-Minds Comm.V.
Query time boosting: ’ functions'
l  These are like mfunctions, but add their val...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr has many functions to use
l  Strings
l  Numbers and mapping
l  Date math
l...
© 2013 Paul Borgermans, K-Minds Comm.V.
Absolute boosting: elevation
l  If a query term matches, one or more objects
are ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Custom datatype handlers
l  Usually for “complex” datatypes
-  Subfields (!)
l  ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Create your own datatype handler
l  Derive from a base class:
-  ezfSolrDocumentF...
© 2013 Paul Borgermans, K-Minds Comm.V.
Overview of eZ Find / Solr lower level API
© 2013 Paul Borgermans, K-Minds Comm.V.
Base classes to know
l  extension/ezfind/classes
-  ezsolrbase.php
handles commun...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index Time Plugin Mechanism
l  Write your own functions to:
-  Expand the Solr fi...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time plugins (...)
l  Implement the following interface
l  docList is the ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Index time plugins (...)
l  Activate your plugin in ezfind.ini
-  Global
-  Per c...
© 2013 Paul Borgermans, K-Minds Comm.V.
Customizing autocomplete
•  Tweaking schema.xml
•  Goal: decrease "noise"
•  Use c...
© 2013 Paul Borgermans, K-Minds Comm.V.
Customizing autocomplete:
schema.xml
<fields>
..
<field name="my_autocomplete_fiel...
© 2013 Paul Borgermans, K-Minds Comm.V.
Customizing autocomplete:
ezfind.ini
[AutoCompleteSettings]
AutoComplete=enabled
#...
© 2013 Paul Borgermans, K-Minds Comm.V.
Suggested exercises
© 2013 Paul Borgermans, K-Minds Comm.V.
Warm up exercise
l  Make sure you are on the latest code base
l  Play with the L...
© 2013 Paul Borgermans, K-Minds Comm.V.
Exercise: boosting
l  Use the new 'mfunctions' parameter to boost
more recent val...
© 2013 Paul Borgermans, K-Minds Comm.V.
Exercise: Facets & attribute filtering
l  Adapt the previous examples/recipes
l ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Exercise: sub-attribute filtering on
a related object
l  Create an override templ...
© 2013 Paul Borgermans, K-Minds Comm.V.
A last plug: You are invited to our
5th anniversary!
conference.phpbenelux.eu/2014/
© 2013 Paul Borgermans, K-Minds Comm.V.
Appendix A
Replication and loadbalancing
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication / Distribution
l  Solr 3.x (current stable eZ Find)
-  Master/slave m...
© 2013 Paul Borgermans, K-Minds Comm.V.
Master/Slave replication
l  solrconfig.xml
-  Activate handlers
-  Allow paramete...
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication: example config
<requestHandler name="/replication" class="solr.Replic...
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication: starting master and slave
Slave!
!
java -Denable.slave=true -Dmaster....
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication and load balancing
•  Reverse proxy and rewrite rules
•  Point eZ Find...
© 2013 Paul Borgermans, K-Minds Comm.V.
Replication and load balancing (…)
Listen 8988!
<VirtualHost *:8988>!
# Need: mod_...
© 2013 Paul Borgermans, K-Minds Comm.V.
Appendix B
Inside Solr analysis
© 2013 Paul Borgermans, K-Minds Comm.V.
A deeper dive into
Apache Solr
l  From index → document → field
l  Schema.xml
l...
© 2013 Paul Borgermans, K-Minds Comm.V.
The Solr/Lucene index
l  Inverted index
l  Holds a collection of “documents” (he...
© 2013 Paul Borgermans, K-Minds Comm.V.
Field types and fields
l  Various field types, derived from base classes
l  Inde...
© 2013 Paul Borgermans, K-Minds Comm.V.
Field definitions: schema.xml
l  Field types
-  text
-  numerical
-  dates
-  loc...
© 2013 Paul Borgermans, K-Minds Comm.V.
schema.xml: simple field type examples
<fieldType name="string" class="solr.StrFie...
© 2013 Paul Borgermans, K-Minds Comm.V.
schema.xml: more complex field type
<!-- A general unstemmed text field - good if ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Analysis
l  Solr does not really search your text, but rather
the terms that resu...
© 2013 Paul Borgermans, K-Minds Comm.V.
Solr comes with many tokenizers and
filters
l  Some are language specific
l  Oth...
© 2013 Paul Borgermans, K-Minds Comm.V.
Text analysis examples
Input phrase:
Ivo Lukač presents a geek-interview on the eZ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Character filters
l  Used to cleanup text before tokenizing
-  HTMLStripCharFilte...
© 2013 Paul Borgermans, K-Minds Comm.V.
Tokenizers
l  Convert text to tokens (terms)
l  You can define only one per fiel...
© 2013 Paul Borgermans, K-Minds Comm.V.
Additional filters
l  Many possible per field/analyzer
l  Many delivered with So...
© 2013 Paul Borgermans, K-Minds Comm.V.
Phonetic filters
l  PhoneticFilterFactory
l  “sounds like” transformations and m...
© 2013 Paul Borgermans, K-Minds Comm.V.
Reversing Filter
l  Reverses the order of characters
l  Use: allow “leading wild...
© 2013 Paul Borgermans, K-Minds Comm.V.
Synonyms
l  Inject synonyms for certain terms
l  Language specific
l  Best used...
© 2013 Paul Borgermans, K-Minds Comm.V.
Stemming
l  Reduce terms to their root form
-  Plural forms
-  Conjugations
l  L...
© 2013 Paul Borgermans, K-Minds Comm.V.
Copy fields
l  Analysis is done differently for
-  searching/filtering
-  facetin...
© 2013 Paul Borgermans, K-Minds Comm.V.
Geospatial fields
l  Solr dedicated fields
-  Latitude Longitude type (trunk)
l ...
© 2013 Paul Borgermans, K-Minds Comm.V.
Dedicated fields for every context in
eZ Find if configured
l  Context
-  Search
...
© 2013 Paul Borgermans, K-Minds Comm.V.
Upcoming SlideShare
Loading in...5
×

eZ Find workshop: advanced insights & recipes

4,012

Published on

Various how-to's and recipes to get things done with eZ Find, advanced searches, facet navigation, clustering of search results, domain specific boosting, etc. This workshop is based on eZ version 4 stack but the knowledge provided reaches beyond eZ versions.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,012
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
3
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "eZ Find workshop: advanced insights & recipes"

  1. 1. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find Recipes & Insights September 4 Bol, Croatia Paul Borgermans
  2. 2. © 2013 Paul Borgermans, K-Minds Comm.V. About me l  12+ years in the eZ ecosystem -  eZ Lucene → eZ Solr → eZ Find l  Fancying : -  Apache Lucene family of projects (mainly Solr) -  NoSQL (Not only SQL) and scalable architectures -  eZ Publish & CMS systems in general -  Semantic aspects -  PHPBenelux Community & Conference l  Contact paul.borgermans@gmail.com @paulborgermans
  3. 3. © 2013 Paul Borgermans, K-Minds Comm.V. Part 1: eZ Find Kitchen Basics •  Get to know the ingredients & tools •  Installation recipes •  Basic configuration options •  Basic indexing •  Basic searching, filtering and facets
  4. 4. © 2013 Paul Borgermans, K-Minds Comm.V. Get to know the ingredients & tools Powered by
  5. 5. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find main search ingredients l  Tunable relevancy ranking l  Keyword highlighting l  Filtering and Facets (drill down navigation) l  Automatic related content l  Language dependent optimizations l  Fast l  Adaptive to your domain data models l  Leverages Apache Solr/Lucene
  6. 6. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find with two main additional roles l  eZ Find to replace your (complex) ‘fetch content’ calls -  Speed up template rendering, especially with complex dynamic pages l  eZ Find/Solr as a content and integration engine -  Document oriented storage system (hello NoSQL) -  Archive use-case -  External content
  7. 7. © 2013 Paul Borgermans, K-Minds Comm.V. Your tools Credit: http://commons.wikimedia.org/wiki/File:Werkzeugwand.jpg
  8. 8. © 2013 Paul Borgermans, K-Minds Comm.V. Core template level tools l  Dedicated template fetch functions l  Leveraging Solr search (including spell check, highlighting, …) l  More Like This l  Raw access to Solr index (ex: integrating foreign sources) l  JS/AJAX l  Term suggestions
  9. 9. © 2013 Paul Borgermans, K-Minds Comm.V. Tuning tools for relevancy •  Index time •  Configuration (ezfind.ini) •  Custom index time plugin •  Search time •  Boost functions •  Elevation of objects •  Apche Solr schema.xml and solrconfig.xml magic
  10. 10. © 2013 Paul Borgermans, K-Minds Comm.V. Tools for extending eZ Find •  Custom data type plugins •  Tailor indexing and searching for your data-types •  General index time plugins •  Even more tailoring and exotic dishes •  Custom suggesters •  Add your own vocabularies
  11. 11. © 2013 Paul Borgermans, K-Minds Comm.V. The Solr administration interface l  http://localhost:8983/solr/<core>/admin l  Statistics and health monitor l  Search index l  Java VM (devops) l  Advanced use l  Learning l  Debugging (understanding search results) l  Tuning tool
  12. 12. © 2013 Paul Borgermans, K-Minds Comm.V.
  13. 13. © 2013 Paul Borgermans, K-Minds Comm.V. Installation and configuration recipes
  14. 14. © 2013 Paul Borgermans, K-Minds Comm.V. Installation and configuration recipes l  Requirements l  Installing the extension l  Basic installation/activation of Solr
  15. 15. © 2013 Paul Borgermans, K-Minds Comm.V. Solr backend-requirements l  Java VM -  JRE 6 or 7 (OpenJDK, Oracle/Sun) l  Servlet container -  Jetty shipped by default, Tomcat, .... -  Security to be configured (by default: open) -  See also http://wiki.apache.org/solr/SolrInstall l  For larger sites/indexes: enough RAM -  Yet leave enough for the OS/file caching
  16. 16. © 2013 Paul Borgermans, K-Minds Comm.V. Extension installation and activation l  eZ Find extension activated the usual way -  ActiveExtensions[]=ezfind -  (!) Regenerate autoloads if using direct editing of ini settings l  Execute the DB upgrade script -  Used for elevation -  See extension/ezfind/sql/<db>
  17. 17. © 2013 Paul Borgermans, K-Minds Comm.V. Putting the backend somewhere •  Inside eZ Find extension •  Single installation •  Quick testing •  Dedicated locations •  Production setups •  Multi-tenant setups •  Multiple instances (development) •  Separate the binaries and data/conf, example: /opt/solr for binaries /srv/solr for data/conf
  18. 18. © 2013 Paul Borgermans, K-Minds Comm.V. Multiple ways and operating modes for starting the Solr backend •  Single core •  Deprecated •  Multiple cores •  Multi-lingual •  Multi-tenant •  Multiple instances on your dev installation •  Setup instructions: see online docs or last years presentation
  19. 19. © 2013 Paul Borgermans, K-Minds Comm.V. Multi-core setup advantages •  Every language / tenant has its own •  Index •  Tunable analyzer options •  Spell checker dictionary •  Synonyms, stop word list •  Elevate configuration •  Additional bonuses: •  slight increase in performance •  core admin features
  20. 20. © 2013 Paul Borgermans, K-Minds Comm.V. How to configure multicore setups ... •  Create a new Solr home directory under the java subdir •  Put a config file solr.xml which specifies the cores •  Copy the conf and data directories •  Specify the solr home when starting the servlet container sudo java -jar -Dsolr.solr.home=solr.multicore -jar start.jar
  21. 21. © 2013 Paul Borgermans, K-Minds Comm.V. Configuration of multiple cores ... l  solr.xml as the master entry point l  lib for all shared jars (extensions) l  in each subdir, dedicated: -  index (“data”) -  Configuration files (“conf”) -  (option) “lib“ with core specific jars
  22. 22. © 2013 Paul Borgermans, K-Minds Comm.V. Multicore master config file: solr.xml <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> <core name="project1-eng-GB" instanceDir="pro1-eng" /> <core name="project1-ger-DE" instanceDir="pro1-ger" /> <core name="develop" instanceDir="inventory" /> </cores> </solr>
  23. 23. © 2013 Paul Borgermans, K-Minds Comm.V. Performance configuration options l  Enable delayed indexation of objects (site.ini) l  Editors will be happier (“faster publishing”) l  Can be done globally or per class (recommended for binary file indexing) l  Downside: objects will only be in search results after the configured cronjob has run
  24. 24. © 2013 Paul Borgermans, K-Minds Comm.V. Performance configuration options (...) l  Disable optimize on commit -  Configure cronjob to do it once per day/week -  Makes files compact -  If many delete operations happen, optimize accordingly
  25. 25. © 2013 Paul Borgermans, K-Minds Comm.V. Performance configuration options (...) l  Enable commitWithin (ezfind.ini) -  Use case: large sites, where commits can also take some time -  Specified in milliseconds -  No cronjobs needed l  Only in special cases: disable direct commits -  Indexing -  Delete operations
  26. 26. © 2013 Paul Borgermans, K-Minds Comm.V. Search handler configuration l  Defaults to “ezpublish” now (Apache Solr 3.6.1), based on “eDismax” l  Supports Lucene syntax (wildcards) l  Does partial language analysis in presence of wildcards l  If upgrading from older versions: check value in ezfind.ini [SearchHandler] DefaultSearchHandler=ezpublish
  27. 27. © 2013 Paul Borgermans, K-Minds Comm.V. Devops side-dish for large scale installations (Linux) •  Goal: avoid crashes, slowness •  Environment •  Many Solr index cores •  Many facet queries and filters used •  Heavy traffic •  Linux process limits (Solr startup) •  Memory limit setting ! !ulimit -v unlimited! •  File descriptors (open files) ulimit -n 30000!
  28. 28. © 2013 Paul Borgermans, K-Minds Comm.V. Basic indexing and re-indexing l  Initial indexing: use dedicated eZ Find provided script -  php extension/ezfind/bin/php/updatesearchindexsolr.php -s <admin siteaccess> --php-exec=php –conc=2 -  typical speed: 5-25 objects /sec l  Further indexing: automatically
  29. 29. © 2013 Paul Borgermans, K-Minds Comm.V. Basic indexing and re-indexing (…) l  Full re-indexing with important changes -  Schema changes in the backend Solr -  ezfind.ini changes related to field mapping -  Switching from single to multi-core setups -  Upgrades of eZ Find and/or Solr
  30. 30. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Tika: indexing binary files l  http://projects.ez.no/eztika l  Based on Apache Tika -  Text and meta-data extraction for a large variety of file types l  Extension provides -  Standalone binary (yet another Java .jar) -  Configuration settings -  A stub binary file handler -  A wrapper shell script
  31. 31. © 2013 Paul Borgermans, K-Minds Comm.V. Basic searching, filtering and facets recipes
  32. 32. © 2013 Paul Borgermans, K-Minds Comm.V. Terminology 101 •  Searching •  What you expect J •  Includes relevancy calculations •  Filtering •  Narrows down the set of documents to search for •  Does NOT influence relevancy calculations •  Full search syntax and more for you to use Index FilterSearch result
  33. 33. © 2013 Paul Borgermans, K-Minds Comm.V. Terminology 101 •  Facets •  Provides counts on potential filters to use •  Tool to create navigation interfaces
  34. 34. © 2013 Paul Borgermans, K-Minds Comm.V. Solr/Lucene search syntax 101 l  Query using “eZ Publish/eDismax” handler l  One or more keywords l  + or – prefix to denote required or excluded example: +cocktail -workshop l  Multiple terms: “minimum must match rules” Default: 1, 2 keywords: at least one must match 3-5 keywords, at least 2-4 must match 6-7 keywords, at least 4-5 must match above 7 keywords, 60% of them must match
  35. 35. © 2013 Paul Borgermans, K-Minds Comm.V. Solr/Lucene search syntax 101 l  Terms and phrases l  Term: cocktail l  Phrase: “Elaphusa hotel” l  Wildcards l  Using '*': pro* l  Using '?': ma?ch l  Allowing certain “edit distance”: fuzzy searches l  march~0.7 l  Proximity l  “john doe”~10
  36. 36. © 2013 Paul Borgermans, K-Minds Comm.V. Solr/Lucene search syntax 101(..) l  Ranges -  Inclusive/exclusive -  One part may be open ended using '*' l  Inclusive -  [1 TO 5] l  Exclusive -  {0 TO 6} l  Open ended -  [NOW/DAY-1YEARS TO *]
  37. 37. © 2013 Paul Borgermans, K-Minds Comm.V. Date handling l  No real limits like unix timestamps l  Date values in ISO 6801 format yyyy-mm-ddThh:mm:ssZ (in UTC) l  Macro like syntax -  “NOW” -  “NOW/DAY-1YEAR” -  “NOW+3DAYS” l  Templates: format datetime with 'solr’ operator
  38. 38. © 2013 Paul Borgermans, K-Minds Comm.V. Searching in templates l  You can use the standard content/search templates and parameters l  But much better: dedicated eZ Find fetch functions -  fetch( ezfind, search, hash( query, 'eZ Systems' ) ) -  fetch( ezfind, moreLikeThis, …) -  fetch( ezfind, rawSolrRequest, …)
  39. 39. © 2013 Paul Borgermans, K-Minds Comm.V. Dedicated search fetch parameters •  Basic query parameters •  query: query string •  offset: result offset •  limit: max number of results •  class_id: class id’s/identifiers (string or array) •  section_id: section identifier •  query_handler: string (default “ezpublish”) See doc.ez.no for the full list of parameters
  40. 40. © 2013 Paul Borgermans, K-Minds Comm.V. Dedicated search fetch parameters •  Advanced query parameters •  spellcheck: array(true/false, ‘default’) •  filter: mixed filter expression •  facet: mixed facet expression •  sort_by: array of hashes •  criterium: score (default), name, class_name, published, modified, …. •  order: “asc” or “desc” See doc.ez.no for the full list of parameters
  41. 41. © 2013 Paul Borgermans, K-Minds Comm.V. Filtering l  AND logic connects array elements using Standard Lucene syntax. l  Within element, ‘OR’ logick can be applied l  Attribute identifiers are mapped to Solr fields l  Example fetch( ezfind, search, hash( query, 'cocktails', filter, array( 'article/tags:Bol' ) ) )
  42. 42. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find and field names l  The normal case for filtering: 3 ways -  array('article/title:a*') //will generate 2 filters -  array('title:a*') //cross class attribute filtering -  array('attr_title_s:a*') //using raw field names
  43. 43. © 2013 Paul Borgermans, K-Minds Comm.V. eZ Find raw field names l  Main principle: <source>_<identifier>_<type> -  <source>: meta, attr, as -  <identifier>: eZ Publish native identifier -  <type>: Solr field type mapping (schema.xml) l  Extra -  timestamp: time when the object was indexed -  ezf_df_text: aggregator for all text -  ezf_sp_words: spellcheck source l  Subattributes: another separator with 3x '_' <source>_<identifier>___<sub_attr_id>_<type>
  44. 44. © 2013 Paul Borgermans, K-Minds Comm.V. Filter recipes •  A specific age: last 2 weeks .. filter, array( 
 ’meta_published_dt:[NOW/DAY-2WEEKS TO NOW/DAY+1DAYS]’ ) ..
 ! Solr filter query cache friendly: •  Lower bound: rounds on day and substracts 2 weeks •  Upper bound: rounds on current day + 1 day in order to get also the published items after 00:00 ‘today’
  45. 45. © 2013 Paul Borgermans, K-Minds Comm.V. Filter recipes (…) •  ‘or’ conditions within fields .. filter, array( 
 'attr_tags_lk:(ezfind ezsummercamp netgen)’ ) ..
 ! •  ‘or’ conditions across fields .. filter, array( 
 ’(attr_length_si:[2 TO 5]) (attr_color_s:red)’ ) ..
 !
  46. 46. © 2013 Paul Borgermans, K-Minds Comm.V. Facets •  Facet types •  field: enumeration •  function: Solr functions •  prefix: prefix/wildcard •  range, date •  Main facet parameters: •  sort: count or alphanumerical •  limit, offset! •  mincount! •  missing!
  47. 47. © 2013 Paul Borgermans, K-Minds Comm.V. Basic facet types l  Field facets l  Enumerate over contents l  Can give large results, use wisely l  Typical: keywords, object metadata l  Functions l  The sky is the limit l  Gives back 1 count result l  Prefix l  Shortcut for a simple function facet
  48. 48. © 2013 Paul Borgermans, K-Minds Comm.V. Range facets l  For numerical and date ranges l  Emits a multiple counts, depending on parameters provided l  Example: fetch( ezfind, search, hash( 'query', '$queryString, 'facet',array( hash( 'range', hash('field', 'published', 'start', 'NOW/YEAR-3YEARS', 'end', 'NOW/YEAR+1YEAR', 'gap', '+1YEAR' ) ) ) ) )
  49. 49. © 2013 Paul Borgermans, K-Minds Comm.V. Range facets: parameters l  Mandatory -  'field' (can also be custom Solr fields) -  'start' (numeric/date) -  'end' (numeric/date) l  Optional -  'hardend' -  'include' -  'other’
  50. 50. © 2013 Paul Borgermans, K-Minds Comm.V. Recipes with facets and filters •  Analytics on publishing activities in the previous month fetch( ezfind, search, hash( query, '', filter, array( 'meta_published_dt:[NOW/MONTH-1MONTHS TO NOW/MONTH]' ), facet, array( hash('field','meta_contentclass_id_si' ), hash('field','meta_owner_id_si') ) ) ) •  Results in counts on content types and authors
  51. 51. © 2013 Paul Borgermans, K-Minds Comm.V. Recipes with facets and filters (..) •  Analytics on publishing activities in the previous months for a certain content type, using range facets fetch( ezfind, search, hash( query, '', filter, array( 'meta_class_identifier_ms:article' ), facet, array( hash('range', hash( 'field', 'published', 'start', 'NOW/MONTH-12MONTHS', 'end', 'NOW/MONTH', 'gap', '+1MONTHS' )), ) ) )
  52. 52. © 2013 Paul Borgermans, K-Minds Comm.V. Part 2: Advanced recipes & insights •  Tuning search result relevancy •  Create your own data-type plugin •  eZ Find / Solr lower-level API •  General index time plugins Appendix •  Devops: replication and loadbalancing/failover •  A deeper dive into Solr analysis
  53. 53. © 2013 Paul Borgermans, K-Minds Comm.V. Tuning search result relevancy l  Index time boosting -  “Permanent boosting” -  Best used after some real-life measurements (logs, user feedback, dedicated tests) -  ezfind.ini l  Query time boosting -  For ezpublish/eDismax request handlers -  Fields (also meta-data) -  Function queries -  Multiplicative and additive boosting
  54. 54. © 2013 Paul Borgermans, K-Minds Comm.V. Index time boosting l  Available for: -  Classes -  Attributes -  Datatypes l  Boost factor ranges -  [0 … 1] suppression -  [1 … ] boosting l  ezfind.ini
  55. 55. © 2013 Paul Borgermans, K-Minds Comm.V. Index time boosting: ezfind.ini example [IndexBoost] #ClassBoost: set boost factors on document (object) level #format Class[<attribute identifier>]=<boost factor as int or float> Class[] Class[article]=4 Class[folder]=0.1 #AttributeBoost: set boost factors on attributes at field level #you can specify the class identifier as optional (!) element for greatest flexibility #If more than attributeidentifier is used, the last one has precedence Attribute[] Attribute[product/name]=8.0 Attribute[bio]=1.5 #AttributeBoost: set boost factors on attributes at field level based on their datatype Datatype[] Datatype[ezkeyword]=3.0 #ReverseRelatedScale: scale factor to use in $boost = $boost + <scalefactor> * <number of reverse relations> ReverseRelatedScale=0 ReverseRelatedScale=0.8
  56. 56. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting l  Boosting types and corresponding sub- parameters -  'field' -  'mfunctions' -  'queries' -  'functions' l  Properly supported only since eZ Publish 5, eZ Find master
  57. 57. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: 'fields' l  Example .. 'boost_functions', hash('fields',array ('article/tags:3')).. or with a raw Solr field identifier .. 'boost_functions', hash('fields',array ('attr_tags_lk:3'))..
  58. 58. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: 'mfunctions' l  Multiplicative l  No need to know raw relevancy numbers l  Multiplies the individual score with the specified function(s) l  Preferred over other query boost functions in most cases!
  59. 59. © 2013 Paul Borgermans, K-Minds Comm.V. Recipe: promote more recent content •  Parameter snippet ... 'boost_functions', hash('mfunctions', array('recip( ms(NOW/DAY,meta_published_dt), 1.58e-11,2.0,0.5)' )) … •  Scaling parameters for reciprocal function •  recip(x,m,a,b) = a/ (m*x+b) •  x = age in milliseconds •  m = 1.58 e-11 (milliseconds in 6 months)-1 •  a,b scaling factors (a “amplitude”, b “speed of age decline”)
  60. 60. © 2013 Paul Borgermans, K-Minds Comm.V. Recipe: promote more recent content (…) Implementing 1+(a/m*x+b) with a = 2 b = 0.5 m = 1.58e-11 x = age in milliseconds
  61. 61. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: 'queries' l  These are added to the main query and need to follow the Solr/Lucene query format ans specify the boost factor explicitely for it l  Example ..'boost_functions', hash('queries', array( 'meta_class_identifier_ms:article^10')).. l  Also available in ini settings (applies always) [QueryBoost] #RawBoostQueries[] RawBoostQueries[]=meta_class_identifier_ms:summary^4
  62. 62. © 2013 Paul Borgermans, K-Minds Comm.V. Query time boosting: ’ functions' l  These are like mfunctions, but add their value to the relevancy score l  Usually 'mfunctions' are the easier choice l  Example ..'boost_functions', hash('functions', array('sum(product (attr_importance_si,0.1),1)')) ..
  63. 63. © 2013 Paul Borgermans, K-Minds Comm.V. Solr has many functions to use l  Strings l  Numbers and mapping l  Date math l  Geospatial http://wiki.apache.org/solr/FunctionQuery/
  64. 64. © 2013 Paul Borgermans, K-Minds Comm.V. Absolute boosting: elevation l  If a query term matches, one or more objects are pushed to the top l  Query term has to be part of the object l  Dedicated admin interface J
  65. 65. © 2013 Paul Borgermans, K-Minds Comm.V. Custom datatype handlers l  Usually for “complex” datatypes -  Subfields (!) l  Can optionally be context aware -  Facets/Sort -  Search -  Filter
  66. 66. © 2013 Paul Borgermans, K-Minds Comm.V. Create your own datatype handler l  Derive from a base class: -  ezfSolrDocumentFieldBase -  Naming convention l  Provide at least two methods -  “schema” data: (sub)field names -  Data to index l  Starting point -  extension/ezfind/classes: ezfsolrdocumentfielddummyexample.php l  Add in ezfind.ini, [Indexoptions]
  67. 67. © 2013 Paul Borgermans, K-Minds Comm.V. Overview of eZ Find / Solr lower level API
  68. 68. © 2013 Paul Borgermans, K-Minds Comm.V. Base classes to know l  extension/ezfind/classes -  ezsolrbase.php handles communication with Solr backends -  ezsolrdoc.php creates proper XML structures for indexing -  ezfsolrutils.php easy to use higher level functions l  Let's have a look ...
  69. 69. © 2013 Paul Borgermans, K-Minds Comm.V. Index Time Plugin Mechanism l  Write your own functions to: -  Expand the Solr fields per object -  Modify existing fields -  Change per object and per field boosting dynamically l  Use cases -  Complex custom data, partially external -  Boost documents based on page views, user score, ….
  70. 70. © 2013 Paul Borgermans, K-Minds Comm.V. Index time plugins (...) l  Implement the following interface l  docList is the array of eZSolrDocs to be sent to Solr, one per language for the given contentObject interface ezfIndexPlugin { /** * @var eZContentObject $contentObject * @var array $docList */ public function modify(eZContentObject $contentObject, &$docList); }
  71. 71. © 2013 Paul Borgermans, K-Minds Comm.V. Index time plugins (...) l  Activate your plugin in ezfind.ini -  Global -  Per content class [IndexPlugins] # Allow injection of custom fields and manipulation of fields/boost parameters # at index time # This can be defined at the class level or general General[] #General[]=ezfIndexParentName #Classhooks will only be called for objects of the specified class Class[] Class[myspecialclass]=ezfIndexParentName
  72. 72. © 2013 Paul Borgermans, K-Minds Comm.V. Customizing autocomplete •  Tweaking schema.xml •  Goal: decrease "noise" •  Use copyfield directives to use only selected input fields and aggregate into a custom autocomplete source field •  Adapt ezfind.ini settings
  73. 73. © 2013 Paul Borgermans, K-Minds Comm.V. Customizing autocomplete: schema.xml <fields> .. <field name="my_autocomplete_field" type="textgen" indexed="true" stored="true" multiValued="true"/> .. <copyField source="*_lk" dest="my_autocomplete_field"/> .. </fields> Example source: only lowercased tags
  74. 74. © 2013 Paul Borgermans, K-Minds Comm.V. Customizing autocomplete: ezfind.ini [AutoCompleteSettings] AutoComplete=enabled # The maximum number of suggestions to return from search engine. Limit=10 # Facet field used by autocomplete. FacetField=my_autocomplete_field
  75. 75. © 2013 Paul Borgermans, K-Minds Comm.V. Suggested exercises
  76. 76. © 2013 Paul Borgermans, K-Minds Comm.V. Warm up exercise l  Make sure you are on the latest code base l  Play with the Lucene syntax supported by the new ezpubish/eDismax handler: -  Proximity searches -  Fuzzy searches -  Wildcards -  Ranges And see what happens
  77. 77. © 2013 Paul Borgermans, K-Minds Comm.V. Exercise: boosting l  Use the new 'mfunctions' parameter to boost more recent values l  Tweak your content with ratings and boost higher rated articles
  78. 78. © 2013 Paul Borgermans, K-Minds Comm.V. Exercise: Facets & attribute filtering l  Adapt the previous examples/recipes l  Try to facet and filters on classnames -  As a field facet (enumerate all classes) -  As a set of several query facets (enumerate only a selection) l  Range facets -  Date ranges
  79. 79. © 2013 Paul Borgermans, K-Minds Comm.V. Exercise: sub-attribute filtering on a related object l  Create an override template for a dummy node l  In the template add code for fetching with ez find, search with an empty query string, but use a filter with a subbatribute clause {def $searchResults = fetch( 'ezfind', 'search', hash( 'query', '', 'filter', array('article/testrelation/caption:specialvalue1')))
  80. 80. © 2013 Paul Borgermans, K-Minds Comm.V. A last plug: You are invited to our 5th anniversary! conference.phpbenelux.eu/2014/
  81. 81. © 2013 Paul Borgermans, K-Minds Comm.V. Appendix A Replication and loadbalancing
  82. 82. © 2013 Paul Borgermans, K-Minds Comm.V. Replication / Distribution l  Solr 3.x (current stable eZ Find) -  Master/slave model (pull) -  Easy to setup l  Solr 4.x (future eZ Find?) -  “SolrCloud”, dustributed capabilities (push) -  Apache Zookeeper based -  A bit more complicated setup -  Automatic failover, monitoring
  83. 83. © 2013 Paul Borgermans, K-Minds Comm.V. Master/Slave replication l  solrconfig.xml -  Activate handlers -  Allow parameters (slave must know master) -  Define replication trigger points (commit/ optimize/manual) -  Define config files to replicate if needed l  HTTP REST API l  Status monitoring in admin interface
  84. 84. © 2013 Paul Borgermans, K-Minds Comm.V. Replication: example config <requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="enable">${enable.master:false}</str> <str name="replicateAfter">commit</str> <str name="replicateAfter">startup</str> <str name="replicateAfter">optimize</str> <str name="confFiles">elevate.xml</str> </lst> <lst name="slave"> <str name="enable">${enable.slave:false}</str> <str name="masterUrl">http://${master.core.url:localhost:8983}/${solr.core.name}/replication</str> <str name="pollInterval">${poll.time:'00:00:10'}</str> </lst> </requestHandler> Startup parameters from command line or system
  85. 85. © 2013 Paul Borgermans, K-Minds Comm.V. Replication: starting master and slave Slave! ! java -Denable.slave=true -Dmaster.core.url=master:8983/solr -Dsolr.solr.home=/var/solr -jar start.jar! ! ! Master! ! java -Denable.master=true -Dsolr.solr.home=/var/solr -jar start.jar &!
  86. 86. © 2013 Paul Borgermans, K-Minds Comm.V. Replication and load balancing •  Reverse proxy and rewrite rules •  Point eZ Find Solr URI’s to load balancer URI •  Direct reads to slaves •  Direct everything else to master
  87. 87. © 2013 Paul Borgermans, K-Minds Comm.V. Replication and load balancing (…) Listen 8988! <VirtualHost *:8988>! # Need: mod_proxy mod_proxy_http mod_proxy_balancer active! ! <Proxy balancer://solrread>! # just two, localhost and the Solr master server as a hot stand-by: may also add the second webserver! BalancerMember http://localhost:8983! BalancerMember http://master-solr:8983 status=+H! </Proxy>! ! <Proxy balancer://solrwrite>! # just the Solr master server! BalancerMember http://master-solr:8983! </Proxy>! ! RewriteEngine On! ! # Send select to the solrread balancer! RewriteCond %{REQUEST_URI} ^/(.*)select/$! RewriteRule ^/(.*)$ balancer://solrread/$1 [P]! ! # Send all others to the write balancer! RewriteRule ^/(.*)$ balancer://solrwrite/$1 [P]! ! ProxyPassReverse / balancer://solrwrite! ProxyPassReverse / balancer://solrread! </VirtualHost>! Apache mod_proxy example
  88. 88. © 2013 Paul Borgermans, K-Minds Comm.V. Appendix B Inside Solr analysis
  89. 89. © 2013 Paul Borgermans, K-Minds Comm.V. A deeper dive into Apache Solr l  From index → document → field l  Schema.xml l  What happens under the hood
  90. 90. © 2013 Paul Borgermans, K-Minds Comm.V. The Solr/Lucene index l  Inverted index l  Holds a collection of “documents” (hello NoSQL) l  Document -  Collection of fields -  Flexible schema! -  Unique ID (user defined) l  Solr uses a XML based config file: schema.xml
  91. 91. © 2013 Paul Borgermans, K-Minds Comm.V. Field types and fields l  Various field types, derived from base classes l  Indexed (optional) -  usually analyzed & tokenized -  makes it searchable and sortable l  Stored (optional) -  contains also the original submitted content -  content can be part of the request response l  Can be multi-valued! -  opens possibilities beyond full text search
  92. 92. © 2013 Paul Borgermans, K-Minds Comm.V. Field definitions: schema.xml l  Field types -  text -  numerical -  dates -  location -  … (about 30 in total) l  Actual fields (name, definition, properties) l  Dynamic fields l  Copy fields (as aggregators)
  93. 93. © 2013 Paul Borgermans, K-Minds Comm.V. schema.xml: simple field type examples <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <!-- boolean type: "true" or "false" --> <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/> <!-- A Trie based date field for faster date range queries and date faceting. --> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/> <!-- A text field that only splits on whitespace for exact matching of words --> <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType>
  94. 94. © 2013 Paul Borgermans, K-Minds Comm.V. schema.xml: more complex field type <!-- A general unstemmed text field - good if one does not know the language of the field --> <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  95. 95. © 2013 Paul Borgermans, K-Minds Comm.V. Analysis l  Solr does not really search your text, but rather the terms that result from the analysis of text l  Typically a chain of -  Character filter(s) -  Tokenisation -  Filter A -  Filter B -  …
  96. 96. © 2013 Paul Borgermans, K-Minds Comm.V. Solr comes with many tokenizers and filters l  Some are language specific l  Others are very specialised l  It is very important to get this right otherwise, you may not get what you expect!
  97. 97. © 2013 Paul Borgermans, K-Minds Comm.V. Text analysis examples Input phrase: Ivo Lukač presents a geek-interview on the eZSummerCamp.
  98. 98. © 2013 Paul Borgermans, K-Minds Comm.V. Character filters l  Used to cleanup text before tokenizing -  HTMLStripCharFilter (strips html, xml, js, css) -  MappingCharFilter (normalisation of characters, removing accents) -  Regular expression filter
  99. 99. © 2013 Paul Borgermans, K-Minds Comm.V. Tokenizers l  Convert text to tokens (terms) l  You can define only one per field/analyzer l  Examples -  WhitespaceTokenizer (splits on white space) -  StandardTokenizer -  CJK variants
  100. 100. © 2013 Paul Borgermans, K-Minds Comm.V. Additional filters l  Many possible per field/analyzer l  Many delivered with Solr out of the box l  If not enough, write a tiny bit of Java or look for contributions l  Examples ...
  101. 101. © 2013 Paul Borgermans, K-Minds Comm.V. Phonetic filters l  PhoneticFilterFactory l  “sounds like” transformations and matching l  Algorithms: -  Metaphone -  Double Metaphone -  Soundex -  Refined Soundex
  102. 102. © 2013 Paul Borgermans, K-Minds Comm.V. Reversing Filter l  Reverses the order of characters l  Use: allow “leading wildcards” l  *thing => gniht* l  A lot faster (prefixes)
  103. 103. © 2013 Paul Borgermans, K-Minds Comm.V. Synonyms l  Inject synonyms for certain terms l  Language specific l  Best used for query time analysis -  may inflate the search index too much -  decreases relevancy
  104. 104. © 2013 Paul Borgermans, K-Minds Comm.V. Stemming l  Reduce terms to their root form -  Plural forms -  Conjugations l  Language specific (or not relevant, CJK) l  Many specialised stemmers available -  Most european languages -  Some exotic ones through contributions outside ASF
  105. 105. © 2013 Paul Borgermans, K-Minds Comm.V. Copy fields l  Analysis is done differently for -  searching/filtering -  faceting/sorting l  Stemming and not stemming in different fields can increase relevance of results l  Use copy fields in schema.xml or do it client side
  106. 106. © 2013 Paul Borgermans, K-Minds Comm.V. Geospatial fields l  Solr dedicated fields -  Latitude Longitude type (trunk) l  Special geospatial functions in filtering & boosting -  Haversine distance (geosphere) -  Simple ranges (squares in 2-D) -  Special query constructs (upcoming)
  107. 107. © 2013 Paul Borgermans, K-Minds Comm.V. Dedicated fields for every context in eZ Find if configured l  Context -  Search -  Facets -  Filtering (usually the same as search) -  Sorting l  ezfind.ini l  Also for custom handlers if needed (see part 6)
  108. 108. © 2013 Paul Borgermans, K-Minds Comm.V.

×