SlideShare a Scribd company logo
1 of 37
Download to read offline
Sphinx: Leveraging Scalable Search in Drupal



    Mike Cantelon
    http://straight.com
    http://mikecantelon.com




                                    Drupal Camp
                                    Vancouver,
                                    May 16 ­ 17, 2008
This Talk


 Aims to orient you to the Sphinx realm
 Code provided on mikecantelon.com to give
  you basic Drupal Sphinx-driven search
What Is Sphinx?


 Open source search engine
 Developed by Andrew Aksyonoff
 No Java required
    (written in C++)
   Mostly cross-platform
    (Windows version not yet production ready)
 Fast
 Scalable
 Flexible
 Straightforward
Who Uses It?


   NowPublic
   MySQL
   The Pirate Bay
   Joomla
   MiniNova
   Etc...
    http://www.sphinxsearch.com/powered.html
How Fast is It?


              http://peter-zaitsev.livejournal.com/12809.html
450


400


350


300                                                                                    4
                                                                                      3.5
250                                                                                    3
                                                                                      2.5
                                                                            Seconds
200                                                                         Cached     2                         Se c o n d s
                                                                                                                 Ca c h e d




                                                                                      1.5
150                                                                                    1
                                                                                      0.5
100
                                                                                       0
                                                                                            Mnogosearch Sphinx
 50


  0
                  MySQL Boolean Fulltext             Mnogosearch
      MySQL Fulltext                    MySQL LIKE                 Sphinx
How Does Sphinx Scale?


 Distributed, parallel searching/indexing across
  multiple agents
 Agents can be machines (“horizontal scaling”)
  or processors/cores (“vertical”)
 Searches get first sent to remote agents
 Distribution is transparent to the application:
  any duplicate results are removed
One of Many Potential Sphinx Architectures



                              Combined
                              Index
Sphinx
Layer




                   Index of              Index of
                   1&2                   3&4
MySQL
Layer




         MySQL     MySQL                 MySQL      MySQL
         Shard 1   Shard 2               Shard 3    Shard 4
Sphinx Components and Installation


   Installation is (usually) straightforward
    (requires make and C++ compiler)
    • ./configure
    • make
    • make install
   Sphinx consists of three components:
    • Search daemon (searchd command)
    • Indexer (indexer command)
    • Search client (search command)
Sphinx Index Fundamentals


   Can have one or more data source
   Sources include MySQL, MySQL Sphinx
    storage engine, Postgres, xmlpipe2
   Each source must share schema
   Each document must have global ID
   Indexes can be rotated seamlessly
   Indexes end up taking much more disk space
    than source text: plan accordingly!
Sphinx Index Options


   UTF-8 encoding
   Prefix and infix indexing for wildcard searches
   HTML stripping
   N-gram support for Chinese, Japanese, etc.
   Multiple stemming schemes
   Stop words
   Word forms
Sphinx Stemming


   Good stemming support
   “Stemming” is the breaking down of words
    into their base form
   In addition to stemming, Sphinx supports
    indexing via Soundex and Metatone
   Multiple stemmers can be applied
   Optionally support for Snowball project for
    international stemming algorithms
    (http://snowball.tartarus.org)
Configuration
Sphinx Configuration Overview


 /usr/local/etc/sphinx.conf is where Sphinx
  configuration normally lives
 Defines sources, indexes, and agents
 Configuration file is in custom format
Sphinx Source Configuration


   Source specifies:
    • Type
    • Connection info
    • Queries
    • Attributes
   Contains three kinds of queries:
    • Main query (what you want to index; first column
      must be document global ID)
    • Detail query (to fetch document details when
      using search command)
    • Optional “range” query
Source Range Query


 The range query is for pulling batches of rows,
  instead off all rows at once
 MyISAM has table-wide locking, so big queries
  can bog things down
 An example range query:
    sql_query_range = SELECT MIN(nid),MAX(nid) FROM node
    sql_range_step = 512
   sql_range_step defines how many rows to
    fetch on each pass
Sphinx Index And searchd Configuration


 The index will specify a source, path, and
  stemming/indexing options
 Make sure the path you specify for your index
  exists (for example: /var/opt/local/sphinx)
 The searchd section will require a directory to
  exist for logs, process ID file, etc.
 This directory will usually be /var/log/searchd
Starting The Sphinx Search Daemon


 searchd can be started by simply typing “sudo
  searchd” into the command line
 Once searchd is started, index a source by
  entering “sudo indexer <name of source>”
 Once your index is created, enter “search -p
  <some search term>” to test
 “Seamless” indexing can be done by adding
  the “--rotate” option when using indexer
Implementation
Sphinx: Leveraging Scalable Search in Drupal
Example of a Sphinx Implementation


   The “my_search” module, which will be
    available for download on mikecantelon.com,
    shows a simple use of Sphinx in Drupal
Sphinx Implementation Overview


 Sphinx data is accessible through multiple
  APIs: PHP, Python, Perl, Ruby, and Java
 APIs provides optional search snippet
  extraction, keyword highlighting, and result
  paging
 Results return only document IDs
 Hence, after searching returns results, you'll
  still likely need to fetch details of documents
  in result page
Sphinx Search Modes


   SPH_MATCH_ALL: match all keywords
   SPH_MATCH_ANY: match any keywords
   SPH_MATCH_BOOLEAN: no relevance, implicit
    boolean AND between keywords if not
    otherwise specified
   SPH_MATCH_PHRASE: treats query as a
    phrase and requires a perfect match
   SPH_MATCH_EXTENDED: allows for
    specification of phrases and/or keywords and
    allows boolean operators
Example of PHP Client Connect


   This example uses SPH_MATCH_EXTENDED:

 // Note: searchd needs to be running
 $cl = new SphinxClient();
 $cl­>SetServer('localhost', 3312 );
 $cl­>SetLimits($start_item, $items_per_page);
 $cl­>SetMatchMode(SPH_MATCH_EXTENDED);

   Note the use of SetLimits to allow us to
    implement paging/limiting
Sphinx Sort Modes


   Sphinx has a number of limited, specific
    search modes :
    • SPH_SORT_RELEVANCE
    • SPH_SORT_ATTR_DESC
    • SPH_SORT_ATTR_ASC
    • SPH_SORT_TIME_SEGMENTS
 SPH_SORT_EXTENDED provides SQL-like
  sorting, listing “columns” and specifying
  ascending/descending order
 SPH_SORT_EXPR allows sorting using a
  mathematical equation involving “columns”
Example of PHP Client Sort and Query


   This example sorts by relevance (most to
    least), then by creation date (reverse
    chronological order):
 $cl­>SetSortMode (SPH_SORT_EXTENDED,
   "@relevance DESC, created DESC");

 // Do Sphinx query
 $search_result = $cl­>Query(
   $query,
   'stories'
 );
Getting Fancier
Sphinx: Leveraging Scalable Search in Drupal
Using the Sphinx PHP API Highlighter


 Once you've received search results from
  Sphinx, you can use API's highlighter to
  generate excerpts and highlight matches
  within the excerpts
 To do this you must first retrieve the full text
  of each of the documents shown on the
  current results page
 This will need to be done by database queries
  or some kind of cache lookup
PHP API Highlighter Example

   In this example $content is an array
    containing document IDs for keys and
    document text for values:
    $highlighting_options = array(
      'before_match'    =>
      "<span style='background­color: yellow'>",
      'after_match'     => "</span>",
      'chunk_separator' => " ... ",
      'limit'           => 256,
      'around'          => 3,
    );
    $excerpts = $cl­>BuildExcerpts($content, 
    'stories', $query, $highlighting_options);
Humanizing SPH_MATCH_EXTENDED


 SPH_MATCH_EXTENDED can be “humanized”
  by parsing a user's query and breaking it into
  phrases and keywords separated by a chosen
  operator
 For example: given the search query “uwe
  boll” postal and the chosen search type
  “match any”, we'd use boolean operator OR
  between the extracted phrase and keyword
 This would end up, internally, as “uwe boll” |
  postal (note the OR pipe character)
Implementing Humanization in PHP


   One way to humanize via clumsy regular
    expression brutality:
    1)Get keywords by using preg_split to split into
      array using phrases (designated by double-
      quotes) as splitting points
    2)Get phrases by using preg_match_all, matching
      anything in double-quotes
    3)Merge arrays
    4)Implode using search operator with spaces on
      either side (i.e. “ & “ or “ | “)
What About Paging?


 As Sphinx queries don't go through Drupal's
  database abstraction functions, built in paging
  won't work automagically, as when using
  Drupal's pager_query function
 Writing paging logic that looks/acts identical
  to Drupal's is tedious
 You can, however, leverage Drupal's paging
  functionality so theme_pager will do your
  bidding...
How to Leverage Drupal's Paging


   Manually set globals to leverage paging:
$element = 0;
$GLOBALS['pager_page_array'] =
  explode(',', $_GET['page']);
$GLOBALS['pager_total_items'][$element] =
  $found_count;
$GLOBALS['pager_total'][$element] =
  ceil($found_count / $items_per_page);
$GLOBALS['pager_page_array'][$element] =
max(0, min((int)$GLOBALS['pager_page_array']
[$element], ((int)$GLOBALS['pager_total']
[$element]) ­ 1));
Multi-Value Attributes (MVAs)


 Allow to you add Drupal taxonomy data to
  Sphinx index
 Implementation requires two things:
    • Addition of sql_attr_multi specification in the
      source section of your Sphinx configuration file
    • sql_attr_multi tells how to query for taxonomy
    • Addition of SetFilter call in PHP API before issuing
      query
   SetFilter example filtering by two taxonomy
    terms:
    • $cl->SetFilter('tag', array(114288, 4567));
MVA sql_attr_multi Example


sql_attr_multi = uint tag from ranged­query; 
  SELECT n.nid, td.tid FROM term_data td INNER 
JOIN term_node tn 
    USING(tid) INNER JOIN node n ON 
tn.nid=n.nid 
    WHERE td.vid=9 AND n.nid >= $start AND 
n.nid <= $end; 
  SELECT MIN(n.nid), MAX(n.nid) FROM term_data 
td INNER JOIN term_node tn 
    USING(tid) INNER JOIN node n ON 
tn.nid=n.nid 
    WHERE td.vid=9
We're done!
Online Resources


 Official Sphinx site:
  http://www.sphinxsearch.com/
 These slides and example module:
  http://mikecantelon.com/talks/sphinx

More Related Content

What's hot

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHPHiraq Citra M
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
От Rails-way к модульной архитектуре
От Rails-way к модульной архитектуреОт Rails-way к модульной архитектуре
От Rails-way к модульной архитектуреIvan Nemytchenko
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 

What's hot (20)

Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHP
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
От Rails-way к модульной архитектуре
От Rails-way к модульной архитектуреОт Rails-way к модульной архитектуре
От Rails-way к модульной архитектуре
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 

Viewers also liked

Not Only Drupal
Not Only DrupalNot Only Drupal
Not Only Drupalmcantelon
 
Basic Crud In Django
Basic Crud In DjangoBasic Crud In Django
Basic Crud In Djangomcantelon
 
CodeFest 2014. Егоров В. — Что за… Dart?
CodeFest 2014. Егоров В. — Что за… Dart?CodeFest 2014. Егоров В. — Что за… Dart?
CodeFest 2014. Егоров В. — Что за… Dart?CodeFest
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesStoyan Stefanov
 
MetaZeta Clusters Overview
MetaZeta Clusters OverviewMetaZeta Clusters Overview
MetaZeta Clusters OverviewPaul Baclace
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicineAltuna Akalin
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Altuna Akalin
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
The Physics of Fast Image Compression
The Physics of Fast Image CompressionThe Physics of Fast Image Compression
The Physics of Fast Image CompressionCloudinary
 
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)Chris Aniszczyk
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
 
An Abusive Relationship with AngularJS
An Abusive Relationship with AngularJSAn Abusive Relationship with AngularJS
An Abusive Relationship with AngularJSMario Heiderich
 
Hour of Code: Best Practices for Successful Educators
Hour of Code: Best Practices for Successful EducatorsHour of Code: Best Practices for Successful Educators
Hour of Code: Best Practices for Successful EducatorsCode.org Teacher Community
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Fostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation EcosystemFostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation EcosystemTechstars
 

Viewers also liked (18)

Not Only Drupal
Not Only DrupalNot Only Drupal
Not Only Drupal
 
Basic Crud In Django
Basic Crud In DjangoBasic Crud In Django
Basic Crud In Django
 
CodeFest 2014. Егоров В. — Что за… Dart?
CodeFest 2014. Егоров В. — Что за… Dart?CodeFest 2014. Егоров В. — Что за… Dart?
CodeFest 2014. Егоров В. — Что за… Dart?
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
 
MetaZeta Clusters Overview
MetaZeta Clusters OverviewMetaZeta Clusters Overview
MetaZeta Clusters Overview
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicine
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
 
Danger Of Free
Danger Of FreeDanger Of Free
Danger Of Free
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
The Physics of Fast Image Compression
The Physics of Fast Image CompressionThe Physics of Fast Image Compression
The Physics of Fast Image Compression
 
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
An Abusive Relationship with AngularJS
An Abusive Relationship with AngularJSAn Abusive Relationship with AngularJS
An Abusive Relationship with AngularJS
 
Hour of Code: Best Practices for Successful Educators
Hour of Code: Best Practices for Successful EducatorsHour of Code: Best Practices for Successful Educators
Hour of Code: Best Practices for Successful Educators
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Fostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation EcosystemFostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation Ecosystem
 

Similar to Sphinx: Leveraging Scalable Search in Drupal

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilitiesVorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilitiesDefconRussia
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...J V
 
[Worskhop Summits] CSS3 Workshop
[Worskhop Summits] CSS3 Workshop[Worskhop Summits] CSS3 Workshop
[Worskhop Summits] CSS3 WorkshopChristopher Schmitt
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Websolutions Agency
 
Site Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariSite Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariJoseph Scott
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxMichael Hackstein
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
PHP language presentation
PHP language presentationPHP language presentation
PHP language presentationAnnujj Agrawaal
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDuraSpace
 

Similar to Sphinx: Leveraging Scalable Search in Drupal (20)

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilitiesVorontsov, golovko   ssrf attacks and sockets. smorgasbord of vulnerabilities
Vorontsov, golovko ssrf attacks and sockets. smorgasbord of vulnerabilities
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
[Worskhop Summits] CSS3 Workshop
[Worskhop Summits] CSS3 Workshop[Worskhop Summits] CSS3 Workshop
[Worskhop Summits] CSS3 Workshop
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8
 
Codeigniter
CodeigniterCodeigniter
Codeigniter
 
Site Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariSite Performance - From Pinto to Ferrari
Site Performance - From Pinto to Ferrari
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Apache solr
Apache solrApache solr
Apache solr
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
SphinxSE with MySQL
SphinxSE with MySQLSphinxSE with MySQL
SphinxSE with MySQL
 
PHP language presentation
PHP language presentationPHP language presentation
PHP language presentation
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/Export
 
Symfony Performance
Symfony PerformanceSymfony Performance
Symfony Performance
 

More from elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

More from elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Recently uploaded

EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)codyslingerland1
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 

Recently uploaded (20)

EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 

Sphinx: Leveraging Scalable Search in Drupal

  • 1. Sphinx: Leveraging Scalable Search in Drupal Mike Cantelon http://straight.com http://mikecantelon.com Drupal Camp Vancouver, May 16 ­ 17, 2008
  • 2. This Talk  Aims to orient you to the Sphinx realm  Code provided on mikecantelon.com to give you basic Drupal Sphinx-driven search
  • 3. What Is Sphinx?  Open source search engine  Developed by Andrew Aksyonoff  No Java required (written in C++)  Mostly cross-platform (Windows version not yet production ready)  Fast  Scalable  Flexible  Straightforward
  • 4. Who Uses It?  NowPublic  MySQL  The Pirate Bay  Joomla  MiniNova  Etc... http://www.sphinxsearch.com/powered.html
  • 5. How Fast is It? http://peter-zaitsev.livejournal.com/12809.html 450 400 350 300 4 3.5 250 3 2.5 Seconds 200 Cached 2 Se c o n d s Ca c h e d 1.5 150 1 0.5 100 0 Mnogosearch Sphinx 50 0 MySQL Boolean Fulltext Mnogosearch MySQL Fulltext MySQL LIKE Sphinx
  • 6. How Does Sphinx Scale?  Distributed, parallel searching/indexing across multiple agents  Agents can be machines (“horizontal scaling”) or processors/cores (“vertical”)  Searches get first sent to remote agents  Distribution is transparent to the application: any duplicate results are removed
  • 7. One of Many Potential Sphinx Architectures Combined Index Sphinx Layer Index of Index of 1&2 3&4 MySQL Layer MySQL MySQL MySQL MySQL Shard 1 Shard 2 Shard 3 Shard 4
  • 8. Sphinx Components and Installation  Installation is (usually) straightforward (requires make and C++ compiler) • ./configure • make • make install  Sphinx consists of three components: • Search daemon (searchd command) • Indexer (indexer command) • Search client (search command)
  • 9. Sphinx Index Fundamentals  Can have one or more data source  Sources include MySQL, MySQL Sphinx storage engine, Postgres, xmlpipe2  Each source must share schema  Each document must have global ID  Indexes can be rotated seamlessly  Indexes end up taking much more disk space than source text: plan accordingly!
  • 10. Sphinx Index Options  UTF-8 encoding  Prefix and infix indexing for wildcard searches  HTML stripping  N-gram support for Chinese, Japanese, etc.  Multiple stemming schemes  Stop words  Word forms
  • 11. Sphinx Stemming  Good stemming support  “Stemming” is the breaking down of words into their base form  In addition to stemming, Sphinx supports indexing via Soundex and Metatone  Multiple stemmers can be applied  Optionally support for Snowball project for international stemming algorithms (http://snowball.tartarus.org)
  • 13. Sphinx Configuration Overview  /usr/local/etc/sphinx.conf is where Sphinx configuration normally lives  Defines sources, indexes, and agents  Configuration file is in custom format
  • 14. Sphinx Source Configuration  Source specifies: • Type • Connection info • Queries • Attributes  Contains three kinds of queries: • Main query (what you want to index; first column must be document global ID) • Detail query (to fetch document details when using search command) • Optional “range” query
  • 15. Source Range Query  The range query is for pulling batches of rows, instead off all rows at once  MyISAM has table-wide locking, so big queries can bog things down  An example range query: sql_query_range = SELECT MIN(nid),MAX(nid) FROM node sql_range_step = 512  sql_range_step defines how many rows to fetch on each pass
  • 16. Sphinx Index And searchd Configuration  The index will specify a source, path, and stemming/indexing options  Make sure the path you specify for your index exists (for example: /var/opt/local/sphinx)  The searchd section will require a directory to exist for logs, process ID file, etc.  This directory will usually be /var/log/searchd
  • 17. Starting The Sphinx Search Daemon  searchd can be started by simply typing “sudo searchd” into the command line  Once searchd is started, index a source by entering “sudo indexer <name of source>”  Once your index is created, enter “search -p <some search term>” to test  “Seamless” indexing can be done by adding the “--rotate” option when using indexer
  • 20. Example of a Sphinx Implementation  The “my_search” module, which will be available for download on mikecantelon.com, shows a simple use of Sphinx in Drupal
  • 21. Sphinx Implementation Overview  Sphinx data is accessible through multiple APIs: PHP, Python, Perl, Ruby, and Java  APIs provides optional search snippet extraction, keyword highlighting, and result paging  Results return only document IDs  Hence, after searching returns results, you'll still likely need to fetch details of documents in result page
  • 22. Sphinx Search Modes  SPH_MATCH_ALL: match all keywords  SPH_MATCH_ANY: match any keywords  SPH_MATCH_BOOLEAN: no relevance, implicit boolean AND between keywords if not otherwise specified  SPH_MATCH_PHRASE: treats query as a phrase and requires a perfect match  SPH_MATCH_EXTENDED: allows for specification of phrases and/or keywords and allows boolean operators
  • 23. Example of PHP Client Connect  This example uses SPH_MATCH_EXTENDED:  // Note: searchd needs to be running  $cl = new SphinxClient();  $cl­>SetServer('localhost', 3312 );  $cl­>SetLimits($start_item, $items_per_page);  $cl­>SetMatchMode(SPH_MATCH_EXTENDED);  Note the use of SetLimits to allow us to implement paging/limiting
  • 24. Sphinx Sort Modes  Sphinx has a number of limited, specific search modes : • SPH_SORT_RELEVANCE • SPH_SORT_ATTR_DESC • SPH_SORT_ATTR_ASC • SPH_SORT_TIME_SEGMENTS  SPH_SORT_EXTENDED provides SQL-like sorting, listing “columns” and specifying ascending/descending order  SPH_SORT_EXPR allows sorting using a mathematical equation involving “columns”
  • 25. Example of PHP Client Sort and Query  This example sorts by relevance (most to least), then by creation date (reverse chronological order):  $cl­>SetSortMode (SPH_SORT_EXTENDED,    "@relevance DESC, created DESC");  // Do Sphinx query  $search_result = $cl­>Query(    $query,    'stories'  );
  • 28. Using the Sphinx PHP API Highlighter  Once you've received search results from Sphinx, you can use API's highlighter to generate excerpts and highlight matches within the excerpts  To do this you must first retrieve the full text of each of the documents shown on the current results page  This will need to be done by database queries or some kind of cache lookup
  • 29. PHP API Highlighter Example  In this example $content is an array containing document IDs for keys and document text for values: $highlighting_options = array(   'before_match'    =>   "<span style='background­color: yellow'>",   'after_match'     => "</span>",   'chunk_separator' => " ... ",   'limit'           => 256,   'around'          => 3, ); $excerpts = $cl­>BuildExcerpts($content,  'stories', $query, $highlighting_options);
  • 30. Humanizing SPH_MATCH_EXTENDED  SPH_MATCH_EXTENDED can be “humanized” by parsing a user's query and breaking it into phrases and keywords separated by a chosen operator  For example: given the search query “uwe boll” postal and the chosen search type “match any”, we'd use boolean operator OR between the extracted phrase and keyword  This would end up, internally, as “uwe boll” | postal (note the OR pipe character)
  • 31. Implementing Humanization in PHP  One way to humanize via clumsy regular expression brutality: 1)Get keywords by using preg_split to split into array using phrases (designated by double- quotes) as splitting points 2)Get phrases by using preg_match_all, matching anything in double-quotes 3)Merge arrays 4)Implode using search operator with spaces on either side (i.e. “ & “ or “ | “)
  • 32. What About Paging?  As Sphinx queries don't go through Drupal's database abstraction functions, built in paging won't work automagically, as when using Drupal's pager_query function  Writing paging logic that looks/acts identical to Drupal's is tedious  You can, however, leverage Drupal's paging functionality so theme_pager will do your bidding...
  • 33. How to Leverage Drupal's Paging  Manually set globals to leverage paging: $element = 0; $GLOBALS['pager_page_array'] =   explode(',', $_GET['page']); $GLOBALS['pager_total_items'][$element] =   $found_count; $GLOBALS['pager_total'][$element] =   ceil($found_count / $items_per_page); $GLOBALS['pager_page_array'][$element] = max(0, min((int)$GLOBALS['pager_page_array'] [$element], ((int)$GLOBALS['pager_total'] [$element]) ­ 1));
  • 34. Multi-Value Attributes (MVAs)  Allow to you add Drupal taxonomy data to Sphinx index  Implementation requires two things: • Addition of sql_attr_multi specification in the source section of your Sphinx configuration file • sql_attr_multi tells how to query for taxonomy • Addition of SetFilter call in PHP API before issuing query  SetFilter example filtering by two taxonomy terms: • $cl->SetFilter('tag', array(114288, 4567));
  • 37. Online Resources  Official Sphinx site: http://www.sphinxsearch.com/  These slides and example module: http://mikecantelon.com/talks/sphinx