SphinxSE with MySQL

  • 4,178 views
Uploaded on

 

More in: Technology , Design
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,178
On Slideshare
0
From Embeds
0
Number of Embeds
10

Actions

Shares
Downloads
61
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config

Transcript

  • 1.  
  • 2.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  • 3.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  • 4.  
  • 5.
    • Open Source Search Engine.
    • Developed by Andrew Aksyonoff
    • Integrates well with MySQL.
    • Provides greatly improved full-text search.
    • Specially designed for indexing databases.
  • 6.  
  • 7.  
  • 8.
    • Search on 500 MB of docs.
    • Docs are 3,000.000 in count.
    • Looking for “internet web design (match any)”.
    • Returning 134.000 docs.
  • 9.  
  • 10.  
  • 11.
    • It has Two standalone programs :
    • Indexer – Pulls data from DB, builds indexes.
    • Searchd- Uses indexes and answers queries.
    • Clients interact with searchd through :
    • Via native API’s: PHP, Python, Perl, Ruby, and Java.
    • Via SphinxSE.
    • Indexer periodically rebuilds the indexes :
    • Typically using cron jobs.
    • Searching works ok during rebuilds (Live Updates).
  • 12.
    • Sphinx documents = Records in DB.
    • Document = It just like ROW in DB and it has its own UNIQUE ID .
    • Each Document comprises of Fields and Attributes.
    • Fields are the columns on which we want to search.
    • Attributes may be used for filtering, sorting, grouping.
  • 13.
    • Sphinx Search Engine Returns only Unique Document ID’s.
    • This means if we Search for Dominos we get corresponding rows
    • UNIQUE ID possessing it.
    • 3. Hence after searching returns results, you will still likely NEED TO FETCH DETAILS of documents in your FINAL RESULT PAGE.
  • 14.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  • 15.
    • SELECT id
    • FROM sphinx_table
    • WHERE
    • query =‘dominos; -- thing which you want to search
    • mode = ext2; -- searching mode
    • weights = 1000,100,10; --weight distribution
    • sort = attr_asc:group_id;’; --sorting type
  • 16.
    • SPH_MATCH_ALL : match all keywords.
    • SPH_MATCH_ANY : match any keywords.
    • SPH_MTACH_BOOLEAN : no relevance, implicit Boolean AND between keywords
    • if not specified otherwise.
    • 1. hello & world
    • 2. hello | world
    • 3. hello –world
    • SPH_MATCH_PHRASE : treats query as a phrase and requires a perfect match.
    • SPH_MATCH_EXTENDED : this has been super ceded by SPH_MATCH_EXTENDED2.
    • SPH_MATCH_EXTENDED2 : it provide varied functionalities.
  • 17.
    • FIELD SEARCH OPERATOR : @title hello @body world.
    • QUORUM MATCHING OPERATOR : “world is wonderful place”/3.
    • PROXIMITY SEARCH OPERATOR : “hello world”~10.
    • STRICT ORDER OPERATOR : black << cat
  • 18.
    • Phrase Ranking : Higher preference to Documents possessing matching phrase like “ hello world ”.
    • Statistical Ranking : Here more preference is giving to word frequency i.e.
    • Document containing more number of “ hello ” and/or “ world ” is given more weightage.
  • 19.
    • SPH_MATCH_BOOLEAN : No weighting performed.
    • SPH_MATCH_ALL and SPH_MATCH_PHRASE : Uses Phrase Ranking.
    • SPH_MATCH_ANY : Phrase ranks * Big value + Statistical ranking
    • ( Here we multiply with big value to guarantee higher phrase rank even if it’s field weight is low ).
    • SPH_MATCH_EXTENDED : ( Phrase Rank + BM25)*1000.
    • Personalized Weighting : This can be done using “weights “ keyword in your Sphinx Query. This is generally used in the case when we want more preference between column to be searched .
    • E.g. weights = 1,2,3; --this possible in mode=ext2.
  • 20.
    • SPH_SORT_RELEVANCE : Sorts by Relevance in DESC order.
    • SPH_SORT_ATTR_DESC : Sorts by an Attribute in DESC order.
    • SPH_SORT_ATTR_ASC : Sorts by an Attribute in ASC order.
    • SPH_SORT_TIME_SEGMENTS : Sorts by (hour/day/week/month) in DESC order.
    • SPH_SORT_EXTENDED : Here we can SPECIFY the COLUMNS on which we are applying our SEARCH for KEYWORDS for sorting order.
    • SPH_SORT_EXPR : Allows sorting using a mathematical equation involving column.
  • 21.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  • 22.
    • Installation is usually straightforward :
    • REQUIREMENT:
    • A Good working C++ compiler.
    • A Good Make Program.
    • STEPS:
    • $./configure - - prefix /path - -with-mysql - - with-pgsql
    • $make
    • $make install
  • 23. Checking SphinxSE Installation
  • 24.
    • There are 2 components that we need to setup before Sphinx is ready for searching:
    • Sphinx Table
    • Configuration File (e.g.: file_name.conf )
  • 25.
    • Requirements:
    • The data types of the first 3 columns must be INT,INT,VARCHAR.
    • which will be mapped to document id, match weight and the search query.
    • Query column must be indexed and no other column must be indexed.
    • All other attributes in the source comes as columns.
    • CREATE TABLE sphinx_table
    • (
    • id int not null,
    • Weight int not null,
    • Query varchar(255) not null,
    • Key (query)
    • )ENGINE=SPHINX CONNECTION=‘sphinx://localhost:3313/city_search_cust_mess’
  • 26.
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
  • 27.
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
  • 28.
    • Following are some of the options available in the source section of the configuration file:
    • TYPE:
    • type : data source type.
    • possible options: mysql,pgsql,xmlpipe,xmlpipe2.
    • Connection Info:
    • sql_host : SQL server host to connect (Mandatory).
    • sql_port : SQL server IP to connect ( Default 3306).
    • sql_user : SQL user to use when connecting to sql_host (Mandatory).
    • sql_pass : SQL user password to use when connecting to sql_host (Mandatory).
    • sql_db : SQL DB to be used.
    • sql_sock : socket name to connect to for local SQL servers.
  • 29.
    • Queries Info:
    • mysql_query_pre : pre-fetch query , or pre-query.
    • eg: sql_query_pre= SET NAMES utf8
    • sql_query : main document fetch query.
    • sql_query_post : Post-fetch query.
    • e.g.: sql_query_post= DROP TABLE my_tmp_table
    • sql_query_info : Document info query. (similar to comment in MySQL)
    • Attributes Info:
    • sql_attr_xxx: attribute declaration.(xxx : uint,bigint,float,str2ordinal,timestamp).
  • 30.
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
  • 31.
    • type: index type .optional (possible option: local , distributed)
    • source: adds document source to local index. Multi-value.
    • path: Index files path and file name (without extension).
    • docinfo : Document attribute values ( inline , extern ) storage mode.
    • mlock : Memory locking for cached data . (Optional default 0).
    • min_word_len: minimum indexed word length (optional default 1).
    • Charset type: character set encoding type
  • 32.
    • Stemming Options:
    • morphology : A list of morphology preprocessors to apply.
    • e.g.: cars = car ; running =run.
    • Stopwords : stopwords file list (space seperated).
    • e.g.: the,is,are,an,a,etc….
  • 33.
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
  • 34.
    • mem_limit : Indexing RAM usage limit . Optional, default is 32MB.
    • max_iops : maximum i/o operations per second.
    • max_iosize : maximum allowed i/o operation size.
    Setting Configuration File: Indexer Section
  • 35.
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
  • 36.
    • address: IP address to bind on default 0.0.0.0 listens to all interfaces.
    • port : searchd TCP port number. (mandatory, default is 3312).
    • log : log file name. (optional, default is empty).
    • query_log : query log file name . (optional , default is empty).
    • pid file : searchd process ID file name (mandatory).
    • max_matches : maximum amount of matches that the daemon keep in RAM for each index and can return to the client. (optional, default 1000)
    • preopen_indexes : whether to forcibly preopen all indexes on startup.(optional , default 0 i.e. don’t open).
    Setting Configuration File: Searchd Section
  • 37.  
  • 38.  
  • 39.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  • 40.  
  • 41.