SphinxSE with MySQL
Upcoming SlideShare
Loading in...5
×
 

SphinxSE with MySQL

on

  • 4,843 views

 

Statistics

Views

Total Views
4,843
Views on SlideShare
4,757
Embed Views
86

Actions

Likes
4
Downloads
61
Comments
0

14 Embeds 86

http://learnmysql.blogspot.com 54
http://techupali.linkpc.net 13
http://www.lmodules.com 4
http://learnmysql.blogspot.kr 2
http://learnmysql.blogspot.ru 2
http://learnmysql.blogspot.com.br 2
https://www.linkedin.com 2
http://learnmysql.blogspot.it 1
http://learnmysql.blogspot.com.au 1
http://learnmysql.blogspot.in 1
http://learnmysql.blogspot.fi 1
http://learnvbdotnet.blogspot.com 1
http://192.168.1.10 1
http://192.168.1.11 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config

SphinxSE with MySQL SphinxSE with MySQL Presentation Transcript

  •  
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  •  
    • Open Source Search Engine.
    • Developed by Andrew Aksyonoff
    • Integrates well with MySQL.
    • Provides greatly improved full-text search.
    • Specially designed for indexing databases.
  •  
  •  
    • Search on 500 MB of docs.
    • Docs are 3,000.000 in count.
    • Looking for “internet web design (match any)”.
    • Returning 134.000 docs.
  •  
  •  
    • It has Two standalone programs :
    • Indexer – Pulls data from DB, builds indexes.
    • Searchd- Uses indexes and answers queries.
    • Clients interact with searchd through :
    • Via native API’s: PHP, Python, Perl, Ruby, and Java.
    • Via SphinxSE.
    • Indexer periodically rebuilds the indexes :
    • Typically using cron jobs.
    • Searching works ok during rebuilds (Live Updates).
    • Sphinx documents = Records in DB.
    • Document = It just like ROW in DB and it has its own UNIQUE ID .
    • Each Document comprises of Fields and Attributes.
    • Fields are the columns on which we want to search.
    • Attributes may be used for filtering, sorting, grouping.
    • Sphinx Search Engine Returns only Unique Document ID’s.
    • This means if we Search for Dominos we get corresponding rows
    • UNIQUE ID possessing it.
    • 3. Hence after searching returns results, you will still likely NEED TO FETCH DETAILS of documents in your FINAL RESULT PAGE.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
    • SELECT id
    • FROM sphinx_table
    • WHERE
    • query =‘dominos; -- thing which you want to search
    • mode = ext2; -- searching mode
    • weights = 1000,100,10; --weight distribution
    • sort = attr_asc:group_id;’; --sorting type
    • SPH_MATCH_ALL : match all keywords.
    • SPH_MATCH_ANY : match any keywords.
    • SPH_MTACH_BOOLEAN : no relevance, implicit Boolean AND between keywords
    • if not specified otherwise.
    • 1. hello & world
    • 2. hello | world
    • 3. hello –world
    • SPH_MATCH_PHRASE : treats query as a phrase and requires a perfect match.
    • SPH_MATCH_EXTENDED : this has been super ceded by SPH_MATCH_EXTENDED2.
    • SPH_MATCH_EXTENDED2 : it provide varied functionalities.
    • FIELD SEARCH OPERATOR : @title hello @body world.
    • QUORUM MATCHING OPERATOR : “world is wonderful place”/3.
    • PROXIMITY SEARCH OPERATOR : “hello world”~10.
    • STRICT ORDER OPERATOR : black << cat
    • Phrase Ranking : Higher preference to Documents possessing matching phrase like “ hello world ”.
    • Statistical Ranking : Here more preference is giving to word frequency i.e.
    • Document containing more number of “ hello ” and/or “ world ” is given more weightage.
    • SPH_MATCH_BOOLEAN : No weighting performed.
    • SPH_MATCH_ALL and SPH_MATCH_PHRASE : Uses Phrase Ranking.
    • SPH_MATCH_ANY : Phrase ranks * Big value + Statistical ranking
    • ( Here we multiply with big value to guarantee higher phrase rank even if it’s field weight is low ).
    • SPH_MATCH_EXTENDED : ( Phrase Rank + BM25)*1000.
    • Personalized Weighting : This can be done using “weights “ keyword in your Sphinx Query. This is generally used in the case when we want more preference between column to be searched .
    • E.g. weights = 1,2,3; --this possible in mode=ext2.
    • SPH_SORT_RELEVANCE : Sorts by Relevance in DESC order.
    • SPH_SORT_ATTR_DESC : Sorts by an Attribute in DESC order.
    • SPH_SORT_ATTR_ASC : Sorts by an Attribute in ASC order.
    • SPH_SORT_TIME_SEGMENTS : Sorts by (hour/day/week/month) in DESC order.
    • SPH_SORT_EXTENDED : Here we can SPECIFY the COLUMNS on which we are applying our SEARCH for KEYWORDS for sorting order.
    • SPH_SORT_EXPR : Allows sorting using a mathematical equation involving column.
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
    • Installation is usually straightforward :
    • REQUIREMENT:
    • A Good working C++ compiler.
    • A Good Make Program.
    • STEPS:
    • $./configure - - prefix /path - -with-mysql - - with-pgsql
    • $make
    • $make install
  • Checking SphinxSE Installation
    • There are 2 components that we need to setup before Sphinx is ready for searching:
    • Sphinx Table
    • Configuration File (e.g.: file_name.conf )
    • Requirements:
    • The data types of the first 3 columns must be INT,INT,VARCHAR.
    • which will be mapped to document id, match weight and the search query.
    • Query column must be indexed and no other column must be indexed.
    • All other attributes in the source comes as columns.
    • CREATE TABLE sphinx_table
    • (
    • id int not null,
    • Weight int not null,
    • Query varchar(255) not null,
    • Key (query)
    • )ENGINE=SPHINX CONNECTION=‘sphinx://localhost:3313/city_search_cust_mess’
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
    • Following are some of the options available in the source section of the configuration file:
    • TYPE:
    • type : data source type.
    • possible options: mysql,pgsql,xmlpipe,xmlpipe2.
    • Connection Info:
    • sql_host : SQL server host to connect (Mandatory).
    • sql_port : SQL server IP to connect ( Default 3306).
    • sql_user : SQL user to use when connecting to sql_host (Mandatory).
    • sql_pass : SQL user password to use when connecting to sql_host (Mandatory).
    • sql_db : SQL DB to be used.
    • sql_sock : socket name to connect to for local SQL servers.
    • Queries Info:
    • mysql_query_pre : pre-fetch query , or pre-query.
    • eg: sql_query_pre= SET NAMES utf8
    • sql_query : main document fetch query.
    • sql_query_post : Post-fetch query.
    • e.g.: sql_query_post= DROP TABLE my_tmp_table
    • sql_query_info : Document info query. (similar to comment in MySQL)
    • Attributes Info:
    • sql_attr_xxx: attribute declaration.(xxx : uint,bigint,float,str2ordinal,timestamp).
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
    • type: index type .optional (possible option: local , distributed)
    • source: adds document source to local index. Multi-value.
    • path: Index files path and file name (without extension).
    • docinfo : Document attribute values ( inline , extern ) storage mode.
    • mlock : Memory locking for cached data . (Optional default 0).
    • min_word_len: minimum indexed word length (optional default 1).
    • Charset type: character set encoding type
    • Stemming Options:
    • morphology : A list of morphology preprocessors to apply.
    • e.g.: cars = car ; running =run.
    • Stopwords : stopwords file list (space seperated).
    • e.g.: the,is,are,an,a,etc….
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
    • mem_limit : Indexing RAM usage limit . Optional, default is 32MB.
    • max_iops : maximum i/o operations per second.
    • max_iosize : maximum allowed i/o operation size.
    Setting Configuration File: Indexer Section
    • Now in a Configuration File there are 4 section to configure which are as follows:
    • Source (multiple)
    • Index (multiple)
    • Indexer
    • Searchd
    • address: IP address to bind on default 0.0.0.0 listens to all interfaces.
    • port : searchd TCP port number. (mandatory, default is 3312).
    • log : log file name. (optional, default is empty).
    • query_log : query log file name . (optional , default is empty).
    • pid file : searchd process ID file name (mandatory).
    • max_matches : maximum amount of matches that the daemon keep in RAM for each index and can return to the client. (optional, default 1000)
    • preopen_indexes : whether to forcibly preopen all indexes on startup.(optional , default 0 i.e. don’t open).
    Setting Configuration File: Searchd Section
  •  
  •  
    • Introduction to Sphinx .
    • Sphinx Searching and Sorting Features.
    • Sphinx Implementation.
    • Demo.
  •  
  •