Your SlideShare is downloading. ×
0
 
<ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Impleme...
<ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Impleme...
 
<ul><li>Open Source Search Engine. </li></ul><ul><li>Developed by  Andrew Aksyonoff </li></ul><ul><li>Integrates well with...
 
 
<ul><li>Search on 500 MB of docs. </li></ul><ul><li>Docs are 3,000.000 in count. </li></ul><ul><li>Looking for “internet w...
 
 
<ul><li>It has  Two standalone programs : </li></ul><ul><li>Indexer – Pulls data from DB, builds indexes. </li></ul><ul><l...
<ul><li>Sphinx documents = Records in DB. </li></ul><ul><li>Document  = It  just like  ROW in DB  and it has its own  UNIQ...
<ul><li>Sphinx  Search Engine Returns only Unique Document ID’s. </li></ul><ul><li>This means if   we   Search   for   Dom...
<ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Impleme...
<ul><li>SELECT   id </li></ul><ul><li>FROM  sphinx_table </li></ul><ul><li>WHERE   </li></ul><ul><li>query =‘dominos;  -- ...
<ul><li>SPH_MATCH_ALL  :  match all keywords. </li></ul><ul><li>SPH_MATCH_ANY  :  match any keywords. </li></ul><ul><li>SP...
<ul><li>FIELD SEARCH OPERATOR  : @title hello @body world. </li></ul><ul><li>QUORUM MATCHING OPERATOR  : “world is wonderf...
<ul><li>Phrase Ranking  : Higher preference to Documents possessing matching phrase like “ hello world ”. </li></ul><ul><l...
<ul><li>SPH_MATCH_BOOLEAN  :  No weighting performed. </li></ul><ul><li>SPH_MATCH_ALL  and  SPH_MATCH_PHRASE  :  Uses Phra...
<ul><li>SPH_SORT_RELEVANCE  :  Sorts by Relevance in DESC order. </li></ul><ul><li>SPH_SORT_ATTR_DESC  :  Sorts by an Attr...
<ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Impleme...
<ul><li>Installation is usually straightforward : </li></ul><ul><li>REQUIREMENT: </li></ul><ul><li>A Good working C++ comp...
Checking SphinxSE Installation
<ul><li>There are 2 components  that  we need to setup before Sphinx is ready for searching: </li></ul><ul><li>Sphinx Tabl...
<ul><li>Requirements:   </li></ul><ul><li>The data types of the first 3 columns must be  INT,INT,VARCHAR. </li></ul><ul><l...
<ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multi...
<ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multi...
<ul><li>Following are some of the options available in the source section of the configuration file: </li></ul><ul><li>TYP...
<ul><li>Queries Info: </li></ul><ul><li>mysql_query_pre  : pre-fetch query , or pre-query.  </li></ul><ul><li>eg: sql_quer...
<ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multi...
<ul><li>type:  index type  .optional  (possible option: local , distributed) </li></ul><ul><li>source:  adds document sour...
<ul><li>Stemming Options: </li></ul><ul><li>morphology :  A list of morphology preprocessors to apply. </li></ul><ul><li>e...
<ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multi...
<ul><li>mem_limit  : Indexing RAM usage limit . Optional, default is 32MB. </li></ul><ul><li>max_iops : maximum i/o operat...
<ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multi...
<ul><li>address:  IP address to bind on default 0.0.0.0 listens to all interfaces. </li></ul><ul><li>port  : searchd TCP p...
 
 
<ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Impleme...
 
 
Upcoming SlideShare
Loading in...5
×

SphinxSE with MySQL

4,395

Published on

Published in: Technology, Design
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,395
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
66
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Show an dummy config file after this slide before moving on with the options of config
  • Transcript of "SphinxSE with MySQL"

    1. 2. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
    2. 3. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
    3. 5. <ul><li>Open Source Search Engine. </li></ul><ul><li>Developed by Andrew Aksyonoff </li></ul><ul><li>Integrates well with MySQL. </li></ul><ul><li>Provides greatly improved full-text search. </li></ul><ul><li>Specially designed for indexing databases. </li></ul>
    4. 8. <ul><li>Search on 500 MB of docs. </li></ul><ul><li>Docs are 3,000.000 in count. </li></ul><ul><li>Looking for “internet web design (match any)”. </li></ul><ul><li>Returning 134.000 docs. </li></ul>
    5. 11. <ul><li>It has Two standalone programs : </li></ul><ul><li>Indexer – Pulls data from DB, builds indexes. </li></ul><ul><li>Searchd- Uses indexes and answers queries. </li></ul><ul><li>Clients interact with searchd through : </li></ul><ul><li>Via native API’s: PHP, Python, Perl, Ruby, and Java. </li></ul><ul><li>Via SphinxSE. </li></ul><ul><li>Indexer periodically rebuilds the indexes : </li></ul><ul><li>Typically using cron jobs. </li></ul><ul><li>Searching works ok during rebuilds (Live Updates). </li></ul>
    6. 12. <ul><li>Sphinx documents = Records in DB. </li></ul><ul><li>Document = It just like ROW in DB and it has its own UNIQUE ID . </li></ul><ul><li>Each Document comprises of Fields and Attributes. </li></ul><ul><li>Fields are the columns on which we want to search. </li></ul><ul><li>Attributes may be used for filtering, sorting, grouping. </li></ul>
    7. 13. <ul><li>Sphinx Search Engine Returns only Unique Document ID’s. </li></ul><ul><li>This means if we Search for Dominos we get corresponding rows </li></ul><ul><li>UNIQUE ID possessing it. </li></ul><ul><li>3. Hence after searching returns results, you will still likely NEED TO FETCH DETAILS of documents in your FINAL RESULT PAGE. </li></ul>
    8. 14. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
    9. 15. <ul><li>SELECT id </li></ul><ul><li>FROM sphinx_table </li></ul><ul><li>WHERE </li></ul><ul><li>query =‘dominos; -- thing which you want to search </li></ul><ul><li>mode = ext2; -- searching mode </li></ul><ul><li>weights = 1000,100,10; --weight distribution </li></ul><ul><li>sort = attr_asc:group_id;’; --sorting type </li></ul>
    10. 16. <ul><li>SPH_MATCH_ALL : match all keywords. </li></ul><ul><li>SPH_MATCH_ANY : match any keywords. </li></ul><ul><li>SPH_MTACH_BOOLEAN : no relevance, implicit Boolean AND between keywords </li></ul><ul><li>if not specified otherwise. </li></ul><ul><li>1. hello & world </li></ul><ul><li>2. hello | world </li></ul><ul><li>3. hello –world </li></ul><ul><li>SPH_MATCH_PHRASE : treats query as a phrase and requires a perfect match. </li></ul><ul><li>SPH_MATCH_EXTENDED : this has been super ceded by SPH_MATCH_EXTENDED2. </li></ul><ul><li>SPH_MATCH_EXTENDED2 : it provide varied functionalities. </li></ul>
    11. 17. <ul><li>FIELD SEARCH OPERATOR : @title hello @body world. </li></ul><ul><li>QUORUM MATCHING OPERATOR : “world is wonderful place”/3. </li></ul><ul><li>PROXIMITY SEARCH OPERATOR : “hello world”~10. </li></ul><ul><li>STRICT ORDER OPERATOR : black << cat </li></ul>
    12. 18. <ul><li>Phrase Ranking : Higher preference to Documents possessing matching phrase like “ hello world ”. </li></ul><ul><li>Statistical Ranking : Here more preference is giving to word frequency i.e. </li></ul><ul><li>Document containing more number of “ hello ” and/or “ world ” is given more weightage. </li></ul>
    13. 19. <ul><li>SPH_MATCH_BOOLEAN : No weighting performed. </li></ul><ul><li>SPH_MATCH_ALL and SPH_MATCH_PHRASE : Uses Phrase Ranking. </li></ul><ul><li>SPH_MATCH_ANY : Phrase ranks * Big value + Statistical ranking </li></ul><ul><li>( Here we multiply with big value to guarantee higher phrase rank even if it’s field weight is low ). </li></ul><ul><li>SPH_MATCH_EXTENDED : ( Phrase Rank + BM25)*1000. </li></ul><ul><li>Personalized Weighting : This can be done using “weights “ keyword in your Sphinx Query. This is generally used in the case when we want more preference between column to be searched . </li></ul><ul><li>E.g. weights = 1,2,3; --this possible in mode=ext2. </li></ul>
    14. 20. <ul><li>SPH_SORT_RELEVANCE : Sorts by Relevance in DESC order. </li></ul><ul><li>SPH_SORT_ATTR_DESC : Sorts by an Attribute in DESC order. </li></ul><ul><li>SPH_SORT_ATTR_ASC : Sorts by an Attribute in ASC order. </li></ul><ul><li>SPH_SORT_TIME_SEGMENTS : Sorts by (hour/day/week/month) in DESC order. </li></ul><ul><li>SPH_SORT_EXTENDED : Here we can SPECIFY the COLUMNS on which we are applying our SEARCH for KEYWORDS for sorting order. </li></ul><ul><li>SPH_SORT_EXPR : Allows sorting using a mathematical equation involving column. </li></ul>
    15. 21. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
    16. 22. <ul><li>Installation is usually straightforward : </li></ul><ul><li>REQUIREMENT: </li></ul><ul><li>A Good working C++ compiler. </li></ul><ul><li>A Good Make Program. </li></ul><ul><li>STEPS: </li></ul><ul><li>$./configure - - prefix /path - -with-mysql - - with-pgsql </li></ul><ul><li>$make </li></ul><ul><li>$make install </li></ul>
    17. 23. Checking SphinxSE Installation
    18. 24. <ul><li>There are 2 components that we need to setup before Sphinx is ready for searching: </li></ul><ul><li>Sphinx Table </li></ul><ul><li>Configuration File (e.g.: file_name.conf ) </li></ul>
    19. 25. <ul><li>Requirements: </li></ul><ul><li>The data types of the first 3 columns must be INT,INT,VARCHAR. </li></ul><ul><li>which will be mapped to document id, match weight and the search query. </li></ul><ul><li>Query column must be indexed and no other column must be indexed. </li></ul><ul><li>All other attributes in the source comes as columns. </li></ul><ul><li>CREATE TABLE sphinx_table </li></ul><ul><li>( </li></ul><ul><li>id int not null, </li></ul><ul><li>Weight int not null, </li></ul><ul><li>Query varchar(255) not null, </li></ul><ul><li>Key (query) </li></ul><ul><li>)ENGINE=SPHINX CONNECTION=‘sphinx://localhost:3313/city_search_cust_mess’ </li></ul>
    20. 26. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
    21. 27. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
    22. 28. <ul><li>Following are some of the options available in the source section of the configuration file: </li></ul><ul><li>TYPE: </li></ul><ul><li>type : data source type. </li></ul><ul><li>possible options: mysql,pgsql,xmlpipe,xmlpipe2. </li></ul><ul><li>Connection Info: </li></ul><ul><li>sql_host : SQL server host to connect (Mandatory). </li></ul><ul><li>sql_port : SQL server IP to connect ( Default 3306). </li></ul><ul><li>sql_user : SQL user to use when connecting to sql_host (Mandatory). </li></ul><ul><li>sql_pass : SQL user password to use when connecting to sql_host (Mandatory). </li></ul><ul><li>sql_db : SQL DB to be used. </li></ul><ul><li>sql_sock : socket name to connect to for local SQL servers. </li></ul>
    23. 29. <ul><li>Queries Info: </li></ul><ul><li>mysql_query_pre : pre-fetch query , or pre-query. </li></ul><ul><li>eg: sql_query_pre= SET NAMES utf8 </li></ul><ul><li>sql_query : main document fetch query. </li></ul><ul><li>sql_query_post : Post-fetch query. </li></ul><ul><li> e.g.: sql_query_post= DROP TABLE my_tmp_table </li></ul><ul><li>sql_query_info : Document info query. (similar to comment in MySQL) </li></ul><ul><li>Attributes Info: </li></ul><ul><li>sql_attr_xxx: attribute declaration.(xxx : uint,bigint,float,str2ordinal,timestamp). </li></ul>
    24. 30. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
    25. 31. <ul><li>type: index type .optional (possible option: local , distributed) </li></ul><ul><li>source: adds document source to local index. Multi-value. </li></ul><ul><li>path: Index files path and file name (without extension). </li></ul><ul><li>docinfo : Document attribute values ( inline , extern ) storage mode. </li></ul><ul><li>mlock : Memory locking for cached data . (Optional default 0). </li></ul><ul><li>min_word_len: minimum indexed word length (optional default 1). </li></ul><ul><li>Charset type: character set encoding type </li></ul>
    26. 32. <ul><li>Stemming Options: </li></ul><ul><li>morphology : A list of morphology preprocessors to apply. </li></ul><ul><li>e.g.: cars = car ; running =run. </li></ul><ul><li>Stopwords : stopwords file list (space seperated). </li></ul><ul><li>e.g.: the,is,are,an,a,etc…. </li></ul>
    27. 33. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
    28. 34. <ul><li>mem_limit : Indexing RAM usage limit . Optional, default is 32MB. </li></ul><ul><li>max_iops : maximum i/o operations per second. </li></ul><ul><li>max_iosize : maximum allowed i/o operation size. </li></ul>Setting Configuration File: Indexer Section
    29. 35. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
    30. 36. <ul><li>address: IP address to bind on default 0.0.0.0 listens to all interfaces. </li></ul><ul><li>port : searchd TCP port number. (mandatory, default is 3312). </li></ul><ul><li>log : log file name. (optional, default is empty). </li></ul><ul><li>query_log : query log file name . (optional , default is empty). </li></ul><ul><li>pid file : searchd process ID file name (mandatory). </li></ul><ul><li>max_matches : maximum amount of matches that the daemon keep in RAM for each index and can return to the client. (optional, default 1000) </li></ul><ul><li>preopen_indexes : whether to forcibly preopen all indexes on startup.(optional , default 0 i.e. don’t open). </li></ul>Setting Configuration File: Searchd Section
    31. 39. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×