Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SphinxSE with MySQL

5,010 views

Published on

Published in: Technology, Design
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2u6xbL5 ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/2u6xbL5 ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

SphinxSE with MySQL

  1. 2. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
  2. 3. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
  3. 5. <ul><li>Open Source Search Engine. </li></ul><ul><li>Developed by Andrew Aksyonoff </li></ul><ul><li>Integrates well with MySQL. </li></ul><ul><li>Provides greatly improved full-text search. </li></ul><ul><li>Specially designed for indexing databases. </li></ul>
  4. 8. <ul><li>Search on 500 MB of docs. </li></ul><ul><li>Docs are 3,000.000 in count. </li></ul><ul><li>Looking for “internet web design (match any)”. </li></ul><ul><li>Returning 134.000 docs. </li></ul>
  5. 11. <ul><li>It has Two standalone programs : </li></ul><ul><li>Indexer – Pulls data from DB, builds indexes. </li></ul><ul><li>Searchd- Uses indexes and answers queries. </li></ul><ul><li>Clients interact with searchd through : </li></ul><ul><li>Via native API’s: PHP, Python, Perl, Ruby, and Java. </li></ul><ul><li>Via SphinxSE. </li></ul><ul><li>Indexer periodically rebuilds the indexes : </li></ul><ul><li>Typically using cron jobs. </li></ul><ul><li>Searching works ok during rebuilds (Live Updates). </li></ul>
  6. 12. <ul><li>Sphinx documents = Records in DB. </li></ul><ul><li>Document = It just like ROW in DB and it has its own UNIQUE ID . </li></ul><ul><li>Each Document comprises of Fields and Attributes. </li></ul><ul><li>Fields are the columns on which we want to search. </li></ul><ul><li>Attributes may be used for filtering, sorting, grouping. </li></ul>
  7. 13. <ul><li>Sphinx Search Engine Returns only Unique Document ID’s. </li></ul><ul><li>This means if we Search for Dominos we get corresponding rows </li></ul><ul><li>UNIQUE ID possessing it. </li></ul><ul><li>3. Hence after searching returns results, you will still likely NEED TO FETCH DETAILS of documents in your FINAL RESULT PAGE. </li></ul>
  8. 14. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
  9. 15. <ul><li>SELECT id </li></ul><ul><li>FROM sphinx_table </li></ul><ul><li>WHERE </li></ul><ul><li>query =‘dominos; -- thing which you want to search </li></ul><ul><li>mode = ext2; -- searching mode </li></ul><ul><li>weights = 1000,100,10; --weight distribution </li></ul><ul><li>sort = attr_asc:group_id;’; --sorting type </li></ul>
  10. 16. <ul><li>SPH_MATCH_ALL : match all keywords. </li></ul><ul><li>SPH_MATCH_ANY : match any keywords. </li></ul><ul><li>SPH_MTACH_BOOLEAN : no relevance, implicit Boolean AND between keywords </li></ul><ul><li>if not specified otherwise. </li></ul><ul><li>1. hello & world </li></ul><ul><li>2. hello | world </li></ul><ul><li>3. hello –world </li></ul><ul><li>SPH_MATCH_PHRASE : treats query as a phrase and requires a perfect match. </li></ul><ul><li>SPH_MATCH_EXTENDED : this has been super ceded by SPH_MATCH_EXTENDED2. </li></ul><ul><li>SPH_MATCH_EXTENDED2 : it provide varied functionalities. </li></ul>
  11. 17. <ul><li>FIELD SEARCH OPERATOR : @title hello @body world. </li></ul><ul><li>QUORUM MATCHING OPERATOR : “world is wonderful place”/3. </li></ul><ul><li>PROXIMITY SEARCH OPERATOR : “hello world”~10. </li></ul><ul><li>STRICT ORDER OPERATOR : black << cat </li></ul>
  12. 18. <ul><li>Phrase Ranking : Higher preference to Documents possessing matching phrase like “ hello world ”. </li></ul><ul><li>Statistical Ranking : Here more preference is giving to word frequency i.e. </li></ul><ul><li>Document containing more number of “ hello ” and/or “ world ” is given more weightage. </li></ul>
  13. 19. <ul><li>SPH_MATCH_BOOLEAN : No weighting performed. </li></ul><ul><li>SPH_MATCH_ALL and SPH_MATCH_PHRASE : Uses Phrase Ranking. </li></ul><ul><li>SPH_MATCH_ANY : Phrase ranks * Big value + Statistical ranking </li></ul><ul><li>( Here we multiply with big value to guarantee higher phrase rank even if it’s field weight is low ). </li></ul><ul><li>SPH_MATCH_EXTENDED : ( Phrase Rank + BM25)*1000. </li></ul><ul><li>Personalized Weighting : This can be done using “weights “ keyword in your Sphinx Query. This is generally used in the case when we want more preference between column to be searched . </li></ul><ul><li>E.g. weights = 1,2,3; --this possible in mode=ext2. </li></ul>
  14. 20. <ul><li>SPH_SORT_RELEVANCE : Sorts by Relevance in DESC order. </li></ul><ul><li>SPH_SORT_ATTR_DESC : Sorts by an Attribute in DESC order. </li></ul><ul><li>SPH_SORT_ATTR_ASC : Sorts by an Attribute in ASC order. </li></ul><ul><li>SPH_SORT_TIME_SEGMENTS : Sorts by (hour/day/week/month) in DESC order. </li></ul><ul><li>SPH_SORT_EXTENDED : Here we can SPECIFY the COLUMNS on which we are applying our SEARCH for KEYWORDS for sorting order. </li></ul><ul><li>SPH_SORT_EXPR : Allows sorting using a mathematical equation involving column. </li></ul>
  15. 21. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>
  16. 22. <ul><li>Installation is usually straightforward : </li></ul><ul><li>REQUIREMENT: </li></ul><ul><li>A Good working C++ compiler. </li></ul><ul><li>A Good Make Program. </li></ul><ul><li>STEPS: </li></ul><ul><li>$./configure - - prefix /path - -with-mysql - - with-pgsql </li></ul><ul><li>$make </li></ul><ul><li>$make install </li></ul>
  17. 23. Checking SphinxSE Installation
  18. 24. <ul><li>There are 2 components that we need to setup before Sphinx is ready for searching: </li></ul><ul><li>Sphinx Table </li></ul><ul><li>Configuration File (e.g.: file_name.conf ) </li></ul>
  19. 25. <ul><li>Requirements: </li></ul><ul><li>The data types of the first 3 columns must be INT,INT,VARCHAR. </li></ul><ul><li>which will be mapped to document id, match weight and the search query. </li></ul><ul><li>Query column must be indexed and no other column must be indexed. </li></ul><ul><li>All other attributes in the source comes as columns. </li></ul><ul><li>CREATE TABLE sphinx_table </li></ul><ul><li>( </li></ul><ul><li>id int not null, </li></ul><ul><li>Weight int not null, </li></ul><ul><li>Query varchar(255) not null, </li></ul><ul><li>Key (query) </li></ul><ul><li>)ENGINE=SPHINX CONNECTION=‘sphinx://localhost:3313/city_search_cust_mess’ </li></ul>
  20. 26. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
  21. 27. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
  22. 28. <ul><li>Following are some of the options available in the source section of the configuration file: </li></ul><ul><li>TYPE: </li></ul><ul><li>type : data source type. </li></ul><ul><li>possible options: mysql,pgsql,xmlpipe,xmlpipe2. </li></ul><ul><li>Connection Info: </li></ul><ul><li>sql_host : SQL server host to connect (Mandatory). </li></ul><ul><li>sql_port : SQL server IP to connect ( Default 3306). </li></ul><ul><li>sql_user : SQL user to use when connecting to sql_host (Mandatory). </li></ul><ul><li>sql_pass : SQL user password to use when connecting to sql_host (Mandatory). </li></ul><ul><li>sql_db : SQL DB to be used. </li></ul><ul><li>sql_sock : socket name to connect to for local SQL servers. </li></ul>
  23. 29. <ul><li>Queries Info: </li></ul><ul><li>mysql_query_pre : pre-fetch query , or pre-query. </li></ul><ul><li>eg: sql_query_pre= SET NAMES utf8 </li></ul><ul><li>sql_query : main document fetch query. </li></ul><ul><li>sql_query_post : Post-fetch query. </li></ul><ul><li> e.g.: sql_query_post= DROP TABLE my_tmp_table </li></ul><ul><li>sql_query_info : Document info query. (similar to comment in MySQL) </li></ul><ul><li>Attributes Info: </li></ul><ul><li>sql_attr_xxx: attribute declaration.(xxx : uint,bigint,float,str2ordinal,timestamp). </li></ul>
  24. 30. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
  25. 31. <ul><li>type: index type .optional (possible option: local , distributed) </li></ul><ul><li>source: adds document source to local index. Multi-value. </li></ul><ul><li>path: Index files path and file name (without extension). </li></ul><ul><li>docinfo : Document attribute values ( inline , extern ) storage mode. </li></ul><ul><li>mlock : Memory locking for cached data . (Optional default 0). </li></ul><ul><li>min_word_len: minimum indexed word length (optional default 1). </li></ul><ul><li>Charset type: character set encoding type </li></ul>
  26. 32. <ul><li>Stemming Options: </li></ul><ul><li>morphology : A list of morphology preprocessors to apply. </li></ul><ul><li>e.g.: cars = car ; running =run. </li></ul><ul><li>Stopwords : stopwords file list (space seperated). </li></ul><ul><li>e.g.: the,is,are,an,a,etc…. </li></ul>
  27. 33. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
  28. 34. <ul><li>mem_limit : Indexing RAM usage limit . Optional, default is 32MB. </li></ul><ul><li>max_iops : maximum i/o operations per second. </li></ul><ul><li>max_iosize : maximum allowed i/o operation size. </li></ul>Setting Configuration File: Indexer Section
  29. 35. <ul><li>Now in a Configuration File there are 4 section to configure which are as follows: </li></ul><ul><li>Source (multiple) </li></ul><ul><li>Index (multiple) </li></ul><ul><li>Indexer </li></ul><ul><li>Searchd </li></ul>
  30. 36. <ul><li>address: IP address to bind on default 0.0.0.0 listens to all interfaces. </li></ul><ul><li>port : searchd TCP port number. (mandatory, default is 3312). </li></ul><ul><li>log : log file name. (optional, default is empty). </li></ul><ul><li>query_log : query log file name . (optional , default is empty). </li></ul><ul><li>pid file : searchd process ID file name (mandatory). </li></ul><ul><li>max_matches : maximum amount of matches that the daemon keep in RAM for each index and can return to the client. (optional, default 1000) </li></ul><ul><li>preopen_indexes : whether to forcibly preopen all indexes on startup.(optional , default 0 i.e. don’t open). </li></ul>Setting Configuration File: Searchd Section
  31. 39. <ul><li>Introduction to Sphinx . </li></ul><ul><li>Sphinx Searching and Sorting Features. </li></ul><ul><li>Sphinx Implementation. </li></ul><ul><li>Demo. </li></ul>

×