Full Text Search in MySQL 5.1                     New Features and HowTo                                 Alexander Rubin  ...
Full Text search      • Natural and popular way to        search for information      • Easy to use: enter key words      ...
In this presentation      • Improvements in FT Search in        MySQL 5.1      • How to speed up MySQL FT Search      • Ho...
Types of FT Search: Relevance                      - MySQL FT: Default by relevance!Copyright 2006 MySQL AB               ...
Types of FT Search: Boolean Search                          - MySQL FT: No default sorting!Copyright 2006 MySQL AB        ...
Types of FT Search: Phrase SearchCopyright 2006 MySQL AB           The World’s Most Popular Open Source Database   6
Full Text Solutions   Type                           Solution   MySQL Built-in                 Full Text Index            ...
MySQL Full Text Index Features      • Available only for MyISAM tables      • Natural Language Search and boolean        s...
MySQL Full Text: Creating Full Text Index  mysql> CREATE TABLE articles (  -> id INT UNSIGNED   AUTO_INCREMENT NOT NULL   ...
MySQL Full Text: Natural Language mode    mysql> SELECT * FROM articles    -> WHERE MATCH (title,body)    -> AGAINST (data...
MySQL Full Text: Boolean mode      mysql> SELECT * FROM       articles      -> WHERE MATCH (title,body)      -> AGAINST (‘...
New MySQL 5.1 Full Text FeaturesCopyright 2006 MySQL AB   The World’s Most Popular Open Source Database   12
New MySQL 5.1 Full Text Features       • Faster Boolean search in MySQL 5.1            – New “smart” index merge is implem...
MySQL 5.0 vs 5.1 Benchmark      • MySQL 5.1 Full Text search:        500-1000% improvement in Boolean        mode         ...
MySQL 5.1: 500-1000% improvement in Boolean mode Copyright 2006 MySQL AB      The World’s Most Popular Open Source Databas...
MySQL 5.1 Full Text: Custom “Plugins”     • Replacing Default Parser to do following:            – Apply special rules, su...
Available Plugins Examples      • Different plugins: search for “FullText” at        forge.mysql.com/projects      • Stemm...
Example: MnogoSearch Stemming      • MnogoSearch includes stemming plugin        (www.mnogosearch.org/doc/msearch-        ...
Example: MnogoSearch Stemming      • Configuration: stemming.conf             – MinWordLength 2             – Spell en lat...
Example: MnogoSearch Stemming                Stemming adds overhead on insert/update      mysql> insert into searchindex_s...
Example: MnogoSearch Stemming      mysql> SELECT count(*) FROM      searchindex WHERE MATCH si_text      AGAINST(color IN ...
Other Planned Full Text Features      • Search for “FullText” at forge.mysql.com      • CTYPE table for unicode character ...
MySQL Full Text HowTo                Tips and TricksCopyright 2006 MySQL AB    The World’s Most Popular Open Source Databa...
DRBD and FullText Search  Active DRBD                                              Passive DRBD                           ...
Using MySQL 5.1 as a Slave                               Master (MySQL 5.0)                     Web/App                   ...
How To: Speed up MySQL FT Search                          Fit index into memory!             – Increase amount of RAM     ...
SpeedUp FullText:                   Preload FT indexes into buffer                   mysql> set global                    ...
How To: Speed up MySQL FT Search      • Manual partitioning             – Partitioning will decrease index               a...
How To: Speed up MySQL FT Search      • Setup number of slaves for search             – Decrease number of queries for eac...
FT Scale-Out with MySQL Replication                                    Master                     Web/App                 ...
Which queries are performance killers        – Order by/Group by                • Natural language mode: order by relevanc...
Real World Example: Performance Killer                          Why it is so slow?      SELECT … FROM `ft`      WHERE MATC...
Real World Example: Performance Killer                          Note the stopword list and                                ...
How To: Search with error correction      • Example: Music Search Engine             – Search for music titles/actors     ...
Search with error corrections      • Implementation             1.Alter table artists add               art_name_sndex var...
Search with error corrections: Sorting      • Popularity of the artist             – Select art_name from artists where   ...
Sphinx Search      • Features             – Open Source, http://www.sphinxsearch.com             – Fast searches for the l...
How to configure Sphinx with MySQL      • Sphinx engine/plugin is not full engine:             – still need to run “search...
How to Integrate Sphinx with MySQL      • Sphinx can be MySQL’s storage engine      CREATE TABLE t1         (id           ...
Time for questions                          Questions?                          www.mysqlfulltextsearch.comCopyright 2006 ...
Upcoming SlideShare
Loading in...5
×

2555305-MySQL-Full-Text-Search-in-MySQL-51-New-Features-and-How-To

1,624

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,624
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

2555305-MySQL-Full-Text-Search-in-MySQL-51-New-Features-and-How-To

  1. 1. Full Text Search in MySQL 5.1 New Features and HowTo Alexander Rubin Senior Consultant, MySQL ABCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 1
  2. 2. Full Text search • Natural and popular way to search for information • Easy to use: enter key words and get what you needCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 2
  3. 3. In this presentation • Improvements in FT Search in MySQL 5.1 • How to speed up MySQL FT Search • How to search with error corrections • Benchmark resultsCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 3
  4. 4. Types of FT Search: Relevance - MySQL FT: Default by relevance!Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 4
  5. 5. Types of FT Search: Boolean Search - MySQL FT: No default sorting!Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 5
  6. 6. Types of FT Search: Phrase SearchCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 6
  7. 7. Full Text Solutions Type Solution MySQL Built-in Full Text Index (MyISAM only) MySQL Sphinx Integrated/External External Lucene MnogoSearch “Hardware boxes” Google box “Fast” boxCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 7
  8. 8. MySQL Full Text Index Features • Available only for MyISAM tables • Natural Language Search and boolean search • ft_min_word_len – 4 char per word by default • Stop word list by default • Frequency based ranking – Distance between words is not countedCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 8
  9. 9. MySQL Full Text: Creating Full Text Index mysql> CREATE TABLE articles ( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> title VARCHAR(200), -> body TEXT, -> FULLTEXT (title,body) -> ) engine=MyISAM;Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 9
  10. 10. MySQL Full Text: Natural Language mode mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST (database IN NATURAL LANGUAGE MODE); +-------------------+------------------------------------------+ | title | body | +-------------------+------------------------------------------+ | MySQL vs. YourSQL | In the following database comparison ... | | MySQL Tutorial | DBMS stands for DataBase ... | In Natural Language Mode: default sorting by relevance!Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 10
  11. 11. MySQL Full Text: Boolean mode mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST (‘cat AND dog IN BOOLEAN MODE); No default sorting in Boolean Mode!Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 11
  12. 12. New MySQL 5.1 Full Text FeaturesCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 12
  13. 13. New MySQL 5.1 Full Text Features • Faster Boolean search in MySQL 5.1 – New “smart” index merge is implemented (forge.mysql.com/worklog/task.php?id=2535) • Custom Plug-ins – Replacing Default Parser • Better Unicode Support – full text index work more accurately with space and punctuation Unicode character (forge.mysql.com/worklog/task.php?id=1386)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 13
  14. 14. MySQL 5.0 vs 5.1 Benchmark • MySQL 5.1 Full Text search: 500-1000% improvement in Boolean mode – relevance based search and phrase search was not improved in MySQL 5.1. • Tested with: Data and Index – CDDB (music database) – author and title, 2 mil. CDs., varchar(255). – CDDB ~= amazon.com’s CD/books inventoryCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 14
  15. 15. MySQL 5.1: 500-1000% improvement in Boolean mode Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 15
  16. 16. MySQL 5.1 Full Text: Custom “Plugins” • Replacing Default Parser to do following: – Apply special rules, such as stemming, different way of splitting words etc – Pre-parsing – processing PDF / HTML files • May do same for query string – If use stemming, search words also need to be stemmed for search to work.Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 16
  17. 17. Available Plugins Examples • Different plugins: search for “FullText” at forge.mysql.com/projects • Stemming – MnogoSearch has a stemming plugin (www.mnogosearch.com) – Porter stemming fulltext plugin • N-Gram parsers (Japanese language) – Simple n-gram fulltext pluginCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 17
  18. 18. Example: MnogoSearch Stemming • MnogoSearch includes stemming plugin (www.mnogosearch.org/doc/msearch- udmstemmer.html) • Configure and install: – Configure (follow instructions) – mysql> INSTALL PLUGIN stemming SONAME libmnogosearch.so; – CREATE TABLE my_table ( my_column TEXT, FULLTEXT(my_column) WITH PARSER stemming ); – SELECT * FROM t WHERE MATCH a AGAINST(test IN BOOLEAN MODE);Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 18
  19. 19. Example: MnogoSearch Stemming • Configuration: stemming.conf – MinWordLength 2 – Spell en latin1 american.xlg – Affix en latin1 english.aff • Grab Ispell (not Aspell) dictionaries from http://lasr.cs.ucla.edu/geoff/ispell- dictionaries.html#English-dicts • Any changes in stemming.conf requires MySQL restartCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 19
  20. 20. Example: MnogoSearch Stemming Stemming adds overhead on insert/update mysql> insert into searchindex_stemmer select * from enwiki.searchindex limit 10000; Query OK, 10000 rows affected (44.03 sec) mysql> insert into searchindex select * from enwiki.searchindex limit 10000; Query OK, 10000 rows affected (21.80 sec)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 20
  21. 21. Example: MnogoSearch Stemming mysql> SELECT count(*) FROM searchindex WHERE MATCH si_text AGAINST(color IN BOOLEAN MODE); count(*): 861 mysql> SELECT count(*) FROM searchindex_stemmer WHERE MATCH si_text AGAINST(color IN BOOLEAN MODE); count(*): 1017Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 21
  22. 22. Other Planned Full Text Features • Search for “FullText” at forge.mysql.com • CTYPE table for unicode character sets (WL#1386), complete • Enable fulltext search for non-MyISAM engines (WL#2559), Assigned • Stemming for fulltext (WL#2423), Assigned • Combined BTREE/FULLTEXT indexes (WL#828) • Many other features, YOU can vote for someCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 22
  23. 23. MySQL Full Text HowTo Tips and TricksCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 23
  24. 24. DRBD and FullText Search Active DRBD Passive DRBD Synchronous Server Block Replication Server InnoDB InnoDB Tables Tables Web/App Server Full Text Requests Normal Replication Slave “FullText” Slaves InnoDB Tables MyISAM Tables with FT IndexesCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 24
  25. 25. Using MySQL 5.1 as a Slave Master (MySQL 5.0) Web/App Server Normal Requests Web/App Server Full Text Requests Normal Slave Full Text Slave (MySQL 5.0) (MySQL 5.1)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 25
  26. 26. How To: Speed up MySQL FT Search Fit index into memory! – Increase amount of RAM – Set key_buffer = <total size of full text index>. Max 4GB! – Preload FT indexes into buffer • Use additional keys for FT index (to solve 4GB limit problem)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 26
  27. 27. SpeedUp FullText: Preload FT indexes into buffer mysql> set global ft_key.key_buffer_size= 4*1024*1024*1024; mysql> CACHE INDEX S1, S2, <some other tables here> IN ft_key; mysql> LOAD INDEX INTO CACHE S1, S2 …;Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 27
  28. 28. How To: Speed up MySQL FT Search • Manual partitioning – Partitioning will decrease index and table size • Search and updates will be faster – Need to change application/no auto partitioningCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 28
  29. 29. How To: Speed up MySQL FT Search • Setup number of slaves for search – Decrease number of queries for each box – Decrease CPU usage (sorting is expensive) – Each slave can have its own data • Example: search for east coast – Slaves 1-5, search for west coast Slaves 6-10Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 29
  30. 30. FT Scale-Out with MySQL Replication Master Web/App Server Write Requests Slave 1 Slave 2 Slave 3 Web/App Server Load Balancer Full Text RequestsCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 30
  31. 31. Which queries are performance killers – Order by/Group by • Natural language mode: order by relevance • Boolean mode: no default sorting! • Order by date much slower than with no “order by” – SQL_CALC_FOUND_ROWS • Select SQL_CALC_FOUND_ROWS from T1 … limit 10 • Will require all result set – Other condition in where clause • MySQL can use either FT index or other indexes (only FT with natural language)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 31
  32. 32. Real World Example: Performance Killer Why it is so slow? SELECT … FROM `ft` WHERE MATCH `album` AGAINST (‘the way i am’)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 32
  33. 33. Real World Example: Performance Killer Note the stopword list and ft_min_word_len! The - stopword My.cnf: Way - stopword I - not a stop word ft_min_word_len =1 Am - stopword query “the way i am” will filter out all words except “i” with standard stoplist and with ft_min_word_len =1 in my.cnfCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 33
  34. 34. How To: Search with error correction • Example: Music Search Engine – Search for music titles/actors – Need to correct users typos • Bob Dilane (user made typos) -> – Bob Dylan (corrected) • Solution: – use soundex() mysql function • Soundex = sounds similar mysql> select soundex(Dilane); D450 mysql> select soundex(Dylan); D450Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 34
  35. 35. Search with error corrections • Implementation 1.Alter table artists add art_name_sndex varchar(80) 2.Update artists set art_name_sndex = soundex(art_name) 3.Select art_name from artists where art_name_sndex = soundex(‘Bob Dilane) limit 10Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 35
  36. 36. Search with error corrections: Sorting • Popularity of the artist – Select art_name from artists where art_name_sndex = soundex(‘Dilane) order by popularity limit 10 • Most similar matches fist – order by levenstein distance – The Levenshtein distance between two strings = minimum number of operations needed to transform one string into the other – Levenstein can be done by stored function or UDFCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 36
  37. 37. Sphinx Search • Features – Open Source, http://www.sphinxsearch.com – Fast searches for the large data – Designed for indexing Database content – Supports multi-node clustering out of box, Multiple Attributes (date, price, etc) – Different sort modes (relevance, data etc) – Client available as MySQL Storage Engine plugin – Fast Index creation (up to 10M/sec) • Disadvantages: – External solution: need to be integrated – No online index updates (have to build whole index)Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 37
  38. 38. How to configure Sphinx with MySQL • Sphinx engine/plugin is not full engine: – still need to run “searcher” daemon • Need to compile MySQL source with Sphinx to integrate it – MySQL 5.0: need to patch source code – MySQL 5.1: no need to patch, copy Sphinx plugin to plugin dirCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 38
  39. 39. How to Integrate Sphinx with MySQL • Sphinx can be MySQL’s storage engine CREATE TABLE t1 (id INTEGER NOT NULL, weight INTEGER NOT NULL, query VARCHAR(3072) NOT NULL, group_id INTEGER, INDEX(query) ) ENGINE=SPHINX CONNECTION="sphinx://localhost: 3312/enwiki"; SELECT * FROM enwiki.searchindex docs JOIN test.t1 ON (docs.si_page=t1.id) WHERE query="one document;mode=any" limit 1;Copyright 2006 MySQL AB The World’s Most Popular Open Source Database 39
  40. 40. Time for questions Questions? www.mysqlfulltextsearch.comCopyright 2006 MySQL AB The World’s Most Popular Open Source Database 40
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×