Full Text Search     Throwdown  Bill Karwin, Percona Inc.
In a full text                                   search, the search                                   engine examines     ...
StackOverflow Test Data• Data dump, exported December 2011• 7.4 million Posts = 8.18 GB                                   ...
StackOverflow ER diagramsearchable   text                      www.percona.com
The Baseline:Naive Search Predicates
Some people, when confronted with a problem, think	 “I know, I’ll use regular expressions.”	 Now they have two problems.	 ...
Accuracy issue• Irrelevant or false matching words ‘one’, ‘money’,   ‘prone’, etc.:     SELECT * FROM Posts      WHERE Bod...
Performance issue• LIKE with wildcards:  ! SELECT * FROM Posts                                        49 sec    WHERE titl...
Why so slow?CREATE TABLE TelephoneBook (  ! FullName VARCHAR(50));CREATE INDEX name_idx ON TelephoneBook  ! (FullName);INS...
Why so slow?• Search for all with last name “Thomas”                                                uses index    SELECT *...
Because:    B-Tree indexes can’t☞   search for substrings                      www.percona.com
• FULLTEXT in MyISAM• FULLTEXT in InnoDB• Apache Solr• Sphinx Search• Trigraphs
FULLTEXTin MyISAM
FULLTEXT Index with MyISAM• Special index type for MyISAM• Integrated with SQL queries• Indexes always in sync with data• ...
Insert Data into Index (MyISAM)mysql> INSERT INTO Posts SELECT * FROM PostsSource;          time: 33 min, 34 sec          ...
Build Index on Data (MyISAM)mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags);          time: 31 min, 18...
QueryingSELECT * FROM Posts WHERE MATCH(column(s)) AGAINST(query pattern);                      must include all columns  ...
Natural Language Mode (MyISAM)• Searches concepts with free text queries:  ! SELECT * FROM Posts    WHERE MATCH(title, bod...
Query Profile:     Natural Language Mode (MyISAM)+-------------------------+----------+| Status                  | Duratio...
Boolean Mode (MyISAM)• Searches words using mini-language:  ! SELECT * FROM Posts    WHERE MATCH(title, body, tags)    AGA...
Query Profile:            Boolean Mode (MyISAM)+-------------------------+----------+| Status                  | Duration ...
FULLTEXTin InnoDB
FULLTEXT Index with InnoDB• Under development in MySQL 5.6  • I’m testing 5.6.6 m1• Usage very similar to FULLTEXT in MyIS...
Insert Data into Index (InnoDB)mysql> INSERT INTO Posts SELECT * FROM PostsSource;          time: 55 min 46 sec           ...
Build Index on Data (InnoDB)• Still under development; you might see problems:mysql> CREATE FULLTEXT INDEX PostText ! ON P...
Build Index on Data (InnoDB)• Solution: make sure you define a primary key   column `FTS_DOC_ID` explicitly:mysql> ALTER T...
Natural Language Mode (InnoDB)• Searches concepts with free text queries:  ! SELECT * FROM Posts    WHERE MATCH(title, bod...
Query Profile:     Natural Language Mode (InnoDB)+-------------------------+----------+| Status                  | Duratio...
Boolean Mode (InnoDB)• Searches words using mini-language:  ! SELECT * FROM Posts    WHERE MATCH(title, body, tags)    AGA...
Query Profile:             Boolean Mode (InnoDB)+-------------------------+----------+| Status                  | Duration...
Apache Solr
Apache Solr• http://lucene.apache.org/solr/• Formerly known as Lucene, started 2001• Apache License• Java implementation• ...
DataImportHandler• conf/solrconfig.xml:. . .<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.D...
DataImportHandler• conf/data-config.xml: <dataConfig>   <dataSource type="JdbcDataSource"                driver="com.mysql...
DataImportHandler•conf/schema.xml:. . .<fields>  <field name="PostId" type="string" indexed="true" stored="true" required=...
Insert Data into Index (Solr)• http://localhost:8983/solr/dataimport?   command=full-import              time: 14 min 28 s...
Searching Solr• http://localhost:8983/solr/select/?q=mysql+AND   +performance              time: 79ms           Query resu...
Sphinx Search
Sphinx Search• http://sphinxsearch.com/• Started in 2001• GPLv2 license• C++ implementation• SphinxSE storage engine for M...
sphinx.confsource src1{! type = mysql! sql_host = localhost! sql_user = xxxx! sql_pass = xxxx! sql_db = testpattern! sql_q...
sphinx.conf! index test1  {  ! source = src1  ! path = C:Sphinxdata  }                             www.percona.com
Insert Data into Index (Sphinx)!   C:Sphinx> indexer.exe -c sphinx.conf.in --verbose test1    Sphinx 2.0.5-release (r3309)...
Querying index$ mysql --port 9306Server version: 2.0.5-release (r3309)mysql> SELECT * FROM test1 WHERE MATCH(mysql perform...
Querying indexmysql> SHOW META;+---------------+-------------+| Variable_name | Value       |+---------------+------------...
Trigraphs
Trigraphs Overview• Not very fast, but still better than LIKE / RLIKE• Generic, portable SQL solution• No dependency on ve...
Three-Letter Sequences! CREATE TABLE AtoZ (  !   c! ! CHAR(1),  !   PRIMARY KEY (c));! INSERT INTO AtoZ (c)  VALUES (a), (...
Insert Data Into Indexmy $sth = $dbh1->prepare("SELECT * FROM Posts") or die $dbh1->errstr;$sth->execute() or die $dbh1->e...
Indexed Lookups! SELECT p.*                        time: 46 sec  FROM Posts p  JOIN PostsTrigraph t1 ON  ! t1.PostId = p.P...
Search Among Fewer Matches! SELECT p.*                        time: 19 sec  FROM Posts p  JOIN PostsTrigraph t1 ON  ! t1.P...
Search Among Fewer Matches! SELECT p.*                        time: 22 sec  FROM Posts p  JOIN PostsTrigraph t1 ON  ! t1.P...
Search Among Fewer Matches! SELECT p.*                          time: 13.6 sec  FROM Posts p  JOIN PostsTrigraph t1 ON  ! ...
Narrow Down Further! SELECT p.*                                    time: 13.8 sec  FROM Posts p  JOIN PostsTrigraph t1 ON ...
And the winner is...     Jarrett Campbell     http://www.flickr.com/people/77744839@N00
Time to Insert Data into IndexLIKE expression   n/aFULLTEXT MyISAM 33 min, 34 secFULLTEXT InnoDB   55 min, 46 secApache So...
Insert Data into Index (sec)8000600040002000   0       LIKE   MyISAM InnoDB   Solr   Sphinx   Trigraph                    ...
Time to Build Index on DataLIKE expression   n/aFULLTEXT MyISAM 31 min, 18 secFULLTEXT InnoDB   25 min, 27 secApache Solr ...
Build Index on Data (sec)4000300020001000                               n/a     n/a         n/a   0        LIKE   MyISAM I...
Index StorageLIKE expression     n/aFULLTEXT MyISAM 2382 MiBFULLTEXT InnoDB     ? MiBApache Solr         2766 MiBSphinx Se...
Index Storage (MiB)200001500010000 5000    0        LIKE   MyISAM InnoDB   Solr   Sphinx   Trigraph                       ...
Query SpeedLIKE expression    49,000ms - 399,000msFULLTEXT MyISAM 16-200msFULLTEXT InnoDB    350-740msApache Solr        7...
Query Speed (ms)400000350000300000250000200000150000100000 50000     0         LIKE   MyISAM InnoDB   Solr   Sphinx   Trig...
1000          Query Speed (ms) 750 500 250   0       LIKE   MyISAM InnoDB   Solr   Sphinx   Trigraph                      ...
Bottom Line                    build   insert   storage     query      solution                                           ...
Final Thoughts• Third-party search engines are complex to keep in   sync with data, and adding another type of   server ad...
New York, October 1-2, 2012London, December 3-4, 2012Santa Clara, April 22-25, 2013                  www.percona.com/live
Expert instructors        In-person training      Custom onsite training       Live virtual traininghttp://www.percona.com...
http://www.pragprog.com/titles/bksqla/                                www.percona.com
Copyright 2012 Bill Karwin           www.slideshare.net/billkarwin          Released under a Creative Commons 3.0 License:...
Upcoming SlideShare
Loading in...5
×

Full Text Search Throwdown

52,592

Published on

Odds are if you develop database applications you have been asked to make a large table of textual data searchable. How can we do this most reliably and efficiently? Bill compares a range of techniques to search text data with MySQL, with a focus on performance.
- Ad hoc searches with the LIKE predicate or regular expressions.
- MySQL FULLTEXT index.
- InnoDB FULLTEXT index in MySQL 5.6.
- Apache Solr search engine
- Sphinx search engine.
- Trigraphs.

Published in: Technology
9 Comments
78 Likes
Statistics
Notes
  • i see so much of contrast when i compared the above document results with the results at https://blogs.oracle.com/mysqlinnodb/entry/performance_enhancement_in_full_text
    @billkarwin, please take a look.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I can't find the definition of PostsTrigraph in the slideshow. The columns are obvious of course, but I'm interested on the indexing you used.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Best one
    Hope you are in good health. My name is AMANDA . I am a single girl, Am looking for reliable and honest person. please have a little time for me. Please reach me back amanda_n14144@yahoo.com so that i can explain all about myself .
    Best regards AMANDA.
    amanda_n14144@yahoo.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Not Work with Arabic (utf8_general_ci ) Text ??!!
    Help Please
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • thank you guys
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
52,592
On Slideshare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
1,074
Comments
9
Likes
78
Embeds 0
No embeds

No notes for slide

Full Text Search Throwdown

  1. 1. Full Text Search Throwdown Bill Karwin, Percona Inc.
  2. 2. In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.http://www.flickr.com/photos/tryingyouth/
  3. 3. StackOverflow Test Data• Data dump, exported December 2011• 7.4 million Posts = 8.18 GB www.percona.com
  4. 4. StackOverflow ER diagramsearchable text www.percona.com
  5. 5. The Baseline:Naive Search Predicates
  6. 6. Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. — Jamie Zawinsky www.percona.com
  7. 7. Accuracy issue• Irrelevant or false matching words ‘one’, ‘money’, ‘prone’, etc.: SELECT * FROM Posts WHERE Body LIKE %one%• Regular expressions in MySQL support escapes for word boundaries: SELECT * FROM Posts WHERE Body RLIKE [[:<:]]one[[:>:]] www.percona.com
  8. 8. Performance issue• LIKE with wildcards: ! SELECT * FROM Posts 49 sec WHERE title LIKE %performance% ! OR body LIKE %performance% ! OR tags LIKE %performance%;• POSIX regular expressions: ! SELECT * FROM Posts 7 min 57 sec WHERE title RLIKE [[:<:]]performance[[:>:]] ! OR body RLIKE [[:<:]]performance[[:>:]] ! OR tags RLIKE [[:<:]]performance[[:>:]]; www.percona.com
  9. 9. Why so slow?CREATE TABLE TelephoneBook ( ! FullName VARCHAR(50));CREATE INDEX name_idx ON TelephoneBook ! (FullName);INSERT INTO TelephoneBook VALUES ! (Riddle, Thomas), ! (Thomas, Dean); www.percona.com
  10. 10. Why so slow?• Search for all with last name “Thomas” uses index SELECT * FROM telephone_book WHERE full_name LIKE Thomas%• Search for all with first name “Thomas” SELECT * FROM telephone_book WHERE full_name LIKE %Thomas can’t use index www.percona.com
  11. 11. Because: B-Tree indexes can’t☞ search for substrings www.percona.com
  12. 12. • FULLTEXT in MyISAM• FULLTEXT in InnoDB• Apache Solr• Sphinx Search• Trigraphs
  13. 13. FULLTEXTin MyISAM
  14. 14. FULLTEXT Index with MyISAM• Special index type for MyISAM• Integrated with SQL queries• Indexes always in sync with data• Balances features vs. speed vs. space www.percona.com
  15. 15. Insert Data into Index (MyISAM)mysql> INSERT INTO Posts SELECT * FROM PostsSource; time: 33 min, 34 sec www.percona.com
  16. 16. Build Index on Data (MyISAM)mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags); time: 31 min, 18 sec www.percona.com
  17. 17. QueryingSELECT * FROM Posts WHERE MATCH(column(s)) AGAINST(query pattern); must include all columns of your index, in the order you defined www.percona.com
  18. 18. Natural Language Mode (MyISAM)• Searches concepts with free text queries: ! SELECT * FROM Posts WHERE MATCH(title, body, tags ) AGAINST(mysql performance IN NATURAL LANGUAGE MODE) LIMIT 100; time with index: 200 milliseconds www.percona.com
  19. 19. Query Profile: Natural Language Mode (MyISAM)+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000068 || checking permissions | 0.000006 || Opening tables | 0.000017 || init | 0.000032 || System lock | 0.000007 || optimizing | 0.000007 || statistics | 0.000018 || preparing | 0.000006 || FULLTEXT initialization | 0.198358 || executing | 0.000012 || Sending data | 0.001921 || end | 0.000005 || query end | 0.000003 || closing tables | 0.000018 || freeing items | 0.000341 || cleaning up | 0.000012 |+-------------------------+----------+ www.percona.com
  20. 20. Boolean Mode (MyISAM)• Searches words using mini-language: ! SELECT * FROM Posts WHERE MATCH(title, body, tags) AGAINST(+mysql +performance IN BOOLEAN MODE) LIMIT 100; time with index: 16 milliseconds www.percona.com
  21. 21. Query Profile: Boolean Mode (MyISAM)+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000031 || checking permissions | 0.000003 || Opening tables | 0.000008 || init | 0.000017 || System lock | 0.000004 || optimizing | 0.000004 || statistics | 0.000008 || preparing | 0.000003 || FULLTEXT initialization | 0.000008 || executing | 0.000001 || Sending data | 0.015703 || end | 0.000004 || query end | 0.000002 || closing tables | 0.000007 || freeing items | 0.000381 || cleaning up | 0.000007 |+-------------------------+----------+ www.percona.com
  22. 22. FULLTEXTin InnoDB
  23. 23. FULLTEXT Index with InnoDB• Under development in MySQL 5.6 • I’m testing 5.6.6 m1• Usage very similar to FULLTEXT in MyISAM• Integrated with SQL queries• Indexes always* in sync with data• Read the blogs for more details: • http://blogs.innodb.com/wp/2011/07/overview-and-getting-started-with-innodb-fts/ • http://blogs.innodb.com/wp/2011/07/innodb-full-text-search-tutorial/ • http://blogs.innodb.com/wp/2011/07/innodb-fts-performance/ • http://blogs.innodb.com/wp/2011/07/difference-between-innodb-fts-and-myisam-fts/ www.percona.com
  24. 24. Insert Data into Index (InnoDB)mysql> INSERT INTO Posts SELECT * FROM PostsSource; time: 55 min 46 sec www.percona.com
  25. 25. Build Index on Data (InnoDB)• Still under development; you might see problems:mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags);ERROR 2013 (HY000): Lost connection to MySQL server during query www.percona.com
  26. 26. Build Index on Data (InnoDB)• Solution: make sure you define a primary key column `FTS_DOC_ID` explicitly:mysql> ALTER TABLE Posts CHANGE COLUMN PostId `FTS_DOC_ID` BIGINT UNSIGNED;mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags); time: 25 min 27 sec www.percona.com
  27. 27. Natural Language Mode (InnoDB)• Searches concepts with free text queries: ! SELECT * FROM Posts WHERE MATCH(title, body, tags) AGAINST(mysql performance IN NATURAL LANGUAGE MODE) LIMIT 100; time with index: 740 milliseconds www.percona.com
  28. 28. Query Profile: Natural Language Mode (InnoDB)+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000074 || checking permissions | 0.000007 || Opening tables | 0.000020 || init | 0.000034 || System lock | 0.000007 || optimizing | 0.000009 || statistics | 0.000020 || preparing | 0.000008 || FULLTEXT initialization | 0.577257 || executing | 0.000013 || Sending data | 0.106279 || end | 0.000018 || query end | 0.000012 || closing tables | 0.000018 || freeing items | 0.055584 || cleaning up | 0.000039 |+-------------------------+----------+ www.percona.com
  29. 29. Boolean Mode (InnoDB)• Searches words using mini-language: ! SELECT * FROM Posts WHERE MATCH(title, body, tags) AGAINST(+mysql +performance IN BOOLEAN MODE) LIMIT 100; time with index: 350 milliseconds www.percona.com
  30. 30. Query Profile: Boolean Mode (InnoDB)+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000064 || checking permissions | 0.000005 || Opening tables | 0.000017 || init | 0.000047 || System lock | 0.000007 || optimizing | 0.000009 || statistics | 0.000019 || preparing | 0.000008 || FULLTEXT initialization | 0.347172 || executing | 0.000014 || Sending data | 0.008089 || end | 0.000011 || query end | 0.000012 || closing tables | 0.000015 || freeing items | 0.001570 || cleaning up | 0.000023 |+-------------------------+----------+ www.percona.com
  31. 31. Apache Solr
  32. 32. Apache Solr• http://lucene.apache.org/solr/• Formerly known as Lucene, started 2001• Apache License• Java implementation• Web service architecture• Many sophisticated search features www.percona.com
  33. 33. DataImportHandler• conf/solrconfig.xml:. . .<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst></requestHandler>. . . www.percona.com
  34. 34. DataImportHandler• conf/data-config.xml: <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/testpattern?useUnicode=true" batchSize="-1" user="xxxx" password="xxxx"/> <document> <entity name="id" query="SELECT PostId, ParentId, Title, Body, Tags FROM Posts"> </entity> </document> </dataConfig> extremely important to avoid buffering the whole query result! www.percona.com
  35. 35. DataImportHandler•conf/schema.xml:. . .<fields> <field name="PostId" type="string" indexed="true" stored="true" required="true" /> <field name="ParentId" type="string" indexed="true" stored="true" required="false" /> <field name="Title" type="text_general" indexed="false" stored="false" required="false" /> <field name="Body" type="text_general" indexed="false" stored="false" required="false" / > <field name="Tags" type="text_general" indexed="false" stored="false" required="false" / > <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/ ><fields><uniqueKey>PostId</uniqueKey><defaultSearchField>text</defaultSearchField><copyField source="Title" dest="text"/><copyField source="Body" dest="text"/><copyField source="Tags" dest="text"/>. . . www.percona.com
  36. 36. Insert Data into Index (Solr)• http://localhost:8983/solr/dataimport? command=full-import time: 14 min 28 sec www.percona.com
  37. 37. Searching Solr• http://localhost:8983/solr/select/?q=mysql+AND +performance time: 79ms Query results are cached (like MySQL Query Cache), so they return much faster on subsequent execution www.percona.com
  38. 38. Sphinx Search
  39. 39. Sphinx Search• http://sphinxsearch.com/• Started in 2001• GPLv2 license• C++ implementation• SphinxSE storage engine for MySQL• Supports MySQL protocol, SQL-like queries• Many sophisticated search features www.percona.com
  40. 40. sphinx.confsource src1{! type = mysql! sql_host = localhost! sql_user = xxxx! sql_pass = xxxx! sql_db = testpattern! sql_query = SELECT PostId, ParentId, Title,! ! Body, Tags FROM Posts! sql_query_info = SELECT * FROM Posts ! ! WHERE PostId=$id} www.percona.com
  41. 41. sphinx.conf! index test1 { ! source = src1 ! path = C:Sphinxdata } www.percona.com
  42. 42. Insert Data into Index (Sphinx)! C:Sphinx> indexer.exe -c sphinx.conf.in --verbose test1 Sphinx 2.0.5-release (r3309) using config file sphinx.conf... indexing index test1... collected 7397507 docs, 5731.8 MB time: 8 min 20 sec sorted 920.3 Mhits, 100.0% done total 7397507 docs, 5731776959 bytes total 500.149 sec, 11460138 bytes/sec, 14790.60 docs/sec total 11 reads, 15.898 sec, 314584.8 kb/call avg, 1445.3 msec/call avg total 542 writes, 3.129 sec, 12723.3 kb/call avg, 5.7 msec/ call avg Execution time: 500.196 s www.percona.com
  43. 43. Querying index$ mysql --port 9306Server version: 2.0.5-release (r3309)mysql> SELECT * FROM test1 WHERE MATCH(mysql performance);+---------+--------+| id | weight |+---------+--------+| 6016856 | 6600 || 4207641 | 6595 || 2656325 | 6593 || 7192928 | 5605 || 8118235 | 5598 |. . .20 rows in set (0.02 sec) www.percona.com
  44. 44. Querying indexmysql> SHOW META;+---------------+-------------+| Variable_name | Value |+---------------+-------------+| total | 1000 || total_found | 7672 || time | 0.013 || keyword[0] | mysql || docs[0] | 162287 | time: 13ms| hits[0] | 363694 || keyword[1] | performance || docs[1] | 147249 || hits[1] | 210895 |+---------------+-------------+ www.percona.com
  45. 45. Trigraphs
  46. 46. Trigraphs Overview• Not very fast, but still better than LIKE / RLIKE• Generic, portable SQL solution• No dependency on version, storage engine, third- party technology www.percona.com
  47. 47. Three-Letter Sequences! CREATE TABLE AtoZ ( ! c! ! CHAR(1), ! PRIMARY KEY (c));! INSERT INTO AtoZ (c) VALUES (a), (b), (c), ...! CREATE TABLE Trigraphs ( ! Tri! ! CHAR(3), ! PRIMARY KEY (Tri));! INSERT INTO Trigraphs (Tri) SELECT CONCAT(t1.c, t2.c, t3.c) FROM AtoZ t1 JOIN AtoZ t2 JOIN AtoZ t3; www.percona.com
  48. 48. Insert Data Into Indexmy $sth = $dbh1->prepare("SELECT * FROM Posts") or die $dbh1->errstr;$sth->execute() or die $dbh1->errstr;$dbh2->begin_work;my $i = 0;while (my $row = $sth->fetchrow_hashref ) { my $text = lc(join(|, ($row->{title}, $row->{body}, $row->{tags}))); my %tri; map($tri{$_}=1, ( $text =~ m/[[:alpha:]]{3}/g )); next unless %tri; my $tuple_list = join(",", map("($_,$row->{postid})", keys %tri)); my $sql = "INSERT IGNORE INTO PostsTrigraph (tri, PostId) VALUES $tuple_list"; $dbh2->do($sql) or die "SQL = $sql, ".$dbh2->errstr; if (++$i % 1000 == 0) { print "."; $dbh2->commit; $dbh2->begin_work; time: 116 min 50 sec} } space: 16.2GiBprint ".n";$dbh2->commit; rows: 519 million www.percona.com
  49. 49. Indexed Lookups! SELECT p.* time: 46 sec FROM Posts p JOIN PostsTrigraph t1 ON ! t1.PostId = p.PostId AND t1.Tri = mys ! www.percona.com
  50. 50. Search Among Fewer Matches! SELECT p.* time: 19 sec FROM Posts p JOIN PostsTrigraph t1 ON ! t1.PostId = p.PostId AND t1.Tri = mys JOIN PostsTrigraph t2 ON ! t2.PostId = p.PostId AND t2.Tri = per www.percona.com
  51. 51. Search Among Fewer Matches! SELECT p.* time: 22 sec FROM Posts p JOIN PostsTrigraph t1 ON ! t1.PostId = p.PostId AND t1.Tri = mys JOIN PostsTrigraph t2 ON ! t2.PostId = p.PostId AND t2.Tri = per JOIN PostsTrigraph t3 ON ! t3.PostId = p.PostId AND t3.Tri = for www.percona.com
  52. 52. Search Among Fewer Matches! SELECT p.* time: 13.6 sec FROM Posts p JOIN PostsTrigraph t1 ON ! t1.PostId = p.PostId AND t1.Tri = mys JOIN PostsTrigraph t2 ON ! t2.PostId = p.PostId AND t2.Tri = per JOIN PostsTrigraph t3 ON ! t3.PostId = p.PostId AND t3.Tri = for JOIN PostsTrigraph t4 ON ! t4.PostId = p.PostId AND t4.Tri = man www.percona.com
  53. 53. Narrow Down Further! SELECT p.* time: 13.8 sec FROM Posts p JOIN PostsTrigraph t1 ON ! t1.PostId = p.PostId AND t1.Tri = mys JOIN PostsTrigraph t2 ON ! t2.PostId = p.PostId AND t2.Tri = per JOIN PostsTrigraph t3 ON ! t3.PostId = p.PostId AND t3.Tri = for JOIN PostsTrigraph t4 ON ! t4.PostId = p.PostId AND t4.Tri = man WHERE CONCAT(p.title,p.body,p.tags) LIKE %mysql% ! AND CONCAT(p.title,p.body,p.tags) LIKE %performance%; www.percona.com
  54. 54. And the winner is... Jarrett Campbell http://www.flickr.com/people/77744839@N00
  55. 55. Time to Insert Data into IndexLIKE expression n/aFULLTEXT MyISAM 33 min, 34 secFULLTEXT InnoDB 55 min, 46 secApache Solr 14 min, 28 secSphinx Search 8 min, 20 secTrigraphs 116 min, 50 sec www.percona.com
  56. 56. Insert Data into Index (sec)8000600040002000 0 LIKE MyISAM InnoDB Solr Sphinx Trigraph www.percona.com
  57. 57. Time to Build Index on DataLIKE expression n/aFULLTEXT MyISAM 31 min, 18 secFULLTEXT InnoDB 25 min, 27 secApache Solr n/aSphinx Search n/aTrigraphs n/a www.percona.com
  58. 58. Build Index on Data (sec)4000300020001000 n/a n/a n/a 0 LIKE MyISAM InnoDB Solr Sphinx Trigraph www.percona.com
  59. 59. Index StorageLIKE expression n/aFULLTEXT MyISAM 2382 MiBFULLTEXT InnoDB ? MiBApache Solr 2766 MiBSphinx Search 3355 MiBTrigraphs 16589 MiB www.percona.com
  60. 60. Index Storage (MiB)200001500010000 5000 0 LIKE MyISAM InnoDB Solr Sphinx Trigraph www.percona.com
  61. 61. Query SpeedLIKE expression 49,000ms - 399,000msFULLTEXT MyISAM 16-200msFULLTEXT InnoDB 350-740msApache Solr 79msSphinx Search 13msTrigraphs 13800ms www.percona.com
  62. 62. Query Speed (ms)400000350000300000250000200000150000100000 50000 0 LIKE MyISAM InnoDB Solr Sphinx Trigraph www.percona.com
  63. 63. 1000 Query Speed (ms) 750 500 250 0 LIKE MyISAM InnoDB Solr Sphinx Trigraph www.percona.com
  64. 64. Bottom Line build insert storage query solution 49k-399kLIKE expression 0 0 0 ms SQLFULLTEXT MyISAM 31:18 33:28 2382MiB 16-200ms MySQLFULLTEXT InnoDB 25:27 55:46 ? 350-740ms MySQL 5.6Apache Solr n/a 14:28 2766MiB 79ms JavaSphinx Search n/a 8:20 3487MiB 13ms C++Trigraphs n/a 116:50 16.2 GiB 13,800ms SQL www.percona.com
  65. 65. Final Thoughts• Third-party search engines are complex to keep in sync with data, and adding another type of server adds more operations work for you.• Built-in FULLTEXT indexes are therefore useful even if they are not absolutely the fastest.• Different search implementations may return different results, so you should evaluate what works best for your project.• Any indexed search solution is orders of magnitude better than LIKE! www.percona.com
  66. 66. New York, October 1-2, 2012London, December 3-4, 2012Santa Clara, April 22-25, 2013 www.percona.com/live
  67. 67. Expert instructors In-person training Custom onsite training Live virtual traininghttp://www.percona.com/training www.percona.com
  68. 68. http://www.pragprog.com/titles/bksqla/ www.percona.com
  69. 69. Copyright 2012 Bill Karwin www.slideshare.net/billkarwin Released under a Creative Commons 3.0 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ You are free to share - to copy, distribute and transmit this work, under the following conditions: Attribution. Noncommercial. No Derivative Works.You must attribute this You may not use this work You may not alter, work to Bill Karwin. for commercial purposes. transform, or build upon this work. www.percona.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×