Implementasi Full Text
Search pada Database
24 November 2020
dipersiapkan oleh Dony Riyanto
Text Searching
• Pencarian text pada database memiliki permasalahan khusus
• Dibutuhkan solusi pencarian text yang lebih baik
• Full text search adalah salah satu solusinya
Sampledata
• StackOverflow.com Posts
• Data dumpexported September
2009
• 1.2million tuples
• ~850 MB
StackOverflowERdiagram
Performanceissue
• LIKEwith wildcards: SELECT*
FROMPosts
WHEREbody LIKE ‘%postgresql%’
• POSIXregular expressions:
SELECT* FROMPosts WHERE
body ~ ‘postgresql’
time:91 sec
time:105 sec
Why so slow?
CREA
TET
ABLEtelephone_book
( full_nameVARCHAR(50) );
CREA
TEINDEX name_idxON telephone_book
(full_name);
INSERTINTO telephone_book V
ALUES
(‘Riddle,Thomas’),
(‘Thomas,Dean’);
Why so slow?
SELECT* FROMtelephone_book
WHEREfull_nameLIKE ‘Thomas%’
• Searchfor allwith lastname“Thomas”
• Searchfor allwith first name“Thomas”
SELECT* FROMtelephone_book WHERE
full_nameLIKE ‘%Thomas’
Accuracyissue
• Irrelevant or falsematching words
‘one’,‘money’,‘prone’,etc.:
body LIKE‘%one%’
• Regularexpressionsin PostgreSQL
support escapesfor word boundaries:
body ~ ‘yoney’
FullTextSearch
PostgreSQLText-Search
• SincePostgreSQL8.3
• TSVECTORto represent textdata
• TSQUERYto represent searchpredicates
• Special indexes
PostgreSQLT
ext-Search:
BasicQuerying
SELECT* FROMPosts
WHEREto_tsvector(title || ‘ ’ || body || ‘ ’ || tags)
@@ to_tsquery(‘postgresql & performance’);
text-search matching
operator
PostgreSQLT
ext-Search:
BasicQuerying
SELECT* FROMPosts
WHEREtitle || ‘ ’ || body || ‘ ’ || tags
@@ ‘postgresql &performance’;
timewithnoindex:8min2 s
e
c
PostgreSQLT
ext-Search:
AddTSVECTORcolumn
AL
TERT
ABLEPostsADDCOLUMN
PostTextTSVECTOR;
UPDATEPostsSETPostText= to_tsvector(‘english’,
title || ‘ ’ || body ||‘ ’ || tags);

Implementasi Full Textsearch pada Database