Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SQLite

1,289 views

Published on

  • Be the first to comment

  • Be the first to like this

SQLite

  1. 1. SQLite full-text search extension
  2. 2. FTS3 and FTS4• Allows to create special virtual tables with a built-in full-text index• Consumes more space
  3. 3. Table of 500 000 rows FTS3 Ordinary table• CREATE VIRTUAL TABLE • CREATE TABLE table1 USING fts3(content table2(content TEXT); /* TEXT); Ordinary table */ /* FTS3 table */ • SELECT count(*) FROM• SELECT count(*) FROM table2 WHERE content LIKE table1 WHERE content %linux%; MATCH linux; /* 22.5 seconds */ /* 0.03 seconds */
  4. 4. Creating tables• CREATE VIRTUAL TABLE users USING fts3(• USER_ID INTEGER PRIMARY KEY AUTOINCREMENT,• NAME TEXT NOT NULL,• PHONE INTEGER NOT NULL,• UNIQUE (USER_ID) ON CONFLICT REPLACE, tokenize=porter• )
  5. 5. Deleting tables• CREATE VIRTUAL TABLE data USING fts3();• DROP TABLE data;
  6. 6. Populating FTS Tables• Regular INSERT, UPDATE, DELETE are used• Contains hidden ‘rowid’ column
  7. 7. Triggers• CREATE TRIGGER TRIGGER_INSERT_USER AFTER INSERT ON USER BEGIN INSERT INTO USER_SEARCH_TABLE• VALUES(new.user_id, new.user_name); END;• CREATE TRIGGER TRIGGER_INSERT_USER AFTER UPDATE ON USER BEGIN UPDATE USER_SEARCH_TABLE SET user_name=new.user_name where user_id=old.user_id; END;
  8. 8. Queries• Query by rowid• SELECT * FROM user WHERE rowid = 15;• Full-text query• SELECT * FROM SEARCH_USER_DATA WHERE SEARCH_USER_DATA MATCH ‘starcraft;
  9. 9. Full-text Index Queries• Token or token prefix queries SELECT * FROM docs WHERE docs MATCH linux; SELECT * FROM docs WHERE docs MATCH lin*;• Phrase queries.SELECT * FROM docs WHERE docs MATCH "linux applications";SELECT * FROM docs WHERE docs MATCH "lin* app*";• NEAR queries.• SELECT * FROM users WHERE users MATCH ‘android NEAR starcraft;• SELECT * FROM users WHERE users MATCH ‘android NEAR/5 starcraft;
  10. 10. Tokenizers• Tokenizer is a set of rules for extracting terms from a document• Default value is ‘simple’• Simple: converts to lower case, splitting by alphanumeric+’_’• Porter: simple + converts to common English root.• ICU: country specific (tokenize=icu th_TH for Turkey)• Custom implementation

×