Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Full-Text Search
on iOS and Android
Lasse Koskela
BUSINESS DESIGN TECHNOLOGY
2
HELSINKI TOKYO NEW YORK
3
4
5
Lasse Koskela
6
Teevee
Common directory for
Finland’s largest
commercial and public
television broadcasters
aggregating some
200,000 movie...
Supla
Audio podcast application
for Finland’s largest
commercial radio
broadcasting company
producing both original
conten...
FULL-TEXT SEARCHSEARCHSEARCH
9
FULL-TEXT SEARCHSEARCHSEARCH
10
?
A search that compares every word in a document, as
opposed to searching an abstract or ...
SEARCH
ContentUI Index
SEARCH
UI
I type some words and

the app gives me relevant content

despite minor spelling differences,

sorted in a way that makes ...
Don’t make me scroll
14
Ranking defines how good your Search is
• Number of occurrences
• Length of match vs length of text
• Starting position of ...
Search Interface Guidelines
• Don’t block the UI thread with searches.
• Don’t block the database with updates.
• Don’t bl...
ContentUI Index
Content Index
Index
CLucene
SQLite FTS4 tables
20
MobileLucene
SQLite FTS4 tables
(BRFullTextSearch)
CLucene + BRFullTextSearch

indexing documents
#import <BRFullTextSearch/BRFullTextSearch.h>

#import <BRFullTextSearch/CL...
CLucene + BRFullTextSearch

searching documents
// ’t’ ~ kBRSearchFieldNameTitle, ‘k’ ~ kBRSearchFieldNameValue

let query...
MobileLucene

searching documents
// Using wrapper classes from Github user ‘lukhnos’:

import org.lukhnos.lucenestudy.Sea...
SQLite FTS4 tables

indexing documents
// create an FTS table for the index

CREATE VIRTUAL TABLE pages 

USING fts4(title...
SQLite FTS4 tables

searching documents
-- search across all columns, order by "matchinfo"

01 SELECT * FROM pages WHERE p...
SQLite FTS4 tables

searching documentssearching documents
01 SELECT title, docid FROM pages

02

03

04

05

06

07

09

...
ContentUI Index
Content
boarboasurfboardsurfboard
29
30
surf
board
surfboard
boa
boar
31
surf
board
surfboard
boa
boar
massacreassmassacre
32
surf
board
surfboard
boa
boar
mass
acre
ass
massacre
33
surf
board
surfboard
boa
boar
mass
acre
ass
massacre slingshotslingshotslingshot
34
sling
shot
slingshot
hot
slings
surf
board
surfboard
boa
boar
mass
acre
ass
massacre
35
sling shot
slingshot
hotslings
surf board
surfboard
mass acre
massacre
boa
boar
ass
shot
36
slingshot
hot
board
surfboard
acre
massacre
shot
37
slingshot
hot
board
surfboard
acre
massacre
craftsmanshipmanshipship
shot
38
slingshot
hot
board
surfboard
acre
massacre
craftsmanship
manship
ship
shot
39
slingshot
hot
board
surfboard
acre
massacre
craftsmanship
manship
ship
shot
40
slingshot
hot
board
surfboard
acre
massacrecraftsmanship
manship
ship
41
shot
slingshot
hot
board
surfboard
acre
massacrecraftsmanship
manship
ship
42
shot
slingshot
hot
board
surfboard
acre
massacrecraftsmanship
manship
ship
01 def split(word, dictionary):
02 splits, m...
windsurfing " w + indsurfing
windsurfing " wi + ndsurfing
windsurfing " win + dsurfing ✗ only HEAD matches
windsurfing " w...
windsurfing ~ windsurf
44
shot
slingshot
hot
board
surfboard
acre
massacrecraftsmanship
manship
ship
windsurfing " windsur...
45
shot
slingshot
hot
board
surfboard
acre
massacrecraftsmanship
manship
ship
stemming reduces a word to its base (a stem)...
46
shot
slingshot
hot
board
surfboard
acre
massacrecraftsmanship
manship
ship
lemmatisation turns a word to its basic form...
I type some words and

the app gives me relevant content

despite minor spelling differences,

sorted in a way that makes ...
lasse.koskela@reaktor.com
lassekoskela
Upcoming SlideShare
Loading in …5
×

MCE^3 - Lasse Koskela - Full-Text Search on iOS and Android

106 views

Published on

"Search" is one of those things that our users take for granted but is surprisingly difficult to get right. In this talk we'll dig into the essential concepts in implementing a proper full-text search, highlighting the challenges such as sorting by relevancy, term and column boosting, stemming and lemmatisation, and the brutality of compound words. We'll look at implementing full-text search with an embedded search engine as well as with the true workhorse of both iOS and Android developers alike – SQLite and its FTS tables.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

MCE^3 - Lasse Koskela - Full-Text Search on iOS and Android

  1. 1. Full-Text Search on iOS and Android Lasse Koskela
  2. 2. BUSINESS DESIGN TECHNOLOGY 2 HELSINKI TOKYO NEW YORK
  3. 3. 3
  4. 4. 4
  5. 5. 5
  6. 6. Lasse Koskela 6
  7. 7. Teevee Common directory for Finland’s largest commercial and public television broadcasters aggregating some 200,000 movies and video episodes. 7
  8. 8. Supla Audio podcast application for Finland’s largest commercial radio broadcasting company producing both original content and on-demand episodes of radio programming. 8
  9. 9. FULL-TEXT SEARCHSEARCHSEARCH 9
  10. 10. FULL-TEXT SEARCHSEARCHSEARCH 10 ? A search that compares every word in a document, as opposed to searching an abstract or a set of keywords associated with the document. I type some words and
 the app gives me relevant content
 despite minor spelling differences,
 sorted in a way that makes sense.
  11. 11. SEARCH ContentUI Index
  12. 12. SEARCH UI
  13. 13. I type some words and
 the app gives me relevant content
 despite minor spelling differences,
 sorted in a way that makes sense. 13
  14. 14. Don’t make me scroll 14
  15. 15. Ranking defines how good your Search is • Number of occurrences • Length of match vs length of text • Starting position of first match • Full word match vs prefix match • Match in title vs match in body • Date of matching document • Popularity amongst other users • Behavioural conditioning 15
  16. 16. Search Interface Guidelines • Don’t block the UI thread with searches. • Don’t block the database with updates. • Don’t block new searches with old ones. • Don’t delay showing results any longer than you must. • Indicate activity while searching. • Differentiate “no matches” from “didn’t do anything”. • Adding terms should narrow the results. 16
  17. 17. ContentUI Index
  18. 18. Content Index
  19. 19. Index
  20. 20. CLucene SQLite FTS4 tables 20 MobileLucene SQLite FTS4 tables (BRFullTextSearch)
  21. 21. CLucene + BRFullTextSearch
 indexing documents #import <BRFullTextSearch/BRFullTextSearch.h>
 #import <BRFullTextSearch/CLuceneSearchService.h>
 … let lucene = CLuceneSearchService(indexPath: "Stuff.idx")
 lucene.defaultAnalyzerLanguage = "fi"
 lucene.stemmingDisabled = false
 lucene.supportStemmedPrefixSearches = true
 … let fields = [kBRSearchFieldNameTitle: "MCE^3 2016",
 kBRSearchFieldNameValue: "Pure awesomeness"]
 let doc = BRSimpleIndexable(identifier: "doc1", data: fields)
 lucene.addObjectToIndex(doc, queue: nil, finished: nil) 21
  22. 22. CLucene + BRFullTextSearch
 searching documents // ’t’ ~ kBRSearchFieldNameTitle, ‘k’ ~ kBRSearchFieldNameValue
 let query = “t:(surf* OR board*) OR v:(surf* OR board*)"
 let boostedQuery = "t:("surf*") OR v:("board*"^10)"
 let results = lucene.search(query)
 results.iterateWithBlock({ (i, result, stop) in
 NSLog("Match: (result.identifier): 
 (result.dictionaryRepresentation())")
 }) 22
  23. 23. MobileLucene
 searching documents // Using wrapper classes from Github user ‘lukhnos’:
 import org.lukhnos.lucenestudy.SearchResult;
 import org.lukhnos.lucenestudy.Searcher;
 import org.lukhnos.lucenestudy.Document;
 Searcher searcher = new Searcher("Stuff.idx");
 SearchResult result = searcher.search(query, 100);
 for (Document doc : result.documents) {
 Log.d("SEARCH", "Found: " + doc.title);
 }
 searcher.close()
 23
  24. 24. SQLite FTS4 tables
 indexing documents // create an FTS table for the index
 CREATE VIRTUAL TABLE pages 
 USING fts4(title, body, tokenize=icu fi_FI);
 // add a document to the index
 INSERT INTO pages (docid, title, body)
 VALUES(42, 'MCE^3 2016', 'Still Pure awesomeness');
 // optimise the index when the app is idle
 INSERT INTO pages(pages) VALUES('optimize'); 24
  25. 25. SQLite FTS4 tables
 searching documents -- search across all columns, order by "matchinfo"
 01 SELECT * FROM pages WHERE pages 
 02 MATCH 'surf* OR board*'
 03 ORDER BY matchinfo(pages) DESC;
 25
  26. 26. SQLite FTS4 tables
 searching documentssearching documents 01 SELECT title, docid FROM pages
 02
 03
 04
 05
 06
 07
 09
 10 WHERE pages MATCH 'surf* OR board*'
 
 02 JOIN (
 03 SELECT docid,
 04 bm25f(matchinfo(pages,'pcxnal'), 10, 1) AS rank
 05 -- 'bm25f' is a custom SQL function!
 06 FROM pages WHERE pages MATCH 'surf* OR board*'
 07 ORDER BY rank DESC LIMIT 1000 OFFSET 0
 09 ) AS ranktable USING(docid)
 10
 11 ORDER BY ranktable.rank DESC 26
  27. 27. ContentUI Index
  28. 28. Content
  29. 29. boarboasurfboardsurfboard 29
  30. 30. 30 surf board surfboard boa boar
  31. 31. 31 surf board surfboard boa boar massacreassmassacre
  32. 32. 32 surf board surfboard boa boar mass acre ass massacre
  33. 33. 33 surf board surfboard boa boar mass acre ass massacre slingshotslingshotslingshot
  34. 34. 34 sling shot slingshot hot slings surf board surfboard boa boar mass acre ass massacre
  35. 35. 35 sling shot slingshot hotslings surf board surfboard mass acre massacre boa boar ass
  36. 36. shot 36 slingshot hot board surfboard acre massacre
  37. 37. shot 37 slingshot hot board surfboard acre massacre craftsmanshipmanshipship
  38. 38. shot 38 slingshot hot board surfboard acre massacre craftsmanship manship ship
  39. 39. shot 39 slingshot hot board surfboard acre massacre craftsmanship manship ship
  40. 40. shot 40 slingshot hot board surfboard acre massacrecraftsmanship manship ship
  41. 41. 41 shot slingshot hot board surfboard acre massacrecraftsmanship manship ship
  42. 42. 42 shot slingshot hot board surfboard acre massacrecraftsmanship manship ship 01 def split(word, dictionary): 02 splits, min_length = [], 3 03 if len(word) >= (min_length * 2): 04 for i in range(min_length, len(word) - min_length): 05 head, tail = word[:i], word[i:] 06 if (head in dictionary): 07 if (tail in dictionary): 08 splits.append(head + " " + tail) 09 splits.extend(split(tail, dictionary)) 10 return splits
  43. 43. windsurfing " w + indsurfing windsurfing " wi + ndsurfing windsurfing " win + dsurfing ✗ only HEAD matches windsurfing " wind + surfing ✓ HEAD and TAIL match! windsurfing " winds + urfing ✗ only HEAD matches windsurfing " windsu + rfing windsurfing " windsur + fing windsurfing " windsurf + ing ✗ only HEAD matches windsurfing " windsurfi + ng windsurfing " windsurfin + g 43 shot slingshot hot board surfboard acre massacrecraftsmanship manship ship windsurfing windsurfing " windsurf + ing
  44. 44. windsurfing ~ windsurf 44 shot slingshot hot board surfboard acre massacrecraftsmanship manship ship windsurfing " windsurf + ing stemming reduces a word to its base (a stem)
  45. 45. 45 shot slingshot hot board surfboard acre massacrecraftsmanship manship ship stemming reduces a word to its base (a stem) // The Snowball Project - http://showballstem.org
 import org.tartarus.snowball.ext.EnglishStemmer;
 
 EnglishStemmer stemmer = new EnglishStemmer();
 stemmer.setCurrent("estimate");
 stemmer.stem();
 stemmer.getCurrent(); // => "estim"
  46. 46. 46 shot slingshot hot board surfboard acre massacrecraftsmanship manship ship lemmatisation turns a word to its basic form (a lemma) // pod 'Parsimmon' - wraps NSLinguisticTagger
 import Parsimmon
 
 let phrase = "I'm eating chocolate bunnies."
 let stems = Lemmatizer().lemmatizeWordsInText(phrase)
 print(stems) // => ["I", "eat", "chocolate", "bunny"]
  47. 47. I type some words and
 the app gives me relevant content
 despite minor spelling differences,
 sorted in a way that makes sense. 47
  48. 48. lasse.koskela@reaktor.com lassekoskela

×