Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Full text search adventures

1,773 views

Published on

Talk on Full Text Search, RialsConf 2010

Published in: Technology
  • Be the first to comment

Full text search adventures

  1. 1. Adventures
 in
 Full
Text
SearchSarah
Allen

@ultrasaurus
  2. 2. class Article < ActiveRecord::Base acts_as_solrend
  3. 3. 3
  4. 4. Tokyo
Dystopia
  5. 5. LanguageRelevanceAccuracy Speed
  6. 6. Text as Language
  7. 7. stemming synonyms stop
wordsword
boundaries
  8. 8. SELECT text FROM phrases WHERE text like %run%; Can you run this to the post office for me? Im going for a run, want to come along? Cross country running Im too drunk to drive. I am running out of battery power. Work is not like wolf - it wont run away.
  9. 9. SELECT text FROM phrases WHERE vectors @@ run::tsquery; Can you run this to the post office for me? Sorry I am running really late. Im going for a run, want to come along? Cross country running I am running out of battery power. Work is not like wolf - it wont run away.
  10. 10. Tokenization and StemmingGoogle App Engine /JRuby / Lucenehttp://full-text-search.appspot.comhttp://github.com/ultrasaurus/full-text-search-appengine
  11. 11. hAp://full‐text‐search.appspot.com/ 16
  12. 12. hAp://full‐text‐search.appspot.com/ 17
  13. 13. hAp://full‐text‐search.appspot.com/ 18
  14. 14. hAp://localhost:8080/_ah/admin/datastore?kind=Notes 19
  15. 15. ./script/generate scaffold note content:string index:List -f --skip-migration./script/generate dd_model note content:string index:List -f
  16. 16. class Note include DataMapper::Resource property :id, Serial property :content, String, :required => true, :length => 500 property :index, List, :required => true timestamps :atend
  17. 17. java_import org.apache.lucene.analysis.snowball.SnowballAnalyzerjava_import java.io.StringReader
  18. 18. before :valid?, :update_indexdef update_index analyzer = SnowballAnalyzer.new("English") s = StringReader.new(content) token_stream = analyzer.tokenStream(nil, s) terms = [] while (token = token_stream.next) do terms << token.term end self.index = termsend
  19. 19. before :valid?, :update_indexdef update_index analyzer = SnowballAnalyzer.new("English") s = StringReader.new(content) token_stream = analyzer.tokenStream(nil, s) terms = [] while (token = token_stream.next) do terms << token.term end self.index = termsend
  20. 20. hAp://full‐text‐search.appspot.com/ 25
  21. 21. a about above after again against all am an and any are arent as at be because been before being below between both but by cant cannot could couldnt did didnt do doesdoesnt doing dont down during each few for from further had hadnt has hasnt have havent having he hed hell hes her here heres hers herself him himself his how hows i id ill imive if in into is isnt it its its itself lets me more most mustnt my myself no nor not of off on once only or other ought our ours ourselves out over own same shant she shed shell shes should shouldnt so some such than that thats the their theirs them themselves then there theres these they theyd theyll theyre theyve this those through to too under until up very was wasnt we wed well were weve were werent what whats when whens where wheres which while who whos whom why whys with wont would wouldnt you youd youll youre youve your yours yourself yourselves http://www.ranks.nl/resources/stopwords.html
  22. 22. Word Boundaries
  23. 23. 


  24. 24. 


  25. 25. 


 










 
 


 
 


  26. 26. 


 










 
 


 
 


  27. 27. 

 
 
 I
love
horses 

 










 
 


 
 


  28. 28. 

 
 
 I
love
horses 

 










 
 


 
 


  29. 29. 

 
 
 I
love
horses 

 

Horses
are
beauSful








 
 


 
 


  30. 30. 

 
 
 I
love
horses 

 

Horses
are
beauSful








 
 


 
 


  31. 31. 

 
 
 I
love
horses 

 

Horses
are
beauSful 






 deer
in
the
forest 
 


 
 


  32. 32. 

 
 
 I
love
horses 

 

Horses
are
beauSful 






 deer
in
the
forest 
 


 
 


  33. 33. 

 
 
 I
love
horses 

 

Horses
are
beauSful 






 deer
in
the
forest 
 








deer
live
in
the
woods 
 


  34. 34. 

 
 
 I
love
horses 

 

Horses
are
beauSful 






 deer
in
the
forest 
 








deer
live
in
the
woods


 
 


  35. 35. 

 
 
 I
love
horses 

 

Horses
are
beauSful 






 deer
in
the
forest 
 








deer
live
in
the
woods


 
 


  36. 36. 

 
 
 I
love
horses

 

Horses
are
beauSful 






 deer
in
the
forest 
 








deer
live
in
the
woods


 
 








You
are
an
idiot.


  37. 37. Relevance
  38. 38. Accuracy
  39. 39. Speed
  40. 40. Write HostedDatabase Search Rails
  41. 41. Read HostedDatabase Search Rails
  42. 42. Target Target SourceText Language LanguageWe’re
running
out
of
daylight en jaCould
you
run
this? en jaCross‐country
running en jaI’m
going
for
a
run,
want
to
come
along? en ja
  43. 43. I’m
going
for
a
run,
want
to
come
along? en ja
  44. 44. I’m
going
for
a
run,
want
to
come
along? en ja 

  45. 45. I’m
going
for
a
run,
want
to
come
along? en ja 
ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?
  46. 46. I’m
going
for
a
run,
want
to
come
along? en ja 
ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?Ikuko
Kobayashi
  47. 47. I’m
going
for
a
run,
want
to
come
along? en ja 
ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?Ikuko
Kobayashi2009‐11‐29
20:36:47
UTC
  48. 48. I’m
going
for
a
run,
want
to
come
along? en ja 
ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?Ikuko
Kobayashi2009‐11‐29
20:36:47
UTChAp://….16ec695a‐8fce‐4277‐bdd4.flv
  49. 49. I’m
going
for
a
run,
want
to
come
along? en ja 
ha
shi
ri
ni
iku
ke
do
iAtsho
ni
ki
ma
su
ka?Ikuko
Kobayashi2009‐11‐29
20:36:47
UTChAp://….16ec695a‐8fce‐4277‐bdd4.flvhAp://….Japanese_ikuko_kobayashi.jpg
  50. 50. 62
  51. 51. class Page < ActiveRecord::Base acts_as_tsearch :fields => [ ... ]end
  52. 52. Page.send :acts_as_tsearch, :fields => [:title]PagePart.send :acts_as_tsearch, :fields => [:content]ProgramPropertyList.send :acts_as_tsearch, :fields =>[:instructor, :program_desc, :program_detail, :resource]
  53. 53. @pages
=
Page.find_by_tsearch(@query)
  54. 54. 66
  55. 55. 69
  56. 56. 70
  57. 57. 71
  58. 58. class Phrase < ActiveRecord::Base acts_as_tsearch :fields => [:text]end
  59. 59. Phrase.find_by_tsearch(term, :conditions => {:language_id => target_language.id})
  60. 60. When you think about search...
  61. 61. Questions?

×