ElasticSearchBeyond Ordinary Fulltext SearchKarel Minařík
http://karmi.cz                  ElasticSearch
AUDIENCE POLLDoes your application have a search feature?                                               ElasticSearch
AUDIENCE POLLWhat do you use for search?1. SELECT  ...  LIKE  %foo%2. Sphinx3. Apache Solr4. ElasticSearch                ...
Search is the primary interfacefor getting information today.                             ElasticSearch
http://www.apple.com/macosx/what-is-macosx/spotlight.html
http://www.apple.com/iphone/features/search.html
???
???
#uxfail???
Y U NO ALIGN???
???
???
Search is hard.Lets go write SQL queries!                         ElasticSearch
WHY SEARCH SUCKS?How do you implement search?def  search    @results  =  MyModel.search  params[:q]    respond_with  @resu...
WHY SEARCH SUCKS?How do you implement search?                    Query       Results   Result                            M...
WHY SEARCH SUCKS?How do you implement search?                    Query       Results   Result                            M...
23px                      670pxA personal story...
WHY SEARCH SUCKS?Compare your search library with your ORM libraryMyModel.search  "(this  OR  that)  AND  NOT  whatever"Ar...
How does search work?                        ElasticSearch
HOW DOES SEARCH WORK?A collection of documents      file_1.txt      The  ruby  is  a  pink  to  blood-­‐red  colored  gems...
HOW DOES SEARCH WORK?How do you search documents?File.read(file_1.txt).include?(ruby)File.read(file_2.txt).include?(ruby)...
HOW DOES SEARCH WORK?The inverted indexTOKENS                         POSTINGS ruby                           file_1.txt  ...
HOW DOES SEARCH WORK?The inverted indexMySearchLib.search  "ruby" ruby                           file_1.txt        file_2....
HOW DOES SEARCH WORK?The inverted indexMySearchLib.search  "song" ruby                           file_1.txt        file_2....
HOW DOES SEARCH WORK?The inverted indexMySearchLib.search  "ruby  AND  song" ruby                           file_1.txt    ...
module  SimpleSearch                                                                           A naïve Ruby implementation...
HOW DOES SEARCH WORK?Indexing documentsSimpleSearch.index  "file1",  "Ruby  is  a  language.  Java  is  also  a  language....
HOW DOES SEARCH WORK?The indexputs  "Whats  in  our  index?"p  SimpleSearch::INDEX{    "ruby"          =>  ["file1",  "fil...
HOW DOES SEARCH WORK?Search the indexSimpleSearch.search  "ruby"Results  for  token  ruby:*  file1*  file2*  file3
HOW DOES SEARCH WORK?The inverted indexTOKENS                         POSTINGS ruby    3                      file_1.txt  ...
It is very practical to know how search works.For instance, now you know thatthe analysis step is very important.Its more ...
module  SimpleSearch    def  index  document,  content        tokens  =  analyze  content        store  document,  tokens ...
HOW DOES SEARCH WORK?The Search Engine Textbook                                 Search Engines                            ...
SEARCH IMPLEMENTATIONSThe Baseline Information Retrieval Implementation                              Lucene in Action     ...
http://elasticsearch.org
ElasticSearch is an open source, scalable,distributed, cloud-ready, highly-available full-text search engine and database ...
{ }HTTPJSONSchema-freeIndex as ResourceDistributedQueriesFacetsMappingRuby                    ElasticSearch
ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby#  Ad...
ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby#  Ad...
ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubyhttp ...
ELASTICSEARCH FEATURES         JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby   ...
ELASTICSEARCH FEATURESHTTP /   JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycur...
ELASTICSEARCH FEATURESHTTP /   JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycur...
ELASTICSEARCH FEATURESHTTP / JSON /   Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycur...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Rubycur...
ELASTICSEARCH FEATURES HTTP / JSON / Schema Free /    Index as Resource / Distributed / Queries / Facets / Mapping / Ruby ...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / Ruby{"_...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / RubyInd...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / RubyThe...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free /   Index as Resource / Distributed / Queries / Facets / Mapping / RubyInd...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby   ...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby In...
ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / RubyIm...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / Ruby   ...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource /   Distributed / Queries / Facets / Mapping / RubyInd...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed /   Queries / Facets / Mapping / Ruby                    $  ...
ELASTICSEARCH FEATURES                                            Queries / Facets / Mapping / Ruby                       ...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed /   Queries / Facets / Mapping / RubyGeo Search             ...
ELASTICSEARCH FEATURES    HTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / RubyQueryFacets        ...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Rubycurl  -­‐X  POST  "http...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Ruby  curl  -­‐X  POST  "ht...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries /   Facets / Mapping / Ruby   Geo Facets  curl  -­...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets /     Mapping / Rubycurl  -­‐X  DELETE  "...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets /      Mapping / Rubycurl  -­‐X  DELETE  ...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping /   RubyTire.index  articles  d...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping /   Rubyclass  Article  <  Acti...
ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping /   Rubyclass  Article    inclu...
Try ElasticSearch in a Ruby On Rails aplication with a one-line command$  rails  new  tired  -­‐m  "https://gist.github.co...
Thanks!  d
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Upcoming SlideShare
Loading in...5
×

Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

7,293

Published on

Talk at the Webexpo 2001 Conference in Prague (http://webexpo.net/)

Published in: Technology, Design
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,293
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
190
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

  1. 1. ElasticSearchBeyond Ordinary Fulltext SearchKarel Minařík
  2. 2. http://karmi.cz ElasticSearch
  3. 3. AUDIENCE POLLDoes your application have a search feature? ElasticSearch
  4. 4. AUDIENCE POLLWhat do you use for search?1. SELECT  ...  LIKE  %foo%2. Sphinx3. Apache Solr4. ElasticSearch ElasticSearch
  5. 5. Search is the primary interfacefor getting information today. ElasticSearch
  6. 6. http://www.apple.com/macosx/what-is-macosx/spotlight.html
  7. 7. http://www.apple.com/iphone/features/search.html
  8. 8. ???
  9. 9. ???
  10. 10. #uxfail???
  11. 11. Y U NO ALIGN???
  12. 12. ???
  13. 13. ???
  14. 14. Search is hard.Lets go write SQL queries! ElasticSearch
  15. 15. WHY SEARCH SUCKS?How do you implement search?def  search    @results  =  MyModel.search  params[:q]    respond_with  @resultsend
  16. 16. WHY SEARCH SUCKS?How do you implement search? Query Results Result MAGICdef  search    @results  =  MyModel.search  params[:q]    respond_with  @resultsend
  17. 17. WHY SEARCH SUCKS?How do you implement search? Query Results Result MAGIC + /def  search    @results  =  MyModel.search  params[:q]    respond_with  @resultsend
  18. 18. 23px 670pxA personal story...
  19. 19. WHY SEARCH SUCKS?Compare your search library with your ORM libraryMyModel.search  "(this  OR  that)  AND  NOT  whatever"Arel::Table.new(:articles).    where(articles[:title].eq(On  Search)).    where(["published_on  =>  ?",  Time.now]).    join(comments).    on(article[:id].eq(comments[:article_id]))    take(5).    skip(4).    to_sql
  20. 20. How does search work? ElasticSearch
  21. 21. HOW DOES SEARCH WORK?A collection of documents file_1.txt The  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ... file_2.txt Ruby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented   programming  language  ... file_3.txt "Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...
  22. 22. HOW DOES SEARCH WORK?How do you search documents?File.read(file_1.txt).include?(ruby)File.read(file_2.txt).include?(ruby)...
  23. 23. HOW DOES SEARCH WORK?The inverted indexTOKENS POSTINGS ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  24. 24. HOW DOES SEARCH WORK?The inverted indexMySearchLib.search  "ruby" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  25. 25. HOW DOES SEARCH WORK?The inverted indexMySearchLib.search  "song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  26. 26. HOW DOES SEARCH WORK?The inverted indexMySearchLib.search  "ruby  AND  song" ruby file_1.txt file_2.txt file_3.txt pink file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  27. 27. module  SimpleSearch A naïve Ruby implementation    def  index  document,  content        tokens  =  analyze  content        store  document,  tokens        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "n"    end    def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  >>>  Reject  stop  words,  digits  and  whitespace        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^d+/  ||  word  ==      }    end    def  store  document_id,  tokens        tokens.each  do  |token|            #  >>>  Save  the  "posting"            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!        end    end    def  search  token        puts  "Results  for  token  #{token}:"        #  >>>  Print  documents  stored  in  index  for  this  token        INDEX[token].each  {  |document|  "    *  #{document}"  }    end    INDEX  =  {}    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  t    extend  selfend
  28. 28. HOW DOES SEARCH WORK?Indexing documentsSimpleSearch.index  "file1",  "Ruby  is  a  language.  Java  is  also  a  language.SimpleSearch.index  "file2",  "Ruby  is  a  song."SimpleSearch.index  "file3",  "Ruby  is  a  stone."SimpleSearch.index  "file4",  "Java  is  a  language."Indexed  document  file1  with  tokens:["ruby",  "language",  "java",  "also",  "language"]Indexed  document  file2  with  tokens:["ruby",  "song"] Words downcased, stopwords removed.Indexed  document  file3  with  tokens:["ruby",  "stone"]Indexed  document  file4  with  tokens:["java",  "language"]
  29. 29. HOW DOES SEARCH WORK?The indexputs  "Whats  in  our  index?"p  SimpleSearch::INDEX{    "ruby"          =>  ["file1",  "file2",  "file3"],    "language"  =>  ["file1",  "file4"],    "java"          =>  ["file1",  "file4"],    "also"          =>  ["file1"],    "stone"        =>  ["file3"],    "song"          =>  ["file2"]}
  30. 30. HOW DOES SEARCH WORK?Search the indexSimpleSearch.search  "ruby"Results  for  token  ruby:*  file1*  file2*  file3
  31. 31. HOW DOES SEARCH WORK?The inverted indexTOKENS POSTINGS ruby 3 file_1.txt file_2.txt file_3.txt pink 1 file_1.txt gemstone file_1.txt dynamic file_2.txt reflective file_2.txt programming file_2.txt song file_3.txt english file_3.txt rock file_3.txthttp://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices
  32. 32. It is very practical to know how search works.For instance, now you know thatthe analysis step is very important.Its more important than the “search” step. ElasticSearch
  33. 33. module  SimpleSearch    def  index  document,  content        tokens  =  analyze  content        store  document,  tokens        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "n"    end    def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  >>>  Reject  stop  words,  digits  and  whitespace        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^d+/  ||  word  ==      }    end    def  store  document_id,  tokens        tokens.each  do  |token|            #  >>>  Save  the  "posting"            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!        end    end    def  search  token        puts  "Results  for  token  #{token}:"        #  >>>  Print  documents  stored  in  index  for  this  token        INDEX[token].each  {  |document|  "    *  #{document}"  }    end    INDEX  =  {}    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  t    extend  selfend A naïve Ruby implementation
  34. 34. HOW DOES SEARCH WORK?The Search Engine Textbook Search Engines Information Retrieval in Practice Bruce Croft, Donald Metzler and Trevor Strohma Addison Wesley, 2009http://search-engines-book.com
  35. 35. SEARCH IMPLEMENTATIONSThe Baseline Information Retrieval Implementation Lucene in Action Michael McCandless, Erik Hatcher and Otis Gospodnetic July, 2010http://manning.com/hatcher3
  36. 36. http://elasticsearch.org
  37. 37. ElasticSearch is an open source, scalable,distributed, cloud-ready, highly-available full-text search engine and database with powerfullaggregation features, communicating by JSONover RESTful HTTP, based on Apache Lucene. ElasticSearch
  38. 38. { }HTTPJSONSchema-freeIndex as ResourceDistributedQueriesFacetsMappingRuby ElasticSearch
  39. 39. ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby#  Add  a  documentcurl  -­‐X  POST      "http://localhost:9200/articles/article/1"         INDEX TYPE ID    -­‐d  {  "title"  :  "One"  } DOCUMENT
  40. 40. ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby#  Add  a  documentcurl  -­‐X  POST  "http://localhost:9200/articles/article/1"  -­‐d  {  "title"  :  "One"  }#  Perform  querycurl  -­‐X  GET    "http://localhost:9200/articles/_search?q=One"curl  -­‐X  POST  "http://localhost:9200/articles/_search"  -­‐d  {    "query"  :  {  "terms"  :  {  "tags"  :  ["ruby",  "python"],  "minimum_match"  :  2  }  }}#  Delete  indexcurl  -­‐X  DELETE    "http://localhost:9200/articles"#  Create  index  with  settings  and  mappingcurl  -­‐X  PUT      "http://localhost:9200/articles"  -­‐d  {  "settings"  :  {  "index"  :  "number_of_shards"  :  3,  "number_of_replicas"  :  2  }},{  "mappings"  :  {  "document"  :  {                                      "properties"  :  {                                          "body"  :  {  "type"  :  "string",  "analyzer"  :  "snowball"  }                                      }                              }  }}
  41. 41. ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubyhttp  { GET  http://user:password@localhost:8080/_search?q=*  =>  http://localhost:9200/user/_search?q=*    server  {        listen              8080;        server_name    search.example.com;        error_log      elasticsearch-­‐errors.log;        access_log    elasticsearch.log;        location  /  {            #  Deny  access  to  Cluster  API            if  ($request_filename  ~  "_cluster")  {                return  403; #664 Add HTTPS and basic authentication support NO.                break;            }            #  Pass  requests  to  ElasticSearch            proxy_pass  http://localhost:9200;            proxy_redirect  off;                                proxy_set_header    X-­‐Real-­‐IP    $remote_addr;            proxy_set_header    X-­‐Forwarded-­‐For  $proxy_add_x_forwarded_for;            proxy_set_header    Host  $http_host;            #  Authorize  access            auth_basic                      "ElasticSearch";            auth_basic_user_file  passwords;            #  Route  all  requests  to  authorized  users  own  index            rewrite    ^(.*)$    /$remote_user$1    break;            rewrite_log  on;            return  403;                } https://gist.github.com/986390    }
  42. 42. ELASTICSEARCH FEATURES JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby ONHTTP / JS{    "id"        :  "abc123",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  .    "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    }}
  43. 43. ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  {    "id"        :  "abc123",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  .    "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    }}curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"curl  -­‐X  GET      "http://localhost:9200/articles/article/_search?q=author.first_name:clara"
  44. 44. ELASTICSEARCH FEATURESHTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  ..."published_on"  :  "2011/05/27  10:00:00",...curl  -­‐X  GET        "http://localhost:9200/articles/_mapping?pretty=true"{    "articles"  :  {        "article"  :  {            "properties"  :  {                "title"  :  {                    "type"  :  "string"                },                //  ...                "author"  :  {                    "dynamic"  :  "true",                    "properties"  :  {                        "first_name"  :  {                            "type"  :  "string"                        },                        //  ...                    }                },                "published_on"  :  {                    "format"  :  "yyyy/MM/dd  HH:mm:ss||yyyy/MM/dd",                    "type"  :  "date"                }            }        }    }}
  45. 45. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  POST      "http://localhost:9200/articles/comment"  -­‐d  {        "body"  :  "Wow!  Really  nice  JSON  support.", DIFFERENT TYPE    "published_on"  :  "2011/05/27  10:05:00",    "author"  :  {        "first_name"  :  "John",        "last_name"    :  "Pear",        "email"            :  "john@pear.org"    }}curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"curl  -­‐X  GET      "http://localhost:9200/articles/comment/_search?q=author.first_name:john"
  46. 46. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  GET      "http://localhost:9200/articles/comment/_search?q=body:json" Search single typecurl  -­‐X  GET      "http://localhost:9200/articles/_search?q=body:json" Search whole indexcurl  -­‐X  GET      "http://localhost:9200/articles,users/_search?q=body:json" Search multiple indicescurl  -­‐X  GET      "http://localhost:9200/_search?q=body:json" Search all indices
  47. 47. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1 curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d   {    "id"        :  "abc123",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",    "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    } } curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"curl  -­‐X  GET  "http://localhost:9200/articles/article/abc123"
  48. 48. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby{"_index":"articles","_type":"article","_id":"1","_version":1,  "_source"  :  {    "id"        :  "1",    "title"  :  "ElasticSearch  Understands  JSON!",    "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s      "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],    "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "clara@rice.org"    }}} “The Index Is Your Database”
  49. 49. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / RubyIndex Aliases curl  -­‐X  POST  http://localhost:9200/_aliases  -­‐d   {    "actions"  :  [        {  "add"  :  { index_A                "index"  :  "index_1",                "alias"  :  "myalias"my_alias            }        },        {  "add"  :  {                "index"  :  "index_2",                "alias"  :  "myalias" index_B            }        }    ] }http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html
  50. 50. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / RubyThe “Sliding Window” problem curl  -­‐X  DELETE  http://localhost:9200  /  logs_2010_01 logs_2010_02 logs logs_2010_03 logs_2010_04“We can really store only three months worth of data.”
  51. 51. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / RubyIndex Templatescurl  -­‐X  PUT  localhost:9200/_template/bookmarks_template  -­‐d  {    "template"  :  "users_*", Apply this configuration for every matching    "settings"  :  { index being created        "index"  :  {            "number_of_shards"      :  1,            "number_of_replicas"  :  3        }    },    "mappings":  {        "url":  {            "properties":  {                "url":  {                    "type":  "string",  "analyzer":  "url_ngram",  "boost":  10                },                "title":  {                    "type":  "string",  "analyzer":  "snowball",  "boost":  5                }                //  ...            }        }    }}http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
  52. 52. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby $  cat  elasticsearch.yml cluster:    name:  <YOUR  APPLICATION> Automatic Discovery Protocol MASTER Node 1 Node 2 Node 3 Node 4http://www.elasticsearch.org/guide/reference/modules/discovery/
  53. 53. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Index A is split into 3 shards, and duplicated in 2 replicas. A1 A1 A1 Replicas A2 A2 A2 A3 A3 A3 curl  -­‐XPUT  http://localhost:9200/A/  -­‐d  {        "settings"  :  {                "index"  :  { Shards                        "number_of_shards"      :  3,                        "number_of_replicas"  :  2                }        } }
  54. 54. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / RubyIm pr ce ove an rm in de rfo xi pe ng h pe a rc rfo se rm e ov an pr ce Im SH AR AS DS IC PL RE
  55. 55. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby Y U NO ASK FIRST???
  56. 56. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / RubyIndexing 100 000 documents (~ 56MB), one shard, no replicas, MacBookAir SSD 2GB#  Index  all  at  oncetime  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"      -­‐-­‐data-­‐binary  @data/bulk_all.json  >  /dev/nullreal   2m1.142s#  Index  in  batches  of  1000for  file  in  data/bulk_*.json;  do    time  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"          -­‐-­‐data-­‐binary  @$file  >  /dev/nulldonereal   1m36.697s  (-­‐25sec,  80%)#  Do  not  refresh  during  indexing  in  batches"settings"  :  {  "refresh_interval"  :  "-­‐1"  }for  file  in  data/bulk_*.json;  do...real   0m38.859s  (-­‐82sec,  32%)
  57. 57. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby $  curl  -­‐X  GET  "http://localhost:9200/_search?q=<YOUR  QUERY>" apple Terms apple  iphone Phrases "apple  iphone" Proximity "apple  safari"~5 Fuzzy apple~0.8 app* Wildcards *pp* Boosting apple^10  safari [2011/05/01  TO  2011/05/31] Range [java  TO  json] apple  AND  NOT  iphone +apple  -­‐iphone Boolean (apple  OR  iphone)  AND  NOT  review title:iphone^15  OR  body:iphone Fields published_on:[2011/05/01  TO  "2011/05/27  10:00:00"]http://lucene.apache.org/java/3_1_0/queryparsersyntax.html
  58. 58. ELASTICSEARCH FEATURES Queries / Facets / Mapping / Ruby ONHTTP / JSON / Schema Free / Distributed / JSQuery DSLcurl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  {    "query"  :  {        "terms"  :  {            "tags"  :  [  "ruby",  "python"  ],            "minimum_match"  :  2        }    }}http://www.elasticsearch.org/guide/reference/query-dsl/
  59. 59. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyGeo Search Accepted  formats  for  Geo: [lon, lat] # Arraycurl  -­‐X  POST  "http://localhost:9200/venues/venue"  -­‐d  { "lat,lon" # String    "name":  "Pizzeria", drm3btev3e86 # Geohash    "pin":  {        "location":  {            "lat":  50.071712,            "lon":  14.386832        }    }}curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  {    "query"  :  {        "filtered"  :  {                "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },                "filter"  :  {                        "geo_distance"  :  {                                "distance"  :  "0.5km",                                "pin.location"  :  {  "lat"  :  50.071481,  "lon"  :  14.387284  }                        }                }        }    }}http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html
  60. 60. ELASTICSEARCH FEATURES HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyQueryFacets http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/
  61. 61. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  {    "query"  :  {        "query_string"  :  {  "query"  :  "title:T*"} User query    },    "filter"  :  {        "terms"  :  {  "tags"  :  ["ruby"]  } “Checkboxes”    },    "facets"  :  {        "tags"  :  {            "terms"  :  { Facets                    "field"  :  "tags",                    "size"  :  10            }        }    }}#  facets"  :  {#      "tags"  :  {#          "terms"  :  [  {#              "term"  :  "ruby",#              "count"  :  2#          },  {#              "term"  :  "python",#              "count"  :  1#          },  {#              "term"  :  "java",#              "count"  :  1#          }  ]#      }#  }http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
  62. 62. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d   {    "facets"  :  {        "published_on"  :  {            "date_histogram"  :  {                "field"        :  "published",                "interval"  :  "day"            }        }    } }
  63. 63. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby Geo Facets curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d   {        "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },        "facets"  :  {                "distance_count"  :  {                        "geo_distance"  :  {                                "pin.location"  :  {                                        "lat"  :  50.071712,                                        "lon"  :  14.386832                                },                                "ranges"  :  [                                        {  "to"  :  1  },                                        {  "from"  :  1,  "to"  :  5  },                                        {  "from"  :  5,  "to"  :  10  }                                ]                        }                }        } } http://www.elasticsearch.org/guide/reference/api/search/facets/geo-distance-facet.html
  64. 64. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  DELETE  "http://localhost:9200/articles"curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  {    "mappings":  {        "article":  {            "properties":  {                "tags":  {                    "type":  "string",                    "analyzer":  "keyword"                },                "content":  {                    "type":  "string",                    "analyzer":  "snowball"                },                "title":  {                    "type":  "string",                    "analyzer":  "snowball",                    "boost":        10.0                }            }        }    }}curl  -­‐X  GET        http://localhost:9200/articles/_mapping?pretty=true Remember?    def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  ...http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html    end
  65. 65. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Rubycurl  -­‐X  DELETE  "http://localhost:9200/urls"curl  -­‐X  POST      "http://localhost:9200/urls/url"  -­‐d  {    "settings"  :  {        "index"  :  {            "analysis"  :  {                "analyzer"  :  {                    "url_analyzer"  :  {                        "type"  :  "custom",                        "tokenizer"  :  "lowercase",                        "filter"        :  ["stop",  "url_stop",  "url_ngram"]                    }                },                "filter"  :  {                    "url_stop"  :  {                        "type"  :  "stop",                        "stopwords"  :  ["http",  "https",  "www"]                    },                    "url_ngram"  :  {                        "type"  :  "nGram",                        "min_gram"  :  3,                        "max_gram"  :  5                    }                }            }        }    }}https://gist.github.com/988923
  66. 66. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyTire.index  articles  do    delete    create    store  :title  =>  One,      :tags  =>  [ruby],                      :published_on  =>  2011-­‐01-­‐01    store  :title  =>  Two,      :tags  =>  [ruby,  python],  :published_on  =>  2011-­‐01-­‐02    store  :title  =>  Three,  :tags  =>  [java],                      :published_on  =>  2011-­‐01-­‐02    store  :title  =>  Four,    :tags  =>  [ruby,  php],        :published_on  =>  2011-­‐01-­‐03    refreshends  =  Tire.search  articles  do    query  {  string  title:T*  }    filter  :terms,  :tags  =>  [ruby]    sort  {  title  desc  } http://github.com/karmi/tire    facet  global-­‐tags    {  terms  :tags,  :global  =>  true  }    facet  current-­‐tags  {  terms  :tags  }end
  67. 67. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Rubyclass  Article  <  ActiveRecord::Base    include  Tire::Model::Search    include  Tire::Model::Callbacksend$  rake  environment  tire:import  CLASS=ArticleArticle.search  do    query  {  string  love  }    facet(timeline)  {  date  :published_on,  :interval  =>  month  }    sort    {  published_on  desc  }end http://github.com/karmi/tire
  68. 68. ELASTICSEARCH FEATURESHTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Rubyclass  Article    include  Whatever::ORM    include  Tire::Model::Search    include  Tire::Model::Callbacksend$  rake  environment  tire:import  CLASS=ArticleArticle.search  do    query  {  string  love  }    facet(timeline)  {  date  :published_on,  :interval  =>  month  }    sort    {  published_on  desc  }end http://github.com/karmi/tire
  69. 69. Try ElasticSearch in a Ruby On Rails aplication with a one-line command$  rails  new  tired  -­‐m  "https://gist.github.com/raw/951343/tired.rb" A “batteries included” installation. Downloads and launches ElasticSearch. Sets up a Rails applicationand and launches it. When youre tired of it, just delete the folder.
  70. 70. Thanks! d
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×