SlideShare a Scribd company logo
1 of 31
Download to read offline
ElasticSearch with Tire
                            @AbookYun, Polydice Inc.




Wednesday, February 6, 13                              1
It’s all about Search
                    • How does search work?
                    • ElasticSearch
                    • Tire




Wednesday, February 6, 13                           2
How does search work?


                            A collection of articles

                    • Article.find(1).to_json
                            { title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” }


                    • Article.find(2).to_json
                            { title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object-
                            oriented programming language.” }


                    • Article.find(3).to_json
                            { title: “Three”, content: “Ruby is a song by English rock band.” }




Wednesday, February 6, 13                                                                                     3
How does search work?


                            How do you search?



                       Article.where(“content like ?”, “%ruby%”)




Wednesday, February 6, 13                                          4
How does search work?


                            The inverted index
                T0 = “it is what it is”
                T1 = “what is it”
                T2 = “it is a banana”

                “a”: {2}
                “banana”: {2}
                “is”: {0, 1, 2}
                “it”: {0, 1, 2}
                “what”: {0, 1}

                A term search for the terms “what”, “is” and “it”
                {0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1}



Wednesday, February 6, 13                                           5
How does search work?


                                 The inverted index
                            TOKEN                     ARTICLES
                               ruby       article_1    article_2   article_3
                               pink       article_1
                             gemstone     article_1
                              dynamic                  article_2
                             reflective                 article_2
                            programming                article_2
                               song                                article_3
                              english                              article_3
                               rock                                article_3




Wednesday, February 6, 13                                                      6
How does search work?


                                 The inverted index
                                          Article.search(“ruby”)
                               ruby       article_1     article_2   article_3
                               pink       article_1
                             gemstone     article_1
                              dynamic                   article_2
                             reflective                  article_2
                            programming                 article_2
                               song                                 article_3
                              english                               article_3
                               rock                                 article_3




Wednesday, February 6, 13                                                       7
How does search work?


                                 The inverted index
                                          Article.search(“song”)
                               ruby       article_1     article_2   article_3
                               pink       article_1
                             gemstone     article_1
                              dynamic                   article_2
                             reflective                  article_2
                            programming                 article_2
                               song                                 article_3
                              english                               article_3
                               rock                                 article_3




Wednesday, February 6, 13                                                       8
module SimpleSearch
           	

  def index document, content
           	

  	

    tokens = analyze content
           	

  	

    store document, tokens
                        puts "Indexed document #{document} with tokens:", tokens.inspect, "n"
           	

  end

           	

    def analyze content
           	

    	

   # Split content by words into "tokens"
           	

    	

   content.split(/W/).
           	

    	

   # Downcase every word
           	

    	

   map { |word| word.downcase }.
           	

    	

   # Reject stop words, digits and whitespace
           	

    	

   reject { |word| STOPWORDS.include?(word) || word =~ /^d+/ || word == '' }
           	

    end

           	

    def store document_id, tokens
           	

    	

    tokens.each do |token|
           	

    	

    	

   ((INDEX[token] ||= []) << document_id).uniq!
           	

    	

    end
           	

    end

           	

    def search token
           	

    	

   puts "Results for token '#{token}':"
           	

    	

   INDEX[token].each { |document| " * #{document}" }
           	

    end

           	

    INDEX = {}
           	

    STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there)

           	

    extend self
           end


Wednesday, February 6, 13                                                                                         9
How does search work?


                            Indexing documents
                SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.”
                SimpleSearch.index “article2”, “Ruby is a song.”
                SimpleSearch.index “article3”, “Ruby is a stone.”
                SimpleSearch.index “article4”, “Java is a language.”




Wednesday, February 6, 13                                                                       10
How does search work?


                            Indexing documents
                SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.”
                SimpleSearch.index “article2”, “Ruby is a song.”
                SimpleSearch.index “article3”, “Ruby is a stone.”
                SimpleSearch.index “article4”, “Java is a language.”

                Indexed document article1 with tokens:
                [“ruby”, “language”, “java”, “also”, “language”]
                Indexed document article2 with tokens:
                [“ruby”, “song”]
                Indexed document article3 with tokens:
                [“ruby”, “stone”]
                Indexed document article4 with tokens:
                [“java”, “language”]




Wednesday, February 6, 13                                                                       11
How does search work?


                                                   Index
                print SimpleSearch::INDEX

                {
                    “ruby”       => [“article1”, “article2”, “article3”],
                    “language”   => [“article1”, “article4”],
                    “java”       => [“article1”, “article4”],
                    “also”       => [“article1”],
                    “stone”      => [“article3”],
                    “song”       => [“article2”]
                }




Wednesday, February 6, 13                                                   12
How does search work?


                              Search the index

                SimpleSearch.search “ruby”

                Results for token ‘ruby’:
                * article1
                * article2
                * article3




Wednesday, February 6, 13                        13
How does search work?


                                        Search is ...
                                           Inverted Index
                                        { “ruby”: [1,2,3], “language”: [1,4] }

                                                         +
                                        Relevance Scoring
                      • How many matching terms does this document contain?
                      • How frequently does each term appear in all your documents?
                      • ... other complicated algorithms.



Wednesday, February 6, 13                                                             14
ElasticSearch
                       ElasticSearch is an Open Source (Apache 2),
                       Distributed, RESTful, Search Engine built on
                       top of Apache Lucene.
                       http://github.com/elasticsearch/elasticsearch




Wednesday, February 6, 13                                              15
ElasticSearch


                                    Terminology
                            Relational DB   ElasticSearch
                                Database       Index
                                 Table          Type
                                 Row         Document
                                Column          Field
                                Schema        Mapping
                                 Index      *Everything
                                  SQL        query DSL


Wednesday, February 6, 13                                   16
ElasticSearch


                                                RESTful
                       # Add document

                       curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One” }

                       # Delete document

                       curl -XDELETE ‘http://localhost:9200/articles/article/1’

                       # Search

                       curl -XGET ‘http://localhost:9200/articles/_search?q=One’




Wednesday, February 6, 13                                                                             17
ElasticSearch


                                 JSON in / JSON out
                       # Query
                       curl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{
                         “query”: {
                            “term”: { “title”: “One” }
                          }
                       }’
                       # Results
                       {
                         “_shards”: {
                            “total”: 5,
                            “success”: 5,
                            “failed”: 0
                          },
                         “hits”: {
                            “total”: 1,
                            “hits”: [{
                              “_index”: “articles”,
                              “_type”: “article”,
                              “_id”: “1”,
                              “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” }
                             }]
                          }
Wednesday, February 6, 13                                                                                                 18
ElasticSearch


                                          Distributed
                       The discovery module is responsible for discovering nodes within a
                       cluster, as well as electing a master node.

                       The responsibility of the master node is to maintain the global cluster
                       global cluster state, and act if nodes join or leave the cluster by
                       reassigning shards.


                                        Automatic Discovery Protocol




                      Node 1               Node 2              Node 3               Node 4
                                                                                     Master

Wednesday, February 6, 13                                                                        19
ElasticSearch


                                          Distributed
                       by default, every Index will split into 5 shards and duplicated in 1 replicas.



                                                     Index A



                            A1           A2             A3             A4            A5          Shards


                            A1’          A2’            A3’           A4’            A5’        Replicas




Wednesday, February 6, 13                                                                                  20
ElasticSearch


                                       Query DSL
                    Queries                 Filters
                      - query_string            - term
                      - term                    - query
                      - wildcard                - range
                      - boosting                - bool
                      - bool                    - and
                      - filtered                 - or
                      - fuzzy                   - not
                      - range                   - limit
                      - geo_shape               - match_all
                      - ...                     - ...


Wednesday, February 6, 13                                     21
ElasticSearch


                                Query DSL
                    Queries            Filters
                      - query_string       - term
                      - term               - query
                      - wildcard
                     With Relevance        - With Cache
                                             range
                      - boosting
                     Without Cache         - bool
                                       Without Relevance
                      - bool               - and
                      - filtered            - or
                      - fuzzy              - not
                      - range              - limit
                      - geo_shape          - match_all
                      - ...                - ...


Wednesday, February 6, 13                                  22
ElasticSearch


                                                           Facets
                 curl -X DELETE "http://localhost:9200/articles"
                 curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : ["foo"]}'
                 curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}'
                 curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'

                 curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '
                  {
                     "query" : { "query_string" : {"query" : "T*"} },
                     "facets" : {
                       "tags" : { "terms" : {"field" : "tags"} }
                     }
                  }'




Wednesday, February 6, 13                                                                                                         23
ElasticSearch


                                          Facets
                 "facets" : {
                   "tags" : {
                     "_type" : "terms",
                     "missing" : 0,
                     "total": 5,
                     "other": 0,
                     "terms" : [ {
                       "term" : "foo",
                       "count" : 2
                     }, {
                       "term" : "bar",
                       "count" : 2
                     }, {
                       "term" : "baz",
                       "count" : 1
                     }]
                   }



Wednesday, February 6, 13                          24
ElasticSearch


                                                    Mapping
                 curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d '
                 {
                 	

     "article": {
                 	

       "properties": {
                 	

         "tags": {
                 	

             "type": "string",
                 	

             "analyzer": "keyword"
                 	

         },
                 	

         "title": {
                 	

          	

 "type": "string",
                 	

          	

 "analyzer": "snowball",
                 	

          	

 "boost": 10.0
                 	

         },
                             "content": {
                                  "type": "string",
                                  "analyzer": "snowball"
                             }
                 	

       }
                 	

     }
                 }'
                 curl -XGET 'http://localhost:9200/articles/article/_mapping'
Wednesday, February 6, 13                                                            25
ElasticSearch


                                                      Analyzer
                 curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d '
                 {
                   “article”: {
                      “properties”: { “title”: { “type”: “string”, “analyzer”: “trigrams” } }
                    }
                 }’
                 curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’

                  C         u         p         e         r          t         i         n      o

                  C         u         p

                            u         p         e

                                      p         e         r

                                                 .         .         .
Wednesday, February 6, 13                                                                           26
Tire
                       A rich Ruby API and DSL for the
                       ElasticSearch search engine.
                       http://github.com/karmi/tire/




Wednesday, February 6, 13                                27
Tire


                       ActiveRecord Integration
                       # New rails application
                       $ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb

                       # Callback
                       class Article < ActiveRecord::Base
                         include Tire::Model::Search
                         include Tire::Model::Callbacks
                       end

                       # Create a article
                       Article.create :title => "I Love Elasticsearch",
                                  :content => "...",
                                  :author => "Captain Nemo",
                                  :published_on => Time.now

                       # Search
                       Article.search do
                        query             { string 'love' }
                        facet('timeline') { date :published_on, :interval => 'month' }
                        sort              { by :published_on, 'desc' }
                       end


Wednesday, February 6, 13                                                                                                         28
Tire


                       ActiveRecord Integration
                       class Article < ActiveRecord::Base
                         include Tire::Model::Search
                         include Tire::Model::Callbacks
                         # Setting
                         settings :number_of_shards => 3,
                                  :number_of_replicas => 2,
                                  :analysis => {
                                    :analyzer => {
                                      :url_analyzer => {
                                        ‘tokenizer’ => ‘lowercase’,
                                        ‘filter’ => [‘stop’, ‘url_ngram’]
                                      }
                                    }
                                  }

                        # Mapping
                        mapping do
                         indexes :title, :analyzer => :not_analyzer, :boost => 100
                         indexes :content, :analyzer => ‘snowball’
                        end
                       end


Wednesday, February 6, 13                                                            29
Reference
                 # github

                 http://github.com/elasticsearch/elasticsearch

                 http://github.com/karmi/tire/

                 # Slides

                 https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine

                 https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011

                 https://speakerdeck.com/clintongormley/to-infinity-and-beyond




Wednesday, February 6, 13                                                                        30
Thanks



Wednesday, February 6, 13            31

More Related Content

Viewers also liked

1 1 quality-principles
1 1 quality-principles1 1 quality-principles
1 1 quality-principlesGaida Basawab
 
Anteprima modulo 6
Anteprima modulo 6Anteprima modulo 6
Anteprima modulo 6eAgisco
 
7serie goingto-100329200728-phpapp01
7serie goingto-100329200728-phpapp017serie goingto-100329200728-phpapp01
7serie goingto-100329200728-phpapp01v3_ronik
 
"Algo-Proof Content Strategy"
"Algo-Proof Content Strategy""Algo-Proof Content Strategy"
"Algo-Proof Content Strategy"mruud
 
151014 教育評価論(三田)第3講
151014 教育評価論(三田)第3講151014 教育評価論(三田)第3講
151014 教育評価論(三田)第3講Koyo Yamamori
 
为什么选择游易帮
为什么选择游易帮为什么选择游易帮
为什么选择游易帮uehelper
 
150610 教育学特殊XIV(学級規模)第8講
150610 教育学特殊XIV(学級規模)第8講150610 教育学特殊XIV(学級規模)第8講
150610 教育学特殊XIV(学級規模)第8講Koyo Yamamori
 
Mike Walton Exec Search Resume
Mike Walton Exec Search ResumeMike Walton Exec Search Resume
Mike Walton Exec Search ResumeMike Walton
 
КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...
КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...
КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...State Enterprise Ukrecoresursy
 
Nahdet El Mahrousa's Incubator Information session
Nahdet El Mahrousa's Incubator Information session Nahdet El Mahrousa's Incubator Information session
Nahdet El Mahrousa's Incubator Information session Nahdet El Mahrousa
 
SOCIAL MEDIA CONTENT TRENDS
SOCIAL MEDIA CONTENT TRENDSSOCIAL MEDIA CONTENT TRENDS
SOCIAL MEDIA CONTENT TRENDSSofya Shishkina
 
Arcade carestia acquisition 2013
Arcade   carestia acquisition 2013Arcade   carestia acquisition 2013
Arcade carestia acquisition 2013Arcade Marketing
 
Bestavros Memorial Book
Bestavros Memorial BookBestavros Memorial Book
Bestavros Memorial Bookvmstj
 
ISUS Presentation v1403 en
ISUS Presentation v1403 enISUS Presentation v1403 en
ISUS Presentation v1403 enmichaelhandforth
 

Viewers also liked (18)

Epwr charts
Epwr chartsEpwr charts
Epwr charts
 
1 1 quality-principles
1 1 quality-principles1 1 quality-principles
1 1 quality-principles
 
Anteprima modulo 6
Anteprima modulo 6Anteprima modulo 6
Anteprima modulo 6
 
7serie goingto-100329200728-phpapp01
7serie goingto-100329200728-phpapp017serie goingto-100329200728-phpapp01
7serie goingto-100329200728-phpapp01
 
"Algo-Proof Content Strategy"
"Algo-Proof Content Strategy""Algo-Proof Content Strategy"
"Algo-Proof Content Strategy"
 
151014 教育評価論(三田)第3講
151014 教育評価論(三田)第3講151014 教育評価論(三田)第3講
151014 教育評価論(三田)第3講
 
Edema tx
Edema txEdema tx
Edema tx
 
Charts
ChartsCharts
Charts
 
为什么选择游易帮
为什么选择游易帮为什么选择游易帮
为什么选择游易帮
 
150610 教育学特殊XIV(学級規模)第8講
150610 教育学特殊XIV(学級規模)第8講150610 教育学特殊XIV(学級規模)第8講
150610 教育学特殊XIV(学級規模)第8講
 
Mike Walton Exec Search Resume
Mike Walton Exec Search ResumeMike Walton Exec Search Resume
Mike Walton Exec Search Resume
 
Set2010 b.i. cmt
Set2010  b.i. cmtSet2010  b.i. cmt
Set2010 b.i. cmt
 
КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...
КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...
КОНЦЕПЦІЯ ЦІЛЬОВОЇ ПРОГРАМИ ЗБИРАННЯ, ЗАГОТІВЛІ ТА УТИЛІЗАЦІЇ ВИКОРИСТАНОЇ УП...
 
Nahdet El Mahrousa's Incubator Information session
Nahdet El Mahrousa's Incubator Information session Nahdet El Mahrousa's Incubator Information session
Nahdet El Mahrousa's Incubator Information session
 
SOCIAL MEDIA CONTENT TRENDS
SOCIAL MEDIA CONTENT TRENDSSOCIAL MEDIA CONTENT TRENDS
SOCIAL MEDIA CONTENT TRENDS
 
Arcade carestia acquisition 2013
Arcade   carestia acquisition 2013Arcade   carestia acquisition 2013
Arcade carestia acquisition 2013
 
Bestavros Memorial Book
Bestavros Memorial BookBestavros Memorial Book
Bestavros Memorial Book
 
ISUS Presentation v1403 en
ISUS Presentation v1403 enISUS Presentation v1403 en
ISUS Presentation v1403 en
 

ElasticSearch with Tire

  • 1. ElasticSearch with Tire @AbookYun, Polydice Inc. Wednesday, February 6, 13 1
  • 2. It’s all about Search • How does search work? • ElasticSearch • Tire Wednesday, February 6, 13 2
  • 3. How does search work? A collection of articles • Article.find(1).to_json { title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” } • Article.find(2).to_json { title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object- oriented programming language.” } • Article.find(3).to_json { title: “Three”, content: “Ruby is a song by English rock band.” } Wednesday, February 6, 13 3
  • 4. How does search work? How do you search? Article.where(“content like ?”, “%ruby%”) Wednesday, February 6, 13 4
  • 5. How does search work? The inverted index T0 = “it is what it is” T1 = “what is it” T2 = “it is a banana” “a”: {2} “banana”: {2} “is”: {0, 1, 2} “it”: {0, 1, 2} “what”: {0, 1} A term search for the terms “what”, “is” and “it” {0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1} Wednesday, February 6, 13 5
  • 6. How does search work? The inverted index TOKEN ARTICLES ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 Wednesday, February 6, 13 6
  • 7. How does search work? The inverted index Article.search(“ruby”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 Wednesday, February 6, 13 7
  • 8. How does search work? The inverted index Article.search(“song”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 Wednesday, February 6, 13 8
  • 9. module SimpleSearch def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "n" end def analyze content # Split content by words into "tokens" content.split(/W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^d+/ || word == '' } end def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end def search token puts "Results for token '#{token}':" INDEX[token].each { |document| " * #{document}" } end INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there) extend self end Wednesday, February 6, 13 9
  • 10. How does search work? Indexing documents SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” Wednesday, February 6, 13 10
  • 11. How does search work? Indexing documents SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” Indexed document article1 with tokens: [“ruby”, “language”, “java”, “also”, “language”] Indexed document article2 with tokens: [“ruby”, “song”] Indexed document article3 with tokens: [“ruby”, “stone”] Indexed document article4 with tokens: [“java”, “language”] Wednesday, February 6, 13 11
  • 12. How does search work? Index print SimpleSearch::INDEX { “ruby” => [“article1”, “article2”, “article3”], “language” => [“article1”, “article4”], “java” => [“article1”, “article4”], “also” => [“article1”], “stone” => [“article3”], “song” => [“article2”] } Wednesday, February 6, 13 12
  • 13. How does search work? Search the index SimpleSearch.search “ruby” Results for token ‘ruby’: * article1 * article2 * article3 Wednesday, February 6, 13 13
  • 14. How does search work? Search is ... Inverted Index { “ruby”: [1,2,3], “language”: [1,4] } + Relevance Scoring • How many matching terms does this document contain? • How frequently does each term appear in all your documents? • ... other complicated algorithms. Wednesday, February 6, 13 14
  • 15. ElasticSearch ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene. http://github.com/elasticsearch/elasticsearch Wednesday, February 6, 13 15
  • 16. ElasticSearch Terminology Relational DB ElasticSearch Database Index Table Type Row Document Column Field Schema Mapping Index *Everything SQL query DSL Wednesday, February 6, 13 16
  • 17. ElasticSearch RESTful # Add document curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One” } # Delete document curl -XDELETE ‘http://localhost:9200/articles/article/1’ # Search curl -XGET ‘http://localhost:9200/articles/_search?q=One’ Wednesday, February 6, 13 17
  • 18. ElasticSearch JSON in / JSON out # Query curl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{ “query”: { “term”: { “title”: “One” } } }’ # Results { “_shards”: { “total”: 5, “success”: 5, “failed”: 0 }, “hits”: { “total”: 1, “hits”: [{ “_index”: “articles”, “_type”: “article”, “_id”: “1”, “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” } }] } Wednesday, February 6, 13 18
  • 19. ElasticSearch Distributed The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node. The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards. Automatic Discovery Protocol Node 1 Node 2 Node 3 Node 4 Master Wednesday, February 6, 13 19
  • 20. ElasticSearch Distributed by default, every Index will split into 5 shards and duplicated in 1 replicas. Index A A1 A2 A3 A4 A5 Shards A1’ A2’ A3’ A4’ A5’ Replicas Wednesday, February 6, 13 20
  • 21. ElasticSearch Query DSL Queries Filters - query_string - term - term - query - wildcard - range - boosting - bool - bool - and - filtered - or - fuzzy - not - range - limit - geo_shape - match_all - ... - ... Wednesday, February 6, 13 21
  • 22. ElasticSearch Query DSL Queries Filters - query_string - term - term - query - wildcard With Relevance - With Cache range - boosting Without Cache - bool Without Relevance - bool - and - filtered - or - fuzzy - not - range - limit - geo_shape - match_all - ... - ... Wednesday, February 6, 13 22
  • 23. ElasticSearch Facets curl -X DELETE "http://localhost:9200/articles" curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : ["foo"]}' curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}' curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}' curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' Wednesday, February 6, 13 23
  • 24. ElasticSearch Facets "facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total": 5, "other": 0, "terms" : [ { "term" : "foo", "count" : 2 }, { "term" : "bar", "count" : 2 }, { "term" : "baz", "count" : 1 }] } Wednesday, February 6, 13 24
  • 25. ElasticSearch Mapping curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d ' { "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": { "type": "string", "analyzer": "snowball" } } } }' curl -XGET 'http://localhost:9200/articles/article/_mapping' Wednesday, February 6, 13 25
  • 26. ElasticSearch Analyzer curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d ' { “article”: { “properties”: { “title”: { “type”: “string”, “analyzer”: “trigrams” } } } }’ curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’ C u p e r t i n o C u p u p e p e r . . . Wednesday, February 6, 13 26
  • 27. Tire A rich Ruby API and DSL for the ElasticSearch search engine. http://github.com/karmi/tire/ Wednesday, February 6, 13 27
  • 28. Tire ActiveRecord Integration # New rails application $ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb # Callback class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks end # Create a article Article.create :title => "I Love Elasticsearch", :content => "...", :author => "Captain Nemo", :published_on => Time.now # Search Article.search do query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { by :published_on, 'desc' } end Wednesday, February 6, 13 28
  • 29. Tire ActiveRecord Integration class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks # Setting settings :number_of_shards => 3, :number_of_replicas => 2, :analysis => { :analyzer => { :url_analyzer => { ‘tokenizer’ => ‘lowercase’, ‘filter’ => [‘stop’, ‘url_ngram’] } } } # Mapping mapping do indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => ‘snowball’ end end Wednesday, February 6, 13 29
  • 30. Reference # github http://github.com/elasticsearch/elasticsearch http://github.com/karmi/tire/ # Slides https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011 https://speakerdeck.com/clintongormley/to-infinity-and-beyond Wednesday, February 6, 13 30