0
ElasticSearch with Tire                            @AbookYun, Polydice Inc.Wednesday, February 6, 13                      ...
It’s all about Search                    • How does search work?                    • ElasticSearch                    • T...
How does search work?                            A collection of articles                    • Article.find(1).to_json     ...
How does search work?                            How do you search?                       Article.where(“content like ?”, ...
How does search work?                            The inverted index                T0 = “it is what it is”                ...
How does search work?                                 The inverted index                            TOKEN                 ...
How does search work?                                 The inverted index                                          Article....
How does search work?                                 The inverted index                                          Article....
module SimpleSearch           	  def index document, content           	  	    tokens = analyze content           	  	    ...
How does search work?                            Indexing documents                SimpleSearch.index “article1”, “Ruby is...
How does search work?                            Indexing documents                SimpleSearch.index “article1”, “Ruby is...
How does search work?                                                   Index                print SimpleSearch::INDEX    ...
How does search work?                              Search the index                SimpleSearch.search “ruby”             ...
How does search work?                                        Search is ...                                           Inver...
ElasticSearch                       ElasticSearch is an Open Source (Apache 2),                       Distributed, RESTful...
ElasticSearch                                    Terminology                            Relational DB   ElasticSearch     ...
ElasticSearch                                                RESTful                       # Add document                 ...
ElasticSearch                                 JSON in / JSON out                       # Query                       curl ...
ElasticSearch                                          Distributed                       The discovery module is responsib...
ElasticSearch                                          Distributed                       by default, every Index will spli...
ElasticSearch                                       Query DSL                    Queries                 Filters          ...
ElasticSearch                                Query DSL                    Queries            Filters                      ...
ElasticSearch                                                           Facets                 curl -X DELETE "http://loca...
ElasticSearch                                          Facets                 "facets" : {                   "tags" : {   ...
ElasticSearch                                                    Mapping                 curl -XPUT http://localhost:9200/...
ElasticSearch                                                      Analyzer                 curl -XPUT http://localhost:92...
Tire                       A rich Ruby API and DSL for the                       ElasticSearch search engine.             ...
Tire                       ActiveRecord Integration                       # New rails application                       $ ...
Tire                       ActiveRecord Integration                       class Article < ActiveRecord::Base              ...
Reference                 # github                 http://github.com/elasticsearch/elasticsearch                 http://gi...
ThanksWednesday, February 6, 13            31
Upcoming SlideShare
Loading in...5
×

ElasticSearch with Tire

1,413

Published on

Introduction to how does a search engine do with elasticsearch and tire.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,413
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "ElasticSearch with Tire"

  1. 1. ElasticSearch with Tire @AbookYun, Polydice Inc.Wednesday, February 6, 13 1
  2. 2. It’s all about Search • How does search work? • ElasticSearch • TireWednesday, February 6, 13 2
  3. 3. How does search work? A collection of articles • Article.find(1).to_json { title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” } • Article.find(2).to_json { title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object- oriented programming language.” } • Article.find(3).to_json { title: “Three”, content: “Ruby is a song by English rock band.” }Wednesday, February 6, 13 3
  4. 4. How does search work? How do you search? Article.where(“content like ?”, “%ruby%”)Wednesday, February 6, 13 4
  5. 5. How does search work? The inverted index T0 = “it is what it is” T1 = “what is it” T2 = “it is a banana” “a”: {2} “banana”: {2} “is”: {0, 1, 2} “it”: {0, 1, 2} “what”: {0, 1} A term search for the terms “what”, “is” and “it” {0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1}Wednesday, February 6, 13 5
  6. 6. How does search work? The inverted index TOKEN ARTICLES ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3Wednesday, February 6, 13 6
  7. 7. How does search work? The inverted index Article.search(“ruby”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3Wednesday, February 6, 13 7
  8. 8. How does search work? The inverted index Article.search(“song”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3Wednesday, February 6, 13 8
  9. 9. module SimpleSearch def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "n" end def analyze content # Split content by words into "tokens" content.split(/W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^d+/ || word == } end def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end def search token puts "Results for token #{token}:" INDEX[token].each { |document| " * #{document}" } end INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there) extend self endWednesday, February 6, 13 9
  10. 10. How does search work? Indexing documents SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.”Wednesday, February 6, 13 10
  11. 11. How does search work? Indexing documents SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” Indexed document article1 with tokens: [“ruby”, “language”, “java”, “also”, “language”] Indexed document article2 with tokens: [“ruby”, “song”] Indexed document article3 with tokens: [“ruby”, “stone”] Indexed document article4 with tokens: [“java”, “language”]Wednesday, February 6, 13 11
  12. 12. How does search work? Index print SimpleSearch::INDEX { “ruby” => [“article1”, “article2”, “article3”], “language” => [“article1”, “article4”], “java” => [“article1”, “article4”], “also” => [“article1”], “stone” => [“article3”], “song” => [“article2”] }Wednesday, February 6, 13 12
  13. 13. How does search work? Search the index SimpleSearch.search “ruby” Results for token ‘ruby’: * article1 * article2 * article3Wednesday, February 6, 13 13
  14. 14. How does search work? Search is ... Inverted Index { “ruby”: [1,2,3], “language”: [1,4] } + Relevance Scoring • How many matching terms does this document contain? • How frequently does each term appear in all your documents? • ... other complicated algorithms.Wednesday, February 6, 13 14
  15. 15. ElasticSearch ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene. http://github.com/elasticsearch/elasticsearchWednesday, February 6, 13 15
  16. 16. ElasticSearch Terminology Relational DB ElasticSearch Database Index Table Type Row Document Column Field Schema Mapping Index *Everything SQL query DSLWednesday, February 6, 13 16
  17. 17. ElasticSearch RESTful # Add document curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One” } # Delete document curl -XDELETE ‘http://localhost:9200/articles/article/1’ # Search curl -XGET ‘http://localhost:9200/articles/_search?q=One’Wednesday, February 6, 13 17
  18. 18. ElasticSearch JSON in / JSON out # Query curl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{ “query”: { “term”: { “title”: “One” } } }’ # Results { “_shards”: { “total”: 5, “success”: 5, “failed”: 0 }, “hits”: { “total”: 1, “hits”: [{ “_index”: “articles”, “_type”: “article”, “_id”: “1”, “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” } }] }Wednesday, February 6, 13 18
  19. 19. ElasticSearch Distributed The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node. The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards. Automatic Discovery Protocol Node 1 Node 2 Node 3 Node 4 MasterWednesday, February 6, 13 19
  20. 20. ElasticSearch Distributed by default, every Index will split into 5 shards and duplicated in 1 replicas. Index A A1 A2 A3 A4 A5 Shards A1’ A2’ A3’ A4’ A5’ ReplicasWednesday, February 6, 13 20
  21. 21. ElasticSearch Query DSL Queries Filters - query_string - term - term - query - wildcard - range - boosting - bool - bool - and - filtered - or - fuzzy - not - range - limit - geo_shape - match_all - ... - ...Wednesday, February 6, 13 21
  22. 22. ElasticSearch Query DSL Queries Filters - query_string - term - term - query - wildcard With Relevance - With Cache range - boosting Without Cache - bool Without Relevance - bool - and - filtered - or - fuzzy - not - range - limit - geo_shape - match_all - ... - ...Wednesday, February 6, 13 22
  23. 23. ElasticSearch Facets curl -X DELETE "http://localhost:9200/articles" curl -X POST "http://localhost:9200/articles/article" -d {"title" : "One", "tags" : ["foo"]} curl -X POST "http://localhost:9200/articles/article" -d {"title" : "Two", "tags" : ["foo", "bar"]} curl -X POST "http://localhost:9200/articles/article" -d {"title" : "Three", "tags" : ["foo", "bar", "baz"]} curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }Wednesday, February 6, 13 23
  24. 24. ElasticSearch Facets "facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total": 5, "other": 0, "terms" : [ { "term" : "foo", "count" : 2 }, { "term" : "bar", "count" : 2 }, { "term" : "baz", "count" : 1 }] }Wednesday, February 6, 13 24
  25. 25. ElasticSearch Mapping curl -XPUT http://localhost:9200/articles/article/_mapping -d { "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": { "type": "string", "analyzer": "snowball" } } } } curl -XGET http://localhost:9200/articles/article/_mappingWednesday, February 6, 13 25
  26. 26. ElasticSearch Analyzer curl -XPUT http://localhost:9200/articles/article/_mapping -d { “article”: { “properties”: { “title”: { “type”: “string”, “analyzer”: “trigrams” } } } }’ curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’ C u p e r t i n o C u p u p e p e r . . .Wednesday, February 6, 13 26
  27. 27. Tire A rich Ruby API and DSL for the ElasticSearch search engine. http://github.com/karmi/tire/Wednesday, February 6, 13 27
  28. 28. Tire ActiveRecord Integration # New rails application $ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb # Callback class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks end # Create a article Article.create :title => "I Love Elasticsearch", :content => "...", :author => "Captain Nemo", :published_on => Time.now # Search Article.search do query { string love } facet(timeline) { date :published_on, :interval => month } sort { by :published_on, desc } endWednesday, February 6, 13 28
  29. 29. Tire ActiveRecord Integration class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks # Setting settings :number_of_shards => 3, :number_of_replicas => 2, :analysis => { :analyzer => { :url_analyzer => { ‘tokenizer’ => ‘lowercase’, ‘filter’ => [‘stop’, ‘url_ngram’] } } } # Mapping mapping do indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => ‘snowball’ end endWednesday, February 6, 13 29
  30. 30. Reference # github http://github.com/elasticsearch/elasticsearch http://github.com/karmi/tire/ # Slides https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011 https://speakerdeck.com/clintongormley/to-infinity-and-beyondWednesday, February 6, 13 30
  31. 31. ThanksWednesday, February 6, 13 31
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×