Использование Elasticsearch для организации поиска по сайту
Upcoming SlideShare
Loading in...5
×
 

Использование Elasticsearch для организации поиска по сайту

on

  • 4,070 views

Дмитрий Жлобо, Ruby and Rails Developer в Twinslash ...

Дмитрий Жлобо, Ruby and Rails Developer в Twinslash

«Использование Elasticsearch для организации поиска по сайту»

Организация качественного поиска на сайте – сложная и нетривиальная задача. В своем докладе Дмитрий расскажет о том, как ее решить с помощью Elasticsearch.

Будет рассмотрено, как Elasticsearch работает с текстом или другими данными: от анализа и индексации документов до поиска и агрегации. По шагам и на примерах будет показано, как настроить поиск, учитывающий, например, морфологию и фонетику русского языка. Также Дмитрий расскажет, как все это использовать в приложениях на Ruby, как организовать добавление документов в индекс и др.

Statistics

Views

Total Views
4,070
Views on SlideShare
3,883
Embed Views
187

Actions

Likes
4
Downloads
17
Comments
1

5 Embeds 187

http://brug.by 176
http://startup.local 5
http://cloud.feedly.com 3
http://it-students.net 2
http://smashingreader.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • http://indexisto.com/ in case you consider hosted solution for site search this one provides a lot of features, including facets and sortings.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Использование Elasticsearch для организации поиска по сайту Использование Elasticsearch для организации поиска по сайту Presentation Transcript

  • elasticsearch
  • me Developer at Twinslash email: dima.zhlobo@gmail.com skype: dima.zhlobo github: proghat twitter: @proghat Dmitry Zhlobo
  • search is hard ● speed vs. relevancy
  • search is hard ● speed vs. relevancy ● real time
  • search is hard ● speed vs. relevancy ● real time ● different kinds of data
  • usual approach ● SELECT * FROM posts WHERE `body` LIKE '%query%'
  • usual approach ● SELECT * FROM posts WHERE `body` LIKE '%query%' ● gem 'thinking-sphinx' … Article.search(params[:q])
  • usual approach ● SELECT * FROM posts WHERE `body` LIKE '%query%' ● gem 'thinking-sphinx' … Article.search(params[:q])
  • how search works? ● document 1: flexible and powerful open source, distributed real- time search and analytics engine for the cloud... ● document 2: Apache Mahout has implementations of a wide range of machine learning and data mining... ● document 3: Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the MapReduce...
  • how search works? data mapreduce learning classification recommenders analysis
  • how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3
  • how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3
  • how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3
  • how search works? data mapreduce learning classification recommenders analysis 1 2 3 1 2 2 3 2 3 2 3
  • elasticsearch
  • ● full text search ● real time data ● opensource ● restful api ● distributed ● schema free & document oriented elasticsearch
  • analysis Flexible and powerful <strong>search</strong> engine
  • analysis Flexible and powerful <strong>search</strong> engine char filters Mapping HTML Strip Pattern Replace
  • analysis char filters Flexible and powerful search engine
  • analysis char filters tokenizer Flexible and powerful search engine Path Hierarchy Keyword Letter Lowercase NGram Standard Whitespace Pattern Edge NGram
  • analysis char filters tokenizer Flexible and powerful search engine
  • analysis char filters tokenizer Flexible and powerful search engine token filters Stop Lowercase Snowball Synonym TrimUnique Normalization Stemmer Shingle Truncate Reverse
  • analysis tokenizer token filters char filters flexible powerful search engine
  • russian morphology “Он отлично рассказал о лучшем поисковом движке”
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший поисковый
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движокпоисковый
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движокпоисковый
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движок “хороший поисковой движок” хороший движок поисковый поисковый
  • russian morphology “Он отлично рассказал о лучшем поисковом движке” отлично отличный рассказать хороший движок “хороший поисковой движок” хороший поисковый движокпоисковый
  • phonetic analysis
  • phonetic analysis Eyjafjallajökull
  • phonetic analysis Eyjafjallajökull Eiyafyalayokul iofiolDkul
  • analysis char filters tokenizer token filters
  • analysis char filters tokenizer token filters analyzer
  • analysis char filters tokenizer token filters analyzer ● has name
  • analysis char filters tokenizer token filters analyzer ● has name ● reusable
  • analysis char filters tokenizer token filters analysis: analyzer: rus_morphology: type: "custom" char_filter: ["html_strip"] tokenizer: "standard" filter: ["lowercase", "russian_morphology", "stopwords"]
  • getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'
  • getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'
  • getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'
  • getting started # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: "Search" }'
  • # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: “Search" }' curl -XPOST "localhost:9200/posts/post/whatever" -d '{ title: "ES" }' getting started
  • # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: “Search" }' curl -XPOST "localhost:9200/posts/post/whatever" -d '{ title: "ES" }' # Search documents curl -XGET "localhost:9200/posts/_search?q=data" curl -XGET "localhost:9200/posts/_search?q=title:elasticsearch" getting started
  • # Add document to index curl -XPOST "localhost:9200/posts/post/1" -d '{ title: “Search" }' curl -XPOST "localhost:9200/posts/post/whatever" -d '{ title: "ES" }' # Search documents curl -XGET "localhost:9200/posts/_search?q=data" curl -XGET "localhost:9200/posts/_search?q=title:elasticsearch" # Update and delete documents curl -XPUT "localhost:9200/posts/post/1" -d '{ title: “Data" }' curl -XDELETE "localhost:9200/posts/post/whatever" getting started
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } repository: { type: "string" }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } repository: { type: "string", boost: 5 }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } repository: { type: "string", boost: 5, analyzer: "repo_name" } repo_name: { tokenizer: "letter", filter: ["lowercase","phonetic"] }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } description: { type: "string" }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } description: { type: "string", analyzer: "english_text" } english_text: { tokenizer: "standard", filter: ["lowercase", "stopwords", "snowball"] }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { } }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { name: { type: "string" } } }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { name: { type: "string", analyzer: "phonetic" } } } phonetic: { tokenizer: "standard", filter: ["lowercase", "stopwords", "beidermorse"] }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { email: { type: "string" } } }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } maintainer: { properties: { email: { type: "string", index: "not_analyzed" } } }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } language: { type: "string" }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } language: { type: "string", analyzer: "programming_lang" } programming_lang: { tokenizer: "keyword", filter: ["lowercase"] }
  • mapping { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" } created_at: { type: "date", format: "YYYY-MM-DD" }
  • mapping curl -XPOST "localhost:9200/repositories" -d ' settings: { analysis: { analyzer: { ... }, filter: { ... } } }, mappings: { repository: { properties: { ... } } }'
  • mapping curl -XPOST "localhost:9200/repositories" -d '...' curl -XPOST "localhost:9200/repositories/repository" -d ' { repository: "elasticsearch/elasticsearch", description: "Open Source, Distributed, RESTful Search Engine", maintainer: { name: "Shay Banon", email: "kimchy@gmail.com" }, languages: ["Java", "Shell"], created_at: "2010-02-08" }'
  • search curl -XGET "localhost:9200/repositories/repository/_search?q=engine"
  • search curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { match: { description: "search" } } }'
  • search "hits" : { "total" : 3, "hits" : [ { "_score" : 0.22295055, "_source" : { repository: "elasticsearch/elasticsearch" } }, { "_score" : 0.22295055, "_source" : { repository: "ankane/searchkick" } }, { "_score" : 0.095891505, "_source" : { repository: "karmi/tire" } } ] }
  • search curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { match: { _all: "elasticsearch" } } }'
  • search "hits" : { "total" : 2, "hits" : [ { "_score" : 5.46875, "_source" : { repository: "elasticsearch/elasticsearch" } }, { "_score" : 0.04746387, "_source" : { repository: "karmi/tire" } } ] }
  • facets curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { match: { _all: "search" } }, facets: { language: { terms: { field: "languages" } } } }'
  • facets "hits" : { "total" : 2, "hits" : [ ... ] }, "facets" : { "language" : { "terms" : [ { "term" : "ruby", "count" : 2 }, { "term" : "shell", "count" : 1 }, { "term" : "java", "count" : 1 } ] } }
  • filters curl -XPOST "localhost:9200/repositories/repository/_search" -d ' { query: { filtered: { query: { match: { _all: "search" } }, filter: { term: { "languages": "java" } } } } }'
  • filters "hits" : { "total" : 1, "hits" : [ { "_score" : 5.46875, "_source" : { repository: "elasticsearch/elasticsearch" } } ] }
  • performance and scaling
  • performance and scaling elasticsearch is web scale
  • random facts ● bulk operations ● real time ● highlights ● geo types and geo distance facets ● attachments ● “did you mean?” and completions ● common terms ● filters and caching ● river
  • You know. For search.