Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

08. ElasticSearch : Sorting and Relevance

609 views

Published on

08. ElasticSearch : Sorting and Relevance

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

08. ElasticSearch : Sorting and Relevance

  1. 1. ElasticSearch Sorting and Relevance http://elastic.openthinklabs.com/
  2. 2. Sorting GET /_search { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 } } } } } GET /_search { "query" : { "constant_score" : { "filter" : { "term" : { "user_id" : 1 } } } } }
  3. 3. Sorting Sorting by Field Values GET /_search { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 }} } }, "sort": { "date": { "order": "desc" }} } "hits" : { "total" : 6, "max_score" : null, "hits" : [ { "_index" : "us", "_type" : "tweet", "_id" : "14", "_score" : null, "_source" : { "date": "2014-09-24", ... }, "sort" : [ 1411516800000 ] }, ... }
  4. 4. Sorting Multilevel Sorting GET /_search { "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ] }
  5. 5. Sorting Multilevel Sorting GET /_search { "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ] }
  6. 6. Sorting Sorting on Multivalue Fields "sort": { "dates": { "order": "asc", "mode": "min" } }
  7. 7. String Sorting and Multifields "tweet": { "type": "string", "analyzer": "english" } "tweet": { "type": "string", "analyzer": "english", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } GET /_search { "query": { "match": { "tweet": "elasticsearch" } }, "sort": "tweet.raw" }
  8. 8. What Is Relevance? ● The standard similarity algorithm used in Elasticsearch : ● Term frequency : How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention. ● Inverse document frequency : How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms. ● Field-length norm : How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field
  9. 9. What Is Relevance? Understanding the Score GET /_search?explain { "query" : { "match" : { "tweet" : "honeymoon" }} }
  10. 10. What Is Relevance? Understanding Why a Document Matched GET /us/tweet/12/_explain { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 2 }}, "must" : { "match" : { "tweet" : "honeymoon" }} } } } "failure to match filter: cache(user_id:[2 TO 2])"
  11. 11. Doc Values Intro ● Doc values are used in several places in Elasticsearch: ● Sorting on a field ● Aggregations on a field ● Certain filters (for example, geolocation filters) ● Scripts that refer to fields
  12. 12. Referensi ● ElasticSearch, The Definitive Guide, A Distrib uted Real-Time Search and Analytics Engine, Cl inton Gormely & Zachary Tong, O’Reilly

×