What Is Relevance?
● The standard similarity algorithm used in Elasticsearch :
● Term frequency : How often does the term appear in the
field? The more often, the more relevant. A field containing
five mentions of the same term is more likely to be relevant
than a field containing just one mention.
● Inverse document frequency : How often does each term
appear in the index? The more often, the less relevant.
Terms that appear in many documents have a lower weight
than more-uncommon terms.
● Field-length norm : How long is the field? The longer it is,
the less likely it is that words in the field will be relevant. A
term appearing in a short title field carries more weight than
the same term appearing in a long content field
What Is Relevance?
Understanding Why a Document Matched
GET /us/tweet/12/_explain
{
"query" : {
"bool" : {
"filter" : { "term" : { "user_id" : 2 }},
"must" : { "match" : { "tweet" : "honeymoon" }}
}
}
}
"failure to match filter: cache(user_id:[2 TO 2])"
Doc Values Intro
● Doc values are used in several places in
Elasticsearch:
● Sorting on a field
● Aggregations on a field
● Certain filters (for example, geolocation filters)
● Scripts that refer to fields
Referensi
● ElasticSearch, The Definitive Guide, A Distrib
uted Real-Time Search and Analytics Engine, Cl
inton Gormely & Zachary Tong, O’Reilly