Explain explained 05/11/14 11:56 
Summary 
1: Getting the right results 
Jettro Coenradie 
This presentation is about getting the right results from elasticsearch. There 
are a lot of things that you can do to improve the results you get back from 
elasticsearch. You will get an introduction into different kind of queries that 
you can use, the impact of analysers on results and we take a deep dive into 
the explain functionality. Using the explain functionality you can find out why 
one document is matching better than another. 
http://localhost:9200/_plugin/preso/#/print Page 1 of 42
Explain explained 05/11/14 11:56 
Returning the right results 
@jettroCoenradie 
http://localhost:9200/_plugin/preso/#/print Page 2 of 42
Explain explained 05/11/14 11:56 
2: About me 
how to contact me 
My name is Jettro Coenradie, I am the follow of Luminis Amsterdam. My 
specialty is search solutions and specifically elasticsearch. You can follow 
me on twitter, linkedin and my code is on github. 
email jettro.coenradie@luminis.eu 
twitter @jettroCoenradie 
linkedin https://www.linkedin.com/in/jettro 
Github https://github.com/jettro 
Blog http://www.gridshore.nl 
3: The right results 
what are they? 
How would you define the right results. Of course a lot depends on the 
context of the asked question. Like always you need to find the term on 
wikipedia to get your first explanation. 
How would you explain The 
right results? 
4: The right results 
according to wikipedia 
Every presentation has to start with wikipedia. To bad there is no page for 
the right results, but there is an interesting link to be found. This shows an 
excerpt from the toyota way. The right process will produce the right results. 
http://localhost:9200/_plugin/preso/#/print Page 3 of 42
Explain explained 05/11/14 11:56 
This is also true for returning the right results using elasticsearch. During this 
presentation the right process will become clear. 
5: What is elasticsearch? 
more than search 
Before we can start explaining why elasticsearch returns the results that it 
does, you first need to know more about what elasticsearch is, what it can 
do for you and some terminology used though out the remainder of the 
presentation. You will learn about structured and unstructured data, data 
sources and how we use the data. 
http://localhost:9200/_plugin/preso/#/print Page 4 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 5 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 6 of 42
Explain explained 05/11/14 11:56 
6: Lucene 
what we need it for 
http://localhost:9200/_plugin/preso/#/print Page 7 of 42
Explain explained 05/11/14 11:56 
Introduce lucene, explain we use analyzers to create terms, the terms are 
stored in an inverted index and the inverted index is used to search the 
terms. 
Create terms, 
Store terms, 
Search terms. 
7: Elasticsearch and lucene 
cluster, index, shards, lucene 
In here I want to explain the different components of an elasticsearch cluster. 
I am showing images containing the structure of these components. A 
cluster contains multiple nodes. Each nodes contains shards of multiple 
indices. Each shard is a lucene index. 
http://localhost:9200/_plugin/preso/#/print Page 8 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 9 of 42
Explain explained 05/11/14 11:56 
8: Executing a query 
calling all shards 
In this slide I am explaining what happens when you execute a query. You 
will learn that we first execute a query that is send to all shards by the client. 
The results are gathered and merged and if the right set of documents is 
created the actual required documents are fetched. 
http://localhost:9200/_plugin/preso/#/print Page 10 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 11 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 12 of 42
Explain explained 05/11/14 11:56 
9: Executing a query 
basic concepts 
http://localhost:9200/_plugin/preso/#/print Page 13 of 42
Explain explained 05/11/14 11:56 
This slide shows the both the apis that elasticsearch provideds. You can 
execute queries using the java api or the rest api through one of the available 
drivers. No matter what mechanism you choose you can use a lot of 
different queries. 
http://localhost:9200/_plugin/preso/#/print Page 14 of 42
Explain explained 05/11/14 11:56 
10: Example with curl 
find all docs 
In this slide we present you the most basic match all docs query using curl. 
http://localhost:9200/_plugin/preso/#/print Page 15 of 42
Explain explained 05/11/14 11:56 
11: Other query tools 
there are a lot 
Some examples of other query tools that are available. 
http://localhost:9200/_plugin/preso/#/print Page 16 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 17 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 18 of 42
Explain explained 05/11/14 11:56 
12: Execute query 
basic match query 
This is the most basic variant of executing a query. 
GET /slides/_search 
{ 
"query": { 
"match": { 
"description": "What you type!" 
} 
} 
} 
Results only in life presentation 
13: The calculated score 
use the explain api 
Here we are going to discuss the most basic explain you can get. 
GET /slides/_search?explain 
http://localhost:9200/_plugin/preso/#/print Page 19 of 42
Explain explained 05/11/14 11:56 
{ 
"query": { 
"match": { 
"description": "What you type!" 
} 
} 
} 
Results only in life presentation 
14: Explain query explained 
the basics 
In this slide I am going to show details about the explain basics. This is 
important to notice the pattern that all explain queries will have for every 
term that is matched. 
http://localhost:9200/_plugin/preso/#/print Page 20 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 21 of 42
Explain explained 05/11/14 11:56 
http://localhost:9200/_plugin/preso/#/print Page 22 of 42
Explain explained 05/11/14 11:56 
15: Calculating score 
the theory 
In this slide we are going to explain the theory behind creating score using 
simulariry algorithms. 
Score is calculated for matching documents (Boolean Model), 
Score represents similarity between search and document terms, 
Lucene uses enhanced TF/IDF (coordination factor and field length), 
Other algorithms can be used: Okapi BM25 
16: Lucene similarity 
formula 
Shows the formula used by lucene to calculate the score. 
http://localhost:9200/_plugin/preso/#/print Page 23 of 42
Explain explained 05/11/14 11:56 
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/17: Calculating score 
the terms 
This slide gives an overview of the most important definitions for calculating 
the score. 
queryNorm Attempt to make different queries comparable. 
coord Factor for total score based on amount of queried 
and found terms 
Term frequency Amount of times a term is matched in the field 
Inverse document 
amount of documents that have the term 
frequency 
fieldNorm Length of the field the terms was found in 
boost Boost a field score 
18: An explain example 
using match query 
In this slide we are going to use a very simple match query with an index 
containing only three documents. The goal is to show the effect on term 
frequency, inverse document frequency and the fieldnorm with very little 
documents in the index. 
Show tf/idf/fieldNorm and score 
Doc 1 Doc 2 Doc 3 
http://localhost:9200/_plugin/preso/#/print Page 24 of 42
Explain explained 05/11/14 11:56 
one two three two three three 
one 1 / 1 / 0.5 
0.702 
two 1 / 2 / 0.5 1 / 2 / 0.625 
0.5 0.625 
three 1 / 3 / 0.5 1 / 3 / 0.625 1 / 3 / 1 
0.356 0.445 0.712 
19: Explain multiple terms 
with a trick 
Here we are going to shows what happens to the results when using capital 
letters, multipe terms and introduce the camel case analyzer. 
GET /onetwothree/_search?explain 
Results only in life presentation 
20: What is an analyzer 
the parts 
Explain what the different components of an analyzer are. 
Character filters Tidy up the string before tokenising. 
Tokeniser Splits the string into a number of tokens 
Token filters Do something with the tokens 
21: One Two Three Analyzer 
settings 
http://localhost:9200/_plugin/preso/#/print Page 25 of 42
Explain explained 05/11/14 11:56 
Show the settings part of the analyzer as used in the onetwothree sample 
with the camel case. 
GET /onetwothree/_settings 
Results only in life presentation 
22: One Two Three Analyzer 
mappings 
Show the mappings part of the analyzer as used in the onetwothree sample 
with the camel case. 
GET /onetwothree/_mappings 
Results only in life presentation 
23: One Two Three Analyzer 
analyze api 
Test the analyzer using the analyze api. 
GET /onetwothree/_analyze?analyzer=camel&text=OneTwoThree 
Results only in life presentation 
24: Back to explain 
recap single term 
In this slide we get back to the explain api, from the image with the real 
explain output we introduce a short notation. 
GET /slides/_search?explain 
http://localhost:9200/_plugin/preso/#/print Page 26 of 42
Explain explained 05/11/14 11:56 
{ 
"query": { 
"match": { 
"description": "basic" 
} 
} 
} 
structure of the score calculation 
description:basic 
[*] 
tf / idf / fieldNorm 
25: Validate query 
http://localhost:9200/_plugin/preso/#/print Page 27 of 42
Explain explained 05/11/14 11:56 
using validate api 
In this slide we are going to demonstrate the validate api for a query using 
multiple terms. 
POST /slides/_validate/query?explain 
26: Validate query 
using validate api 
In this slide we are going to demonstrate the validate api for a query using 
multiple terms using the and operator. 
POST /slides/_validate/query?explain 
27: Bool query 
the base for all queries 
Introduce the bool query as the base query to all other queries. In the end all 
queries can be written as a bool query. Explain the difference between 
operator AND/OR. 
{ 
"query": { 
"bool": { 
"must": [ 
{} 
], 
"must_not": [ 
{} 
], 
"should": [ 
{} 
] 
} 
} 
} 
28: Boolean model 
http://localhost:9200/_plugin/preso/#/print Page 28 of 42
Explain explained 05/11/14 11:56 
must, must_not and should 
In this slide we are going to explain the transformation of all queries into a 
bool query. 
{ 
"query": { 
"match": { 
"description": "basic search elasticsearch" 
} 
} 
} 
{ 
"query": { 
"bool": { 
"should": [ 
{ 
"term": { 
"description": { 
"value": "basic" 
} 
} 
}, 
{ 
"term": { 
"description": { 
"value": "search" 
} 
} 
}, 
{ 
"term": { 
"description": { 
"value": "elasticsearch" 
} 
} 
} 
] 
} 
} 
} 
http://localhost:9200/_plugin/preso/#/print Page 29 of 42
Explain explained 05/11/14 11:56 
29: Explain 2 terms 
match with 2 
Now we are going to show the short notation for the explaination of a query 
with two terms due to a standard analyzer. 
GET /slides/_search?explain 
{ 
"query": { 
"match": { 
"description": "basic search" 
} 
} 
} 
structure of the score calculation 
[*] 
[+] 
description:basic 
coord (1/2) 
30: Explain 3 terms 
match with 3 
Now we are going to show the short notation for the explaination of a query 
with three terms due to a standard analyzer. 
GET /slides/_search?explain 
http://localhost:9200/_plugin/preso/#/print Page 30 of 42
Explain explained 05/11/14 11:56 
{ 
"query": { 
"match": { 
"description": "basic search elasticsearch" 
} 
} 
} 
structure of the score calculation 
[*] 
[+] 
description:basic 
description:elasticsearch 
coord (2/3) 
31: Validate query 
using validate api 
In this slide we are going to demonstrate the validate api for a query using 
multiple terms and multiple fields with the default best_fields type query. 
POST /slides/_validate/query?explain 
32: Validate query 
using validate api 
In this slide we are going to demonstrate the validate api for a query using 
multiple terms and multiple fields with the most_fields type query. 
POST /slides/_validate/query?explain 
33: Validate query 
http://localhost:9200/_plugin/preso/#/print Page 31 of 42
Explain explained 05/11/14 11:56 
using validate api 
In this slide we are going to demonstrate the validate api for a query using 
multiple terms and multiple fields with the cross_fields type query. 
POST /slides/_validate/query?explain 
34: Explain multi_field 
best_fields 
Show the effect of a multi_field query using the default best_fields type. 
GET /slides/_search?explain 
{ 
"query": { 
"multi_match": { 
"query": "basic query", 
"fields": [ 
"title", 
"description" 
], 
"type": "best_fields" 
} 
} 
} 
structure of the score calculation 
[max_of] 
[+] 
description:basic 
description:query 
[*] 
[+] 
title:query 
http://localhost:9200/_plugin/preso/#/print Page 32 of 42
Explain explained 05/11/14 11:56 
coord (1/2) 
35: Explain multi_field 
most_fields 
Show the effect of a multi_field query using the most_fields type. 
GET /slides/_search?explain 
{ 
"query": { 
"multi_match": { 
"query": "basic query", 
"fields": [ 
"title", 
"description" 
], 
"type": "most_fields" 
} 
} 
} 
structure of the score calculation 
[sum_of] 
[+] 
description:basic 
description:query 
[*] 
[+] 
title:query 
coord (1/2) 
36: Explain multi_field 
http://localhost:9200/_plugin/preso/#/print Page 33 of 42
Explain explained 05/11/14 11:56 
cross_fields 
Show the effect of a multi_field query using the cross_fields type. 
GET /slides/_search?explain 
{ 
"query": { 
"multi_match": { 
"query": "basic query", 
"fields": [ 
"title", 
"description" 
], 
"type": "cross_fields" 
} 
} 
} 
structure of the score calculation 
[sum_of] 
[max_of] 
description:basic 
[max_of] 
description:query 
title:query 
37: Explain dis_max query 
use tie breaker 
Show the effect of a dis_max query which is a balance between cross_fields 
and best matching field. 
GET /slides/_search?explain 
http://localhost:9200/_plugin/preso/#/print Page 34 of 42
Explain explained 05/11/14 11:56 
{ 
"query": { 
"dis_max": { 
"tie_breaker": 0.7, 
"boost": 1.2, 
"queries": [ 
{ 
"match": { 
"description": "basic query" 
} 
}, 
{ 
"match": { 
"title": "basic query" 
} 
} 
] 
} 
} 
} 
structure of the score calculation 
[max_of + 0.7 [*] others] 
[+] 
description:basic 
description:query 
[*] 
[+] 
title:query 
coord (1/2) 
38: Multile terms and fields 
summary 
http://localhost:9200/_plugin/preso/#/print Page 35 of 42
Explain explained 05/11/14 11:56 
Explain the different options we have for multi field queries and explain the 
differences when calculating the score. 
Best field returns the field with the highest score, 
Most fields adds the scores for the different fields, 
Cross fields treets all field as one big field and add maximum score for 
term, 
Dis max takes the best field and adds a part of the score of other fields 
39: Boosting 
the basics 
In match queries you can apply a boost to a certain field. Important to notice 
is that the structure of the output of explain is not changing using this kind of 
boost. It is only the score that changes, the boost is reflected within the 
query norm of the explain. 
{ 
"query": { 
"multi_match": { 
"query": "basic query", 
"fields": [ 
"title^5", 
"description" 
] 
} 
} 
} 
No boost Title boost 
_score 0.729 0.312 
description:basic 0.533 0.107 
description:query 0.195 0.0391 
description query norm 0.197 0.0394 
title:query 0.624 0.624 
http://localhost:9200/_plugin/preso/#/print Page 36 of 42
Explain explained 05/11/14 11:56 
coord (1/2) 0.5 0.5 
40: Boosting query 
match with negative impact 
The most basic boosting, is boosting on a field basis. Sometimes you have 
other boosting requirements. One thing could be to give a negative boost to 
some term. Of course you can use the must_not in a bool query but this is 
different. In that situation you do not have a match, but we want a match just 
with a lower score if a certain term is available. Here we show that the 
negative term query adds no score but does give a penalty to the complete 
score. 
{ 
"query": { 
"boosting": { 
"positive": { 
"term": { 
"description": { 
"value": "basic" 
} 
} 
}, 
"negative": { 
"term": { 
"description": { 
"value": "query" 
} 
} 
}, 
"negative_boost": 0.2 
} 
} 
} 
http://localhost:9200/_plugin/preso/#/print Page 37 of 42
Explain explained 05/11/14 11:56 
41: Sorting results 
by score and ... 
In this slide I want to discuss the options you have for sorting results. 
Sort by score (the default), 
Sort by date, 
Sort by analyzed fields, 
GET /slides/_search 
http://localhost:9200/_plugin/preso/#/print Page 38 of 42
Explain explained 05/11/14 11:56 
{ 
"query": { 
"match": { 
"description": "What you type!" 
} 
}, 
"sort": [ 
{ 
"title.raw": { 
"order": "asc" 
} 
} 
] 
} 
Results only in life presentation 
42: Fuzzy query 
taking care of typos 
In here we are going to demonstrate the effect of fuzzy searching on the 
score. We are going to use the term basik which is wrong for all slides 
except this slide. Show what happens with the boost factor for documents 
with that match due to the fuzzy matching. 
43: Fuzzy query 
explain score for match 
Explain why the score for the document with the fuzzy match is higher than 
the score for the exact match. 
Total score is a product of field and query weight. 
found term query weight field weight 
description:basic^0.8 0.36849 0.992109 
description:basik 0.59356 0.51138 
44: Fuzzy query 
http://localhost:9200/_plugin/preso/#/print Page 39 of 42
Explain explained 05/11/14 11:56 
enhance result with a signal 
Since we got the wrong document on top with the previous fuzzy query we 
now want to help improve the results with a Signal. A signal can help to 
change the score in a way you prefer. In this case we make the score higher 
if there is an exact match. 
45: Fuzzy query 
explain score for match 
Explain why adding a signal query as a should query with a match query 
does change the order of the results. 
No match means a coord penalty. 
found term must (fuzzy) should (match) 
description:basik 0.26101 0.26101 
description:basic^0.8 0.31438 * 0.5 (coord 1/2) 
46: Function score query 
using popularity 
One query that is used a lot on news sites is the function_score query. With 
this query you can change the score based on another field like the 
popularity or recency. In this slide we discuss the effect on the explain 
output for such a query. 
GET /blogging/_search?explain 
http://localhost:9200/_plugin/preso/#/print Page 40 of 42
Explain explained 05/11/14 11:56 
{ 
"query": { 
"function_score": { 
"query": { 
"match": { 
"description": "elasticsearch" 
} 
}, 
"functions": [ 
{ 
"field_value_factor": { 
"field": "popularity", 
"factor": 1.2, 
"modifier": "ln" 
} 
} 
] 
} 
} 
} 
structure of the score calculation 
function score 
[*] 
description:elasticsearch 
Math.min 
ln(doc[popularity].value * 
factor=1.2) 
maxBoost 
47: Summarizing 
the take away 
Explain the right process to produce the right results. 
http://localhost:9200/_plugin/preso/#/print Page 41 of 42
Explain explained 05/11/14 11:56 
The right process to produce the right results. 
Use the correct analyzer, 
Construct the right query, 
Analyze the results with your users, 
Explain the results using explain/validate and improve. 
48: Questions 
I am here the whole day 
Place holder sheet that can be used during the questions moment. 
jettro.coenradie@luminis.eu 
@jettroCoenradie 
https://github.com/jettro/preso-explain 
http://localhost:9200/_plugin/preso/#/print Page 42 of 42

Returning the right results - Jettro Coenradie

  • 1.
    Explain explained 05/11/1411:56 Summary 1: Getting the right results Jettro Coenradie This presentation is about getting the right results from elasticsearch. There are a lot of things that you can do to improve the results you get back from elasticsearch. You will get an introduction into different kind of queries that you can use, the impact of analysers on results and we take a deep dive into the explain functionality. Using the explain functionality you can find out why one document is matching better than another. http://localhost:9200/_plugin/preso/#/print Page 1 of 42
  • 2.
    Explain explained 05/11/1411:56 Returning the right results @jettroCoenradie http://localhost:9200/_plugin/preso/#/print Page 2 of 42
  • 3.
    Explain explained 05/11/1411:56 2: About me how to contact me My name is Jettro Coenradie, I am the follow of Luminis Amsterdam. My specialty is search solutions and specifically elasticsearch. You can follow me on twitter, linkedin and my code is on github. email jettro.coenradie@luminis.eu twitter @jettroCoenradie linkedin https://www.linkedin.com/in/jettro Github https://github.com/jettro Blog http://www.gridshore.nl 3: The right results what are they? How would you define the right results. Of course a lot depends on the context of the asked question. Like always you need to find the term on wikipedia to get your first explanation. How would you explain The right results? 4: The right results according to wikipedia Every presentation has to start with wikipedia. To bad there is no page for the right results, but there is an interesting link to be found. This shows an excerpt from the toyota way. The right process will produce the right results. http://localhost:9200/_plugin/preso/#/print Page 3 of 42
  • 4.
    Explain explained 05/11/1411:56 This is also true for returning the right results using elasticsearch. During this presentation the right process will become clear. 5: What is elasticsearch? more than search Before we can start explaining why elasticsearch returns the results that it does, you first need to know more about what elasticsearch is, what it can do for you and some terminology used though out the remainder of the presentation. You will learn about structured and unstructured data, data sources and how we use the data. http://localhost:9200/_plugin/preso/#/print Page 4 of 42
  • 5.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 5 of 42
  • 6.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 6 of 42
  • 7.
    Explain explained 05/11/1411:56 6: Lucene what we need it for http://localhost:9200/_plugin/preso/#/print Page 7 of 42
  • 8.
    Explain explained 05/11/1411:56 Introduce lucene, explain we use analyzers to create terms, the terms are stored in an inverted index and the inverted index is used to search the terms. Create terms, Store terms, Search terms. 7: Elasticsearch and lucene cluster, index, shards, lucene In here I want to explain the different components of an elasticsearch cluster. I am showing images containing the structure of these components. A cluster contains multiple nodes. Each nodes contains shards of multiple indices. Each shard is a lucene index. http://localhost:9200/_plugin/preso/#/print Page 8 of 42
  • 9.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 9 of 42
  • 10.
    Explain explained 05/11/1411:56 8: Executing a query calling all shards In this slide I am explaining what happens when you execute a query. You will learn that we first execute a query that is send to all shards by the client. The results are gathered and merged and if the right set of documents is created the actual required documents are fetched. http://localhost:9200/_plugin/preso/#/print Page 10 of 42
  • 11.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 11 of 42
  • 12.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 12 of 42
  • 13.
    Explain explained 05/11/1411:56 9: Executing a query basic concepts http://localhost:9200/_plugin/preso/#/print Page 13 of 42
  • 14.
    Explain explained 05/11/1411:56 This slide shows the both the apis that elasticsearch provideds. You can execute queries using the java api or the rest api through one of the available drivers. No matter what mechanism you choose you can use a lot of different queries. http://localhost:9200/_plugin/preso/#/print Page 14 of 42
  • 15.
    Explain explained 05/11/1411:56 10: Example with curl find all docs In this slide we present you the most basic match all docs query using curl. http://localhost:9200/_plugin/preso/#/print Page 15 of 42
  • 16.
    Explain explained 05/11/1411:56 11: Other query tools there are a lot Some examples of other query tools that are available. http://localhost:9200/_plugin/preso/#/print Page 16 of 42
  • 17.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 17 of 42
  • 18.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 18 of 42
  • 19.
    Explain explained 05/11/1411:56 12: Execute query basic match query This is the most basic variant of executing a query. GET /slides/_search { "query": { "match": { "description": "What you type!" } } } Results only in life presentation 13: The calculated score use the explain api Here we are going to discuss the most basic explain you can get. GET /slides/_search?explain http://localhost:9200/_plugin/preso/#/print Page 19 of 42
  • 20.
    Explain explained 05/11/1411:56 { "query": { "match": { "description": "What you type!" } } } Results only in life presentation 14: Explain query explained the basics In this slide I am going to show details about the explain basics. This is important to notice the pattern that all explain queries will have for every term that is matched. http://localhost:9200/_plugin/preso/#/print Page 20 of 42
  • 21.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 21 of 42
  • 22.
    Explain explained 05/11/1411:56 http://localhost:9200/_plugin/preso/#/print Page 22 of 42
  • 23.
    Explain explained 05/11/1411:56 15: Calculating score the theory In this slide we are going to explain the theory behind creating score using simulariry algorithms. Score is calculated for matching documents (Boolean Model), Score represents similarity between search and document terms, Lucene uses enhanced TF/IDF (coordination factor and field length), Other algorithms can be used: Okapi BM25 16: Lucene similarity formula Shows the formula used by lucene to calculate the score. http://localhost:9200/_plugin/preso/#/print Page 23 of 42
  • 24.
    Explain explained 05/11/1411:56 https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/17: Calculating score the terms This slide gives an overview of the most important definitions for calculating the score. queryNorm Attempt to make different queries comparable. coord Factor for total score based on amount of queried and found terms Term frequency Amount of times a term is matched in the field Inverse document amount of documents that have the term frequency fieldNorm Length of the field the terms was found in boost Boost a field score 18: An explain example using match query In this slide we are going to use a very simple match query with an index containing only three documents. The goal is to show the effect on term frequency, inverse document frequency and the fieldnorm with very little documents in the index. Show tf/idf/fieldNorm and score Doc 1 Doc 2 Doc 3 http://localhost:9200/_plugin/preso/#/print Page 24 of 42
  • 25.
    Explain explained 05/11/1411:56 one two three two three three one 1 / 1 / 0.5 0.702 two 1 / 2 / 0.5 1 / 2 / 0.625 0.5 0.625 three 1 / 3 / 0.5 1 / 3 / 0.625 1 / 3 / 1 0.356 0.445 0.712 19: Explain multiple terms with a trick Here we are going to shows what happens to the results when using capital letters, multipe terms and introduce the camel case analyzer. GET /onetwothree/_search?explain Results only in life presentation 20: What is an analyzer the parts Explain what the different components of an analyzer are. Character filters Tidy up the string before tokenising. Tokeniser Splits the string into a number of tokens Token filters Do something with the tokens 21: One Two Three Analyzer settings http://localhost:9200/_plugin/preso/#/print Page 25 of 42
  • 26.
    Explain explained 05/11/1411:56 Show the settings part of the analyzer as used in the onetwothree sample with the camel case. GET /onetwothree/_settings Results only in life presentation 22: One Two Three Analyzer mappings Show the mappings part of the analyzer as used in the onetwothree sample with the camel case. GET /onetwothree/_mappings Results only in life presentation 23: One Two Three Analyzer analyze api Test the analyzer using the analyze api. GET /onetwothree/_analyze?analyzer=camel&text=OneTwoThree Results only in life presentation 24: Back to explain recap single term In this slide we get back to the explain api, from the image with the real explain output we introduce a short notation. GET /slides/_search?explain http://localhost:9200/_plugin/preso/#/print Page 26 of 42
  • 27.
    Explain explained 05/11/1411:56 { "query": { "match": { "description": "basic" } } } structure of the score calculation description:basic [*] tf / idf / fieldNorm 25: Validate query http://localhost:9200/_plugin/preso/#/print Page 27 of 42
  • 28.
    Explain explained 05/11/1411:56 using validate api In this slide we are going to demonstrate the validate api for a query using multiple terms. POST /slides/_validate/query?explain 26: Validate query using validate api In this slide we are going to demonstrate the validate api for a query using multiple terms using the and operator. POST /slides/_validate/query?explain 27: Bool query the base for all queries Introduce the bool query as the base query to all other queries. In the end all queries can be written as a bool query. Explain the difference between operator AND/OR. { "query": { "bool": { "must": [ {} ], "must_not": [ {} ], "should": [ {} ] } } } 28: Boolean model http://localhost:9200/_plugin/preso/#/print Page 28 of 42
  • 29.
    Explain explained 05/11/1411:56 must, must_not and should In this slide we are going to explain the transformation of all queries into a bool query. { "query": { "match": { "description": "basic search elasticsearch" } } } { "query": { "bool": { "should": [ { "term": { "description": { "value": "basic" } } }, { "term": { "description": { "value": "search" } } }, { "term": { "description": { "value": "elasticsearch" } } } ] } } } http://localhost:9200/_plugin/preso/#/print Page 29 of 42
  • 30.
    Explain explained 05/11/1411:56 29: Explain 2 terms match with 2 Now we are going to show the short notation for the explaination of a query with two terms due to a standard analyzer. GET /slides/_search?explain { "query": { "match": { "description": "basic search" } } } structure of the score calculation [*] [+] description:basic coord (1/2) 30: Explain 3 terms match with 3 Now we are going to show the short notation for the explaination of a query with three terms due to a standard analyzer. GET /slides/_search?explain http://localhost:9200/_plugin/preso/#/print Page 30 of 42
  • 31.
    Explain explained 05/11/1411:56 { "query": { "match": { "description": "basic search elasticsearch" } } } structure of the score calculation [*] [+] description:basic description:elasticsearch coord (2/3) 31: Validate query using validate api In this slide we are going to demonstrate the validate api for a query using multiple terms and multiple fields with the default best_fields type query. POST /slides/_validate/query?explain 32: Validate query using validate api In this slide we are going to demonstrate the validate api for a query using multiple terms and multiple fields with the most_fields type query. POST /slides/_validate/query?explain 33: Validate query http://localhost:9200/_plugin/preso/#/print Page 31 of 42
  • 32.
    Explain explained 05/11/1411:56 using validate api In this slide we are going to demonstrate the validate api for a query using multiple terms and multiple fields with the cross_fields type query. POST /slides/_validate/query?explain 34: Explain multi_field best_fields Show the effect of a multi_field query using the default best_fields type. GET /slides/_search?explain { "query": { "multi_match": { "query": "basic query", "fields": [ "title", "description" ], "type": "best_fields" } } } structure of the score calculation [max_of] [+] description:basic description:query [*] [+] title:query http://localhost:9200/_plugin/preso/#/print Page 32 of 42
  • 33.
    Explain explained 05/11/1411:56 coord (1/2) 35: Explain multi_field most_fields Show the effect of a multi_field query using the most_fields type. GET /slides/_search?explain { "query": { "multi_match": { "query": "basic query", "fields": [ "title", "description" ], "type": "most_fields" } } } structure of the score calculation [sum_of] [+] description:basic description:query [*] [+] title:query coord (1/2) 36: Explain multi_field http://localhost:9200/_plugin/preso/#/print Page 33 of 42
  • 34.
    Explain explained 05/11/1411:56 cross_fields Show the effect of a multi_field query using the cross_fields type. GET /slides/_search?explain { "query": { "multi_match": { "query": "basic query", "fields": [ "title", "description" ], "type": "cross_fields" } } } structure of the score calculation [sum_of] [max_of] description:basic [max_of] description:query title:query 37: Explain dis_max query use tie breaker Show the effect of a dis_max query which is a balance between cross_fields and best matching field. GET /slides/_search?explain http://localhost:9200/_plugin/preso/#/print Page 34 of 42
  • 35.
    Explain explained 05/11/1411:56 { "query": { "dis_max": { "tie_breaker": 0.7, "boost": 1.2, "queries": [ { "match": { "description": "basic query" } }, { "match": { "title": "basic query" } } ] } } } structure of the score calculation [max_of + 0.7 [*] others] [+] description:basic description:query [*] [+] title:query coord (1/2) 38: Multile terms and fields summary http://localhost:9200/_plugin/preso/#/print Page 35 of 42
  • 36.
    Explain explained 05/11/1411:56 Explain the different options we have for multi field queries and explain the differences when calculating the score. Best field returns the field with the highest score, Most fields adds the scores for the different fields, Cross fields treets all field as one big field and add maximum score for term, Dis max takes the best field and adds a part of the score of other fields 39: Boosting the basics In match queries you can apply a boost to a certain field. Important to notice is that the structure of the output of explain is not changing using this kind of boost. It is only the score that changes, the boost is reflected within the query norm of the explain. { "query": { "multi_match": { "query": "basic query", "fields": [ "title^5", "description" ] } } } No boost Title boost _score 0.729 0.312 description:basic 0.533 0.107 description:query 0.195 0.0391 description query norm 0.197 0.0394 title:query 0.624 0.624 http://localhost:9200/_plugin/preso/#/print Page 36 of 42
  • 37.
    Explain explained 05/11/1411:56 coord (1/2) 0.5 0.5 40: Boosting query match with negative impact The most basic boosting, is boosting on a field basis. Sometimes you have other boosting requirements. One thing could be to give a negative boost to some term. Of course you can use the must_not in a bool query but this is different. In that situation you do not have a match, but we want a match just with a lower score if a certain term is available. Here we show that the negative term query adds no score but does give a penalty to the complete score. { "query": { "boosting": { "positive": { "term": { "description": { "value": "basic" } } }, "negative": { "term": { "description": { "value": "query" } } }, "negative_boost": 0.2 } } } http://localhost:9200/_plugin/preso/#/print Page 37 of 42
  • 38.
    Explain explained 05/11/1411:56 41: Sorting results by score and ... In this slide I want to discuss the options you have for sorting results. Sort by score (the default), Sort by date, Sort by analyzed fields, GET /slides/_search http://localhost:9200/_plugin/preso/#/print Page 38 of 42
  • 39.
    Explain explained 05/11/1411:56 { "query": { "match": { "description": "What you type!" } }, "sort": [ { "title.raw": { "order": "asc" } } ] } Results only in life presentation 42: Fuzzy query taking care of typos In here we are going to demonstrate the effect of fuzzy searching on the score. We are going to use the term basik which is wrong for all slides except this slide. Show what happens with the boost factor for documents with that match due to the fuzzy matching. 43: Fuzzy query explain score for match Explain why the score for the document with the fuzzy match is higher than the score for the exact match. Total score is a product of field and query weight. found term query weight field weight description:basic^0.8 0.36849 0.992109 description:basik 0.59356 0.51138 44: Fuzzy query http://localhost:9200/_plugin/preso/#/print Page 39 of 42
  • 40.
    Explain explained 05/11/1411:56 enhance result with a signal Since we got the wrong document on top with the previous fuzzy query we now want to help improve the results with a Signal. A signal can help to change the score in a way you prefer. In this case we make the score higher if there is an exact match. 45: Fuzzy query explain score for match Explain why adding a signal query as a should query with a match query does change the order of the results. No match means a coord penalty. found term must (fuzzy) should (match) description:basik 0.26101 0.26101 description:basic^0.8 0.31438 * 0.5 (coord 1/2) 46: Function score query using popularity One query that is used a lot on news sites is the function_score query. With this query you can change the score based on another field like the popularity or recency. In this slide we discuss the effect on the explain output for such a query. GET /blogging/_search?explain http://localhost:9200/_plugin/preso/#/print Page 40 of 42
  • 41.
    Explain explained 05/11/1411:56 { "query": { "function_score": { "query": { "match": { "description": "elasticsearch" } }, "functions": [ { "field_value_factor": { "field": "popularity", "factor": 1.2, "modifier": "ln" } } ] } } } structure of the score calculation function score [*] description:elasticsearch Math.min ln(doc[popularity].value * factor=1.2) maxBoost 47: Summarizing the take away Explain the right process to produce the right results. http://localhost:9200/_plugin/preso/#/print Page 41 of 42
  • 42.
    Explain explained 05/11/1411:56 The right process to produce the right results. Use the correct analyzer, Construct the right query, Analyze the results with your users, Explain the results using explain/validate and improve. 48: Questions I am here the whole day Place holder sheet that can be used during the questions moment. jettro.coenradie@luminis.eu @jettroCoenradie https://github.com/jettro/preso-explain http://localhost:9200/_plugin/preso/#/print Page 42 of 42