5. Index
An index is like a ‘table’ in a relational database.
It has a mapping which defines multiple types.
An index is a logical namespace:
◦ Maps to one or more primary shards
◦ Can have zero or more replica shards
RDBMS
ES
Database
?
Table
Index
Columns/Rows
Document
CURRENT VERSION 7.6
12. Index Operations – list all indexes
GET _cat/indices
GET /_cat/indices/twi*?v
GET /_cat/indices/?v&health=green|yellow|red&h=col1,col2
CURRENT VERSION 7.6
13. Index Operations – read index details
GET big-index
GET big-index?format=yaml|json
CURRENT VERSION 7.6
14. Index Operations – create document
POST big-index/_doc/1
{
"name": "Ismail Anjrini",
"age": 27
}
POST big-index/_doc/2
{
"name": "Fadi Abdul Wahab",
"age": 45,
"country": "Saudi Arabia"
}
CURRENT VERSION 7.6
15. Index Operations – POST vs PUT
POST big-index/_doc/
{
"name": "Kasem",
"age": 46
}
PUT big-index/_doc/
{
"name": "Riyadh",
"age": 33
}
CURRENT VERSION 7.6
16. Index Operations – read document
GET big-index/_doc/2
CURRENT VERSION 7.6
17. Index Operations – update document
POST big-index/_update/1
{
"doc":
{
"name":"Ismail Hassan Anjrini" ,
"country": "Syria"
}
}
CURRENT VERSION 7.6
18. Index Operations – delete document
DELETE big-index/_doc/1 PUT big-index/_doc/1
{
"name":"Ismail Anjrini",
"age": 27
}
CURRENT VERSION 7.6
19. Index Operations - Index aliases
An index alias is a secondary name used to refer to one or more existing indices
POST index-1/_alias/index-alias
POST index-2/_alias/index-alias
POST index-3/_alias/index-alias
CURRENT VERSION 7.6
20. Index Operations - Index aliases
filter: If specified, the index alias only applies to documents returned by the filter.
POST index-*/_alias/index-Egypt
{
"filter":
{
"term":
{
"nationality": "egypt"
}
}
}
CURRENT VERSION 7.6
21. Index Operations - Index aliases
DELETE index-1/_alias/index-alias
DELETE index-*/_alias/index-alias
GET index-alias/_search
GET index-alias/_search
CURRENT VERSION 7.6
22. Index Template
Index templates define settings and mappings that you can automatically apply when creating
new indices
Elasticsearch applies templates to new indices based on an index pattern that matches the index
name
Changes to index templates do not affect existing indices
Settings and mappings specified in create index API requests override any settings or mappings
specified in an index template
CURRENT VERSION 7.6
24. Index Template - Order
Multiple index templates can potentially match an index
Both the settings and mappings are merged into the final configuration of the index
The order of the merging can be controlled using the order parameter
With lower order being applied first, and higher orders overriding them
CURRENT VERSION 7.6
26. Index Operations - Reindex
Reindex the current data in old-index to new-index
It does not copy the settings/fields settings from the source index to destination
CURRENT VERSION 7.6
27. Index Operations - Reindex
version_type: internal or empty:
◦ Update any document that have the same _id regardless the version number in the target index
◦ Increase the version number for the documents with the same _id
CURRENT VERSION 7.6
29. Index Operations - Reindex
version_type: external
◦ Elasticsearch to preserve the version from the source
◦ Create any documents that are missing
◦ The _id value is not matched
◦ Update any documents that have an older version in the destination index than they do in the source
index
◦ The document with older version will get the same version number from the source index
CURRENT VERSION 7.6
30. Index Operations - Reindex
Created index-1
Add data to index-1
Delete new-index-1
CURRENT VERSION 7.6
31. Index Operations - Reindex
Add document to index-1
Do reindex
CURRENT VERSION 7.6
32. Index Operations - Reindex
op_type: create
◦ _reindex to only create missing documents in the target index
◦ All existing documents will cause a version conflict
max_docs
◦ To limit the number of processed documents from source to dest
CURRENT VERSION 7.6
refresh_interval: How often to perform a refresh operation, which makes recent changes to the index visible to search. Defaults to 1s
number_of_shards
The number of primary shards that an index should have. Defaults to 1. This setting can only be set at index creation time
Health
values: green|yellow|red
(Optional, string) Health status used to limit returned indices
h:
(Optional, string) Comma-separated list of column names to display.
s:
(Optional, string) Comma-separated list of column names or column aliases used to sort the response.
Script 1:
Where is the Nationality field? It is not here because we didn’t pass it during the document creation
Script 2:
Note the country column in the mappings section
PUT
1 - updates a full document, not only the field you're sending.
2 - can not create document without id
POST
1 - will do a partial update and only update the fields you're sending, and not touch the other ones already present in the document.
2 - creates document with/without id
1 - Note that we didn’t touch the field age and still appears
2 – You can add new field to the document
Check _version: 6
Versioning:
Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document we are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation executed on a document, deletes included, causes its version to be incremented. The version number of a deleted document remains available for a short time after deletion to allow for control of concurrent operations. The length of time for which a deleted document’s version remains available is determined by the index.gc_deletes index setting and defaults to 60 seconds.
Great articles
https://developers.soundcloud.com/blog/how-to-reindex-1-billion-documents-in-1-hour-at-soundcloud
https://engineering.carsguide.com.au/elasticsearch-zero-downtime-reindexing-e3a53000f0ac
Full reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
1 – Reindex documents already exists in the dest index
2 – The version will be increased with the updated data