ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

Roma – 8 Febbraio 2017
presenta Alberto Paro, Seacom
ElasticSearch 5.x
New Tricks

Alberto Paro
 Laureato in Ingegneria Informatica (POLIMI)
 Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech
review
 Lavoro principalmente in Scala e su tecnologie BD
(Akka, Spray.io, Playframework, Apache Spark) e NoSQL
(Accumulo, Cassandra, ElasticSearch e MongoDB)
 Evangelist linguaggio Scala e Scala.JS

Tip 1: Shrink - 1/5
Why?
 The wrong number of shards during the initial
design sizing. Often sizing the shards without
knowing the correct data/text distribution tends to
oversize the number of shards
 Reducing the number of shards to reduce memory
and resource usage
 Reducing the number of shards to speed up
searching

Tip 1: Shrink - 2/5 - Where is your data?
We can retrieve it via the _nodes API:
curl -XGET 'http://localhost:9200/_nodes?pretty'
In the result there will be a similar section:
.... "nodes" : {
"5Sei9ip8Qhee3J0o9dTV4g" : {
"name" : "Gin Genie",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "5.1.1",....
The name of my node is Gin Genie

Tip 1: Shrink - 3/5 - Relocate your data
We can change the index settings, forcing allocation to a single node for
our index, and disabling the writing for the index.
curl -XPUT 'http://localhost:9200/myindex/_settings' -d ’
{
"settings": {
"index.routing.allocation.require._name": "Gin Genie", "index.blocks.write":
true
}
}’
We can check for the green status:
curl -XGET 'http://localhost:9200/_cluster/health?pretty'

Tip 1: Shrink - 4/5 – Shrink our shards
We need to disable the writing for the index via:
curl -XPUT 'http://localhost:9200/myindex/_settings?index.blocks.write=true'
The shrink call for creating the reduced_index, will be:
curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{
"settings": {
"index.number_of_replicas": 1,
"index.number_of_shards": 1,
"index.codec": "best_compression”
},
"aliases": {"my_search_indices": {}}
}'

Tip 1: Shrink - 5/5 – Post Shrinking
We can also wait for a yellow status if the index it is ready to work:
curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow’
Now we can remove the read-only by changing the index settings:
curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'

Tip 2: Reindex - 1/2
Why?
 Changing an analyzer for a mapping
 Adding a new subfield to a mapping and you need
to reprocess all the records to search for the new
subfield
 Removing an unused mapping
 Changing a record structure that requires a new
mapping

Tip 2: Reindex - 2/2
curl -XPOST 'http://localhost:9200/_reindex?pretty=true' -d '{
"source": {
"index": "myindex”
"type": "mytype",
"query": "…"
},
"dest": {
"index": "myindex2",
"script": "…"
}
}'

Tip 3: Update By Query with painless
Add a new Field
1. Create your mapping (i.e modified: date)
2. Call an update by query
curl -XPOST http://$server/$index/$mapping/_update_by_query -d '{
"script": {
"inline": "ctx._source.modified="2015-10-06T00:00:00.000+00:00"",
"lang": "painless”
},
"query": {
"bool": {"must_not":[{"exists":{"field":"modified"} }]}
}
}'

Tip 4: Use search_after
Step 1:
curl -XGET 'http://$server/$index/$type/_search' -d ’{
"size": 100,
"query": { "match_all" : {} },
"sort": [{"_uid": "desc"} ]
}’
Step n, n>1:
curl -XGET 'http://$server/$index/$type/_search' -d ’{
"size": 100,
"query": { "match_all" : {} },
"search_after": ["$type#100"],
"sort": [{"_uid": "desc"} ]
}’

Tip 5: Reindex for a remote node – 1/2
Why?
 The backup is a safe Lucene index copy, so it depends on the
Elasticsearch version used. If you are switching from a version
of Elastisearch that is prior to version 5.x, it's not possible to
restore old indices.
 It's not possible to restore backups of a newer Elasticsearch
version in an older version. The restore is only forward-
compatible.
 It's not possible to restore partial data from a backup.

Tip 5: Reindex for a remote node – 2/2
In config/elasticsearch.yml add:
reindex.remote.whitelist: ["192.168.1.227:9200"]
Then:
curl -XPOST "http://$server/_reindex" -d' {
"source": {
"remote": { "host": "http://192.168.1.227:9200" },
"index": "test-source”
},
"dest": {
"index": "test-dest”
}
}'

Tip 6: Ingest Pipeline – 1/2
Why
 Adding/Removing fields without changing your code
 Manipulate your records before ingesting
 Computed fields
 Also supports scripting

Tip 6: Ingest Pipeline – 2/2
curl -XPUT 'http://127.0.0.1:9200/_ingest/pipeline/add-user-john' -d '{
"description" : "Add user john field",
"processors" : [
{
"set" : {
"field": "user",
"value": "john"} }
],
"version":1
}’
curl -XPUT http://$server/$index/$type/$id?pipeline=add-user-john -d '{}'

Grazie per
l’attenzione
Alberto Paro

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

Similar to ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup (20)

More from Alberto Paro

More from Alberto Paro (9)

Recently uploaded

Recently uploaded (20)

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

Editor's Notes