Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

My Presentation to ElasticSearch Meetup in Rome on 8th of February about new tricks in Elasticsearch

  • Login to see the comments

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

  1. 1. Roma – 8 Febbraio 2017 presenta Alberto Paro, Seacom ElasticSearch 5.x New Tricks
  2. 2. Alberto Paro  Laureato in Ingegneria Informatica (POLIMI)  Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech review  Lavoro principalmente in Scala e su tecnologie BD (Akka, Spray.io, Playframework, Apache Spark) e NoSQL (Accumulo, Cassandra, ElasticSearch e MongoDB)  Evangelist linguaggio Scala e Scala.JS
  3. 3. Tip 1: Shrink - 1/5 Why?  The wrong number of shards during the initial design sizing. Often sizing the shards without knowing the correct data/text distribution tends to oversize the number of shards  Reducing the number of shards to reduce memory and resource usage  Reducing the number of shards to speed up searching
  4. 4. Tip 1: Shrink - 2/5 - Where is your data? We can retrieve it via the _nodes API: curl -XGET 'http://localhost:9200/_nodes?pretty' In the result there will be a similar section: .... "nodes" : { "5Sei9ip8Qhee3J0o9dTV4g" : { "name" : "Gin Genie", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.1.1",.... The name of my node is Gin Genie
  5. 5. Tip 1: Shrink - 3/5 - Relocate your data We can change the index settings, forcing allocation to a single node for our index, and disabling the writing for the index. curl -XPUT 'http://localhost:9200/myindex/_settings' -d ’ { "settings": { "index.routing.allocation.require._name": "Gin Genie", "index.blocks.write": true } }’ We can check for the green status: curl -XGET 'http://localhost:9200/_cluster/health?pretty'
  6. 6. Tip 1: Shrink - 4/5 – Shrink our shards We need to disable the writing for the index via: curl -XPUT 'http://localhost:9200/myindex/_settings?index.blocks.write=true' The shrink call for creating the reduced_index, will be: curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{ "settings": { "index.number_of_replicas": 1, "index.number_of_shards": 1, "index.codec": "best_compression” }, "aliases": {"my_search_indices": {}} }'
  7. 7. Tip 1: Shrink - 5/5 – Post Shrinking We can also wait for a yellow status if the index it is ready to work: curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow’ Now we can remove the read-only by changing the index settings: curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'
  8. 8. Tip 2: Reindex - 1/2 Why?  Changing an analyzer for a mapping  Adding a new subfield to a mapping and you need to reprocess all the records to search for the new subfield  Removing an unused mapping  Changing a record structure that requires a new mapping
  9. 9. Tip 2: Reindex - 2/2 curl -XPOST 'http://localhost:9200/_reindex?pretty=true' -d '{ "source": { "index": "myindex” "type": "mytype", "query": "…" }, "dest": { "index": "myindex2", "script": "…" } }'
  10. 10. Tip 3: Update By Query with painless Add a new Field 1. Create your mapping (i.e modified: date) 2. Call an update by query curl -XPOST http://$server/$index/$mapping/_update_by_query -d '{ "script": { "inline": "ctx._source.modified="2015-10-06T00:00:00.000+00:00"", "lang": "painless” }, "query": { "bool": {"must_not":[{"exists":{"field":"modified"} }]} } }'
  11. 11. Tip 4: Use search_after Step 1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "sort": [{"_uid": "desc"} ] }’ Step n, n>1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "search_after": ["$type#100"], "sort": [{"_uid": "desc"} ] }’
  12. 12. Tip 5: Reindex for a remote node – 1/2 Why?  The backup is a safe Lucene index copy, so it depends on the Elasticsearch version used. If you are switching from a version of Elastisearch that is prior to version 5.x, it's not possible to restore old indices.  It's not possible to restore backups of a newer Elasticsearch version in an older version. The restore is only forward- compatible.  It's not possible to restore partial data from a backup.
  13. 13. Tip 5: Reindex for a remote node – 2/2 In config/elasticsearch.yml add: reindex.remote.whitelist: ["192.168.1.227:9200"] Then: curl -XPOST "http://$server/_reindex" -d' { "source": { "remote": { "host": "http://192.168.1.227:9200" }, "index": "test-source” }, "dest": { "index": "test-dest” } }'
  14. 14. Tip 6: Ingest Pipeline – 1/2 Why  Adding/Removing fields without changing your code  Manipulate your records before ingesting  Computed fields  Also supports scripting
  15. 15. Tip 6: Ingest Pipeline – 2/2 curl -XPUT 'http://127.0.0.1:9200/_ingest/pipeline/add-user-john' -d '{ "description" : "Add user john field", "processors" : [ { "set" : { "field": "user", "value": "john"} } ], "version":1 }’ curl -XPUT http://$server/$index/$type/$id?pipeline=add-user-john -d '{}'
  16. 16. Grazie per l’attenzione Alberto Paro
  17. 17. Q&A

×