Bulking for Indexing, creating, updating and deleting Bulk size in Bytes, not number of documents If in doubt, smaller batch sizes Parallelize multiple bulks Async calls
Turn of refresh while indexing Delay flushes Throttle merging Maybe increase indices.memory.index_buffer_size Set replicas to zero (only DURING indexing, right?) Disable warmup
Do you really need your _all field? _source field & stored ??? Reduce analysis Field norms Term frequencies & positions Not_analyzed is your friend Dynamic mapping is for playtime, not production
No Scoring Filter results can be cached Most Simple filters are cached, but not all (geo) Compound filters are not cached Expicitly control cache with _cache Bool filters query the cache for sub-filters, but and/or/not don‘t Moving Target Consider the scope -> filtered query probably? Filter applied after query, but not in „filtered query“!
Regular Queries query first, filter afterwards Filtered query filters first Elements of Bool filters are executed sequentially Place most restrictive filter first Accelerator filter Additional filter on general terms Better for caching Reduce Work for heavyweight filters
Pagination Don‘t load too many results at once Avoid deep pagination
Index-time vs. Query time optimizations: Try to do prework during index time E.g. Prefix Query vs. Edge Ngram Warmup for „common queries“ Turn on the slow log Use multi-search if applicable
Load lazy as much as possible Hide lesser needed ones Only load once during pagination
For example sorting Filed data stored in RAM Expensive for the JVM, Garbage Collection Issues OS File System cache can take care of that Slightly slower Test them!
Update is a delete + add Partial updates still read the whole document Even „small“ updates can be expensive
Sequential Ids allow optimized storage (binary stored) Javas UUID is truly random Internally Elasticsearch uses FlakeIDs
Multiple Shards allow for paralell writes Multiple Replicas allow parallel reads Indexing more expensive Safety Sharding makes reads slower Accurate scoring round trip Second round trip for the search Reduce step Third roundtrip to retrieve final set of documents
2 Rules of distributed Search: Distributed Search is expensive! Searching multiple indexes is the same as searching multiple shards
Only works for isolated „chunks“ of Data in the same index Maybe „Users“ Routing key overrides shard key Popular Example UserID Multipe users will share a shard Shards will be different in size
Alternative: Aliases Move out Large users to new index Have alias point to all indexes Drawback: Cluster state will become big, high network impact
Use existent client librarys If Java, prefer NodeClient Alternative Transport Client
Http Long lived connections Check http chunking
Maximum Number of File Descriptors Avoid Swapping ES_HEAP_SIZE (Xms = Xmx) Leave enough memory to the OS ½ memory to ES Not more than 32GB If using doc values, a few GB should be enough
Use concurrent GC Default is CMS, maybe try G1
Check your Java Version Avoid virtualisation Noisy Neighbours Storage Use local Use SSD RAID 0
Elasticsearch performance tips
codecentric AG 17.12.2014 Seite 1
17.12.2014 Seite 2
Why doesn‘t anyone bulk?!?
17.12.2014 Seite 3
Change your configuration during important events
17.12.2014 Seite 4
Y‘all need to think more abour your mappings
FILTERS AND CACHES
17.12.2014 Seite 5
Filters instead of Queries as often as possible
17.12.2014 Seite 6
What comes first?
The Chicken or the Egg?
The Query or the Filter?
17.12.2014 Seite 7
So much room for optimizations!
17.12.2014 Seite 8
Aggregations are expensive!
17.12.2014 Seite 9
Store field data on disk instead of on the heap
17.12.2014 Seite 10
There‘s no such thing as an update
17.12.2014 Seite 11
Use friendly IDs !
17.12.2014 Seite 12
Choose a sharding strategy!
17.12.2014 Seite 13
Avoid distibuted searches by routing
17.12.2014 Seite 14
Use the right client!
17.12.2014 Seite 15
There are plenty of essential configurations
17.12.2014 Seite 16
Tuning: Measure, don‘t guess!
17.12.2014 Seite 17
One change at a time!
17.12.2014 Seite 18