10 minutes lightning talks about how to avoid hotspots in Elasticsearch. It goes through the way elasticsearch decides which node will host your data as well as how to force it to store the data on the nodes you want.
5. Split vs vnode/shards
Splitting an existing index means
re-index everything.
● On-demand
● Very expensive in resources
● Usually needed when cluster
under stress
● Challenge supporting current
writes and read
Sharding in-advance:
● static
● moving shard is low cost (just
transferring bytes)
● No impact on current read and
writes
10. Murmur3Hash function
Hash function for table lookup
● Fast
● Very few collisions
● Uniform distributions across all possibilities
● Not good for signing documents (no crypto)
11. Default field vs Custom field
Default is good when
● No specific access pattern
● Uniform repartition of
documents
● Less risk of hotspot
● Search needs to ask all shards
Custom field is good when:
● Search usually filter by this field
● Documents belonging to the
same group are together
● Need to create dedicated index
in case of hotspot
● Search only targets a single
shard
14. Dealing with hotspot
P1 R2 R1 R3 P3
R1 P2 R3
Index 1
R2
Create a dedicated index for the heavy user. Transparent for application (alias)
Index 2
P1 R1 R1
15. Putting heavy load on bigger servers
./bin/elasticsearch --node.box_type strong
PUT /index_2
{
"settings": {
"index.routing.allocation.include.box_type" : "strong"
}
}