This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
The Good (well, arguably). Why containers? Orchestration, configuration drift...
The Bad (actually, not so bad). How to do it? Hardware, heap size, shards...
The Ugly (and exciting). Why is it slow/crashing? Container limits, GC&OS settings
dev=test=prod; infrastructure as code. Sounds familiar? But:
○ light images
○ faster start&stop
○ hype ⇒ community
Efficiency (overhead vs isolation): (processes + VMs)/2 = containers
More on “the Good” of containerization
Zookeeper on separate hosts
Equal nodes per host
Equal shards per node
podAntiAffinity on k8s
Moving on to “how”
Overshard*. A bit.
*Moving shards creates load ⇒ be aware of spikes
Time series? Size-based indices
volumes/StatefulSet for persistence
local > network (esp. for full-text search)
latency (mostly to Zookeeper)
AWS → enhanced networking
network storage on different interface
AWS → EBS-optimized
Not too small
OS caches are shared between containers
>1 Solr nodes per host?
Co-locate with less IO-intensive apps?
Not too big
Host failure will be really bad
Overhead (e.g. memory allocation)
Big vs small hosts
Many small Solr nodes ⇒ bigger cluster state, # of shards
Full text search is usually bound by IO latency
Facets are usually parallelized between shards/collections
Size usually limited by heap (can’t be too big due to GC)
or by recovery time
bigger = better
Big vs small containers/nodes
More data → more heap (terms, docValues, norms…)
Caches (generally, fieldValueCache is evil, use docValues)
Transient memory (serving requests)
→ add 50-100% headroom
Make sure to leave enough room for OS caches
How much heap?
@32GB → no more compressed object pointers
Depending on OS, >30GB → still compressed, but not 0-based → more CPU
Uncompressed pointers’ overhead varies on use-case, 5-10% is a good
Larger heaps → GC is a bigger problem
The 32GB heap problem
Defaults → should be good up to 30GB
Larger heaps need tuning for latency
100GB+ per node is doable.
CMS: NewRatio, SurvivorRatio, CMSInitiatingOccupancyFraction
G1 trades heap for latency and throughput:
■ Adaptive sizing depending on MaxGCPauseMillis
■ Compacts old gen (check G1HeapRegionSize)
More useful info: https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
typical cluster killer (timeouts)
old: ConcGCThreads + G1ConcRefinementThreads
merges*: maxThreadCount & maxMergeCount
* also account for IO throughput&latency
<Java 9 defaults depend on host’s #CPUs
N nodes per host ⇒ threads
Memory: more than heap, but won’t include OS caches
Single NUMA node? --cpu-shares
Multiple NUMA nodes? --cpuset*
vm.zone_reclaim_mode to store caches only on local node?
* Docker isn’t NUMA aware: https://github.com/moby/moby/issues/9777
But kernel automatically balances threads by default
Memory leak → OOM killer with a wide range of Java versions*
Similar leaks (growing RSS) → NativeMemoryTracking
Don’t overbook memory + leave room for OS caches
Allocate on startup via AlwaysPreTouch
JVM+Docker+Linux = love. Or not.
Newer kernels and Dockers are usually better
Open files and locked memory limits
Check dmesg and kswapd* CPU usage
Dare I say it:
Try smaller hosts
Try niofs? (if you trash the cache - and TLB - too much)
A bit of swap? (swappiness is configurable per container, too)
Play with mmap arenas and THP
* kernel’s (single-threaded) GC: https://linux-mm.org/PageOutKswapd
e.g. 4.4+ and 1.13+
More on that love
Dynamic allocation of resources (works well for bigger boxes)
Might actually deliver the promise of dev=testing=prod, because
Pets → cattle requires good sizing, config, scaling practices
Ecosystem is still young → exciting bugs
Docker is the future!