Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Solr on Docker - the Good, the Bad and the Ugly


Published on

This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.

Published in: Technology
  • Be the first to comment

Solr on Docker - the Good, the Bad and the Ugly

  1. 1. Solr on Docker - the Good, the Bad and the Ugly Radu Gheorghe Sematext Group, Inc.
  2. 2. 01 Agenda The Good (well, arguably). Why containers? Orchestration, configuration drift... The Bad (actually, not so bad). How to do it? Hardware, heap size, shards... The Ugly (and exciting). Why is it slow/crashing? Container limits, GC&OS settings
  3. 3. 01 Clients Sematext Cloud logs metrics ... Our own dockerizing (dockerization?)
  4. 4. 01 Because Docker is the future!
  5. 5. 01 * * you’re not tied to the provider’s autoscaling * you may get better deals with huge VMs Orchestration
  6. 6. 01 Demo: Kubernetes
  7. 7. 01 dev=test=prod; infrastructure as code. Sounds familiar? But: ○ light images ○ faster start&stop ○ hype ⇒ community Efficiency (overhead vs isolation): (processes + VMs)/2 = containers More on “the Good” of containerization
  8. 8. 01 Zookeeper on separate hosts nodes Avoid hotspots: Equal nodes per host Equal shards per node (per collection) podAntiAffinity on k8s Moving on to “how”
  9. 9. 01 Overshard*. A bit. time logs1 logs2 logs3 *Moving shards creates load ⇒ be aware of spikes Time series? Size-based indices On scaling
  10. 10. 01 volumes/StatefulSet for persistence local > network (esp. for full-text search) permissions latency (mostly to Zookeeper) AWS → enhanced networking network storage on different interface AWS → EBS-optimized
  11. 11. 01 Not too small OS caches are shared between containers ⇩ >1 Solr nodes per host? Co-locate with less IO-intensive apps? Not too big Host failure will be really bad Overhead (e.g. memory allocation) Big vs small hosts
  12. 12. 01 Many small Solr nodes ⇒ bigger cluster state, # of shards Multithreaded indexing Full text search is usually bound by IO latency Facets are usually parallelized between shards/collections Size usually limited by heap (can’t be too big due to GC) or by recovery time bigger = better Big vs small containers/nodes
  13. 13. 01 More data → more heap (terms, docValues, norms…) Caches (generally, fieldValueCache is evil, use docValues) Transient memory (serving requests) → add 50-100% headroom Make sure to leave enough room for OS caches How much heap?
  14. 14. 01 @32GB → no more compressed object pointers Depending on OS, >30GB → still compressed, but not 0-based → more CPU Uncompressed pointers’ overhead varies on use-case, 5-10% is a good Larger heaps → GC is a bigger problem The 32GB heap problem
  15. 15. 01 Defaults → should be good up to 30GB Larger heaps need tuning for latency 100GB+ per node is doable. CMS: NewRatio, SurvivorRatio, CMSInitiatingOccupancyFraction G1 trades heap for latency and throughput: ■ Adaptive sizing depending on MaxGCPauseMillis ■ Compacts old gen (check G1HeapRegionSize) More useful info: usually jump to 45GB+ typical cluster killer (timeouts) GC Settings
  16. 16. 01 GC-related young: ParallelGCThreads old: ConcGCThreads + G1ConcRefinementThreads facet.threads merges*: maxThreadCount & maxMergeCount * also account for IO throughput&latency <Java 9 defaults depend on host’s #CPUs N nodes per host ⇒ threads
  17. 17. 01 Memory: more than heap, but won’t include OS caches CPU Single NUMA node? --cpu-shares Multiple NUMA nodes? --cpuset* vm.zone_reclaim_mode to store caches only on local node? * Docker isn’t NUMA aware: But kernel automatically balances threads by default Container limits
  18. 18. 01 Memory leak → OOM killer with a wide range of Java versions* What helps: Similar leaks (growing RSS) → NativeMemoryTracking Don’t overbook memory + leave room for OS caches Allocate on startup via AlwaysPreTouch Increase vm.min_free_kbytes? * JVM+Docker+Linux = love. Or not.
  19. 19. Newer kernels and Dockers are usually better Open files and locked memory limits Check dmesg and kswapd* CPU usage Dare I say it: Try smaller hosts Try niofs? (if you trash the cache - and TLB - too much) A bit of swap? (swappiness is configurable per container, too) Play with mmap arenas and THP 01 * kernel’s (single-threaded) GC: e.g. 4.4+ and 1.13+ More on that love
  20. 20. 01 The Good: Orchestration Dynamic allocation of resources (works well for bigger boxes) Might actually deliver the promise of dev=testing=prod, because The Bad: Pets → cattle requires good sizing, config, scaling practices The Ugly: Ecosystem is still young → exciting bugs Docker is the future! Summary
  21. 21. Thank You! And please check out: Solr&Kubernetes cheatsheets: Openings: @sematext @radu0gheorgheOur booth :)