Scaling Search Clusters with Apache Solr & Kubernetes

Scaling Search Clusters
with Apache Solr &
Kubernetes
Amrit Sarkar
Engineer, Cloud Operations
Lucidworks Inc

3
SOLR
AUTOSCALING
LUCENE
HIGHLIGHTING
TEXT
SEARCH
CDCR
DISTRIBUTED
GROUPING &
FACETS
LEARNING
TO RANK
STREAMING
PLENTY MORE..

4
referred from: https://cloud.google.com/kubernetes-engine/kubernetes-comic/

5
Kubernetes (K8s)
System for running and coordinating containerized applications across a cluster of machines
referred from: https://cloud.google.com/kubernetes-engine/kubernetes-comic/

6
Run Solr Cloud within Kubernetes
Load Balancer / Ingress
(external IPs / custom route names)
Headless Service
for internal
communication
Service
IP is exposed
solr-0 solr-1 solr-2
…more solr pods
Service
IP is exposed
Headless Service
for internal
communication
zk-0 zk-1 zk-2
pv-0 pv-1 pv-2
pv-0 pv-1 pv-2
Stateful Set Stateful Set
HTTP Traffic
pv
persistent volume

7
Solr Cloud Clusters Within Kubernetes
Namespace A Namespace B Namespace C
Namespace D Namespace E
Single
Kubernetes
Cluster
Solr Cloud
Cluster
Solr Cloud
Cluster
Solr Cloud
Cluster
Solr Cloud
Cluster
Solr Cloud
Cluster

10
K8s Horizontal Pod Autoscaling
N1 N2 …...
User Traffic
Metrics Monitoring System
Custom Metric API
( Queries Per Sec )
External Layer
Solr Autoscaling
R1 R2 …...
Scaling Solr Clusters w/o Manual Intervention

11
Monitoring with Prometheus & Grafana
Prometheus helm chart: https://github.com/helm/charts/tree/master/stable/prometheus
apiVersion:
monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: solr
spec:
selector:
matchLabels:
app: solr
● Prometheus Operator:
Uses Custom Resource Definition (CRD), named ServiceMonitor,
to abstract configuration to target.
K8s Services
Monitor
Prometheus AlertManager
Manage
K8s Deployment
Grafana

12
Kubernetes State Metrics

13
Solr Metrics

14
Custom Metrics via Prometheus Adapter
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: solr_requests
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_pod_name:
resource: pod
name:
matches: "^(.*)_requests"
as: "${1}_requests"
metricsQuery: (<<.Series>>{<<.LabelMatchers>>})
● Prometheus Adapter:
Pull custom metrics from Prometheus installation.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: custom-metrics-apiserver
name: custom-metrics-apiserver
spec:
replicas: 1
selector:
matchLabels:
template:
metadata:
labels:
……..
……..

15
Custom Metrics via Prometheus Adapter
● Register custom metrics as external to K8s.
apiVersion: v1
kind: Service
metadata:
spec:
ports:
- port: 443
targetPort: 6443
selector:
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
root$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"
| jq .
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "pods/queries_requests",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/queries_requests,
"singularName": "",
"namespaced": false,

16
Custom Metrics via Commercial Monitoring Applications

17
Stateful Set
Horizontal Pod Autoscaling
(HPA) on metric ‘M’
If metric ‘M’ value > X,
Add more pods
avg metric ‘M’ value < X
Stateful Set
solr-3
until avg metric ‘M’ value < X

18
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: solr-hpa
namespace: solr
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Statefulset
name: solr-statefulset
minReplicas: 3
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: pods/query_requests
targetAverageValue: 2000
root$ kubectl describe hpa
Name: solr-hpa
Namespace: solr
Labels: <none>
Annotations: autoscaling.alpha.kubernetes.io/metrics:
[{"type":"Pods","pods":{"metricName":"pods/query_requests","targetAverageValue":"500"}}]
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"nam
e":"solr-hpa","namespace":"s...
CreationTimestamp: Thu, 14 Aug 2020 11:11:05 -0800
Reference: Statefulset/solr-statefulset
Min replicas: 3
Max replicas: 10
Deployment pods: 0 current / 0 desired
Events: <none>

19
Solr Autoscaling
• Autoscaling in Solr provide balanced and stable cluster for various cluster change events.
• Achieved by satisfying a set of rules and sorting preferences
to select the target of cluster management operations.
S - A
S - B
R - 1
R - 1
R - 2
C
R - 2
Set solr-autoscaling policy and trigger
such that -
Add one replica to each shard of
collection
● if new Node is added
C - collection
S - shard
R - replica

20
Solr Autoscaling
Set solr-autoscaling policy and trigger such that -
Add one replica to each shard of collection
● if new Node is added "node_added_trigger":{
"event":"nodeAdded",
"waitFor":60,
"preferredOperation":"ADDREPLICA",
"actions":[
{
"name":"compute_plan",
"class":"solr.ComputePlanAction"},
{
"name":"execute_plan",
"class":"solr.ExecutePlanAction"}],
}
Trigger Policy
{
"cluster-policy":[
{
"node":"#ANY",
"replica":"<2",
"shard":"#EACH"}
]
}
Cluster Policy

21
Handling Special Cases
● What happens when nodes are removed due to scaling?
Is there a potential of losing data?
○ All replicas of collection gets hosted on scaled nodes.
● Do we need to scale on each collection on the entire cluster?
How to configure scaling on specific collection or set of collections with
common attribute?
● How does the Node Added Trigger behaves when a node gets restarted?
Replicas will be added to each shard of each collection where it was
previously not.

23
References
● Lucidworks Managed Search - https://lucidworks.com/products/managed-search/
● Docker - https://www.docker.com/
● Kubernetes - https://kubernetes.io/
● Run Solr on Kubernetes - https://lucidworks.com/post/running-solr-on-kubernetes-part-1/
● Monitoring Apache Solr Ecosystem on K8s -
https://www.youtube.com/watch?v=UI9mh4gSIsw
● Custom Metrics via Prometheus on K8s -
https://towardsdatascience.com/kubernetes-hpa-with-custom-metrics-from-prometheus-9f
fc201991e
● K8s HPA -https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
● Solr Autoscaling -
https://lucene.apache.org/solr/guide/8_5/solrcloud-autoscaling-policy-preferences.html

Scaling Search Clusters with Apache Solr & Kubernetes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling Search Clusters with Apache Solr & Kubernetes

Similar to Scaling Search Clusters with Apache Solr & Kubernetes (20)

Recently uploaded

Recently uploaded (20)

Scaling Search Clusters with Apache Solr & Kubernetes