SlideShare a Scribd company logo
Elasticsearch
On Kubernetes
Elasticsearch [is] a distributed, multitenant-capable full-text search engine with
an HTTP web interface and schema-free JSON documents [based on Lucene]
(https://en.wikipedia.org/wiki/Elasticsearch)
Elasticsearch at Honestbee
● Used as backend for product search function on Honestbee.com
● Mission critical part of production setup
● Downtime will cause major service disruption
● Stats:
○ Product index: ~3,300,000 documents
○ Query latency: ~30ms
○ Queries per hr: 15-20k
● ES v2.3, 5.3
● Kubernetes v1.5, v1.7
Concepts
● Cluster
○ Collection of nodes that holds entire dataset
● Node
○ Instance of elasticsearch taking part in
indexing, search
○ Will join a cluster by name
○ Single node clusters are possible
● Index, Alias
○ Collection of document that are somewhat
similar (much like NoSQL collections)
● Document:
○ Piece of data, expressed as JSON
● Shard, Replica
○ Subdivision of an index
○ Scalability, HA
○ Each shard is a Lucene index in itself
Cluster
Node
Shard Shard Shard
Node
Shard Shard Shard
Index, Alias, Shard
products_201801
16123456
products
0
1
2
● Horizontal scalability
● # primary shards cannot be changed later!
Nodes, Shards
0 1 2 3
Oops...
0 1 2 3
Replication
0 1
3
2
0 1
3
2
1 Index, 3 shards x 1 replica = 6 shards
Node Roles
● Master (eligible) Node
○ Discovery, shard allocation, etc.
○ Only one active at a time (election)
● Data Node
○ Holds the actual shards
○ Does CRUD, search
● Client Node
○ REST API
○ Aggregation
● Controlled in elasticsearch.yml
● A node can have multiple roles
Client
Client
Client
Data
Data
Data
LB *Master
Master
Master
# elasticsearch.yml
node.master: false
node.data: true
node.ingest: false
search.remote.connect: false
es-masteres-dataes-clients
Kubernetes
Client
Client
Client
Data
Data
Data
*Master
Master
Master
api
(svc)
ing
disc.
(svc)
https://github.com/kubernetes/charts/tree/master/incubator/elasticsearch
Kubernetes
● One deployment per node role
○ Scaling
○ Resources
○ Config
● E.g. 3 masters, >= 3 data nodes, clients as needed
● Discovery plugin* (needs access to kube API, RBAC)
● Services:
○ Discovery
○ API
○ STS (later)
● Optional: Ingress, CM, CronJob, SA, CM
*https://github.com/fabric8io/elasticsearch-cloud-kubernetes
Stateless
0 1 2 3
3 0 1 2
3
2
● No persistent state
● Multiple node failures?
● Cluster upgrades?
Safety Net - Snapshots
● Repository - metadata defining snapshot
storage
● Supported: FS, S3, HDFS, Azure, GCS
● Can be used to restore or replicate cluster
(beware version compat*)
● Works well in with CronJobs (batch/v1beta)
● Snapper: honestbee/snapper
● Window of data loss when indexing in real
time → RPO
● Helm hooks - causes timeout issues
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
Manual Upgrade
0 1
3
2
0 1
3
2
0 1
3
disc.
(svc)
Production
Rollover
StatefulSet (STS)
● Kubernetes approach to stateful applications (i.e. Databases)
● Very similar to a deployment
● But some extra properties:
○ Pods have a defined order
○ Different naming pattern
○ Will be launched and terminated in sequence
○ Etc. (check reference docs)
○ Support for PVC template
es-master
(deploy)
es-data
(sts)
es-clients
(deploy)
Stateful
Client
Client
Client
Data
Data
*Master
Master
Master
api
(svc)
ing
disc.
(svc)
Data
pv
pv
pvhead-
less
StatefulSet and PVCs
Deployment:
● Pods in a deployment are
unrelated to each other
● Identity not maintained across
restarts
● Indiv. Pods can have PVC
● Multiple pods - how to?
● Association PVC to pod when
rescheduled?
StatefulSet:
● Pods are ordered, maintain
identity across restart
● PVCs are ordered
● STS pods ‘remember’ PVs
● volumeClaimTemplates
● Even survives `helm delete
--purge` (by design?)
apiVersion: apps/v1beta1
kind: StatefulSet
# ...
spec:
serviceName: {{ template
"elasticsearch.data-service" . }}
# ...
podManagementPolicy: Parallel # quicker
updateStrategy:
type: RollingUpdate # default: onDelete
template:
# Pod spec, like deployment
Statefulset vs. Deployment
# ...
volumeClaimTemplates:
- metadata:
name: "es-staging-pvc"
labels:
# ...
spec:
accessModes: [ReadWriteOnce]
storageClassName: ”gp2”
resources:
requests:
storage: ”35Gi”
Resource Limits
● Follow ES docs, discussions online, monitoring
● JVM does not regard cgroups properly!*
○ Sees ALL memory of the host, ignores container limits
○ Adjust JVM limits (Xmx, Xms) according to limits for container
○ Otherwise: OOMKilled
● Data nodes:
○ 50% of available memory as Heap
○ The rest for OS and Lucene caches
● Master/client nodes:
○ No Lucene caches
○ ~75% mem as heap, rest for OS
● CPU: track actual usage, set limits so scheduler can make decisions
*https://banzaicloud.com/blog/java-resource-limits/
10.20.0.1
Host Downtime?
data-1
data-0
10.20.0.2
data-2
10.20.0.3
10.20.0.1
Anti Affinity
data-1
data-0
10.20.0.2
data-2
10.20.0.3
Anti Affinity
# ...
metadata:
labels:
app: es-demo-elasticsearch
role: data
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: es-demo-elasticsearch
role: data
Config Tweaks
What Where Why
Cluster Name elasticsearch.yml Discovery is done via service, but important for
monitoring
JVM env Important. Utilize memory properly and avoid
OOMKill
Node name =
$HOSTNAME
elasticsearch.yml Random Marvel characters or UUIDs are tricky
to troubleshoot at 3 am
Node counts, recovery
delay
elasticsearch.yml Avoid triggering recovery when cluster isn’t
ready or for temp. downtime
Monitoring
● We’re using Datadog (no endorsement)
● Pod annotations, kube state metrics
● There are a lot of metrics...
● Kubernetes metrics:
○ Memory usage per pod
○ Memory usage per k8s host
○ CPU usage per pod
○ Healthy k8s hosts (via ELB)
● ES Metrics
○ Cluster state
○ JVM metrics
○ Search queue size
○ Storage size
● ES will test your memory reserves and cluster
autoscaler!
Troubleshooting
● Introspection via API
● _cat APIs
○ Human readable, watchable
○ Health state, index health
○ Shard allocation
○ Recovery jobs
○ Thread pool (search queue size!)
● _cluster/_node APIs
○ Consumed by e.g. Datadog
○ Node stats: JVM state, resource usage
○ Cluster stats
Example: Shard Allocation
$ curl $ES_URL/_cat/shards?v
index shard prirep state docs store ip node
products_20171010034124200 2 r STARTED 100000 1gb 172.23.6.72 es-data-2
products_20171010034124200 2 p STARTED 100000 1gb 172.23.5.110 es-data-1
products_20171010034124200 3 p STARTED 100000 1gb 172.23.6.72 es-data-2
products_20171010034124200 3 r STARTED 100000 1gb 172.23.5.110 es-data-1
products_20171010034124200 4 p STARTED 100000 1gb 172.23.6.72 es-data-2
products_20171010034124200 4 r STARTED 100000 1gb 172.23.8.183 es-data-0
products_20171010034124200 1 p STARTED 100000 1gb 172.23.5.110 es-data-1
products_20171010034124200 1 r STARTED 100000 1gb 172.23.8.183 es-data-0
products_20171010034124200 0 p STARTED 100000 1gb 172.23.5.110 es-data-1
products_20171010034124200 0 r STARTED 100000 1gb 172.23.8.183 es-data-0
Example: JVM heap usage
curl $ES_URL/_nodes/<node_name> | jq '.nodes[].jvm.mem'
{
"heap_init_in_bytes": 1073741824, # 1 GB
"heap_max_in_bytes": 1038876672, # ~1 GB
"non_heap_init_in_bytes": 2555904,
"non_heap_max_in_bytes": 0,
"direct_max_in_bytes": 1038876672
}
Dynamic Settings
● Set cluster wide settings as runtime
● Endpoints:
○ curl $ES_URL/_cluster/settings
○ curl -XPUT $ES_URL/_cluster/settings -d '{"persistent":
{"discovery.zen.minimum_master_nodes" : 2}}
● Transient vs. persistent (not sure that matters in k8s)
● E.g.:
○ Cluster level shard allocation: disable allocation before restarts (lifecycle hooks, helm hooks?)
○ Shard allocation filtering: “cordon off” nodes
Advanced (TODO)
● Shard allocation awareness (host, rack, AZ, …)
● Shard allocation filtering (cordoning off nodes, ...)
Pitfalls: Scripting
● Scripting:
○ Disabled by default
○ Scripts run with same permissions as the
ES cluster
● If you really have to:
○ Prefer sandboxed (mustache, expressions)
○ Use parameterised scripts!
○ Test impact on your cluster carefully, mem,
cpu usage
○ Sanitise input, ensure cluster is not public,
don’t run as root
Elasticsearch Operator
● https://github.com/upmc-enterprises/elasticsearch-operator
● CustomResourceDefinition, higher level abstraction
○ Domain specific configuration
○ Snapshots
○ Certificates
● https://raw.githubusercontent.com/upmc-enterprises/elasticsearch-operator/m
aster/example/example-es-cluster-minikube.yaml
● Demo: https://www.youtube.com/watch?v=3HnV7NfgP6A

More Related Content

What's hot

What's hot (20)

OpenStack Storage Overview
OpenStack Storage OverviewOpenStack Storage Overview
OpenStack Storage Overview
 
Cassandra Summit 2014: Highly Scalable Web Application in the Cloud with Cass...
Cassandra Summit 2014: Highly Scalable Web Application in the Cloud with Cass...Cassandra Summit 2014: Highly Scalable Web Application in the Cloud with Cass...
Cassandra Summit 2014: Highly Scalable Web Application in the Cloud with Cass...
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these project
 
Stateful set in kubernetes implementation & usecases
Stateful set in kubernetes implementation & usecases Stateful set in kubernetes implementation & usecases
Stateful set in kubernetes implementation & usecases
 
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaSEnhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
Enhancing Kubernetes with Autoscaling & Hybrid Cloud IaaS
 
Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)
 
The evolving container landscape
The evolving container landscapeThe evolving container landscape
The evolving container landscape
 
Building Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and DockerBuilding Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and Docker
 
KubeWHAT!?
KubeWHAT!?KubeWHAT!?
KubeWHAT!?
 
Extending Kubernetes
Extending KubernetesExtending Kubernetes
Extending Kubernetes
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container Engine
 
reInvent 2021 Recap and k9s review
reInvent 2021 Recap and k9s reviewreInvent 2021 Recap and k9s review
reInvent 2021 Recap and k9s review
 
Node.js and Containers Go Together Like Peanut Butter and Jelly
Node.js and Containers Go Together Like Peanut Butter and JellyNode.js and Containers Go Together Like Peanut Butter and Jelly
Node.js and Containers Go Together Like Peanut Butter and Jelly
 
Kubernetes scheduling and QoS
Kubernetes scheduling and QoSKubernetes scheduling and QoS
Kubernetes scheduling and QoS
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)
 
Introduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang NguyenIntroduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang Nguyen
 
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
 
Kubernetes & Google Kubernetes Engine (GKE)
Kubernetes & Google Kubernetes Engine (GKE)Kubernetes & Google Kubernetes Engine (GKE)
Kubernetes & Google Kubernetes Engine (GKE)
 
Apache jclouds and Docker
Apache jclouds and DockerApache jclouds and Docker
Apache jclouds and Docker
 
Google container engine (GKE)
Google container engine (GKE)Google container engine (GKE)
Google container engine (GKE)
 

Similar to Elasticsearch on Kubernetes

Similar to Elasticsearch on Kubernetes (20)

Varnish - PLNOG 4
Varnish - PLNOG 4Varnish - PLNOG 4
Varnish - PLNOG 4
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Socket programming, and openresty
Socket programming, and openrestySocket programming, and openresty
Socket programming, and openresty
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 

Elasticsearch on Kubernetes

  • 2. Elasticsearch [is] a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents [based on Lucene] (https://en.wikipedia.org/wiki/Elasticsearch)
  • 3. Elasticsearch at Honestbee ● Used as backend for product search function on Honestbee.com ● Mission critical part of production setup ● Downtime will cause major service disruption ● Stats: ○ Product index: ~3,300,000 documents ○ Query latency: ~30ms ○ Queries per hr: 15-20k ● ES v2.3, 5.3 ● Kubernetes v1.5, v1.7
  • 4. Concepts ● Cluster ○ Collection of nodes that holds entire dataset ● Node ○ Instance of elasticsearch taking part in indexing, search ○ Will join a cluster by name ○ Single node clusters are possible ● Index, Alias ○ Collection of document that are somewhat similar (much like NoSQL collections) ● Document: ○ Piece of data, expressed as JSON ● Shard, Replica ○ Subdivision of an index ○ Scalability, HA ○ Each shard is a Lucene index in itself Cluster Node Shard Shard Shard Node Shard Shard Shard
  • 5. Index, Alias, Shard products_201801 16123456 products 0 1 2 ● Horizontal scalability ● # primary shards cannot be changed later!
  • 8. Replication 0 1 3 2 0 1 3 2 1 Index, 3 shards x 1 replica = 6 shards
  • 9. Node Roles ● Master (eligible) Node ○ Discovery, shard allocation, etc. ○ Only one active at a time (election) ● Data Node ○ Holds the actual shards ○ Does CRUD, search ● Client Node ○ REST API ○ Aggregation ● Controlled in elasticsearch.yml ● A node can have multiple roles Client Client Client Data Data Data LB *Master Master Master
  • 10. # elasticsearch.yml node.master: false node.data: true node.ingest: false search.remote.connect: false
  • 12. Kubernetes ● One deployment per node role ○ Scaling ○ Resources ○ Config ● E.g. 3 masters, >= 3 data nodes, clients as needed ● Discovery plugin* (needs access to kube API, RBAC) ● Services: ○ Discovery ○ API ○ STS (later) ● Optional: Ingress, CM, CronJob, SA, CM *https://github.com/fabric8io/elasticsearch-cloud-kubernetes
  • 13. Stateless 0 1 2 3 3 0 1 2 3 2 ● No persistent state ● Multiple node failures? ● Cluster upgrades?
  • 14.
  • 15. Safety Net - Snapshots ● Repository - metadata defining snapshot storage ● Supported: FS, S3, HDFS, Azure, GCS ● Can be used to restore or replicate cluster (beware version compat*) ● Works well in with CronJobs (batch/v1beta) ● Snapper: honestbee/snapper ● Window of data loss when indexing in real time → RPO ● Helm hooks - causes timeout issues https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
  • 16. Manual Upgrade 0 1 3 2 0 1 3 2 0 1 3 disc. (svc) Production Rollover
  • 17. StatefulSet (STS) ● Kubernetes approach to stateful applications (i.e. Databases) ● Very similar to a deployment ● But some extra properties: ○ Pods have a defined order ○ Different naming pattern ○ Will be launched and terminated in sequence ○ Etc. (check reference docs) ○ Support for PVC template
  • 19. StatefulSet and PVCs Deployment: ● Pods in a deployment are unrelated to each other ● Identity not maintained across restarts ● Indiv. Pods can have PVC ● Multiple pods - how to? ● Association PVC to pod when rescheduled? StatefulSet: ● Pods are ordered, maintain identity across restart ● PVCs are ordered ● STS pods ‘remember’ PVs ● volumeClaimTemplates ● Even survives `helm delete --purge` (by design?)
  • 20. apiVersion: apps/v1beta1 kind: StatefulSet # ... spec: serviceName: {{ template "elasticsearch.data-service" . }} # ... podManagementPolicy: Parallel # quicker updateStrategy: type: RollingUpdate # default: onDelete template: # Pod spec, like deployment Statefulset vs. Deployment # ... volumeClaimTemplates: - metadata: name: "es-staging-pvc" labels: # ... spec: accessModes: [ReadWriteOnce] storageClassName: ”gp2” resources: requests: storage: ”35Gi”
  • 21. Resource Limits ● Follow ES docs, discussions online, monitoring ● JVM does not regard cgroups properly!* ○ Sees ALL memory of the host, ignores container limits ○ Adjust JVM limits (Xmx, Xms) according to limits for container ○ Otherwise: OOMKilled ● Data nodes: ○ 50% of available memory as Heap ○ The rest for OS and Lucene caches ● Master/client nodes: ○ No Lucene caches ○ ~75% mem as heap, rest for OS ● CPU: track actual usage, set limits so scheduler can make decisions *https://banzaicloud.com/blog/java-resource-limits/
  • 24. Anti Affinity # ... metadata: labels: app: es-demo-elasticsearch role: data spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchLabels: app: es-demo-elasticsearch role: data
  • 25. Config Tweaks What Where Why Cluster Name elasticsearch.yml Discovery is done via service, but important for monitoring JVM env Important. Utilize memory properly and avoid OOMKill Node name = $HOSTNAME elasticsearch.yml Random Marvel characters or UUIDs are tricky to troubleshoot at 3 am Node counts, recovery delay elasticsearch.yml Avoid triggering recovery when cluster isn’t ready or for temp. downtime
  • 26. Monitoring ● We’re using Datadog (no endorsement) ● Pod annotations, kube state metrics ● There are a lot of metrics... ● Kubernetes metrics: ○ Memory usage per pod ○ Memory usage per k8s host ○ CPU usage per pod ○ Healthy k8s hosts (via ELB) ● ES Metrics ○ Cluster state ○ JVM metrics ○ Search queue size ○ Storage size ● ES will test your memory reserves and cluster autoscaler!
  • 27. Troubleshooting ● Introspection via API ● _cat APIs ○ Human readable, watchable ○ Health state, index health ○ Shard allocation ○ Recovery jobs ○ Thread pool (search queue size!) ● _cluster/_node APIs ○ Consumed by e.g. Datadog ○ Node stats: JVM state, resource usage ○ Cluster stats
  • 28. Example: Shard Allocation $ curl $ES_URL/_cat/shards?v index shard prirep state docs store ip node products_20171010034124200 2 r STARTED 100000 1gb 172.23.6.72 es-data-2 products_20171010034124200 2 p STARTED 100000 1gb 172.23.5.110 es-data-1 products_20171010034124200 3 p STARTED 100000 1gb 172.23.6.72 es-data-2 products_20171010034124200 3 r STARTED 100000 1gb 172.23.5.110 es-data-1 products_20171010034124200 4 p STARTED 100000 1gb 172.23.6.72 es-data-2 products_20171010034124200 4 r STARTED 100000 1gb 172.23.8.183 es-data-0 products_20171010034124200 1 p STARTED 100000 1gb 172.23.5.110 es-data-1 products_20171010034124200 1 r STARTED 100000 1gb 172.23.8.183 es-data-0 products_20171010034124200 0 p STARTED 100000 1gb 172.23.5.110 es-data-1 products_20171010034124200 0 r STARTED 100000 1gb 172.23.8.183 es-data-0
  • 29. Example: JVM heap usage curl $ES_URL/_nodes/<node_name> | jq '.nodes[].jvm.mem' { "heap_init_in_bytes": 1073741824, # 1 GB "heap_max_in_bytes": 1038876672, # ~1 GB "non_heap_init_in_bytes": 2555904, "non_heap_max_in_bytes": 0, "direct_max_in_bytes": 1038876672 }
  • 30. Dynamic Settings ● Set cluster wide settings as runtime ● Endpoints: ○ curl $ES_URL/_cluster/settings ○ curl -XPUT $ES_URL/_cluster/settings -d '{"persistent": {"discovery.zen.minimum_master_nodes" : 2}} ● Transient vs. persistent (not sure that matters in k8s) ● E.g.: ○ Cluster level shard allocation: disable allocation before restarts (lifecycle hooks, helm hooks?) ○ Shard allocation filtering: “cordon off” nodes
  • 31. Advanced (TODO) ● Shard allocation awareness (host, rack, AZ, …) ● Shard allocation filtering (cordoning off nodes, ...)
  • 32. Pitfalls: Scripting ● Scripting: ○ Disabled by default ○ Scripts run with same permissions as the ES cluster ● If you really have to: ○ Prefer sandboxed (mustache, expressions) ○ Use parameterised scripts! ○ Test impact on your cluster carefully, mem, cpu usage ○ Sanitise input, ensure cluster is not public, don’t run as root
  • 33. Elasticsearch Operator ● https://github.com/upmc-enterprises/elasticsearch-operator ● CustomResourceDefinition, higher level abstraction ○ Domain specific configuration ○ Snapshots ○ Certificates ● https://raw.githubusercontent.com/upmc-enterprises/elasticsearch-operator/m aster/example/example-es-cluster-minikube.yaml ● Demo: https://www.youtube.com/watch?v=3HnV7NfgP6A