SlideShare a Scribd company logo
1 of 45
Download to read offline
ElasticSearch On
Compute Engine
Best practices for
Elasticsearch in GCE
12+ Year Cloud Journey with Google
Who am I?
Searce – Bangalore
Linkedin.com/rbhuvanesh
Twitter.com/@BhuviTheDataGuy
Medium.com/@BhuviTheDataGuy
https://TheDataGuy.in
Bhuvanesh
Database Architect
Searce
Agenda
Short into about GCE
ElasticSearch Terms
Capacity Planning & Architecture
Best Practices for Production Grade ES Cluster
Compute Engine
Compute Engine delivers configurable virtual machines running in Google’s data centers with access to high-performance
networking infrastructure and block storage.
Live migration for VMs
Compute Engine virtual machines can live-migrate between host systems without
rebooting, which keeps your applications running even when host systems require
maintenance.
Preemptible VMs
Run batch jobs and fault-tolerant workloads on preemptible VMs to reduce your vCPU
and memory costs by up to 80% while still getting the same performance and capabilities
as regular VMs.
Sole-tenant nodes
Sole-tenant nodes are physical Compute Engine servers dedicated exclusively for your
use. Sole-tenant nodes simplify deployment for bring your own license (BYOL)
applications. Sole-tenant nodes give you access to the same machine types and VM
configuration options as regular compute instances.
What is Elastic Search?
• First release 2010
• Open Source search and analytical engine
• Elasticsearch is the central component of the Elastic Stack
• Distributed processing
• Works with all types of data (textual, numerical, geospatial, structured, and unstructured)
• Powerful REST API
• And everything is indexed
Where can we use
Elastic Search?
Where can we use ES?
Use cases
• Logging and log analytics
• Infrastructure metrics and container monitoring
• Application performance monitoring
• Geospatial data analysis and visualization
• Security analytics
• Enterprise search
• Website search
• And more….
Elastic Stack
ES Terms
Master Node:
• Master Node controls the Cluster.
• Responsible for maintaining the metadata about the cluster.
• Decide where to move the data and relocating the data.
• We can have multiple nodes for Master role.
• But Elasticsearch will select any one of the node as an elastic master.
• In the event of failure, a new elastic master will be selected from the available nodes.
ES Terms
Data Node
• All of your is stored here.
• Responsible for managing the stored data.
• Perform the operations when it queried.
Ingest Node
• Pre-process’s documents before the actual document indexing.
• The ingest node intercepts bulk and index requests, applies transformations, and it then passes the
documents back to the index or bulk APIs.
ES Index
ES Index
Design For Failure
Capacity Planning
Memory
Elastic Search will use the memory in 2 ways.
1. Java Heap
2. Other processes
“More memory – More time on
Garbage collection”
CPU
Don’t choose the CPU
core based on some
random calculations.
DISK
• Standard Persistent Disk
• SSD Persistent Disk
• Local SSD
1 GB SSD disk = 30iops
Disk Cont…
Disk Cont…
Disk Cont…
Disk Cont…
Disk Cont…
Network
From GCP Docs,
The egress traffic from a given VM instance is subject to maximum network egress throughput caps. These
caps are dependent on the number of vCPUs that the VM instance has. Each vCPU is subject to a 2 Gbps
cap for peak performance. Each additional vCPU increases the network cap, up to a theoretical maximum of
32 Gbps for each instance. The actual performance you experience will vary depending on your workload.
All caps are meant as maximum possible performance, and not sustained performance.
How to identify the right VM size?
1. Simulate your workload and do the load test.
2. Or use Rally(https://github.com/elastic/rally)
Swapping
• Memory based operations are super fast. But we can’t give a tons of memory to the server.
• The OS will swap out the unused applications memory.
• That’s bad for the performance.
Prevent Swapping
1. From OS Level(temporarily) - sudo swapoff –a
2. Configure swappiness from the Kernal - vm.swappiness=1
3. Enable bootstrap-memory_lock - bootstrap.memory_lock: true
JVM Heap
• By default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB.
• When moving to production, it is important to configure heap size to ensure that Elasticsearch has
enough heap available.
• Set the Heap size <50% of your total Memory
“The more heap available to Elasticsearch, the more memory it can use for its
internal caches, larger heaps can cause longer garbage collection pauses” –
From Elastic
ulimit
Ulimit is the number of open file descriptors per process.
vi /etc/security/limits.conf
elasticsearch - nofile 65535
--For Ubuntu
vi /etc/pam.d/su
session required pam_limits.so
--For systemd
vi /usr/lib/systemd/system/elasticsearch.service
LimitMEMLOCK=infinity
sudo systemctl daemon-reload
MMAP
Elasticsearch uses a mmapfs directory by default to store its indices
sysctl -w vm.max_map_count=262144
/etc/sysctl.conf
vm.max_map_count = 262144
Some common questions while
setting up the Elastic
Search Cluster
CPU Platform
Operating System & File System
• Windows
• Debian
• Ubuntu
• CentOS
• RedHat
• Windows - NTFS
• Linux – Ext4 (if you have less than 1TB Data), XFS for >1TB data
Some parameters for a generic workload
indices.memory.index_buffer_size: 40%
indices.query.cache.enabled: false
thread_pool.bulk.queue_size: 3000
thread_pool.index.queue_size: 3000
store.throttle.type: 'none'
index.refresh_interval: "1m"
SSD vs Local SSD
Persistent SSD Local SSD
Local SSD
• Max size of one Local SSD disk = 375 GB
• You can add up to 8 Local SSD/Instance (3TB)
• You can’t reboot/stop the VM
• In case of the maintenance – Replace the node
How many nodes
• Master – 3 nodes
• Ingest – 2 nodes
• Data – 2-3 nodes (for a fresh setup)
Rally for the benchmark tests
What is Rally?
You want to benchmark Elasticsearch? Then Rally is for you. It can help you with the following tasks:
• Setup and teardown of an Elasticsearch cluster for benchmarking
• Management of benchmark data and specifications even across Elasticsearch versions
• Running benchmarks and recording results
• Finding performance problems by attaching so-called telemetry devices
• Comparing performance results
pip3 install esrally
How to run the esrally
esrally --track=nyc_taxis 
--target-hosts=10.20.4.157:9200 
--pipeline=benchmark-only 
--challenge=append-no-conflicts-index-only 
--on-error=continue 
--report-format=markdown 
--report-file=/opt/report.md
Esrally cont…
Some use cases with ES
Use cases cont…
Thank YouThank You

More Related Content

What's hot

DockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDocker, Inc.
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writesInstaclustr
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...Equnix Business Solutions
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkBen Slater
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksLucidworks
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Spark Summit
 
Configuring MongoDB HA Replica Set on AWS EC2
Configuring MongoDB HA Replica Set on AWS EC2Configuring MongoDB HA Replica Set on AWS EC2
Configuring MongoDB HA Replica Set on AWS EC2ShepHertz
 
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.comCross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.comErtuğ Karamatlı
 
Bcache and Aerospike
Bcache and AerospikeBcache and Aerospike
Bcache and AerospikeAnshu Prateek
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
Azure Recovery Services
Azure Recovery ServicesAzure Recovery Services
Azure Recovery ServicesPavel Revenkov
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...DevOpsDays Tel Aviv
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load TestingMike Harnish
 

What's hot (19)

DockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and Containerization
 
re:dash is awesome
re:dash is awesomere:dash is awesome
re:dash is awesome
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Configuring MongoDB HA Replica Set on AWS EC2
Configuring MongoDB HA Replica Set on AWS EC2Configuring MongoDB HA Replica Set on AWS EC2
Configuring MongoDB HA Replica Set on AWS EC2
 
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.comCross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
 
Bcache and Aerospike
Bcache and AerospikeBcache and Aerospike
Bcache and Aerospike
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Azure Recovery Services
Azure Recovery ServicesAzure Recovery Services
Azure Recovery Services
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 

Similar to Optimizing elastic search on google compute engine

Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAmazon Web Services
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
CMP301_Deep Dive on Amazon EC2 Instances
CMP301_Deep Dive on Amazon EC2 InstancesCMP301_Deep Dive on Amazon EC2 Instances
CMP301_Deep Dive on Amazon EC2 InstancesAmazon Web Services
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Amazon Web Services
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 

Similar to Optimizing elastic search on google compute engine (20)

Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
CMP301_Deep Dive on Amazon EC2 Instances
CMP301_Deep Dive on Amazon EC2 InstancesCMP301_Deep Dive on Amazon EC2 Instances
CMP301_Deep Dive on Amazon EC2 Instances
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 

Recently uploaded

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

Optimizing elastic search on google compute engine

  • 3. 12+ Year Cloud Journey with Google
  • 4.
  • 5. Who am I? Searce – Bangalore Linkedin.com/rbhuvanesh Twitter.com/@BhuviTheDataGuy Medium.com/@BhuviTheDataGuy https://TheDataGuy.in Bhuvanesh Database Architect Searce
  • 6. Agenda Short into about GCE ElasticSearch Terms Capacity Planning & Architecture Best Practices for Production Grade ES Cluster
  • 7. Compute Engine Compute Engine delivers configurable virtual machines running in Google’s data centers with access to high-performance networking infrastructure and block storage. Live migration for VMs Compute Engine virtual machines can live-migrate between host systems without rebooting, which keeps your applications running even when host systems require maintenance. Preemptible VMs Run batch jobs and fault-tolerant workloads on preemptible VMs to reduce your vCPU and memory costs by up to 80% while still getting the same performance and capabilities as regular VMs. Sole-tenant nodes Sole-tenant nodes are physical Compute Engine servers dedicated exclusively for your use. Sole-tenant nodes simplify deployment for bring your own license (BYOL) applications. Sole-tenant nodes give you access to the same machine types and VM configuration options as regular compute instances.
  • 8. What is Elastic Search? • First release 2010 • Open Source search and analytical engine • Elasticsearch is the central component of the Elastic Stack • Distributed processing • Works with all types of data (textual, numerical, geospatial, structured, and unstructured) • Powerful REST API • And everything is indexed
  • 9. Where can we use Elastic Search?
  • 10. Where can we use ES?
  • 11. Use cases • Logging and log analytics • Infrastructure metrics and container monitoring • Application performance monitoring • Geospatial data analysis and visualization • Security analytics • Enterprise search • Website search • And more….
  • 13. ES Terms Master Node: • Master Node controls the Cluster. • Responsible for maintaining the metadata about the cluster. • Decide where to move the data and relocating the data. • We can have multiple nodes for Master role. • But Elasticsearch will select any one of the node as an elastic master. • In the event of failure, a new elastic master will be selected from the available nodes.
  • 14. ES Terms Data Node • All of your is stored here. • Responsible for managing the stored data. • Perform the operations when it queried. Ingest Node • Pre-process’s documents before the actual document indexing. • The ingest node intercepts bulk and index requests, applies transformations, and it then passes the documents back to the index or bulk APIs.
  • 19. Memory Elastic Search will use the memory in 2 ways. 1. Java Heap 2. Other processes “More memory – More time on Garbage collection”
  • 20. CPU Don’t choose the CPU core based on some random calculations.
  • 21. DISK • Standard Persistent Disk • SSD Persistent Disk • Local SSD 1 GB SSD disk = 30iops
  • 27. Network From GCP Docs, The egress traffic from a given VM instance is subject to maximum network egress throughput caps. These caps are dependent on the number of vCPUs that the VM instance has. Each vCPU is subject to a 2 Gbps cap for peak performance. Each additional vCPU increases the network cap, up to a theoretical maximum of 32 Gbps for each instance. The actual performance you experience will vary depending on your workload. All caps are meant as maximum possible performance, and not sustained performance.
  • 28. How to identify the right VM size? 1. Simulate your workload and do the load test. 2. Or use Rally(https://github.com/elastic/rally)
  • 29. Swapping • Memory based operations are super fast. But we can’t give a tons of memory to the server. • The OS will swap out the unused applications memory. • That’s bad for the performance. Prevent Swapping 1. From OS Level(temporarily) - sudo swapoff –a 2. Configure swappiness from the Kernal - vm.swappiness=1 3. Enable bootstrap-memory_lock - bootstrap.memory_lock: true
  • 30. JVM Heap • By default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB. • When moving to production, it is important to configure heap size to ensure that Elasticsearch has enough heap available. • Set the Heap size <50% of your total Memory “The more heap available to Elasticsearch, the more memory it can use for its internal caches, larger heaps can cause longer garbage collection pauses” – From Elastic
  • 31. ulimit Ulimit is the number of open file descriptors per process. vi /etc/security/limits.conf elasticsearch - nofile 65535 --For Ubuntu vi /etc/pam.d/su session required pam_limits.so --For systemd vi /usr/lib/systemd/system/elasticsearch.service LimitMEMLOCK=infinity sudo systemctl daemon-reload
  • 32. MMAP Elasticsearch uses a mmapfs directory by default to store its indices sysctl -w vm.max_map_count=262144 /etc/sysctl.conf vm.max_map_count = 262144
  • 33. Some common questions while setting up the Elastic Search Cluster
  • 35. Operating System & File System • Windows • Debian • Ubuntu • CentOS • RedHat • Windows - NTFS • Linux – Ext4 (if you have less than 1TB Data), XFS for >1TB data
  • 36. Some parameters for a generic workload indices.memory.index_buffer_size: 40% indices.query.cache.enabled: false thread_pool.bulk.queue_size: 3000 thread_pool.index.queue_size: 3000 store.throttle.type: 'none' index.refresh_interval: "1m"
  • 37. SSD vs Local SSD Persistent SSD Local SSD
  • 38. Local SSD • Max size of one Local SSD disk = 375 GB • You can add up to 8 Local SSD/Instance (3TB) • You can’t reboot/stop the VM • In case of the maintenance – Replace the node
  • 39. How many nodes • Master – 3 nodes • Ingest – 2 nodes • Data – 2-3 nodes (for a fresh setup)
  • 40. Rally for the benchmark tests What is Rally? You want to benchmark Elasticsearch? Then Rally is for you. It can help you with the following tasks: • Setup and teardown of an Elasticsearch cluster for benchmarking • Management of benchmark data and specifications even across Elasticsearch versions • Running benchmarks and recording results • Finding performance problems by attaching so-called telemetry devices • Comparing performance results pip3 install esrally
  • 41. How to run the esrally esrally --track=nyc_taxis --target-hosts=10.20.4.157:9200 --pipeline=benchmark-only --challenge=append-no-conflicts-index-only --on-error=continue --report-format=markdown --report-file=/opt/report.md
  • 43. Some use cases with ES