Elasticsearch

Knowing Elasticsearch
By, Shagun Rathore

So, what is the ELK Stack?
● "ELK" is the acronym for three open source projects:
Elasticsearch, Logstash, and Kibana.
● Elasticsearch is a search and analytics engine.
● Logstash is a server‑side data processing pipeline that
ingests data from multiple sources simultaneously,
transforms it, and then sends it to a "stash" like
Elasticsearch.
● Kibana lets users visualize data with charts and graphs
in Elasticsearch.

ELK vs Elastic stack
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.

Elasticsearch
● Elasticsearch is a distributed, open-source, RESTful,
Highly Scalable search and analytics engine based on the
Apache Lucene Library and works for all types of data,
including textual, numerical, geospatial, structured, and
unstructured.
● It is the Heart of elastic stack.
● Elasticsearch is an open source developed in Java and
used by many big organizations around the world.
● It is licensed under the Apache license version 2.0.

What is Elasticsearch used for?
● Application search
● Website search
● Enterprise search
● Logging and log analytics
● Infrastructure metrics and container monitoring
● Application performance monitoring
● Geospatial data analysis and visualization
● Security analytics
● Business analytics

How does Elasticsearch work?
Raw data flows into Elasticsearch from a variety of sources,
including logs, system metrics, and web applications.
Data ingestion is the process by which this raw data is
parsed, normalized, and enriched before it is indexed in
Elasticsearch.
Once indexed in Elasticsearch, users can run complex queries
against their data and use aggregations to retrieve complex
summaries of their data.
From Kibana, users can create powerful visualizations of
their data, share dashboards, and manage the Elastic Stack.

Document
It is a collection of fields in a specific manner defined in
JSON format.
Every document belongs to a type and resides inside an
index.
Every document is associated with a unique identifier called
the UID.

What is an Elasticsearch index?
An Elasticsearch index is a collection of documents that are related to each
other. Elasticsearch stores data as JSON documents. Each document correlates a
set of keys (names of fields or properties) with their corresponding values
(strings, numbers, Booleans, dates, arrays of values, geolocations, or other
types of data).
Elasticsearch uses a data structure called an inverted index, which is
designed to allow very fast full-text searches. An inverted index lists every
unique word that appears in any document and identifies all of the documents
each word occurs in.
During the indexing process, Elasticsearch stores documents and builds an
inverted index to make the document data searchable in near real-time.
Indexing is initiated with the index API, through which you can add or update
a JSON document in a specific index.

Shards
Indexes are horizontally subdivided into shards.
This means each shard contains all the properties of
document but contains less number of JSON objects than
index.
The horizontal separation makes shard an independent node,
which can be store in any node. Primary shard is the
original horizontal part of an index and then these primary
shards are replicated into replica shards.

Shards
● Shard is like a partition(piece) of an Index.
● Shard splits the index horizontally.
● You can define the number of shards in an index at the time of Index
creation.
● The main shard which is used for write is called as Primary shard.
● In Elasticsearch, replication is done with the help of Replica shards.

Replicas
● Elasticsearch allows a user to create replicas of their indexes and
shards. Replication not only helps in increasing the availability of data
in case of failure, but also improves the performance of searching by
carrying out a parallel search operation in these replicas.
● Replica contains the same data as its primary shards.
● The replicas are never allocated to the same node as the primary shard.
● Allows for fault tolerance.
● Scales search throughput.

Node
● A single server in a cluster called Node.
● A node has a unique name in the cluster.

Cluster
● It is a collection of one or more servers.
● It allows searching and indexing across all nodes in
the cluster.
● One node is one Lucene instance.
● Every cluster is identified by its UNIQUE name. (This
is Important for multi-cluster setup)

Cluster Status
Your cluster will be either of 3 stats of cluster depends on primary and
replica shards.
● Green, when all the primary, as well as replica shards, are allocated.
● Yellow, when all the primary shards are allocated where one or more
replica shards are unallocated
● Red, when one or more primary shards are unallocated.

Comparison between Elasticsearch and RDBMS

Node Types
Master Eligible Node (Default: True)
It is responsible for all the master cluster management, operations like create, update, delete,
read as well as tracking of all the clusters and shard allocation.
Data Node (Default: True)
Data nodes contain the shards. Index, Delete, Search and other operations are performed on data
nodes.
Ingest Node (Default: True)
Preprocessing of the data is done by the index node. (Logstash)

Node Types
Coordinating Only Node (Default: false)
Coordinating only nodes acts as a smart load balancer that routes the requests to the
nodes.
It also handles search reduction.
Distributes bulk indexing.
Machine Learning Node
It is a feature of X-pack which is not free.
In this node, you can run machine learning jobs and API requests.

What programming languages does Elasticsearch support?
Elasticsearch supports a variety of languages and official
clients are available for:
● Java
● JavaScript (Node.js)
● Go
● .NET (C#)
● PHP
● Perl
● Python
● Ruby

Amazon Elasticsearch
Amazon Elasticsearch Service is a fully managed service that makes it
easy for you to deploy, secure, and run Elasticsearch cost effectively
at scale.
You can build, monitor, and troubleshoot your applications using the
tools you love, at the scale you need.
The service provides support for open source Elasticsearch APIs,
managed Kibana, integration with Logstash and other AWS services, and
built-in alerting and SQL querying.
Amazon Elasticsearch Service lets you pay only for what you use – there
are no upfront costs or usage requirements. With Amazon Elasticsearch
Service, you get the ELK stack you need, without the operational
overhead.

Beneﬁts
Easy to deploy and manage
With Amazon Elasticsearch Service you can deploy your
Elasticsearch cluster in minutes. The service simplifies
management tasks such as hardware provisioning, software
installation and patching, failure recovery, backups, and
monitoring.
To monitor your clusters, Amazon Elasticsearch service includes
built-in event monitoring and alerting so you can get notified on
changes to your data to proactively address any issues.

Beneﬁts
Highly scalable and available
Amazon Elasticsearch Service lets you store up to 3 PB of data in
a single cluster, enabling you to run large log analytics
workloads via a single Kibana interface.
You can easily scale your cluster up or down via a single API
call or a few clicks in the AWS console.
Amazon Elasticsearch Service is designed to be highly available
using multi-AZ deployments, which allows you to replicate data
between three Availability Zones in the same region.

Beneﬁts
Highly secure
For your data in Elasticsearch Service, you can achieve
network isolation with Amazon VPC, encrypt data at-rest and
in-transit using keys you create and control through AWS
KMS, and manage authentication and access control with
Amazon Cognito and AWS IAM policies.
Amazon Elasticsearch Service is also HIPAA eligible, and
compliant with PCI DSS, SOC, ISO, and FedRamp standards to
help you meet industry-specific or regulatory requirements.

Beneﬁts
Cost-effective
With Amazon Elasticsearch Service, you pay only for the resources you
consume.
You can select on-demand pricing with no upfront costs or long-term
commitments, or achieve significant cost savings via our Reserved
Instance pricing.
As a fully managed service, Amazon Elasticsearch Service further lowers
your total cost of operations by eliminating the need for a dedicated
team of Elasticsearch experts to monitor and manage your clusters.

Use Cases
Application monitoring
Store, analyze, and correlate application and infrastructure log data to find
and fix issues faster and improve application performance.
Enable trace data analysis for your distributed applications to quickly
identify performance issues. You can receive automated alerts if your
application is underperforming, enabling you to proactively address any
issues.
An online travel company, for example, can use Amazon Elasticsearch Service to
analyze logs from its applications to identify and resolve performance
bottlenecks or availability issues, ensuring streamlined booking experience.

Use Cases
Security information and event management (SIEM)
Centralize and analyze logs from disparate applications and
systems across your network for real-time threat detection
and incident management.
A telecom company, for example, can use Amazon Elasticsearch
Service with Kibana to quickly index, search, and visualize
logs from its routers, applications, and other devices to
find and prevent security threats such as data breaches,
unauthorized login attempts, DoS attacks, and fraud.

Use Cases
Search
Provide a fast, personalized search experience for your applications,
websites, and data lake catalogs, allowing your users to quickly find
relevant data.
For example, a real estate business can use Amazon Elasticsearch
Service to help its consumers find homes in their desired location, in
a certain price range from among millions of real-estate properties.
You get access to all of Elasticsearch’s search APIs, supporting
natural language search, auto-completion, faceted search, and
location-aware search.

Use Cases
Infrastructure monitoring
Collect logs and metrics from your servers, routers, switches,
and virtualized machines to get a comprehensive visibility into
your infrastructure, reducing mean time to detect (MTTD) and
resolve (MTTR) issues and lowering system downtime.
A gaming company, for example, can use Amazon Elasticsearch
Service to monitor and analyze server logs to identify any server
performance issues that could lead to application downtime.

Advantages
● Elasticsearch is developed on Java, which makes it compatible on almost every
platform.
● Elasticsearch is real time, in other words after one second the added document is
searchable in this engine
● Elasticsearch is distributed, which makes it easy to scale and integrate in any
big organization.
● Creating full backups are easy by using the concept of gateway, which is present
in Elasticsearch.
● Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
● Elasticsearch uses JSON objects as responses, which makes it possible to invoke
the Elasticsearch server with a large number of different programming languages.
● Elasticsearch supports almost every document type except those that do not support
text rendering.

Elasticsearch

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Elasticsearch

Similar to Elasticsearch (20)

More from Shagun Rathore

More from Shagun Rathore (8)

Recently uploaded

Recently uploaded (20)

Elasticsearch