Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON document. Distributed search and analytics engine, part of the Elastic Stack. It indexes and analyzes data in real-time, providing powerful and scalable search capabilities for diverse applications.
2. Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Join the session 5 minutes prior to the session start time. We start on
time and conclude on time!
Feedback
Make sure to submit a constructive feedback for all sessions as it is very
helpful for the presenter.
Silent Mode
Keep your mobile devices in silent mode, feel free to move out of session
in case you need to attend an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during the session.
3. 1. Introduction
2. Elastic Stack(ELK)
3. Basic Architecture
4. Use Cases
5. Data Ingestion
6. Query DSL
7. Performance And Scalability
8. Demo
4.
5. Introduction
Elastic search is a powerful and open-source search and analytics engine built
on top of Apache Lucene. It is designed to handle large amounts of data and is
particularly well-suited for real-time search and analysis. Elastic search is a part
of the Elastic Stack, which also includes Logstash, Kibana, and Beats.
Elastic search is primarily used for full-text search but is also capable of
handling various types of structured and unstructured data.
Query & analyze structured data.
Analyze application logs and system metrics. E.g. errors and CPU/memory
usage
Send events to Elastic search E.g. sales, website clicks, phone calls etc.
Elastic search is great at analyzing lots of data.
Anomality Detection.
8. Elastic Stack(ELK)
The Elastic Stack, formerly known as the ELK Stack, is a collection of
open-source tools designed for searching, analyzing, and visualizing
data. It is widely used for log and event data analysis, monitoring, and
various other data analytics scenarios.
Elastic Stack consists of four main components.
9. Elasticsearch: Elasticsearch is a distributed search and analytics engine that stores
data in a JSON format. It provides powerful search and analytics capabilities, making it
the core component of the Elastic Stack.
Logstach:Logstash is a data processing pipeline that ingests, processes, and
transforms data before sending it to Elasticsearch. It is often used for collecting and
parsing log data from various sources, such as application logs, system logs, and
network logs.
Kibana:Kibana is a web-based visualization tool that allows users to interact with and
explore data stored in Elasticsearch. It provides a user-friendly interface for creating
dashboards, visualizations, and performing ad-hoc queries. Kibana is an essential
component for monitoring and visualizing data in real-time.
Beats:Beats are lightweight data shippers that send data from different sources to
either Elasticsearch or Logstash.
10.
11. Basic Architecture
Cluster:A cluster is a collection of nodes that work
together and share the same cluster name. Nodes
within a cluster communicate and cooperate to
distribute data, manage cluster state, and handle
queries.
Node: A node is a single instance of Elastic search
running on a physical or virtual machine. Nodes are
the basic building blocks of an Elastic search cluster.
Index: An index is a logical collection of documents
with similar characteristics or data type. For example,
you might have an index for storing user data and
another for storing product information.
Shard: A shard is a single, self-contained unit of an
index that holds a subset of the index's data. Elastic
search distributes shards across nodes to enable
horizontal scaling and efficient data retrieval.
Document:A document is a JSON object
representing a piece of data stored in an index
12.
13. Website and Application
Search:
Implementing a fast and
efficient search functionality on
websites or applications.
Elastic search enables full-text
search, autocomplete
suggestions, and faceted
search.
Use Cases
Log and Event Data
Analysis:
Analyzing and searching through
log files generated by applications,
servers, and network devices.
Elastic search excels in handling
large volumes of log and event
data, providing real-time insights
and troubleshooting capabilities.
Business Intelligence
and Analytics:
Storing and querying large
datasets for business intelligence
and analytics purposes.
Elasticsearch can handle
structured and unstructured data,
making it suitable for complex
querying and analysis.
Monitoring and Alerting:
Monitoring the health and
performance of systems,
applications, and infrastructure in
real-time. Elasticsearch, when
used in conjunction with tools like
Beats and Kibana, forms a
powerful monitoring and alerting
solution.
Security Information and
Event Management (SIEM):
Centralizing and analyzing
security-related data, including
logs, alerts, and events.
Elasticsearch is a key component
in building SIEM solutions for
threat detection and response.
Content and Document
Management:
Managing and searching through
large repositories of documents or
content. Elasticsearch allows for
full-text search, document retrieval,
and supports complex queries,
making it valuable for content
management systems.
14. E-commerce Search and
Recommendations:
Enhancing the search experience
on e-commerce platforms by
providing relevant and fast search
results. Elastic search can also be
used for building recommendation
engines based on user behavior
and preferences.
Use Cases
Geospatial Data
Analysis:
Analyzing and searching through
geospatial data, such as locations
and coordinates. Elasticsearch
supports geospatial queries,
making it suitable for applications
involving maps, geolocation-based
services, and spatial analysis.
Healthcare Data Search:
Indexing and searching through
vast amounts of healthcare data,
including electronic health records
(EHRs) and medical documents.
Elasticsearch enables quick
retrieval of relevant medical
information.
Social Media Monitoring:
Tracking and analyzing social
media data for sentiment analysis,
trend identification, and brand
monitoring. Elastic search can
process and index large amounts
of social media data in real-time.
Data Exploration and
Visualization:
Building interactive dashboards
and visualizations for exploring and
understanding large datasets.
Kibana, when integrated with
Elasticsearch, facilitates data
exploration and visualization.
Data Integration and
Enrichment:
Integrating data from multiple
sources, enriching it, and making it
searchable. Logstash, a
component of the Elastic Stack, is
often used for data integration and
transformation.
15.
16. Elasticsearch APIs:
Elasticsearch provides a RESTful API
that allows you to interact with the
cluster using HTTP requests. You can
use the Index API to manually index
individual documents or the Bulk API to
index multiple documents in a single
request.This method is suitable for
small-scale data ingestion or when
dealing with data that is generated on-
demand.
Data Ingestion
Logstash:
Logstash is a powerful data
processing pipeline that can ingest
data from multiple sources,
transform it, and send it to
Elasticsearch. It supports a wide
range of input plugins (e.g., file
input, beats input, JDBC input),
filter plugins (e.g., grok, mutate),
and output plugins (e.g.,
Elasticsearch, Kafka).
Beats:
Beats are lightweight data shippers
designed to send data from various
sources to Elasticsearch or Logstash.
Different Beats are available for specific
use cases, such as Filebeat for log files,
Metricbeat for system metrics,
Packetbeat for network data, etc.
Elasticsearch Hadoop:
Elasticsearch Hadoop is a
connector that allows you to
integrate Elasticsearch with
Apache Hadoop and other big data
processing frameworks. This is
useful for ingesting and analyzing
large datasets stored in Hadoop.
Native Integrations:
Some applications and systems
offer native integrations with
Elasticsearch. For example,
databases like MongoDB and
MySQL can use connectors or
plugins to push data directly to
Elasticsearch.
Third-Party Tools:
Several third-party tools and
connectors are available that
facilitate data ingestion into
Elasticsearch. These tools might
offer additional features or
specialized functionality for specific
use cases.
17.
18. Query DSL
Query DSL (Domain-Specific Language) in Elastic search is a powerful and
expressive language used to construct queries for searching and retrieving data
from an Elastic search cluster. It allows users to define complex queries and filters,
making it possible to retrieve specific documents that match certain criteria. The
Query DSL is a JSON-based syntax that enables a wide range of search
functionalities. Here are some key components of the Query DSL:
Match Query
Term Query
Bool Query
Match Phrase Query
Range Query
WildCard Query
Nested Query
Fuzzy Query
19.
20. Performance And Scalability
Performance and scalability are critical considerations when designing an
Elastic search cluster, especially for handling large volumes of data and
supporting high query loads. Here are key factors and strategies to optimize
performance and scalability in Elastic search:
Hardware and Infrastructure
Cluster Configuration
Sharding
Indexing Performance
Query Performance
Caching
Monitoring and Logging
Tune JVM setting
Data Archiving and Lifecycle Policies
Horizontal Scaling
Network and Security