A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
2. What is ElasticSearch
A JSON Oriented Full text search engine based on the Apache Lucene search
library.
It is a distributed, horizontally scalable, as in more elasticsearch nodes can be
added to an elasticsearch cluster as needed.
RESTful and API Centric - Thus making it more usable.
3. That’s fine, but where do I use it?
You want to use Elasticsearch when you want to do text search instead of just
matching data with fields - Full Text Searches.
You want to use elasticsearch for purposes such as Logging and Analysis - To
analyze logs from various sources, user tracking, behavioural data, search
terms used and make sense out of them.
In addition, aggregation tools such as Kibana and Sense can be used to analyse
data and build visualizations on the data present in real time. A good example
of that is Loggly - built using Elasticsearch and Kibana.
4. That’s fine, but where do I use it?(continued)
In the past few years, the ELK(ElasticSearch-Logstash-Kibana) stack has
become the defacto standard for Infrastructure Monitoring among other
tools, due to CI/CD pipelines and the increasing amount of metrics from
various sources.
A lot of tools are available for analysing data available in Elasticsearch, such as:
a. Logstash - Parse Log from multiple sources into ElasticSearch Documents.
b. Beats - Log Shipper to ship logs to elasticsearch/logstash.
c. Kibana - Data visualization tool.
Elasticseach can thus be used for operations such as fuzzy searching, data
5. Why not Relational Databases?
Due to their inherent nature, Relational databases are not scalable. Relational
Databases were not designed to handle such large volumes of data generated
digitally, online.
Relational Databases are designed to work on a single system and are not
largely distributed. It takes a huge overhead to induce distributed behaviour
into relational databases, especially maintaining integrity of data and table
mappings across servers.
NoSQL Databases such as (indirectly)Elasticsearch are designed for scale, thus
replacing relational databases in such applications as aforementioned .
6. Who uses Elasticsearch?
Github: To query 130 billion lines of code.
Wikimedia: To provide search-as-you-type and ‘Did-you-mean’ type results.
Stackoverflow: The programmer’s community combines full-text search with
geo-location queries and uses more-like-this to find related questions and
answers.
Apart from the obvious uses above, it is used at numerous other places such as:
Facebook, Netflix, Foursquare, Quora, Lichess, Mozilla etc.
Even I have been using elasticsearch for various purposes during my Internship.
7. Who built Elasticsearch?
Shay Banon created the precursor to Elasticsearch, called Compass, in 2004.
While thinking about the third version of Compass he realized that it would be
necessary to rewrite big parts of Compass to "create a scalable search
solution"
So he created "a solution built from the ground up to be distributed" and used a
common interface, JSON over HTTP, suitable for programming languages
other than Java as well.
Shay Banon released the first version of Elasticsearch in February 2010.
8. Elasticsearch Concepts
Elasticsearch works on a concept known as inverse indexing.
Let’s say there are three documents: "Winter is coming.", "Ours is the fury." and
"The choice is yours.".
After some simple text processing (lowercasing, removing punctuation and
splitting words), we can construct the "inverted index" as shown:
9.
10. Storage Model
The inverted index maps terms to documents (and possibly positions in the
documents) containing the term. Since the terms in the dictionary are sorted, we
can quickly find a term, and subsequently its occurrences in the postings-
structure. This is contrary to a "forward index", which lists terms related to a
specific document.
A simple search with multiple terms is then done by looking up all the terms and their
occurrences, and take the intersection (for AND searches) or the union (for OR
searches) of the sets of occurrences to get the resulting list of documents. More
complex types of queries are obviously more elaborate, but the approach is the
same: first, operate on the dictionary to find candidate terms, then on the
corresponding occurrences, positions, etc.
11. The Elastic Terminology
Few terms associated with Elasticsearch:
a. Cluster: Group of Elasticsearch Nodes
b. Node: A JVM Process to access the Elasticsearch Instance(An independently
accessible Server/Machine/Container running ElasticSearch)
c. Index: Analogous to a Relational Database, this holds the mapping types and their
definitions. An index may contain data across many shards.
d. Mapping Type: Description of a Particular Field in the Table
e. Document: A JSON Document. In relational terms, this would represent a single row.
f. Shard: An independent scalable unit that independently processes primary and replica
12. Types of Nodes
There are the following three types of nodes available in Elasticsearch:
a. Master Node
b. Data Node
c. Client Node
13. Master Node
It controls the Elasticsearch cluster and is responsible for all clusterwide operations
like creating/deleting an index, keeping track of which nodes are part of the
cluster and assigning shards to nodes.
The master node processes one cluster state at a time and broadcasts the state to all
the other nodes which respond with confirmation to the master node.
A node can be configured to be eligible to become a master node by setting the
node.master property to be true (default) in elasticsearch.yml.
For large production clusters, it’s recommended to have a dedicated master node to
just control the cluster and not serve any user requests.
14. Data Node
It holds the data and the inverted index.
By default, every node is configured to be a data node and the property
node.data is set to true in elasticsearch.yml.
In order to have a dedicated master node, then change the node.data property
to false.
15. Client Node
If both node.master and node.data are set to false, then the node gets
configured as a client node and acts as a load balancer routing incoming
requests to different nodes in the cluster.
16. Elastic Operations
Following CRUD operations can be performed:
a. Create Data
b. Read Data
c. Update Data
d. Delete Data
These Operations can be performed on Elasticsearch by use of the extensive
API that comprises of:
a. Document APIs - To create new documents, read existing, delete and update documents.
b. Search APIs - To perform searches on the data present and draw information based on search
terms
17. Interacting with Elasticsearch
Once elasticsearch is set up and running on a local machine, one can access basic cluster
information via the URL:
127.0.0.1:9200/ and view cluster information such as follows:
{
"name" : "m0ggZFF",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "KG0nPhXxQnaDECIq71kKIw",
"version" : {
"number" : "5.2.1",
"build_hash" : "db0d481",
"build_date" : "2017-02-09T22:05:32.386Z",
"build_snapshot" : false,
"lucene_version" : "6.4.1"
},
"tagline" : "You Know, for Search"
}
18. Creating a Document
To create a document, a PUT operation may be performed on the following URL -
127.0.0.1:9200/seminar_index/books
{
"title": "Java 8 Optional In Depth",
"category":"Java",
"published_date":"23-FEB-2017",
"author":"Rambabu Posa"
}
The above created an Index in Elasticsearch by the name ‘seminar_index’ and within it
created a document with the mapping books.
Elasticsearch supports Dynamic Mapping.
19. Read/ Search a Document
To read/search a document, use ‘_search’ at the end of the REST API URL:
127.0.0.1:9200/seminar_index/books/_search
A result such as follows will be obtained:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1
.0,"hits":[{"_index":"seminar_index","_type":"books","_id":"AVtmTCD8NesM_0w1wBo3","_score":1.0,"_
source":{
"title": "Java 8 Optional In Depth",
"category":"Java",
"published_date":"23-FEB-2017",
"author":"Rambabu Posa"
}}]}}
20. Update a Document
To perform an Update Operation, send the updated field as the POST BODY to
the URL:
127.0.0.1:9200/seminar_index/books/AVtmTCD8NesM_0w1wBo3
(The last part
after books/ being the id of the document)
The lines:
“result”:“updated”
“created”:false
Indicate that the document was updated and not created again.
21. Delete a Document
To remove a document from the index, simply send a DELETE request to a URL
as follows:
127.0.0.1:9200/seminar_index/books/AVtmTCD8NesM_0w1wBo3
This will remove the document
In the Index: seminar_index
With the mapping: books
With the id: AVtmTCD8NesM_0w1wBo3
22. The Takeaway
The use of elasticsearch can be extended by creating custom plugins that
extend its functionality for custom use cases
Beats can be used for effective Infrastructure Monitoring
Logstash can be used to effectively parse logs received from various sources
by the use of Grok Patterns to monitor activity across servers
Kibana can be used as an aggregator as well as interface to execute queries on
the data present in elasticsearch to return responses via the Timelion Tool. It
can also be used for real time analysis of data by the use of Graphs