2. What is elasticsearch
ES is a document-oriented database designed to store,
retrieve, and manage document-oriented or semi-structured
data. When you use Elasticsearch, you store data in JSON
document form. Then, you query them for retrieval.
3. ElasticSearch Story: Past and Present
Many years ago, a newly married unemployed developer called Shay Banon followed his wife to London,
where she was studying to be a chef.
While looking for gainful employment, he started playing with an early version of Lucene, with the intent of
building his wife a recipe search engine.
Working directly with Lucene can be tricky, so Shay started work on an abstraction layer to make it easier
for Java programmers to add search to their applications. He released this as his first open source project,
called Compass.
Later Shay took a job Gigaspaces. The need for a high-performance, real-time, distributed search engine
was obvious, and he decided to rewrite the Compass libraries as a standalone server called Elasticsearch.
The first public release came out in February 2010.
A company has formed around Elasticsearch to provide commercial support and to develop new features,
but Elasticsearch is, and forever will be, open source and available to all.
He is a great example how a single person's perseverance for a long time can create a world class product.
But Shay’s wife is still waiting for the recipe search.
9. The Basic Concepts
of Elasticsearch.
Let's take a look at the basic concepts of
Elasticsearch:
➔ clusters, near real-time search,
indexes, nodes, shards, mapping
types, and more.
10.
11. Indexing
● Elasticsearch is able to achieve fast search responses
because, instead of searching the text directly, it
searches an index instead.
● This is like retrieving pages in a book related to a
keyword by scanning the index at the back of a book,
as opposed to searching every word of every page of
the book.
● This type of index is called an inverted index,
because it inverts a page-centric data structure
(page->words) to a keyword-centric data structure
(word->pages).
Elasticsearch uses
Apache Lucene to
create and manage
this inverted index.
12.
13.
14. Cluster
A cluster is a collection of one or more servers that together
hold entire data and give federated indexing and search
capabilities across all servers. For relational databases, the
node is DB Instance. There can be N nodes with the same
cluster name.
Near-Real-Time (NRT)
Elasticsearch is a near-real-time search platform. There is a
slight from the time you index a document until the time it
becomes searchable.
15. Index
The index is a collection of documents that have similar characteristics. For example,
we can have an index for customer data and another one for a product information.
An index is identified by a unique name that refers to the index when performing
indexing search, update, and delete operations. In a single cluster, we can define as
many indexes as we want. Index = database schema in an RDBMS (relational database
management system) — similar to a database or a schema. Consider it a set of tables
with some logical grouping. In Elasticsearch terms: index = database; type = table;
document = row.
16. Node
A node is a single server that holds some data and participates on the cluster’s
indexing and querying. A node can be configured to join a specific cluster by the
particular cluster name. A single cluster can have as many nodes as we want. A node
is simply one Elasticsearch instance. Consider this a running instance of MySQL.
There is one MySQL instance running per machine on different a port, while in
Elasticsearch, generally, one Elasticsearch instance runs per machine. Elasticsearch
uses distributed computing, so having separate machines would help, as there would
be more hardware resources.
17. Shards
A shard is a subset of documents of an index. An index can be divided into many shards.
Mapping Type
Mapping type = database table in an RDBMS.
Elasticsearch uses document definitions that act as tables. If you PUT (“index”) a document in Elasticsearch,
you will notice that it automatically tries to determine the property types. This is like inserting a JSON blob in
MySQL, and then MySQL determining the number of columns and column types as it creates the database
table.
Elasticsearch users have delightfully diverse use cases, ranging from appending tiny log-line documents to
indexing web-scale collections of large documents and maximizing indexing throughput.