What is Elasticsearch?
- Powerful search server
- Stores data, although not so good at it
- Quick, flexible and scalable
- Allows complex search queries through easily
understandable JSON structures
- Accessible and manageable through a RESTful API
How does it work?
The data is stored in JSON files, which are sent to the
server through an API. These documents are processed during
indexing, and an inverted index is created, containing the
results(tokens) from the document pre-processing. Then, this
document can be searched using the API. The query used on the
search will also be processed and reduced to tokens, which
will be used to obtain the stored document.
Documents, Types & Indexes
- Index: It's equivalent to a “database” on MySQL. Each
index contains various…
- Types: Are equivalent to a MySQL “table”. Every type has
it's own mapping, which can be automatically created when
inserting documents (not recommended) or effectively
defined. Each type contains various…
- Documents: Are equivalent to a MySQL “row”. It's where
data is effectively stored. Documents can be deleted,
“modified” e inserted through the API.
At Elasticsearch, each ‘searchable’ documents field is
processed according to the rules defined at the Analyzer.
From those, Tokens are extracted, which simply are words
‘shards’. These shards are indexes (Like an array's key).
These indexes can be used to locate a document.
The Tokens become the indexes to a certain document.
Differently from MySQL, that finds a document in O(1) time
through one of your columns defined as ‘index’, the
Elasticsearch has a different approach: All tokens are
indexes which return different documents of different Types.
- Analyzer is a group of rules for tokening and information
filtering on Elasticsearch. There’s millions of different
kinds of tokening and filtering, and infinite ways of
- It’s potentially the difference between a poor query and
the perfect search
- Each analyzer must be forethought with emphasis on the
language and the app specifications.
Are responsible for “breaking the
phrases into important shards”.
Edge NGram Tokenizer
UAX Email URL Tokenizer
Path Hierarchy Tokenizer
They polish the tokens,
making them more precise
without turning them
Standard Token Filter
ASCII Folding Token Filter
Length Token Filter
Lowercase Token Filter
Uppercase Token Filter
NGram Token Filter
Edge NGram Token Filter
Porter Stem Token Filter
Shingle Token Filter
Stop Token Filter
Word Delimiter Token Filter
Stemmer Token Filter
Stemmer Override Token Filter
Keyword Marker Token Filter
Keyword Repeat Token Filter
KStem Token Filter
Snowball Token Filter
Phonetic Token Filter
Synonym Token Filter
Compound Word Token Filter
Reverse Token Filter
Elision Token Filter
Truncate Token Filter
Unique Token Filter
Pattern Capture Token Filter
Pattern Replace Token Filter
Trim Token Filter
Do I have to know all of this?
The Analyzers are configured only once during the index
Creating an Index: MAPPING
- Before creating your index, it’s important to dedicate
some moments to build the mapping.
- The mapping defines all the fields of all of your types,
just like the analyzer that will be used for indexing
andor searching in each field.
- Also responsible for defining the number of SHARDs the
index will have.
Shards are LITERALLY ‘shards’ of your Index, which can be
allocated in different Nodes and have replicas in other
Shards are important for info safety and are responsible for
the distributed nature of Elasticsearch. Each shard is
independent from other operations.
Machines. Nodes are machines executing the Elasticsearch
server. Nodes can be configured to recognize and interact
with others. By rule, one of the Nodes becomes the “Master”,
while the remaining become “Slaves”. The Master is
responsible to coordinate both indexes and search
An aggregation is an extra information about the query’s
For example, an aggregation can bring the average price of
all deals found in the result of a query of Deals filtered by
It could also collect all categories contained on the result
of a query of all Listings.
How to build an Index
There are many ways. It is possible to build it and then
create the mapping; Creating both together; Creating both,
one at the time, and then link together, etcetera.
The most recommended, though, is to create them all at once.
It is done by sending a request of PUT type to the