Elasticsearch V/s Relational Databases
Agenda
● Basic Difference Between Elasticsearch And Relational Database
● Use Cases where Relational Db are not suitable
● Basic Terminology Of Elasticsearch
● Elasticsearch – CRUD operations
Basic Difference
● Elasticsearch is a No sql Database.
● It has no relations, no constraints, no joins, no transactional
behaviour.
● Easier to scale as compared to a relational Database.
Relational DB Elasticsearch
DataBase Index
Table Type
Row/Record Document
Column Name Field
Usecases where Relational Databases
are not suitable
● Relevance based searching
● Searching when entered spelling of search term is wrong
● Full text search
● Synonym search
● Phonetic search
● Log analysis
Relevance Based searcching
● By default, results are returned sorted by relevance—with the most
relevant docs first.
● The relevance score of each document is represented by a positive
floating-point number called the _score. The higher the _score, the
more relevant the document.
● A query clause generates a _score for each document. How that
score is calculated depends on the type of query clause.
Relevance Representation in ES
{
"_index": "test",
"_type": "product",
"_id": "AV0iKK_ZJJfvpLB9dSHl",
"_score": 0.51623213, ====> Relevance Score calculated by ES
"_source": {
"id": 2,
"name": "Red Shirt"
}
}
Wrong Spelling searching
Query
● {
"query": {
"match": {
"name": {
"query": "shrt",
"fuzziness": 2,
"prefix_length": 0
}
}
}
}
Result
{
"_index": "test",
"_type": "product",
"_id": "AV0iKKplJJfvpLB9dSHk",
"_score": 0.21576157,
"_source": {
"id": 1,
"name": "Shirt"
}
}
Full Text Search
● Whenever a full-text is given to Elasticsearch, special analyzers are
applied in order to simplify it and make it searchable.
● It does not store the text as it is visible. This means that the original
text would be modified following special rules before being stored in
the Inverted index.
● This process is called the “analysis phase,” and it is applied to all full-
text fields.
Full Text- Analysis Phase
Full Text- Reverse Indexing
Synonym search
● Synonyms are used to broaden the scope of what is considered a
matching document.
● Perhaps no documents match a query for “Top Doctor's College,” but
documents that contain “Top Medical Institutions” would probably be
considered a good match.
Phonetic Searching
● Elasticsearch can search for words that sound similar, even if their
spelling differs.
● The Phonetic Analysis plugin provides token filters which convert
tokens to their phonetic representation using Soundex, Metaphone,
and a variety of other algorithms.
● Generally used while searching for names that sound similar.
Consider 'Smith', 'Smythe'. Elasticsearch analyser will produce same
tokens for both.
Log Analysis Using Elasticsearch
● Elasticsearch is vastly used as a centralized location for storing logs.
● For the purpose of indexing and searching logs, there is a bundled
solution offered at the Elasticsearch page - ELK stack, which stands
for elasticsearch, logstash and kibana.
●
Elasticsearch Terminology
● Elasticsearch: It is a horizontally distributed,data storage, search server,
aggregation engine, based on lucene library. It is written in java. Elasticsearch
5.5 is the latest one.
● Cluster: A cluster consists of one or more nodes which share the same cluster
name. Each cluster has a single master node which can be replaced if the
current master node fails.
● Node: A node is a running instance of elasticsearch which belongs to a cluster.
Multiple nodes can be started on a single server. At startup, a node will use
unicast to discover an existing cluster with the same cluster name and will try
to join that cluster.
● Primary Shard: Each document is stored in a single primary shard. When you
index a document, it is indexed first on the primary shard, then on all replicas
of the primary shard. By default, an index has 5 primary shards.
Elasticsearch Terminology Ctd.
● Replica Shard: Each primary shard can have zero or more replicas. A replica
is a copy of the primary shard. By Default there are 1 replica for each primary
shards.
● Document: A document is a JSON document which is stored in elasticsearch.
It is like a row in a table in a relational database. Each document is stored in
an index and has a type and an id. A document is a JSON object which
contains zero or more fields, or key-value pairs.
● ID: The ID of a document identifies a document. The index/type/id of a
document must be unique. If no ID is provided, then it will be auto-generated.
● Mapping: A mapping is like a schema definition in a relational database. Each
index has a mapping, which defines each type within the index, plus a number
of index-wide settings.
Create Index/Document
● Index Creation:
PUT employee
● Document Creation
POST employee/employee/1
{
"name" : "John"
}
Delete Document
● Delete By Id
DELETE employee/employee/1
● Delete By query
POST employee/employee/_delete_by_query
{
"query": {
"match": {
"name": "John"
}
}
}
Update Document
● Update By Id:
POST employee/employee/1/_update
{
"doc": {
"name": "Johny"
}
}
● Update By Query:
POST employee/_update_by_query
{
"script": {
"inline": "ctx._source.age++",
"lang": "painless"
},
"query": {
"match": {
"name": "john"
}
}
}
Read/Query Document
● Read By Id
GET employee/employee/1
● Read By query
GET employee/_search
{
"query": {
"match": {
"name": "John"
}
}
}
Thank You :)

Elasticsearch V/s Relational Database

  • 1.
  • 2.
    Agenda ● Basic DifferenceBetween Elasticsearch And Relational Database ● Use Cases where Relational Db are not suitable ● Basic Terminology Of Elasticsearch ● Elasticsearch – CRUD operations
  • 3.
    Basic Difference ● Elasticsearchis a No sql Database. ● It has no relations, no constraints, no joins, no transactional behaviour. ● Easier to scale as compared to a relational Database. Relational DB Elasticsearch DataBase Index Table Type Row/Record Document Column Name Field
  • 4.
    Usecases where RelationalDatabases are not suitable ● Relevance based searching ● Searching when entered spelling of search term is wrong ● Full text search ● Synonym search ● Phonetic search ● Log analysis
  • 5.
    Relevance Based searcching ●By default, results are returned sorted by relevance—with the most relevant docs first. ● The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document. ● A query clause generates a _score for each document. How that score is calculated depends on the type of query clause.
  • 6.
    Relevance Representation inES { "_index": "test", "_type": "product", "_id": "AV0iKK_ZJJfvpLB9dSHl", "_score": 0.51623213, ====> Relevance Score calculated by ES "_source": { "id": 2, "name": "Red Shirt" } }
  • 7.
    Wrong Spelling searching Query ●{ "query": { "match": { "name": { "query": "shrt", "fuzziness": 2, "prefix_length": 0 } } } } Result { "_index": "test", "_type": "product", "_id": "AV0iKKplJJfvpLB9dSHk", "_score": 0.21576157, "_source": { "id": 1, "name": "Shirt" } }
  • 8.
    Full Text Search ●Whenever a full-text is given to Elasticsearch, special analyzers are applied in order to simplify it and make it searchable. ● It does not store the text as it is visible. This means that the original text would be modified following special rules before being stored in the Inverted index. ● This process is called the “analysis phase,” and it is applied to all full- text fields.
  • 9.
  • 10.
  • 11.
    Synonym search ● Synonymsare used to broaden the scope of what is considered a matching document. ● Perhaps no documents match a query for “Top Doctor's College,” but documents that contain “Top Medical Institutions” would probably be considered a good match.
  • 12.
    Phonetic Searching ● Elasticsearchcan search for words that sound similar, even if their spelling differs. ● The Phonetic Analysis plugin provides token filters which convert tokens to their phonetic representation using Soundex, Metaphone, and a variety of other algorithms. ● Generally used while searching for names that sound similar. Consider 'Smith', 'Smythe'. Elasticsearch analyser will produce same tokens for both.
  • 13.
    Log Analysis UsingElasticsearch ● Elasticsearch is vastly used as a centralized location for storing logs. ● For the purpose of indexing and searching logs, there is a bundled solution offered at the Elasticsearch page - ELK stack, which stands for elasticsearch, logstash and kibana. ●
  • 14.
    Elasticsearch Terminology ● Elasticsearch:It is a horizontally distributed,data storage, search server, aggregation engine, based on lucene library. It is written in java. Elasticsearch 5.5 is the latest one. ● Cluster: A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which can be replaced if the current master node fails. ● Node: A node is a running instance of elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server. At startup, a node will use unicast to discover an existing cluster with the same cluster name and will try to join that cluster. ● Primary Shard: Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards.
  • 15.
    Elasticsearch Terminology Ctd. ●Replica Shard: Each primary shard can have zero or more replicas. A replica is a copy of the primary shard. By Default there are 1 replica for each primary shards. ● Document: A document is a JSON document which is stored in elasticsearch. It is like a row in a table in a relational database. Each document is stored in an index and has a type and an id. A document is a JSON object which contains zero or more fields, or key-value pairs. ● ID: The ID of a document identifies a document. The index/type/id of a document must be unique. If no ID is provided, then it will be auto-generated. ● Mapping: A mapping is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings.
  • 16.
    Create Index/Document ● IndexCreation: PUT employee ● Document Creation POST employee/employee/1 { "name" : "John" }
  • 17.
    Delete Document ● DeleteBy Id DELETE employee/employee/1 ● Delete By query POST employee/employee/_delete_by_query { "query": { "match": { "name": "John" } } }
  • 18.
    Update Document ● UpdateBy Id: POST employee/employee/1/_update { "doc": { "name": "Johny" } } ● Update By Query: POST employee/_update_by_query { "script": { "inline": "ctx._source.age++", "lang": "painless" }, "query": { "match": { "name": "john" } } }
  • 19.
    Read/Query Document ● ReadBy Id GET employee/employee/1 ● Read By query GET employee/_search { "query": { "match": { "name": "John" } } }
  • 20.