SlideShare a Scribd company logo
1 of 33
Download to read offline
Basics on
Elasticsearch
Ruby Shrestha
Overview Session
Elasticsearch: An Introduction
 Written in Java, open source, based on Apache Lucene
 https://github.com/elastic/elasticsearch
 Document storage
 Format: JSON
 Full-text search engine
 Full-text search?
 Every doc, every word
 Search large dataset in few seconds
 How?
 Via Inverted Index, Distributed Nature
 Analytics Platform
 Aggregations and analysis
Use Cases Where ES
Overshadows DB
 Full-text search is more efcient in ES
due to fexible indexing.
 Relevance based searching
Use Cases Where ES
Overshadows DB
 Searching when entered spelling is
wrong
 Synonym based search
 Phonetic based search
 Use of distributed architecture
 Works well with unstructured data
How does Elasticsearch Work?
 Data stored as document
 Format: JSON
How does Elasticsearch Work?
 Querying Document
 Via JSON Based REST API
HTTP Request Method (Get, Put, Post, Delete)
REST Client
(e.g:
Insomnia)
REST
API
Elasticsearch
JSON
Request
JSON
Response
JSON
Response
JSON
Request
All in All
 Easy to get started with
 Complex technology if its full potential is
to be used
 By far, the hottest search engine in
market used by a huge community
Used by a huge
community
Elastic Stack
When Not To Use ES: Use
Cases
 Data Storage
 No/Rare/Simple Analysis
 Analysis on single value text-felds
(usernames, zip-codes), value lookups
 Huge computations (extensive
preprocessing and transformations)
Conceptual Details
Types of Scaling
Vertical Scaling Horizontal Scaling
Scaling Up Scaling Out
Increasing size of a machine Having multiple machines
Has limits Real power of distributed system
comes from here
Architecture of Elasticsearch
 Cluster
Architecture of Elasticsearch
 Nodes
 Can carry out indexing and searching
 Every node is aware of each other
 Every node can forward request to any other node in the cluster.
 Every node can accept HTTP request from REST clients.
 Every node as its own unique name (UUID).
 First seven characters used as node id. Persists even after restart.
 Node is considered as running instance of Elasticsearch
 Categories of Dedicated Nodes:
 Master Node
 Data Node
 Ingest Node
 Coordinating Node
 By default, a node is master eligible, data and ingest node
Architecture of Elasticsearch
 Indices and Types
Parallel concepts between Databases and Elasticsearch
Change in latest ES version : 6.5
Database Table Index
Table Type
Index name, type name and
feld name rules
 Lowercase only
 Cannot include  , / , * , ? , " , < , > , | ,
space (the character, not the word), , , #
 Indices prior to 7.0 could contain a colon
( : ), but that's been deprecated and won't
be supported in 7.0+
 Cannot start with - , _ , +
 Cannot be . or ..
 Cannot be longer than 255 characters.
Sharding
 Size of single index exceeds physical
capacity of available nodes
 Example:
 Each Node: 512 MB
 Size of Index: 1 TB
 Sharding comes to the rescue during
such cases of bottleneck.
Sharding
 Advantages:
 Enables adjusting with growing amount of data
 Better throughput in cases where shards are distributed to multiple nodes
 Parallel execution of queries across nodes possible
Replication
 What if a node fails?
 Is there any fault tolerance mechanism in ES?
 YES, via Replication
 Replication means duplicating available shards
 For high availability/ fault tolerance
 For better throughput (provided hardware is available)
 Shard that is replicated-> Primary Shard
 Replicated version of shared->Replica Shard
 Replication Group= Primary shard + Its Replicas
Defaults
 Cluster Name: elasticsearch
 Number of shards per index: 5
 Number of replicas: 1 for each shard
Keeping Replicas in Sync
Complete Architecture
Characteristics of ES
 Near-real Time Searching
 Indexing
 Distributed Nature
 Multi-Tenancy
Indexing in Elastisearch
{
"statement": "Winter is coming"
}
{
"statement": “Ours is the fury"
}
{
"statement": “The choice is yours"
}
Let’s get started practically!
Monitoring Cluster Health
 localhost:9200/_cluster/health
Statu
s
Reason
Gree
n
All the shards are properly
assigned/allocated to
nodes.
Yello
w
Some/All of the shard’s
replicas are unassigned.
Red Specifc primary shard is
unassigned/unallocated.
In Shard Level:
Index Health: Worst Shard Status
Cluster Health: Worst Index Status
Cluster State
 localhost:9200/_cluster/state
Document Management
 Simple Index Creation
 PUT /<index-name>
 Similar to creation of table in database (if
we are to consider from ES V_6.X)
 Creating Index with Setting
 { "settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
} }
File Directory Structure
 The frst time you install ES and run it,
you are running an instance of ES, i.e., a
node.
 data
 Elasticsearch
 Nodes
 0
 _state
 global-<version>.st (contains node/cluster settings)
 node.lock (so that only one ES instance writes to
the directory at a time)
Index Creation Leads To
 Inside node, a new indices folder
appear.
 indices
 <index-name>/<uuid> (you can fnd this
uuid inside localhost:9200/_cluster/state
-> metadata key->indices key
 0 … 5 (shards, default number)
 _state
 state-<version>.st (certain index’s
metadata/setting)
Document Management
 Creating/Indexing/Inserting a new document
 PUT /<index-name>/_doc/1
{“name”:”Basics of Elastic Stack”,
“course”:”Searching and Analytics”
“price”:500}
 POST /<index-name>/_doc
{
"name": "Umagi",
"course": "Fiction",
"price": 2000
}
What actually happens when we create a
new document?
In-Memory Indexing
Bufer
Transaction Log
File System Cache
Disk
• Refresh Rate (Default 1 sec)
{“settings”:{“refresh-interval”:”30s”}}
• File System Cache: Segment Creation
• Disk: Segments fushed into commit point

More Related Content

What's hot

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In ElasticsearchKnoldus Inc.
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch BasicsAmresh Singh
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 

What's hot (20)

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Elastic search
Elastic searchElastic search
Elastic search
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 

Similar to Elasticsearch: An Overview

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGSteve Behrendt
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsTiziano Fagni
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Introduction to ElasticSearch
Introduction to ElasticSearchIntroduction to ElasticSearch
Introduction to ElasticSearchManav Shrivastava
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionDataWorks Summit
 

Similar to Elasticsearch: An Overview (20)

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to ElasticSearch
Introduction to ElasticSearchIntroduction to ElasticSearch
Introduction to ElasticSearch
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Elasticsearch: An Overview

  • 3. Elasticsearch: An Introduction  Written in Java, open source, based on Apache Lucene  https://github.com/elastic/elasticsearch  Document storage  Format: JSON  Full-text search engine  Full-text search?  Every doc, every word  Search large dataset in few seconds  How?  Via Inverted Index, Distributed Nature  Analytics Platform  Aggregations and analysis
  • 4. Use Cases Where ES Overshadows DB  Full-text search is more efcient in ES due to fexible indexing.  Relevance based searching
  • 5. Use Cases Where ES Overshadows DB  Searching when entered spelling is wrong  Synonym based search  Phonetic based search  Use of distributed architecture  Works well with unstructured data
  • 6. How does Elasticsearch Work?  Data stored as document  Format: JSON
  • 7. How does Elasticsearch Work?  Querying Document  Via JSON Based REST API HTTP Request Method (Get, Put, Post, Delete) REST Client (e.g: Insomnia) REST API Elasticsearch JSON Request JSON Response JSON Response JSON Request
  • 8. All in All  Easy to get started with  Complex technology if its full potential is to be used  By far, the hottest search engine in market used by a huge community
  • 9. Used by a huge community
  • 11. When Not To Use ES: Use Cases  Data Storage  No/Rare/Simple Analysis  Analysis on single value text-felds (usernames, zip-codes), value lookups  Huge computations (extensive preprocessing and transformations)
  • 13. Types of Scaling Vertical Scaling Horizontal Scaling Scaling Up Scaling Out Increasing size of a machine Having multiple machines Has limits Real power of distributed system comes from here
  • 15. Architecture of Elasticsearch  Nodes  Can carry out indexing and searching  Every node is aware of each other  Every node can forward request to any other node in the cluster.  Every node can accept HTTP request from REST clients.  Every node as its own unique name (UUID).  First seven characters used as node id. Persists even after restart.  Node is considered as running instance of Elasticsearch  Categories of Dedicated Nodes:  Master Node  Data Node  Ingest Node  Coordinating Node  By default, a node is master eligible, data and ingest node
  • 16. Architecture of Elasticsearch  Indices and Types Parallel concepts between Databases and Elasticsearch Change in latest ES version : 6.5 Database Table Index Table Type
  • 17. Index name, type name and feld name rules  Lowercase only  Cannot include , / , * , ? , " , < , > , | , space (the character, not the word), , , #  Indices prior to 7.0 could contain a colon ( : ), but that's been deprecated and won't be supported in 7.0+  Cannot start with - , _ , +  Cannot be . or ..  Cannot be longer than 255 characters.
  • 18. Sharding  Size of single index exceeds physical capacity of available nodes  Example:  Each Node: 512 MB  Size of Index: 1 TB  Sharding comes to the rescue during such cases of bottleneck.
  • 19. Sharding  Advantages:  Enables adjusting with growing amount of data  Better throughput in cases where shards are distributed to multiple nodes  Parallel execution of queries across nodes possible
  • 20. Replication  What if a node fails?  Is there any fault tolerance mechanism in ES?  YES, via Replication  Replication means duplicating available shards  For high availability/ fault tolerance  For better throughput (provided hardware is available)  Shard that is replicated-> Primary Shard  Replicated version of shared->Replica Shard  Replication Group= Primary shard + Its Replicas
  • 21. Defaults  Cluster Name: elasticsearch  Number of shards per index: 5  Number of replicas: 1 for each shard
  • 24. Characteristics of ES  Near-real Time Searching  Indexing  Distributed Nature  Multi-Tenancy
  • 25. Indexing in Elastisearch { "statement": "Winter is coming" } { "statement": “Ours is the fury" } { "statement": “The choice is yours" }
  • 26. Let’s get started practically!
  • 27. Monitoring Cluster Health  localhost:9200/_cluster/health Statu s Reason Gree n All the shards are properly assigned/allocated to nodes. Yello w Some/All of the shard’s replicas are unassigned. Red Specifc primary shard is unassigned/unallocated. In Shard Level: Index Health: Worst Shard Status Cluster Health: Worst Index Status
  • 29. Document Management  Simple Index Creation  PUT /<index-name>  Similar to creation of table in database (if we are to consider from ES V_6.X)  Creating Index with Setting  { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 } }
  • 30. File Directory Structure  The frst time you install ES and run it, you are running an instance of ES, i.e., a node.  data  Elasticsearch  Nodes  0  _state  global-<version>.st (contains node/cluster settings)  node.lock (so that only one ES instance writes to the directory at a time)
  • 31. Index Creation Leads To  Inside node, a new indices folder appear.  indices  <index-name>/<uuid> (you can fnd this uuid inside localhost:9200/_cluster/state -> metadata key->indices key  0 … 5 (shards, default number)  _state  state-<version>.st (certain index’s metadata/setting)
  • 32. Document Management  Creating/Indexing/Inserting a new document  PUT /<index-name>/_doc/1 {“name”:”Basics of Elastic Stack”, “course”:”Searching and Analytics” “price”:500}  POST /<index-name>/_doc { "name": "Umagi", "course": "Fiction", "price": 2000 }
  • 33. What actually happens when we create a new document? In-Memory Indexing Bufer Transaction Log File System Cache Disk • Refresh Rate (Default 1 sec) {“settings”:{“refresh-interval”:”30s”}} • File System Cache: Segment Creation • Disk: Segments fushed into commit point