SlideShare a Scribd company logo
1 of 52
Elasticsearch
Under the hood
August 2018
Disclaimer
● Elasticsearch 6.3
● Lucene 7.3
What is it? What is it good for?
● Full text search
● Scalable and robust full text search
What is Lucene? What it has to do with Elasticsearch?
● Full text search library
● Elasticsearch is just a wrapper around it which provides scalability, durability and REST API
API level
Index document
POST doc/default
{
"class": "ConfigurationParser",
"method": "build",
"description": "<p class=’paragraph’>Creates an instance of
<link>ObjectGenerator</link> based on provided configuration.
Resulting <link>ObjectGenerator</link> will try to convert configured
output to specified <code>objectType</code>.</p>"
}
API level
Update document
POST doc/default/1/_update
{
"doc": { "description": "New description value" }
}
API level
Delete document
DELETE doc/default/1
API level
Search
GET doc/default/_search
{
"query": {
"match": {
"title": {
"query": "ObjectGenerator"
}
}
}
}
Cluster level
How does that look like?
Transaction
Log
LuceneREST
ES node
UUID name
ES
ES Cluster
“Elasticsearch”
Transaction
Log
LuceneREST
ES node
UUID name
ES Transaction
Log
LuceneREST
ES node
UUID name
ES
Transaction
Log
LuceneREST
ES node
UUID name
ES
All nodes are of the same type?
● Master node
● Data node
● Ingest node
● Tribe node
● Coordinator node
● Default case
Shards
Index
Doc1
Doc2
Doc3
Doc4
Doc5
Doc6
Doc7
Doc8
Doc9
Doc10
Doc11
Doc12
Node
Node
Node
Node
Node
Node
Index - S1
Doc1
Doc4
Doc8
Index - S5
Doc9
Doc12
Index - S4
Doc7
Doc10
Index - S2
Doc3
Doc5
Doc6
Index - S3
Doc2
Doc11
Replicas
Node
Node
Index - S1 R1
Doc1
Doc4
Doc8
Node
Node
Node
Index - S1 R2
Doc1
Doc4
Doc8
Index - S1 R3
Doc1
Doc4
Doc8
Index - S1 R4
Doc1
Doc4
Doc8
Index - S1 P
Doc1
Doc4
Doc8
Scaling
● Single node cluster with 3 shards and 1 replica
● Unassigned shards problem
Node 1
S1
P
S2
P
S3
P
Single node cluster State:
Scaling out
● Two node cluster with 3 shards and 1 replica
● To prevent unassigned shards: number of nodes > number of replicas + 1
Node 1
S1
P
S2
P
S3
P
Two node cluster State:
Node 2
S1
R
S2
R
S3
R
Scaling out
● Three node cluster with 3 shards and 1 replica
● Load spread across all nodes
Node 1
S1
P
S2
P
Three node cluster State:
Node 2
S2
R
Node 3
S1
R
S3
P
S3
R
Scaling out
● Seven node cluster with 3 shards and 1 replica, one node unused
● Increase number of shards (not possible)
● Increase replication factor (possible on running cluster)
Node 1
S1
P
Seven node cluster State:
Node 2
S2
P
Node 3
S3
P
Node 4
S1
R
Node 5
S2
R
Node 6
S3
R
Node 7
Scaling in
● Make sure you have enough nodes to support replication factor when node is killed
● Wait for green status
● If necessary, lower the replication factor
Replacing the node
● Same as scale out / scale in
How does the write look like?
Node 2
S2 P
S3 R
Node 4
S1 R
S4 P
Node 3
S1 ISR
S2 ISR
Node 5
S1 ISR
S4 ISR
Node 6
S1 P
S3 ISR
S4 ISR
Node 1
S2 ISR
S3 P
ES Cluster
Generate doc ID if
not present
Hash ID to determine
replication group
(routing param)
Coordinator node
How does the read look like?
Node 2
S2 P
S3 R
Node 4
S1 R
S4 P
Node 3
S1 ISR
S2 ISR
Node 5
S1 ISR
S4 ISR
Node 6
S1 P
S3 ISR
S4 ISR
Node 1
S2 ISR
S3 P
ES Cluster
Coordinator node
Resolve search
request to relevant
shards
Combine the results
What if something goes wrong?
● Network partition (one master)
● Network partition (two masters)
● Network partition (three masters)
● Primary shard node failure (in sync replicas available)
● Primary shard node failure (no in sync replicas available)
● Write replication (replica write failure)
● Node failure (read)
What if something goes wrong?
Network partition (one master)
● discovery.zen.no_master_block
Shard 2 P
Shard 2 R
Shard 1 P
Shard 2 RShard 1 R
Master
ES Cluster
What if something goes wrong?
Network partition (two masters)
● discovery.zen.minimum_master_nodes
Shard 2 P
Master
Shard 1 P
Shard 2 RShard 1 R
Master
ES Cluster
What if something goes wrong?
Network partition (three masters)
● discovery.zen.minimum_master_nodes
Shard 2 P
Master
Shard 1 P
MasterShard 1 R
Master
ES Cluster
What if something goes wrong?
Primary shard node failure (in sync replicas available)
Shard 2 P
Shard 1 ISR
Shard 1 P
Shard 2 ISRShard 1 R
Master
ES Cluster
Shard 1 P
(new)
What if something goes wrong?
Primary shard node failure (no in sync replicas available)
Shard 2 P
Shard 1 R
Shard 1 P
Shard 2 ISRShard 1 R
Master
ES Cluster
allocate_stale_primary cmd
Shard 1 ISR
What if something goes wrong?
Write replication (replica write failure)
Coordinator
node
Shard 1 ISR
Shard 1 P
Shard 1 RShard 1 ISR
Master
ES Cluster
Shard 1 R
What if something goes wrong?
Node failure (read)
Coordinator
node
Shard 1 P
Shard 1 R
Shard 2 PShard 2 R
Master
ES Cluster
Lucene level
How is write processed?
Elasticsearch
Lucene
Character filters Tokenizer Token filters File storage
How is write processed?
● Mapping character filter
● HTML strip character filter
● Pattern replace character filter
Character fiters
<p class=’paragraph’>Creates an
instance of <link>ObjectGenerator</link>
based on provided configuration.
Resulting <link>ObjectGenerator</link>
will try to convert configured output to
specified <code>objectType</code>.</p>
Creates an instance of ObjectGenerator
based on provided configuration.
Resulting ObjectGenerator will try to
convert configured output to specified
objectType.
How is write processed?
● Standard tokenizer
● Keyword tokenizer
● Letter tokenizer
● Lowercase tokenizer
● N gram tokenizer
● Edge n gram tokenizer
● Regular expression pattern tokenizer
Tokenizer
Creates an instance of ObjectGenerator
based on provided configuration.
Resulting ObjectGenerator will try to
convert configured output to specified
objectType.
[Creates, an, instance, of, ObjectGenerator,
based, on, provided, configuration,
Resulting, ObjectGenerator, will, try, to,
convert, configured, output, to, specified,
objectType]
How is write processed?
● Lower case filter
● English possessive filter
● Stop filter
● Synonym filter
● Reversed wildcard filter
● English minimal stem filter
Token filters
[Creates, an, instance, of, ObjectGenerator,
based, on, provided, configuration,
Resulting, ObjectGenerator, will, try, to,
convert, configured, output, to, specified,
objectType]
[creates, instance, ObjectGenerator,
based, provided, configuration, resulting,
try, convert, configured, output,
specified, objectType]
Inverted index
File storage - Logical view
● Difference between forward and inverted index
● Doc1: “Peter has a brown dog and a white cat”
● Doc2: “Mike has a black dog”
● Doc3: “Rachel has a brown cat”
Forward index
Doc1 brown, cat, dog, peter, white
Doc2 black, dog, mike
Doc3 brown, cat, rachel
Inverted index
0 black 1
1 brown 0, 2
2 cat 0, 2
3 dog 0, 1
4 mike 1
5 peter 0
6 rachel 2
7 white 0
Documents
0 Doc1 (Peter has a brown dog and a white cat.)
1 Doc2 (Mike has a black dog.)
2 Doc3 (Rachel has a brown cat.)
term
ordinal
terms
dict
postings
list
doc id document
Segment
Inverted index
File storage - Logical view (deleting documents)
Inverted index
0 black 1
1 brown 0, 2
2 cat 0, 2
3 dog 0, 1
4 mike 1
5 peter 0
6 rachel 2
7 white 0
Documents
0 Doc1 (Peter has a brown dog and a white cat.)
1 Doc2 (Mike has a black dog.)
2 Doc3 (Rachel has a brown cat.)
term
ordinal
terms
dict
postings
list
doc id document
Segment
Live documents
0
1
2
Inverted index
File storage - Logical view (merge segments)
Inverted index
0 black 1
1 brown 0, 2
2 cat 0, 2
3 dog 0, 1
4 mike 1
5 peter 0
6 rachel 2
7 white 0
Documents
0 Doc1
1 Doc2
2 Doc3
Segment 1
Inverted index
0 balloon 3
1 boy 3, 4
2 brown 4
3 cat 4
4 little 3, 4
5 red 3
Documents
3 Doc4
4 Doc5
Segment 2
Inverted index
0 balloon 3
1 black 1
2 boy 3, 4
3 brown 2, 4
4 cat 2, 4
5 little 3, 4
6 mike 1
7 rachel 2
8 red 3
Documents
1 Doc2
2 Doc3
3 Doc4
4 Doc5
Merged segment
Live documents
1
2
3
4
Live documents
0
1
2
Live documents
1
2
3
4
Elasticsearch
How does the write look like?
Logical view (write path, refresh & commit)
Lucene
write
Memory Disk
Seg
1
Seg
2
Seg
3
Seg
4
write
In-mem buffer
flush
Transaction log
Commit point
Seg
3
Seg
5
Seg
4
How does the read look like?
Lucene
Memory
read
merge
Seg1
Disk
Seg
1
Seg
2
Commit point
Seg
3
Seg
4
Seg2
Seg3
Seg4
response
Logical view (read path)
Compaction
Lucene
Disk
Seg1 Seg2 Seg3 Seg4
Compact
Seg5
Commit point
Lucene low level
Lucene codecs
● Abstraction over data format within files
● Keeps low level details away from Lucene
● File formats are codec-dependent
File formats
● Each index in separate UUD-named dir (Elasticsearch’s doing, prevent index corruption when recreating)
● Segments file (segments_1, segments_2, …)
● Lock file (write.lock)
Per-index files
File formats
● Segment info (.si) - lucene ver, num of docs, os, os ver, java ver, files included
● Term index (.tip) - Index into the Term Dictionary
● Term Dictionary (.tim) - Stores term info
● Postings (.pos) - Stores information where document is located within stored fields
● Field index (.fdx) - Index into Field Data
● Field Data (.fdt) - Stored fields for documents (real values)
● ...
Per-segment file
How does the read look like?
Term query
Term = tomato
How does the read look like?
File view - Term Index
● FST (Finite State Transducer)
● Stores term prefixes
● Map String -> Object
t / 1
b / 7
o / 1
r / 2
h / 3
e / 6
to = 2
thr = 6
br = 9
the = 10
be = 13
How does the read look like?
File view - Term Dictionary
● Jump to the given block (offset)
● Number of items within the block (25 - 48)
... ... ...
[Prefix = to]
Suffix Frequency Offset
ad 1 102
ast 1 135
aster 2 167
day 7 211
e 2 233
ilette 3 251
mato 8 287
nic 5 309
oth 3 355
Jump here
Not found
Not found
Not found
Not found
Not found
Not found
Found
How does the read look like?
File view - Postings lists
● Jump to the given offset in the postings list
● Encoded using modified FOR (Frame of Reference) delta
○ delta-encode
○ split into blocks of N = 128 values
○ bit packing per block
○ if remaining, encode with vInt
Example with N = 4
Offset Document ids
... ...
287 1, 3, 4, 6, 8, 20, 22, 26, 30, 31
... ...
Delta encode: 1, 2, 1, 2, 2, 12, 2, 4, 4, 1
Split to blocks: [1, 2, 1, 2] [2, 12, 2, 4] 4, 1
2 bits per value
total: 1 byte
4 bits per value
total: 2 bytes
vInt encoded
1 byte per value
total: 2 bytes
Uncompressed: 40 (10 * 4) bytes
Compressed: 5 (1 + 2 + 2) bytes
How does the read look like?
File view - Field Index & Field Data
● Stored sequentially
● Compressed using LZ4 in 16+KB blocks
Starting Doc id Offset
0 33
4 188
5 312
7 423
12 605
13 811
20 934
25 1084
Field Index
Field Data
[Offset = 33]
Doc 0
Doc 1
Doc 2
Doc 3
[Offset = 188]
Doc 4
[Offset = 312]
Doc 5
Doc 6
[Offset = 423]
Doc 7
...
16KB
16KB
16KB
16KB
References
● https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html
● https://lucene.apache.org/core/7_3_1/index.html
● https://www.elastic.co/blog/tracking-in-sync-shard-copies
● https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
● https://www.youtube.com/watch?v=T5RmMNDR5XI
● https://www.youtube.com/watch?v=c9O5_a50aOQ
● https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-cluster.html
● http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html
Thank you

More Related Content

What's hot

Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 

What's hot (20)

Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 

Similar to Elasticsearch Under the Hood

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch ArchitechtureAnurag Sharma
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data AnalysesAlaa Elhadba
 
Elasticsearch selected topics
Elasticsearch selected topicsElasticsearch selected topics
Elasticsearch selected topicsCube Solutions
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An OverviewRuby Shrestha
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS GlueLaercio Serra
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfcadejaumafiq
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsLucidworks
 

Similar to Elasticsearch Under the Hood (20)

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch Architechture
 
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
 
Elasticsearch selected topics
Elasticsearch selected topicsElasticsearch selected topics
Elasticsearch selected topics
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Xml session
Xml sessionXml session
Xml session
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS Glue
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data Analytics
 

More from SmartCat

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...SmartCat
 
Resilient event deduplication in Kafka by Vladimir Vajda
Resilient event deduplication in Kafka by Vladimir VajdaResilient event deduplication in Kafka by Vladimir Vajda
Resilient event deduplication in Kafka by Vladimir VajdaSmartCat
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5SmartCat
 
Journey through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicJourney through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicSmartCat
 
Ai pitfalls through so you don't have to
Ai pitfalls through so you don't have toAi pitfalls through so you don't have to
Ai pitfalls through so you don't have toSmartCat
 
Embryo selection using AI
Embryo selection using AIEmbryo selection using AI
Embryo selection using AISmartCat
 
HVAC optimisation using RL
HVAC optimisation using RLHVAC optimisation using RL
HVAC optimisation using RLSmartCat
 

More from SmartCat (7)

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...
 
Resilient event deduplication in Kafka by Vladimir Vajda
Resilient event deduplication in Kafka by Vladimir VajdaResilient event deduplication in Kafka by Vladimir Vajda
Resilient event deduplication in Kafka by Vladimir Vajda
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5
 
Journey through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicJourney through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko Kuveljic
 
Ai pitfalls through so you don't have to
Ai pitfalls through so you don't have toAi pitfalls through so you don't have to
Ai pitfalls through so you don't have to
 
Embryo selection using AI
Embryo selection using AIEmbryo selection using AI
Embryo selection using AI
 
HVAC optimisation using RL
HVAC optimisation using RLHVAC optimisation using RL
HVAC optimisation using RL
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Elasticsearch Under the Hood

  • 1.
  • 4. What is it? What is it good for? ● Full text search ● Scalable and robust full text search
  • 5. What is Lucene? What it has to do with Elasticsearch? ● Full text search library ● Elasticsearch is just a wrapper around it which provides scalability, durability and REST API
  • 6. API level Index document POST doc/default { "class": "ConfigurationParser", "method": "build", "description": "<p class=’paragraph’>Creates an instance of <link>ObjectGenerator</link> based on provided configuration. Resulting <link>ObjectGenerator</link> will try to convert configured output to specified <code>objectType</code>.</p>" }
  • 7. API level Update document POST doc/default/1/_update { "doc": { "description": "New description value" } }
  • 9. API level Search GET doc/default/_search { "query": { "match": { "title": { "query": "ObjectGenerator" } } } }
  • 11. How does that look like? Transaction Log LuceneREST ES node UUID name ES ES Cluster “Elasticsearch” Transaction Log LuceneREST ES node UUID name ES Transaction Log LuceneREST ES node UUID name ES Transaction Log LuceneREST ES node UUID name ES
  • 12. All nodes are of the same type? ● Master node ● Data node ● Ingest node ● Tribe node ● Coordinator node ● Default case
  • 14. Replicas Node Node Index - S1 R1 Doc1 Doc4 Doc8 Node Node Node Index - S1 R2 Doc1 Doc4 Doc8 Index - S1 R3 Doc1 Doc4 Doc8 Index - S1 R4 Doc1 Doc4 Doc8 Index - S1 P Doc1 Doc4 Doc8
  • 15. Scaling ● Single node cluster with 3 shards and 1 replica ● Unassigned shards problem Node 1 S1 P S2 P S3 P Single node cluster State:
  • 16. Scaling out ● Two node cluster with 3 shards and 1 replica ● To prevent unassigned shards: number of nodes > number of replicas + 1 Node 1 S1 P S2 P S3 P Two node cluster State: Node 2 S1 R S2 R S3 R
  • 17. Scaling out ● Three node cluster with 3 shards and 1 replica ● Load spread across all nodes Node 1 S1 P S2 P Three node cluster State: Node 2 S2 R Node 3 S1 R S3 P S3 R
  • 18. Scaling out ● Seven node cluster with 3 shards and 1 replica, one node unused ● Increase number of shards (not possible) ● Increase replication factor (possible on running cluster) Node 1 S1 P Seven node cluster State: Node 2 S2 P Node 3 S3 P Node 4 S1 R Node 5 S2 R Node 6 S3 R Node 7
  • 19. Scaling in ● Make sure you have enough nodes to support replication factor when node is killed ● Wait for green status ● If necessary, lower the replication factor
  • 20. Replacing the node ● Same as scale out / scale in
  • 21. How does the write look like? Node 2 S2 P S3 R Node 4 S1 R S4 P Node 3 S1 ISR S2 ISR Node 5 S1 ISR S4 ISR Node 6 S1 P S3 ISR S4 ISR Node 1 S2 ISR S3 P ES Cluster Generate doc ID if not present Hash ID to determine replication group (routing param) Coordinator node
  • 22. How does the read look like? Node 2 S2 P S3 R Node 4 S1 R S4 P Node 3 S1 ISR S2 ISR Node 5 S1 ISR S4 ISR Node 6 S1 P S3 ISR S4 ISR Node 1 S2 ISR S3 P ES Cluster Coordinator node Resolve search request to relevant shards Combine the results
  • 23. What if something goes wrong? ● Network partition (one master) ● Network partition (two masters) ● Network partition (three masters) ● Primary shard node failure (in sync replicas available) ● Primary shard node failure (no in sync replicas available) ● Write replication (replica write failure) ● Node failure (read)
  • 24. What if something goes wrong? Network partition (one master) ● discovery.zen.no_master_block Shard 2 P Shard 2 R Shard 1 P Shard 2 RShard 1 R Master ES Cluster
  • 25. What if something goes wrong? Network partition (two masters) ● discovery.zen.minimum_master_nodes Shard 2 P Master Shard 1 P Shard 2 RShard 1 R Master ES Cluster
  • 26. What if something goes wrong? Network partition (three masters) ● discovery.zen.minimum_master_nodes Shard 2 P Master Shard 1 P MasterShard 1 R Master ES Cluster
  • 27. What if something goes wrong? Primary shard node failure (in sync replicas available) Shard 2 P Shard 1 ISR Shard 1 P Shard 2 ISRShard 1 R Master ES Cluster Shard 1 P (new)
  • 28. What if something goes wrong? Primary shard node failure (no in sync replicas available) Shard 2 P Shard 1 R Shard 1 P Shard 2 ISRShard 1 R Master ES Cluster allocate_stale_primary cmd Shard 1 ISR
  • 29. What if something goes wrong? Write replication (replica write failure) Coordinator node Shard 1 ISR Shard 1 P Shard 1 RShard 1 ISR Master ES Cluster Shard 1 R
  • 30. What if something goes wrong? Node failure (read) Coordinator node Shard 1 P Shard 1 R Shard 2 PShard 2 R Master ES Cluster
  • 32. How is write processed? Elasticsearch Lucene Character filters Tokenizer Token filters File storage
  • 33. How is write processed? ● Mapping character filter ● HTML strip character filter ● Pattern replace character filter Character fiters <p class=’paragraph’>Creates an instance of <link>ObjectGenerator</link> based on provided configuration. Resulting <link>ObjectGenerator</link> will try to convert configured output to specified <code>objectType</code>.</p> Creates an instance of ObjectGenerator based on provided configuration. Resulting ObjectGenerator will try to convert configured output to specified objectType.
  • 34. How is write processed? ● Standard tokenizer ● Keyword tokenizer ● Letter tokenizer ● Lowercase tokenizer ● N gram tokenizer ● Edge n gram tokenizer ● Regular expression pattern tokenizer Tokenizer Creates an instance of ObjectGenerator based on provided configuration. Resulting ObjectGenerator will try to convert configured output to specified objectType. [Creates, an, instance, of, ObjectGenerator, based, on, provided, configuration, Resulting, ObjectGenerator, will, try, to, convert, configured, output, to, specified, objectType]
  • 35. How is write processed? ● Lower case filter ● English possessive filter ● Stop filter ● Synonym filter ● Reversed wildcard filter ● English minimal stem filter Token filters [Creates, an, instance, of, ObjectGenerator, based, on, provided, configuration, Resulting, ObjectGenerator, will, try, to, convert, configured, output, to, specified, objectType] [creates, instance, ObjectGenerator, based, provided, configuration, resulting, try, convert, configured, output, specified, objectType]
  • 36. Inverted index File storage - Logical view ● Difference between forward and inverted index ● Doc1: “Peter has a brown dog and a white cat” ● Doc2: “Mike has a black dog” ● Doc3: “Rachel has a brown cat” Forward index Doc1 brown, cat, dog, peter, white Doc2 black, dog, mike Doc3 brown, cat, rachel Inverted index 0 black 1 1 brown 0, 2 2 cat 0, 2 3 dog 0, 1 4 mike 1 5 peter 0 6 rachel 2 7 white 0 Documents 0 Doc1 (Peter has a brown dog and a white cat.) 1 Doc2 (Mike has a black dog.) 2 Doc3 (Rachel has a brown cat.) term ordinal terms dict postings list doc id document Segment
  • 37. Inverted index File storage - Logical view (deleting documents) Inverted index 0 black 1 1 brown 0, 2 2 cat 0, 2 3 dog 0, 1 4 mike 1 5 peter 0 6 rachel 2 7 white 0 Documents 0 Doc1 (Peter has a brown dog and a white cat.) 1 Doc2 (Mike has a black dog.) 2 Doc3 (Rachel has a brown cat.) term ordinal terms dict postings list doc id document Segment Live documents 0 1 2
  • 38. Inverted index File storage - Logical view (merge segments) Inverted index 0 black 1 1 brown 0, 2 2 cat 0, 2 3 dog 0, 1 4 mike 1 5 peter 0 6 rachel 2 7 white 0 Documents 0 Doc1 1 Doc2 2 Doc3 Segment 1 Inverted index 0 balloon 3 1 boy 3, 4 2 brown 4 3 cat 4 4 little 3, 4 5 red 3 Documents 3 Doc4 4 Doc5 Segment 2 Inverted index 0 balloon 3 1 black 1 2 boy 3, 4 3 brown 2, 4 4 cat 2, 4 5 little 3, 4 6 mike 1 7 rachel 2 8 red 3 Documents 1 Doc2 2 Doc3 3 Doc4 4 Doc5 Merged segment Live documents 1 2 3 4 Live documents 0 1 2 Live documents 1 2 3 4
  • 39. Elasticsearch How does the write look like? Logical view (write path, refresh & commit) Lucene write Memory Disk Seg 1 Seg 2 Seg 3 Seg 4 write In-mem buffer flush Transaction log Commit point Seg 3 Seg 5 Seg 4
  • 40. How does the read look like? Lucene Memory read merge Seg1 Disk Seg 1 Seg 2 Commit point Seg 3 Seg 4 Seg2 Seg3 Seg4 response Logical view (read path)
  • 41. Compaction Lucene Disk Seg1 Seg2 Seg3 Seg4 Compact Seg5 Commit point
  • 43. Lucene codecs ● Abstraction over data format within files ● Keeps low level details away from Lucene ● File formats are codec-dependent
  • 44. File formats ● Each index in separate UUD-named dir (Elasticsearch’s doing, prevent index corruption when recreating) ● Segments file (segments_1, segments_2, …) ● Lock file (write.lock) Per-index files
  • 45. File formats ● Segment info (.si) - lucene ver, num of docs, os, os ver, java ver, files included ● Term index (.tip) - Index into the Term Dictionary ● Term Dictionary (.tim) - Stores term info ● Postings (.pos) - Stores information where document is located within stored fields ● Field index (.fdx) - Index into Field Data ● Field Data (.fdt) - Stored fields for documents (real values) ● ... Per-segment file
  • 46. How does the read look like? Term query Term = tomato
  • 47. How does the read look like? File view - Term Index ● FST (Finite State Transducer) ● Stores term prefixes ● Map String -> Object t / 1 b / 7 o / 1 r / 2 h / 3 e / 6 to = 2 thr = 6 br = 9 the = 10 be = 13
  • 48. How does the read look like? File view - Term Dictionary ● Jump to the given block (offset) ● Number of items within the block (25 - 48) ... ... ... [Prefix = to] Suffix Frequency Offset ad 1 102 ast 1 135 aster 2 167 day 7 211 e 2 233 ilette 3 251 mato 8 287 nic 5 309 oth 3 355 Jump here Not found Not found Not found Not found Not found Not found Found
  • 49. How does the read look like? File view - Postings lists ● Jump to the given offset in the postings list ● Encoded using modified FOR (Frame of Reference) delta ○ delta-encode ○ split into blocks of N = 128 values ○ bit packing per block ○ if remaining, encode with vInt Example with N = 4 Offset Document ids ... ... 287 1, 3, 4, 6, 8, 20, 22, 26, 30, 31 ... ... Delta encode: 1, 2, 1, 2, 2, 12, 2, 4, 4, 1 Split to blocks: [1, 2, 1, 2] [2, 12, 2, 4] 4, 1 2 bits per value total: 1 byte 4 bits per value total: 2 bytes vInt encoded 1 byte per value total: 2 bytes Uncompressed: 40 (10 * 4) bytes Compressed: 5 (1 + 2 + 2) bytes
  • 50. How does the read look like? File view - Field Index & Field Data ● Stored sequentially ● Compressed using LZ4 in 16+KB blocks Starting Doc id Offset 0 33 4 188 5 312 7 423 12 605 13 811 20 934 25 1084 Field Index Field Data [Offset = 33] Doc 0 Doc 1 Doc 2 Doc 3 [Offset = 188] Doc 4 [Offset = 312] Doc 5 Doc 6 [Offset = 423] Doc 7 ... 16KB 16KB 16KB 16KB
  • 51. References ● https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html ● https://lucene.apache.org/core/7_3_1/index.html ● https://www.elastic.co/blog/tracking-in-sync-shard-copies ● https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up ● https://www.youtube.com/watch?v=T5RmMNDR5XI ● https://www.youtube.com/watch?v=c9O5_a50aOQ ● https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-cluster.html ● http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html