Introduce ElasticSearch
Minsoo Jun
Agenda
What is ElasticSearch
ElasticSearch Composition
Understand of ElasticSearch Performance
RDB with ElasticSearch
End
What is ElasticSearch
• Lucene-based open source search engine.
• Inverted Index
• Fast full-text searches.
• Distributed & highly available search engine.
• RESTful search
• Real time search & Analytics
Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java.
It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
How does ElasticSearch work?
Compare With RDB
RDB ElasticSearch
Database Indices
Tables Types
Rows Documents
Columns Fields
Index Analyze
Primary key _id
RDB ElasticSearch
Schema Mapping
Physical Partition Shard
Logical Partition Route
Relational Parent/Child, Nested
SQL Query DSL
B*Tree index (Default Index) Inverted Index
How does ElasticSearch work?
Index (inverted index)
Row# Name Address color
1 minsoo Tokyo nerima-ku brown, blue
2 elastic Saitama red, brown
3 search busan blue, yellow
b : y
red:yello
w
blue :
brown
1 3 1 3 2 3
term Row 1 Row 2 Row 3
brown ◉ ◉
blue ◉ ◉
red ◉
yellow ◉
B*Tree Inverted
Agenda
What is ElasticSearch
ElasticSearch Composition
Understand of ElasticSearch Performance
RDB with ElasticSearch
End
How does ElasticSearch work?
Composition
Cluster
Node
Indice
Shard
Shard
Shard
Node
Indice
Shard
Shard
Shard
Node
Indice
Shard
Shard
Shard
Index
Type
Document
filed:value
filed:value
filed:value
Type
Document
filed:value
filed:value
filed:value
Type
Document
filed:value
filed:value
filed:value
Physical composition Logical composition
How does ElasticSearch work?
Nodes
node.master : true
Node: Master-eligible
node.data : true
Node: Data
node.ingest : true
Node: Ingest
tribe : *
Node: Tribe
* ElasticSearch 5.X
Cluster-wide Action, Creating or Deleting an Index, Deciding shards
allocate
Handle data related operations like CRUD, Search, Aggregations
There operations are I/O, Memory, CPU-intensive.
Execute pre-processing pipelines
Client across multiple clusters.
How does ElasticSearch work?
Nodes Composition Example
node.master : true
Node: Master-eligible
node.data : true
Node: Data
node.ingest : true
Node: Ingest
tribe : *
Node: Tribe
node.master : true
Node: Master-eligible
node.master : true
Node: Master-eligible
node.data : true
Node: Data
node.data : true
Node: Data
node.data : true
Node: Data
node.data : true
Node: Data
node.data : true
Node: Data
node.data : true
Node: Data
node.ingest : true
Node: Ingest
Cluster A
Cluster B
Node.xxxx: false
Node: coordinating
How does ElasticSearch work?
Shard replication
POST /my_index/_settings
{
“number_of_replicas”: 1
}
POST /my_index/_settings
{
“number_of_replicas”: 2
}
How does ElasticSearch work?
Creating, indexing and deleting a dcoument
1. The client sends a create, index, or
delete request to Node 1.
2. The node uses the document’s _id to
determine that the document belongs to
shard 0. It forwards the request to Node 3,
where the primary copy of shard 0 is
currently allocated.
3. Node 3 executes the request on
the primary shard. If it is successful,
it forwards the request in parallel to the replica
shards on Node 1 and Node 2. Once all of
the replica shards report success, Node 3
reports success to the coordinating node,
which reports success to the client
How does ElasticSearch work?
Retrieving a Document
1. The client sends a get request to Node 1.
2. The node uses the document’s _id to
determine that the document belongs to
shard 0. Copies of shard 0 exist on
all three nodes. On this occasion,
it forwards the request to Node 2.
3. Node 2 returns the document to Node 1,
which returns the document to the client.
How does ElasticSearch work?
Query Phase
1.The client sends a search request to Node 3,
which creates an empty priority queue of size
from + size.
2. Node 3 forwards the search request to
a primary or replica copy of every shard in
the index. Each shard executes the query locally
and adds the results into a local sorted priority
queue of size from + size.
3. Each shard returns the doc IDs and sort
values of all the docs in its priority queue
to the coordinating node, Node 3, which merges
these values into its own priority queue to
produce a globally sorted list of results.
GET /_search
{
"from": 90
, "size": 10
}
How does ElasticSearch work?
Fetch Phase
1. The coordinating node identifies which
documents need to be fetched and issues
a multi GET request to the relevant shards.
2. Each shard loads the documents and enriches
them, if required, and then returns
the documents to the coordinating node.
3. Once all documents have been fetched,
the coordinating node returns the results to
the client.
How does ElasticSearch work?
Composition & Shard tips
Number_of_shards >= number_of_data_nodes
Shard design
Number_of_replica <= number_of_data_nodes -1
Shard sizing
Max number of shards per the Index : >= 200
Max a shard size : 20 ~ 50 GB
Min a shard size : ~ 3 GB
System settings
ulimit –n 65536
permanently /etc/security/limits.conf
Virtual memory
sysctl –w vm.max_map_count=262144
permanently /etc/sysctl.conf
Disable swapping
Bootstrap.memory_lock: true
config/elasticsearch.yml
Number of threads
ulimit –u 2048
permanently /etc/security/limits.conf
jvm.options
ES_JAVA_OPTS=“-Xms2g –Xmx2g”
Max memory must be under half number of OS memory
Agenda
What is ElasticSearch
ElasticSearch Composition
Understand of ElasticSearch Performance
RDB with ElasticSearch
End
Understand of the ElasticSearch Performance
Performance keys
Equipment perspective Document (data) perspective Service perspective
Network Bandwidth ?
Disk I/O ?
RAM ?
CPU cores ?
Document size ?
Total Index data size ?
Data size increase ?
Store period ?
Analyzer ?
Analyze fields ?
Indexed field size ?
Boosting ?
Realtime or batch ?
Queries ?
Agenda
What is ElasticSearch
ElasticSearch Composition
Understand of ElasticSearch Performance
RDB with ElasticSearch
End
How to connect to RDB
Logstash
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "mysql" parameters => { "favorite_artist" => "Beethoven" }
schedule => "* * * * *" statement => "SELECT * from songs where artist = :favorite_artist"
Timing
* 5 * 1-3 *
Analysis
Analysis & Analyzer
"The QUICK brown foxes jumped over the lazy dog!"
Analysis
[ quick, brown, fox, jump, over, lazy, dog]
Tokenizer (n-gram)
[ qu, ui, ic, ck]
Token filter
[ QU, ui, ic]
Character filters
[٠١٢٣٤٥٦٧٨٩] [0123456789]
Analyzer
Analysis
Analyzer & Plugin for Japanese
Tokenizer
Standard Tokenizer The standard tokenizer divides text into terms on word boundaries
NGram Tokenizer The ngram tokenizer can break up text into words when it encounters any of a list of
specified characters
Keyword Tokenizer The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and
outputs the exact same text as a single term
Pattern Tokenizer The pattern tokenizer uses a regular expression to either split text into terms whenever
it matches a word separator, or to capture matching text as terms.
Plugin
Kuromoji Plugin The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into
elasticsearch.
Kuromoji analyzer kuromoji_tokenizer
Kuromoji token filter kuromoji_baseform, kuromoji_part_of_speech, cjk_width, ja_stop, kuromoji_stemmer ,
lowercase
END
{
“name” : “minsoo.jun”,
“email” : “minsoo.jun@rakuten.com”
“department” : “TRVDD”,
“group” : “Search Platform”
“language” : [“java”,”ansible”,”SQL”,”korean”],
“database”: [”oracle”,”elasticsearch”,”mongodb”]
}

About elasticsearch

  • 1.
  • 2.
    Agenda What is ElasticSearch ElasticSearchComposition Understand of ElasticSearch Performance RDB with ElasticSearch End
  • 3.
    What is ElasticSearch •Lucene-based open source search engine. • Inverted Index • Fast full-text searches. • Distributed & highly available search engine. • RESTful search • Real time search & Analytics Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
  • 4.
    How does ElasticSearchwork? Compare With RDB RDB ElasticSearch Database Indices Tables Types Rows Documents Columns Fields Index Analyze Primary key _id RDB ElasticSearch Schema Mapping Physical Partition Shard Logical Partition Route Relational Parent/Child, Nested SQL Query DSL B*Tree index (Default Index) Inverted Index
  • 5.
    How does ElasticSearchwork? Index (inverted index) Row# Name Address color 1 minsoo Tokyo nerima-ku brown, blue 2 elastic Saitama red, brown 3 search busan blue, yellow b : y red:yello w blue : brown 1 3 1 3 2 3 term Row 1 Row 2 Row 3 brown ◉ ◉ blue ◉ ◉ red ◉ yellow ◉ B*Tree Inverted
  • 6.
    Agenda What is ElasticSearch ElasticSearchComposition Understand of ElasticSearch Performance RDB with ElasticSearch End
  • 7.
    How does ElasticSearchwork? Composition Cluster Node Indice Shard Shard Shard Node Indice Shard Shard Shard Node Indice Shard Shard Shard Index Type Document filed:value filed:value filed:value Type Document filed:value filed:value filed:value Type Document filed:value filed:value filed:value Physical composition Logical composition
  • 8.
    How does ElasticSearchwork? Nodes node.master : true Node: Master-eligible node.data : true Node: Data node.ingest : true Node: Ingest tribe : * Node: Tribe * ElasticSearch 5.X Cluster-wide Action, Creating or Deleting an Index, Deciding shards allocate Handle data related operations like CRUD, Search, Aggregations There operations are I/O, Memory, CPU-intensive. Execute pre-processing pipelines Client across multiple clusters.
  • 9.
    How does ElasticSearchwork? Nodes Composition Example node.master : true Node: Master-eligible node.data : true Node: Data node.ingest : true Node: Ingest tribe : * Node: Tribe node.master : true Node: Master-eligible node.master : true Node: Master-eligible node.data : true Node: Data node.data : true Node: Data node.data : true Node: Data node.data : true Node: Data node.data : true Node: Data node.data : true Node: Data node.ingest : true Node: Ingest Cluster A Cluster B Node.xxxx: false Node: coordinating
  • 10.
    How does ElasticSearchwork? Shard replication POST /my_index/_settings { “number_of_replicas”: 1 } POST /my_index/_settings { “number_of_replicas”: 2 }
  • 11.
    How does ElasticSearchwork? Creating, indexing and deleting a dcoument 1. The client sends a create, index, or delete request to Node 1. 2. The node uses the document’s _id to determine that the document belongs to shard 0. It forwards the request to Node 3, where the primary copy of shard 0 is currently allocated. 3. Node 3 executes the request on the primary shard. If it is successful, it forwards the request in parallel to the replica shards on Node 1 and Node 2. Once all of the replica shards report success, Node 3 reports success to the coordinating node, which reports success to the client
  • 12.
    How does ElasticSearchwork? Retrieving a Document 1. The client sends a get request to Node 1. 2. The node uses the document’s _id to determine that the document belongs to shard 0. Copies of shard 0 exist on all three nodes. On this occasion, it forwards the request to Node 2. 3. Node 2 returns the document to Node 1, which returns the document to the client.
  • 13.
    How does ElasticSearchwork? Query Phase 1.The client sends a search request to Node 3, which creates an empty priority queue of size from + size. 2. Node 3 forwards the search request to a primary or replica copy of every shard in the index. Each shard executes the query locally and adds the results into a local sorted priority queue of size from + size. 3. Each shard returns the doc IDs and sort values of all the docs in its priority queue to the coordinating node, Node 3, which merges these values into its own priority queue to produce a globally sorted list of results. GET /_search { "from": 90 , "size": 10 }
  • 14.
    How does ElasticSearchwork? Fetch Phase 1. The coordinating node identifies which documents need to be fetched and issues a multi GET request to the relevant shards. 2. Each shard loads the documents and enriches them, if required, and then returns the documents to the coordinating node. 3. Once all documents have been fetched, the coordinating node returns the results to the client.
  • 15.
    How does ElasticSearchwork? Composition & Shard tips Number_of_shards >= number_of_data_nodes Shard design Number_of_replica <= number_of_data_nodes -1 Shard sizing Max number of shards per the Index : >= 200 Max a shard size : 20 ~ 50 GB Min a shard size : ~ 3 GB System settings ulimit –n 65536 permanently /etc/security/limits.conf Virtual memory sysctl –w vm.max_map_count=262144 permanently /etc/sysctl.conf Disable swapping Bootstrap.memory_lock: true config/elasticsearch.yml Number of threads ulimit –u 2048 permanently /etc/security/limits.conf jvm.options ES_JAVA_OPTS=“-Xms2g –Xmx2g” Max memory must be under half number of OS memory
  • 16.
    Agenda What is ElasticSearch ElasticSearchComposition Understand of ElasticSearch Performance RDB with ElasticSearch End
  • 17.
    Understand of theElasticSearch Performance Performance keys Equipment perspective Document (data) perspective Service perspective Network Bandwidth ? Disk I/O ? RAM ? CPU cores ? Document size ? Total Index data size ? Data size increase ? Store period ? Analyzer ? Analyze fields ? Indexed field size ? Boosting ? Realtime or batch ? Queries ?
  • 18.
    Agenda What is ElasticSearch ElasticSearchComposition Understand of ElasticSearch Performance RDB with ElasticSearch End
  • 19.
    How to connectto RDB Logstash input { jdbc { jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb" jdbc_user => "mysql" parameters => { "favorite_artist" => "Beethoven" } schedule => "* * * * *" statement => "SELECT * from songs where artist = :favorite_artist" Timing * 5 * 1-3 *
  • 20.
    Analysis Analysis & Analyzer "TheQUICK brown foxes jumped over the lazy dog!" Analysis [ quick, brown, fox, jump, over, lazy, dog] Tokenizer (n-gram) [ qu, ui, ic, ck] Token filter [ QU, ui, ic] Character filters [٠١٢٣٤٥٦٧٨٩] [0123456789] Analyzer
  • 21.
    Analysis Analyzer & Pluginfor Japanese Tokenizer Standard Tokenizer The standard tokenizer divides text into terms on word boundaries NGram Tokenizer The ngram tokenizer can break up text into words when it encounters any of a list of specified characters Keyword Tokenizer The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and outputs the exact same text as a single term Pattern Tokenizer The pattern tokenizer uses a regular expression to either split text into terms whenever it matches a word separator, or to capture matching text as terms. Plugin Kuromoji Plugin The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch. Kuromoji analyzer kuromoji_tokenizer Kuromoji token filter kuromoji_baseform, kuromoji_part_of_speech, cjk_width, ja_stop, kuromoji_stemmer , lowercase
  • 22.
    END { “name” : “minsoo.jun”, “email”: “minsoo.jun@rakuten.com” “department” : “TRVDD”, “group” : “Search Platform” “language” : [“java”,”ansible”,”SQL”,”korean”], “database”: [”oracle”,”elasticsearch”,”mongodb”] }

Editor's Notes

  • #10 https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
  • #11 https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html
  • #12 https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-write.html
  • #13 https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-read.html
  • #14 https://www.elastic.co/guide/en/elasticsearch/guide/current/_query_phase.html
  • #15 https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html
  • #16 https://www.elastic.co/guide/en/elasticsearch/reference/master/setting-system-settings.html
  • #18 https://www.elastic.co/guide/en/elasticsearch/reference/master/setting-system-settings.html
  • #20 https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
  • #21 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html
  • #22 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html