Elasticsearch
Elasticsearch Intro
By Alaa Elhadba
Table of Contents
Why Elasticsearch
Why Elasticsearch
✓
✓
✓
✓
Elasticsearch at scale
Index / Type
- An index is a collection of documents that should be grouped together for a
common reason.
- A type is a collection of documents all share an identical (or very similar)
schema
Sharding
PUT http:localhost:9200/reviews
{
"settings" :
"index" : {
"number_of_shards" : 5
"number_of_replicas" : 1
}
}
}'
Talking to data
Distribution
Elasticsearch
node
Cluster_state: yellow
Scaling
Cluster
Cluster_state: yellow
Replication
Cluster
Cluster_state: Green
Replication
Cluster
Cluster_state: Green
Replication
Cluster
Cluster_state: Green
Replication
Cluster
Cluster_state: Red
Data Modeling
Schema
Type:
◆
Index:
◆
◆
◆
Doc_values:
◆
PUT /reviews_v3/reviews/_mapping
Relationships
● Application Side Joins
● Parent-Child
● Nested objects
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ●
●
●
●
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ● Parent-child queries can be 5 to 10
times slower than the equivalent
nested query!
●
●
●
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ●
●
●
●
●
●
Catwalk Data Model
Searching
Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance _score, which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance _score, which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
Terms Query Example
POST /{index}/{type}/_search
Unstructured Search (Full Text)
Quick brown foxes leap over lazy dogs in summer
Quick, brown, foxes, leap, over, lazy, dogs, in, summer
Quick, brown, foxes, leap, lazy, dogs, summer
Quick, brown, fox, leap, lazy, dog, summer
fast, brown, fox, jump, lazy, dog, summer
tsar -> star
TERM | DOC1 | DOC2
-----------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
-----------------------
Inverted Index
Relevance
Scoring & Relevance in Full-Text Search
Relevance is the algorithm to calculate how similar the contents of a field to a query.
TF/IDF
Term Frequency
How often does the term appear in the field?
Inverse Document Frequency
How often does each term appear in the index?
Field Length Norm
How long is the field?
Vector Space Model
The vector space model provides a way of
comparing a multiterm query against a document.
- The model represents both the document and the
query as vectors.
Vector Space Model
1. I am happy in summer.
2. After Christmas I’m a hippopotamus.
3. The happy hippopotamus helped Harry.
- By measuring the angle between the query vector
and the document vector, it is possible to assign a
relevance score to each document.
- If The angle between a document and the query is
large, so it is of low relevance.
Constant Score
Field Value Factor
Field Value Factor
Script Scoring
Catwalk Custom Scoring
Catwalk Scoring Function
Aggregations
Aggregation
Search Analytics
Business Requirement “Help me find the best
documents ?”
“What do theses documents
tell me about my business ?”
Enablers Matching, Relevance,
Filtering, Auto-completion,...
Summaries, Patterns,
Trends, Outliers, Predictions,
Visualization
- Aggregations help build complex summaries & analytics of the indexed data.
Aggregation
Terms
Significant Terms
Bucket Aggregations
Nested Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Article Reviews
Article Reviews
Article Reviews
Significant Terms
What’s uncommonly common
about this sub-group ?
Significant Terms
- Significant_terms analyzes your data and finds terms that appear with a frequency that is
statistically anomalous compared to the background data.
- It can uncover surprisingly sophisticated trends and correlation in your data.
- Used in discovering anomalies
Significant Terms
Summarisehow their style differ
from everyone else
Find all people who like these
products
Significant Terms
Kibana: Data Visualization
Kibana
ElasticSearch in Pdp (Vegas)
The New PDP
Reviews Catwalk
Architecture
Links

Vegas ES