SlideShare a Scribd company logo
Elasticsearch
Elasticsearch Intro
By Alaa Elhadba
Table of Contents
Why Elasticsearch
Why Elasticsearch
✓
✓
✓
✓
Elasticsearch at scale
Index / Type
- An index is a collection of documents that should be grouped together for a
common reason.
- A type is a collection of documents all share an identical (or very similar)
schema
Sharding
PUT http:localhost:9200/reviews
{
"settings" :
"index" : {
"number_of_shards" : 5
"number_of_replicas" : 1
}
}
}'
Talking to data
Distribution
Elasticsearch
node
Cluster_state: yellow
Scaling
Cluster
Cluster_state: yellow
Replication
Cluster
Cluster_state: Green
Replication
Cluster
Cluster_state: Green
Replication
Cluster
Cluster_state: Green
Replication
Cluster
Cluster_state: Red
Data Modeling
Schema
Type:
◆
Index:
◆
◆
◆
Doc_values:
◆
PUT /reviews_v3/reviews/_mapping
Relationships
● Application Side Joins
● Parent-Child
● Nested objects
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ●
●
●
●
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ● Parent-child queries can be 5 to 10
times slower than the equivalent
nested query!
●
●
●
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ●
●
●
●
●
●
Catwalk Data Model
Searching
Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance _score, which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance _score, which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
Terms Query Example
POST /{index}/{type}/_search
Unstructured Search (Full Text)
Quick brown foxes leap over lazy dogs in summer
Quick, brown, foxes, leap, over, lazy, dogs, in, summer
Quick, brown, foxes, leap, lazy, dogs, summer
Quick, brown, fox, leap, lazy, dog, summer
fast, brown, fox, jump, lazy, dog, summer
tsar -> star
TERM | DOC1 | DOC2
-----------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
-----------------------
Inverted Index
Relevance
Scoring & Relevance in Full-Text Search
Relevance is the algorithm to calculate how similar the contents of a field to a query.
TF/IDF
Term Frequency
How often does the term appear in the field?
Inverse Document Frequency
How often does each term appear in the index?
Field Length Norm
How long is the field?
Vector Space Model
The vector space model provides a way of
comparing a multiterm query against a document.
- The model represents both the document and the
query as vectors.
Vector Space Model
1. I am happy in summer.
2. After Christmas I’m a hippopotamus.
3. The happy hippopotamus helped Harry.
- By measuring the angle between the query vector
and the document vector, it is possible to assign a
relevance score to each document.
- If The angle between a document and the query is
large, so it is of low relevance.
Constant Score
Field Value Factor
Field Value Factor
Script Scoring
Catwalk Custom Scoring
Catwalk Scoring Function
Aggregations
Aggregation
Search Analytics
Business Requirement “Help me find the best
documents ?”
“What do theses documents
tell me about my business ?”
Enablers Matching, Relevance,
Filtering, Auto-completion,...
Summaries, Patterns,
Trends, Outliers, Predictions,
Visualization
- Aggregations help build complex summaries & analytics of the indexed data.
Aggregation
Terms
Significant Terms
Bucket Aggregations
Nested Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Article Reviews
Article Reviews
Article Reviews
Significant Terms
What’s uncommonly common
about this sub-group ?
Significant Terms
- Significant_terms analyzes your data and finds terms that appear with a frequency that is
statistically anomalous compared to the background data.
- It can uncover surprisingly sophisticated trends and correlation in your data.
- Used in discovering anomalies
Significant Terms
Summarisehow their style differ
from everyone else
Find all people who like these
products
Significant Terms
Kibana: Data Visualization
Kibana
ElasticSearch in Pdp (Vegas)
The New PDP
Reviews Catwalk
Architecture
Links

More Related Content

What's hot

Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Sematext Group, Inc.
 

What's hot (20)

Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysis
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSight
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
 
DynamoDB Deep Dive
DynamoDB Deep DiveDynamoDB Deep Dive
DynamoDB Deep Dive
 

Similar to Vegas ES

04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 

Similar to Vegas ES (20)

ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Nationwide Splunk Ninjas!
Nationwide Splunk Ninjas!Nationwide Splunk Ninjas!
Nationwide Splunk Ninjas!
 
Amazon cloud search comparison report
Amazon cloud search comparison reportAmazon cloud search comparison report
Amazon cloud search comparison report
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
R tutorial
R tutorialR tutorial
R tutorial
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. Elasticsearch
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Query processing System
Query processing SystemQuery processing System
Query processing System
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenches
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Cluster Analysis - Keyword Clustering
Cluster Analysis -  Keyword ClusteringCluster Analysis -  Keyword Clustering
Cluster Analysis - Keyword Clustering
 

Vegas ES