Aggregations - The Elasticsearch "GROUP BY"

v
Aggregations
The Elasticsearch GROUP BY
John Sobanski – VP of AI/ML

About Me
John Sobanski
○VP AI/ML at Pyramid Systems Inc.
○Elastic Certified Engineer (2020, 2022)
○https://soban.ski

Agenda
●Background - Relational Database Management Systems
●Aggregations – The Elasticsearch GROUP BY
○Why use the Elasticsearch Aggregation API
○1st Demo - Execute GROUP BY operations via Elasticsearch
aggregations
●Elasticsearch Aggregations Drive Time Series Data Visualization
○BUCKETS in Elasticsearch
○2nd Demo - Time Series Data Visualization with Kibana

Relational Database Management System

Series Operations
2 Sum 57
5 Product 40000
20 Min 2
20 Max 20
10 Median 10
Mean 11.4

RDBMS Series Operations
●In the traditional Relational Database Management System (RDBMS) world, SQL databases
use GROUP BY syntax to group rows with similar values into summary rows.
●The query, "find the number of web page hits per country," for example, represents a
typical GROUP BY operation.
This table records the number of hits to my
site [https://soban.ski], broken down by time
zone.

●Further summarize the table to record "hits per country" via
a GROUP BY operation
○Collapse the Time zones into parent countries.
SELECT COUNTRY, SUM(HITS) FROM timezone_hits GROUP BY COUNTRY;
SQL Syntax:

RDBMS vs. Elasticsearch
●Even though Elasticsearch does not use the row construct to identify a unit of data (Elastic calls their
rows Documents), we can still perform GROUP BY queries in Elasticsearch
●Elasticsearch names their GROUP BY queries Aggregations.
●The Elasticsearch API provides an expressive REST API to execute Aggregations
○Kibana also provides a Graphical User Interface (GUI) to execute Aggregations
ElasticSearch
Relational Database
Index
Type
Documents
Fields
Database
Table
Rows
Columns

Aggregations – The Elasticsearch GROUP BY

Why use the Elasticsearch Aggregation API? - 1
●Easy Approach: Query Elasticsearch, convert the returned JSON to a Pandas Dataframe, and then
apply a Pandas GROUP BY to the Dataframe to retrieve summary stats.
●Modern laptops include 32GB of memory and you have had no issues with this method. If you use
Elasticsearch for non time series data, e.g. static data for blogs, you may not need to worry about
running out of memory.

●In the future, you may deal with Big Data. If you collect time series data, such as
access logs, or security logs, you might scale to Big Data. In that case, the
Elasticsearch database size will exceed the memory of your laptop.

●Aggs API allows you to command Elasticsearch to execute the GROUP
BY analogue in-stu (a best practice), and then also apply the summary stats in
place.
●Elasticsearch will then return the summary stats as JSON, and you will not run out of
memory.

Time Series
●GROUP BY(RDMMS) and Aggregation(Elasticsearch) operations
lend themselves well to Time Series data, since these operations
allow you to GROUP BY or Aggregate results over a given time
bucket (e.g. Hour, Day, Week, Month, etc.).

1st Demo
●Execute GROUP BY operations via Elasticsearch aggregations
●Demonstrate how to generate tables via both Kibana and the
Elasticsearch API.

Elasticsearch Aggregations Drive Time Series
Data Visualization

Elasticsearch BUCKETS
●In Elasticsearch parlance, we put the rows into BUCKETS, with one BUCKET for each country.
●Consider a query for total hits. The Elasticsearch API reports that I received ~100k hits in the
month of June.
●Add the hits to BUCKETS !

“Country” Buckets
●Use the AGGS API to count the hits by country.

“Day” Buckets
●Use the API to count the hits by day.

Time Buckets
T
●Time BUCKETS
○Analyze, summarize and visualize time series data.
○BUCKETS return a smaller amount of data, for example, hits per
hour
○We need BUCKETS because we cannot plot every datum from
a big data document store

Demo – Time Series Data Viz with Kibana

Conclusion
●Summarize Series
○Sum, Median, etc.
○RDBMS Uses GROUP BY
●Elasticsearch Uses AGGS
○Separate data by tag via BUCKETS
○Country Buckets, Time Buckets, Month Buckets

Aggregations - The Elasticsearch "GROUP BY"

Recommended

Recommended

More Related Content

Similar to Aggregations - The Elasticsearch "GROUP BY"

Similar to Aggregations - The Elasticsearch "GROUP BY" (20)

Recently uploaded

Recently uploaded (20)

Aggregations - The Elasticsearch "GROUP BY"