SlideShare a Scribd company logo
v
Aggregations
The Elasticsearch GROUP BY
John Sobanski – VP of AI/ML
About Me
John Sobanski
○VP AI/ML at Pyramid Systems Inc.
○Elastic Certified Engineer (2020, 2022)
○https://soban.ski
Agenda
●Background - Relational Database Management Systems
●Aggregations – The Elasticsearch GROUP BY
○Why use the Elasticsearch Aggregation API
○1st Demo - Execute GROUP BY operations via Elasticsearch
aggregations
●Elasticsearch Aggregations Drive Time Series Data Visualization
○BUCKETS in Elasticsearch
○2nd Demo - Time Series Data Visualization with Kibana
Relational Database Management System
Series Operations
2 Sum 57
5 Product 40000
20 Min 2
20 Max 20
10 Median 10
Mean 11.4
RDBMS Series Operations
●In the traditional Relational Database Management System (RDBMS) world, SQL databases
use GROUP BY syntax to group rows with similar values into summary rows.
●The query, "find the number of web page hits per country," for example, represents a
typical GROUP BY operation.
This table records the number of hits to my
site [https://soban.ski], broken down by time
zone.
●Further summarize the table to record "hits per country" via
a GROUP BY operation
○Collapse the Time zones into parent countries.
SELECT COUNTRY, SUM(HITS) FROM timezone_hits GROUP BY COUNTRY;
SQL Syntax:
RDBMS vs. Elasticsearch
●Even though Elasticsearch does not use the row construct to identify a unit of data (Elastic calls their
rows Documents), we can still perform GROUP BY queries in Elasticsearch
●Elasticsearch names their GROUP BY queries Aggregations.
●The Elasticsearch API provides an expressive REST API to execute Aggregations
○Kibana also provides a Graphical User Interface (GUI) to execute Aggregations
ElasticSearch
Relational Database
Index
Type
Documents
Fields
Database
Table
Rows
Columns
Aggregations – The Elasticsearch GROUP BY
Why use the Elasticsearch Aggregation API? - 1
●Easy Approach: Query Elasticsearch, convert the returned JSON to a Pandas Dataframe, and then
apply a Pandas GROUP BY to the Dataframe to retrieve summary stats.
●Modern laptops include 32GB of memory and you have had no issues with this method. If you use
Elasticsearch for non time series data, e.g. static data for blogs, you may not need to worry about
running out of memory.
Why use the Elasticsearch Aggregation API? - 2
●In the future, you may deal with Big Data. If you collect time series data, such as
access logs, or security logs, you might scale to Big Data. In that case, the
Elasticsearch database size will exceed the memory of your laptop.
Why use the Elasticsearch Aggregation API? - 3
●Aggs API allows you to command Elasticsearch to execute the GROUP
BY analogue in-stu (a best practice), and then also apply the summary stats in
place.
●Elasticsearch will then return the summary stats as JSON, and you will not run out of
memory.
Time Series
●GROUP BY(RDMMS) and Aggregation(Elasticsearch) operations
lend themselves well to Time Series data, since these operations
allow you to GROUP BY or Aggregate results over a given time
bucket (e.g. Hour, Day, Week, Month, etc.).
1st Demo
●Execute GROUP BY operations via Elasticsearch aggregations
●Demonstrate how to generate tables via both Kibana and the
Elasticsearch API.
Elasticsearch Aggregations Drive Time Series
Data Visualization
Elasticsearch BUCKETS
●In Elasticsearch parlance, we put the rows into BUCKETS, with one BUCKET for each country.
●Consider a query for total hits. The Elasticsearch API reports that I received ~100k hits in the
month of June.
●Add the hits to BUCKETS !
“Country” Buckets
●Use the AGGS API to count the hits by country.
“Day” Buckets
●Use the API to count the hits by day.
Time Buckets
T
●Time BUCKETS
○Analyze, summarize and visualize time series data.
○BUCKETS return a smaller amount of data, for example, hits per
hour
○We need BUCKETS because we cannot plot every datum from
a big data document store
Why Use Time Buckets?
Why Use Time Buckets?
Demo – Time Series Data Viz with Kibana
Conclusion
●Summarize Series
○Sum, Median, etc.
○RDBMS Uses GROUP BY
●Elasticsearch Uses AGGS
○Separate data by tag via BUCKETS
○Country Buckets, Time Buckets, Month Buckets
Thank You

More Related Content

Similar to Aggregations - The Elasticsearch "GROUP BY"

Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
ObjectRocket
 
Security sizing meetup
Security sizing meetupSecurity sizing meetup
Security sizing meetup
Daliya Spasova
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
Ido Green
 
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
Yi Pan
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
Alibaba Cloud
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
Riccardo Zamana
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
Alex Pinkin
 
Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!
Knoldus Inc.
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Grega Kespret
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
Amazon Web Services
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_system
Kai Sasaki
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
Amazon Web Services
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
Hyderabad Scalability Meetup
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
Amazon Web Services Korea
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Amazon Web Services
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Databricks
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 

Similar to Aggregations - The Elasticsearch "GROUP BY" (20)

Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Security sizing meetup
Security sizing meetupSecurity sizing meetup
Security sizing meetup
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
WSO2 Product Release Webinar: WSO2 Data Analytics Server 3.0
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_system
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 

Recently uploaded

一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 

Recently uploaded (20)

一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 

Aggregations - The Elasticsearch "GROUP BY"

  • 1. v Aggregations The Elasticsearch GROUP BY John Sobanski – VP of AI/ML
  • 2. About Me John Sobanski ○VP AI/ML at Pyramid Systems Inc. ○Elastic Certified Engineer (2020, 2022) ○https://soban.ski
  • 3. Agenda ●Background - Relational Database Management Systems ●Aggregations – The Elasticsearch GROUP BY ○Why use the Elasticsearch Aggregation API ○1st Demo - Execute GROUP BY operations via Elasticsearch aggregations ●Elasticsearch Aggregations Drive Time Series Data Visualization ○BUCKETS in Elasticsearch ○2nd Demo - Time Series Data Visualization with Kibana
  • 5. Series Operations 2 Sum 57 5 Product 40000 20 Min 2 20 Max 20 10 Median 10 Mean 11.4
  • 6. RDBMS Series Operations ●In the traditional Relational Database Management System (RDBMS) world, SQL databases use GROUP BY syntax to group rows with similar values into summary rows. ●The query, "find the number of web page hits per country," for example, represents a typical GROUP BY operation. This table records the number of hits to my site [https://soban.ski], broken down by time zone.
  • 7. ●Further summarize the table to record "hits per country" via a GROUP BY operation ○Collapse the Time zones into parent countries. SELECT COUNTRY, SUM(HITS) FROM timezone_hits GROUP BY COUNTRY; SQL Syntax:
  • 8. RDBMS vs. Elasticsearch ●Even though Elasticsearch does not use the row construct to identify a unit of data (Elastic calls their rows Documents), we can still perform GROUP BY queries in Elasticsearch ●Elasticsearch names their GROUP BY queries Aggregations. ●The Elasticsearch API provides an expressive REST API to execute Aggregations ○Kibana also provides a Graphical User Interface (GUI) to execute Aggregations ElasticSearch Relational Database Index Type Documents Fields Database Table Rows Columns
  • 9. Aggregations – The Elasticsearch GROUP BY
  • 10. Why use the Elasticsearch Aggregation API? - 1 ●Easy Approach: Query Elasticsearch, convert the returned JSON to a Pandas Dataframe, and then apply a Pandas GROUP BY to the Dataframe to retrieve summary stats. ●Modern laptops include 32GB of memory and you have had no issues with this method. If you use Elasticsearch for non time series data, e.g. static data for blogs, you may not need to worry about running out of memory.
  • 11. Why use the Elasticsearch Aggregation API? - 2 ●In the future, you may deal with Big Data. If you collect time series data, such as access logs, or security logs, you might scale to Big Data. In that case, the Elasticsearch database size will exceed the memory of your laptop.
  • 12. Why use the Elasticsearch Aggregation API? - 3 ●Aggs API allows you to command Elasticsearch to execute the GROUP BY analogue in-stu (a best practice), and then also apply the summary stats in place. ●Elasticsearch will then return the summary stats as JSON, and you will not run out of memory.
  • 13. Time Series ●GROUP BY(RDMMS) and Aggregation(Elasticsearch) operations lend themselves well to Time Series data, since these operations allow you to GROUP BY or Aggregate results over a given time bucket (e.g. Hour, Day, Week, Month, etc.).
  • 14. 1st Demo ●Execute GROUP BY operations via Elasticsearch aggregations ●Demonstrate how to generate tables via both Kibana and the Elasticsearch API.
  • 15. Elasticsearch Aggregations Drive Time Series Data Visualization
  • 16. Elasticsearch BUCKETS ●In Elasticsearch parlance, we put the rows into BUCKETS, with one BUCKET for each country. ●Consider a query for total hits. The Elasticsearch API reports that I received ~100k hits in the month of June. ●Add the hits to BUCKETS !
  • 17. “Country” Buckets ●Use the AGGS API to count the hits by country.
  • 18. “Day” Buckets ●Use the API to count the hits by day.
  • 19. Time Buckets T ●Time BUCKETS ○Analyze, summarize and visualize time series data. ○BUCKETS return a smaller amount of data, for example, hits per hour ○We need BUCKETS because we cannot plot every datum from a big data document store
  • 20. Why Use Time Buckets?
  • 21. Why Use Time Buckets?
  • 22. Demo – Time Series Data Viz with Kibana
  • 23. Conclusion ●Summarize Series ○Sum, Median, etc. ○RDBMS Uses GROUP BY ●Elasticsearch Uses AGGS ○Separate data by tag via BUCKETS ○Country Buckets, Time Buckets, Month Buckets