SlideShare a Scribd company logo
1 of 16
Download to read offline
Druid,
or There and Back Again
Is it about woods creature?
Nope. It’s about…
a data store designed for OLAP queries on event
(time-series) data.
2
Boring facts
• Open-source, community-driven project
• ~3400 stars, ~7300 commits on Github ATM
• Written in Java
• Very modular / extendable thanks to Guice
3
What it does?
• Ingests stream / batch time-series data and
splits it into segments representing configured
time interval
• During ingestion, it performs aggregation using
provided algorithms to create metrics
• Allows to perform several types of queries over
served segments via nice HTTP API (here you
can do simple aggregations over metrics)
4
How it works?
5
How it works?
Druid consists of several types of services (hereinafter,
nodes) which together make it complete working system:
• Broker
• Historical
• Real-time
• Coordinator
• Overlord, Middle Manager, Peons, etc. (aka boring
nodes)
6
How it works?
It also has three external dependencies:
• Apache ZooKeeper for configuration management,
leader election and data flow organisation
(coordinator - historical communication)
• Metadata Storage such as MySQL, PostgreSQL used
to store (guess what?) various metadata about the
system
• Deep Storage where all the compressed data is stored
(S3, HDFS, FS)
7
Nodes: Real-time
It is responsible for ingestion of streaming data.
Also it exposes HTTP API for querying, so data is
available for querying right after processing /
aggregation.
That is how Druid allows querying of real-time
data.
*: by real-time here we mean something which happened ~0-15 seconds ago.
8
Nodes: Historical
This guy is responsible for loading data
(hereinafter, segments) from Deep Storage and
serving this data via HTTP API (same as Real-time
dude).
Historical Node uses ZooKeeper to understand
what segments to load.
It uncompresses segment data and caches it
locally on FS.
9
Nodes: Coordinator
Node which manages segments and coordinates
their distribution on historical nodes. It uses
ZooKeeper for:
• getting current cluster state
• assigning segments to Historical Nodes
Loading and dropping of segments is managed
via Rules stored in Metadata Storage.
10
Nodes: Broker
This is the only Node which is usually touched by
client applications, because it exposes HTTP API
for querying segments data.
It splits incoming query onto smaller ones based
on segments data stored ZooKeeper and queries
corresponding Historical and Real-time Nodes.
11
Nodes: Overlord
Indexing service powers Druid’s batch data
ingestion. It consists of three node types: Overlord,
Middle Manager and Peons.
Overlord accepts indexing tasks and distributes
those to Middle Manager Nodes. It communicates
the latter ones through ZooKeeper.
It provides simple HTTP API to create, shutdown
and view status of indexing tasks.
12
Nodes: Middle-Manager
Middle-manager executes submitted indexing
tasks. It creates separate JVMs i.e. Peon Node
processes for this. Each Peon runs one task at a
time.
MM retrieves indexing tasks from ZooKeeper.
Then it stores indexing task as JSON file and runs
Peon providing it with path to the indexing task file.
After processing Peon stores segments data in
Deep Storage.
13
Live example
• Run all nodes locally
• Local FS is used for Deep Storage
• Derby is used for Metadata Storage
• ZooKeeper … well, it’s ZooKeeper, gotta run it
• Kafka 0.8.7.6.5.4.3.2.1 for streaming data <3
• Various Bash and Node scripts for loading data
• Imply Pivot for visualisation of queried results
14
Problems?
Everybody can tell good sides of Druid. Issues
time!
• DevOps needed to spin up all the Nodes and
dependencies (Did we clone Dmytro already?)
• Limited number of aggregations
• Druid loves cookies… no, space, it needs space,
more space, and it’s not enough anyways.
15
Thank you
Questions?
Fun time?
Beer for me?
16

More Related Content

What's hot

Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
 
What you need to know for postgresql operation
What you need to know for postgresql operationWhat you need to know for postgresql operation
What you need to know for postgresql operationAnton Bushmelev
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...HostedbyConfluent
 
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
 
Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREFernando Lopez Aguilar
 
Gateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGlobus
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented dataEric Sammer
 

What's hot (12)

Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
 
What you need to know for postgresql operation
What you need to know for postgresql operationWhat you need to know for postgresql operation
What you need to know for postgresql operation
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
 
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
 
Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWARE
 
Gateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to Globus
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
from source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented datafrom source to solution - building a system for event-oriented data
from source to solution - building a system for event-oriented data
 
Zookeeper
ZookeeperZookeeper
Zookeeper
 

Similar to Druid Time-Series Data Store Explained

Web Basics
Web BasicsWeb Basics
Web BasicsHui Xie
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxGrace Jansen
 
Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.
Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.
Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.Thomas Vendetta
 
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014Arun Gupta
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS BackendLaurent Cerveau
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioDistributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioMax Klyga
 
Oracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuningOracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuningMichel Schildmeijer
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache ZookeeperAnshul Patel
 
Solr at zvents 6 years later & still going strong
Solr at zvents   6 years later & still going strongSolr at zvents   6 years later & still going strong
Solr at zvents 6 years later & still going stronglucenerevolution
 
Real time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflowsReal time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflowsShankar Manian
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices💡 Tomasz Kogut
 
Сергей Радзыняк ".NET Microservices in Real Life"
Сергей Радзыняк ".NET Microservices in Real Life"Сергей Радзыняк ".NET Microservices in Real Life"
Сергей Радзыняк ".NET Microservices in Real Life"Fwdays
 
Oracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and TroubleshootOracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and TroubleshootMichel Schildmeijer
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Codemotion
 
AvalancheProject2012
AvalancheProject2012AvalancheProject2012
AvalancheProject2012fishetra
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 

Similar to Druid Time-Series Data Store Explained (20)

Web Basics
Web BasicsWeb Basics
Web Basics
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
 
Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.
Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.
Integrate Ruby on Rails with Avectra's NetFORUM xWeb API.
 
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS Backend
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.ioDistributed tracing with OpenTracing and Jaeger @ getstream.io
Distributed tracing with OpenTracing and Jaeger @ getstream.io
 
Oracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuningOracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuning
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Solr at zvents 6 years later & still going strong
Solr at zvents   6 years later & still going strongSolr at zvents   6 years later & still going strong
Solr at zvents 6 years later & still going strong
 
Real time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflowsReal time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflows
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices
 
Сергей Радзыняк ".NET Microservices in Real Life"
Сергей Радзыняк ".NET Microservices in Real Life"Сергей Радзыняк ".NET Microservices in Real Life"
Сергей Радзыняк ".NET Microservices in Real Life"
 
Oracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and TroubleshootOracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and Troubleshoot
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
 
AvalancheProject2012
AvalancheProject2012AvalancheProject2012
AvalancheProject2012
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 

Recently uploaded

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Druid Time-Series Data Store Explained

  • 1. Druid, or There and Back Again
  • 2. Is it about woods creature? Nope. It’s about… a data store designed for OLAP queries on event (time-series) data. 2
  • 3. Boring facts • Open-source, community-driven project • ~3400 stars, ~7300 commits on Github ATM • Written in Java • Very modular / extendable thanks to Guice 3
  • 4. What it does? • Ingests stream / batch time-series data and splits it into segments representing configured time interval • During ingestion, it performs aggregation using provided algorithms to create metrics • Allows to perform several types of queries over served segments via nice HTTP API (here you can do simple aggregations over metrics) 4
  • 6. How it works? Druid consists of several types of services (hereinafter, nodes) which together make it complete working system: • Broker • Historical • Real-time • Coordinator • Overlord, Middle Manager, Peons, etc. (aka boring nodes) 6
  • 7. How it works? It also has three external dependencies: • Apache ZooKeeper for configuration management, leader election and data flow organisation (coordinator - historical communication) • Metadata Storage such as MySQL, PostgreSQL used to store (guess what?) various metadata about the system • Deep Storage where all the compressed data is stored (S3, HDFS, FS) 7
  • 8. Nodes: Real-time It is responsible for ingestion of streaming data. Also it exposes HTTP API for querying, so data is available for querying right after processing / aggregation. That is how Druid allows querying of real-time data. *: by real-time here we mean something which happened ~0-15 seconds ago. 8
  • 9. Nodes: Historical This guy is responsible for loading data (hereinafter, segments) from Deep Storage and serving this data via HTTP API (same as Real-time dude). Historical Node uses ZooKeeper to understand what segments to load. It uncompresses segment data and caches it locally on FS. 9
  • 10. Nodes: Coordinator Node which manages segments and coordinates their distribution on historical nodes. It uses ZooKeeper for: • getting current cluster state • assigning segments to Historical Nodes Loading and dropping of segments is managed via Rules stored in Metadata Storage. 10
  • 11. Nodes: Broker This is the only Node which is usually touched by client applications, because it exposes HTTP API for querying segments data. It splits incoming query onto smaller ones based on segments data stored ZooKeeper and queries corresponding Historical and Real-time Nodes. 11
  • 12. Nodes: Overlord Indexing service powers Druid’s batch data ingestion. It consists of three node types: Overlord, Middle Manager and Peons. Overlord accepts indexing tasks and distributes those to Middle Manager Nodes. It communicates the latter ones through ZooKeeper. It provides simple HTTP API to create, shutdown and view status of indexing tasks. 12
  • 13. Nodes: Middle-Manager Middle-manager executes submitted indexing tasks. It creates separate JVMs i.e. Peon Node processes for this. Each Peon runs one task at a time. MM retrieves indexing tasks from ZooKeeper. Then it stores indexing task as JSON file and runs Peon providing it with path to the indexing task file. After processing Peon stores segments data in Deep Storage. 13
  • 14. Live example • Run all nodes locally • Local FS is used for Deep Storage • Derby is used for Metadata Storage • ZooKeeper … well, it’s ZooKeeper, gotta run it • Kafka 0.8.7.6.5.4.3.2.1 for streaming data <3 • Various Bash and Node scripts for loading data • Imply Pivot for visualisation of queried results 14
  • 15. Problems? Everybody can tell good sides of Druid. Issues time! • DevOps needed to spin up all the Nodes and dependencies (Did we clone Dmytro already?) • Limited number of aggregations • Druid loves cookies… no, space, it needs space, more space, and it’s not enough anyways. 15