This document summarizes the key components and collaborations in Apache Druid. It describes Zookeeper's role in coordination, the Overlord's role in task management, the Broker's role in query routing, and the Middle Manager's role in ingestion and indexing. It provides diagrams illustrating how these components work together to ingest and store distributed data, and answer queries in a scalable way.
1. Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
2. peter.marshall@imply.io
20 years in Enterprise Architecture
CRM, EDRM, ERP, EIP, Digital Services,
Security, BI, RI, and MDM
BA Theology (!) and Computer Studies
TOGAF certified
Book collector & A/V buyer
Prime Timeline = proper timeline
#werk
petermarshall.io
4. Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
5.
6. ● A job to do
● Compute to do the ingestion
● A place to store optimised data
● A question to answer
● Data to process!
● Compute to answer queries
● Somewhere to put the data that’s
near to the query process
● Some rules to follow
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
29. Zookeeper
Coordinator
Overlord
Broker
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
30. ★ A job to do
★ Compute to do the ingestion
★ A place to store optimised data
★ A question to answer
★ Data to process!
★ Compute to answer queries
★ Somewhere to put the data that’s
near to the query process
★ Some rules to follow
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules