13. Druid Concepts
What is Druid?
• OLAP
• Columnar
• Real-Time
• Distributed
• Optimized for Aggregations
• Optimized for query performance
• Horizontally Scale
13
14. Druid Concepts
What it’s not :
• Relational - (no joins!)
– lookup feature as partial solution
• Good at ad-hoc row retrieval
• Fully ACID
– no transactions
– eventually consisted
• Easy to change/delete data
• Simple
14
15. Data Model
• Cluster - Instance of Druid system
• Data Source - collection of data, equivalent for Table
• Segment - Immutable Timestamped data part of data-source
(file)
• Shard / Partition - For big segments, enabling partitions split the
segment
15
Data Source
“Tracker_Stats_Zynga”
Segment
May-17
Segment
June-17
Partition
P0
Partition
P1
Partition
P0
Partition
P1
16. Druid Concepts
16
S3 /
HDFS
Segments Store
Deep Storage
Batch / RT
Overlord
MiddleManagers
Indexing Service
Brokers
Historicals
Query Layer
Coordinator
ZooKeeper
Metadata-DB
Management