SlideShare a Scribd company logo
1 of 61
Druid Ingestion: From 3 hr to 5 min
Shivji kumar Jha, Staff Engineer, Nutanix
Sachidananda Maharana, MTS 4, Nutanix
Challenges, Mitigations & Learnings
About Us
Shivji Kumar Jha
Staff Engineer,
CPaaS Data Platform,
Nutanix
Sachidananda Maharana
Sr Engineer / OLAP Ninja
CPaaS Team, Nutanix
 Software Engineer & Regular Speaker / Meetups
 Excited about:
 Distributed Databases & Streaming
 Open-Source Software & Communities
 MySQL, Postgres, Pulsar/NATS, Druid/Clickhouse
 Regular Platform Engineer
 Excited about:
 Distributed OLAP Databases
 Open-Source Enthusiast
Contents
Druid 101
How we use Druid
Re-architecture : What & Why
Impact On Druid components
How we fixed the issues
State of Bugs we filed / fixed
Druid 101
• Open-source, Apache 2.0 License and under Apache Foundation
• Columnar data store designed for high-performance.
• Supports Real-time and Batch ingestion.
• Segment Oriented Storage
• Distributed and modular architecture, horizontally scalable for most
parts
• Supports Data tiering – Keep cold data in cheaper storage!
What we love about Druid!
Modularity - Separation of Concerns
Modularity – Simplicity* : Ease to deploy , Upgrade, Migrate, Manage
Modularity – Flexibility - Scale only what you need, Retain based on retention rules on tiers
Modularity - Built for Cloud
Durability – Object Store (S3 or Nutanix Objects for instance) for Deep Storage
Durability - SQL database for metadata
Admin Dashboard – easier debugging and monitoring
Write
Read
Druid 101
Ingestion & Query Patterns
● IPFix log files are collected from clouds.
○ IPFIX : IP Flow Information Export
○ Summarizes network data packets to track IP actions
● We enrich data and store in an s3 bucket.
● S3 data is ingested into druid.
● Serves Analytics dashboards in slice and dice manner.
● Used for ML engine as well.
Druid Nos : 3+ years in Prod
Last 24 hrs
Cluster Size
Data Model for our Apps
● Analytics Apps as part of Nutanix Dashboard
● Customers can slice and dice data given some filters
● Multi-tenant Use Case
● Druid Data source per customer per use case
● Enable features for some data sources
○ Phased rollout for new Druid features
○ Druid Version Upgrades
○ App redesign requiring Change in Druid ingestion or query.
● Workflow engine (Temporal) for pipeline.
● Java based Workers backed by Postgres storage for state.
Change in Requirements
● Change in Requirement: Batch (3 hours) to 5 minutes
● Earlier:
○ Agent collects data, dumps to S3.
○ Cron runs every 3 hour, ingests from S3 to Druid
○ SLA : 3 hours
● New Design:
○ SLA : 15 minutes
○ Agent collects data, dumps to S3 every 5 minutes.
○ Ingestion Pipeline ingests to Druid depending on what Druid likes.
○ Ingestion Pipeline gobbles backpressure.
● Release Plan
○ Data sources uploaded to cluster in a phased manner
Before: old batch system
Cron : 3 hrs
Change: Batch to near-real-time system
nudge State
Machine,
absorb
backpressure
Cron : 5 mins
Batch to near-real-time system
Cron : 5 mins
Druid
Ingestion
Tasks
Druid Database
Druid Database
Datasource 1
Druid Database
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Druid Database
Datasource N
Datasource 3
Datasource 2
Datasource 1
Proof of the Pudding !
Proof of the Pudding(2) !
Proof of the Pudding(3) !
Summary: When Druid was struggling (Overlord on
fire)
● Ingested smaller, but more tasks.
● onboarding a few large datasources, fine for a day
● More confidence 
● Onboarded all datasource at once
○ Task queue kept increasing (till 25K). Overlord overwhelmed after 5K
○ Soon, overlord machine CPU usage at 100%
● All the tasks were stuck in pending state
● Task count was 12x more than previous but smaller.
● Middle managers were sitting idle, no incoming tasks.
● Task state were not updating properly as overlord was overwhelmed.
Druid Overlord
Getting Overlord Alive
Druid Database
Overlord Process
Druid Database
Overlord Process
Bigger VM
Druid Database
Overlord Process
Bigger DB Instance
Bigger VM
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs:
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs: No
ZK for
assignment
Druid.indexer.runner.type : httpRemote
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs:
Throttle,
Don’t give up
Druid.indexer.runner.type : httpRemote
Druid.indexer.queue.maxSize : 5000
Handling the Overlord…
● Vertically scale overlord. Didn’t help! No support for horizontal
scaling.
● Changed configs:
● Set max pending tasks per datasource for an interval to 1
Throttle,
Don’t give up
Druid.indexer.runner.type : httpRemote
Druid.indexer.queue.maxSize : 5000
GET /druid/indexer/v1/pendingTasks?datasource=ds1
Filed Github issues so you don’t hit these…
Fixed Issues (PR)
Making DB functional…
● Queries from overlord to Postgres for
task metadata were taking long time.
● Add more CPU to DB server
● Improvements:
○ Overlord CPU utilization is less
○ Number of pending tasks are less
○ Task slot utilization graph looks stable
Scaling Middle Managers
Druid Database
Peon Processes
Middle managers
Druid Database
Middle managers
Peon Processes
More VMs
Druid Database
Middle managers
Peon Processes
More VMs
Bigger Compact tasks
Tiering Middle Managers
Druid Database
More slots per Middle Managers
More VMs
More slots
Bigger Compact tasks
Druid Database
Right size Middle Managers
Less VMs
More slots
Bigger Compact tasks
Summary : Scaling Middle Manager
● Increased number of middle manager as so
that more task slots are available for
overlord to assign tasks.
● Then we increased number of slots per
middle manager as new tasks were small
i.e. having less number of files to ingest.
● We created a separate tier for compaction
as these tasks took more resource then the
current index tasks.
● Then we right sized the middle manager
count in each tier by reducing it.
12 MMs * 5 slots => 24MMs * 5 slots
24 MMs * 5 slots => 12MMs * 10 slots
12 MMs * 10 slots =>
10 MMs * 10 slots + 2 MMs *5 slots
Tiering
Coordinator Issues
Coordinator Issues
Coordinator Issues
Coordinator Issues
Summary of Coordinator crisis…
● Happy Overlord.
● But issues in Coordinator now:
○ Huge number of small segments.
○ Unavailable segments count increasing.
○ Coordinator CPU usage increasing
○ Coordinator cycle is taking too long to complete
Fixing Coordinator
Druid Database
Handling the Coordinator
Coordinator Process
Druid Database
Handling the Coordinator
Coordinator Process
Bigger VM
Druid Database
Handling the Coordinator
Coordinator Process
Bigger VM
Same big DB
Handling the Coordinator…
● Increased Coordinator instance type as it is not scalable
horizontally
● Tried the following coordinator dynamic configs:
Handling the Coordinator…
● Increased Coordinator instance type as it is not scalable
horizontally
● Tried few coordinator dynamic configs:
maxSegmentsToMove: 1000
percentOfSegmentsToConsiderPerMove: 25
reducing the
number of
segments per
coordinator
cycle
Handling the Coordinator…
● Increased Coordinator instance type as it is not scalable
horizontally
● Tried few coordinator dynamic configs:
maxSegmentsToMove: 1000
percentOfSegmentsToConsiderPerMove: 25
Assign segments
In round-robin
fashion first.
Lazily reassign with
chosen balancer
strategy later
useRoundRobinSegmentAssignment: true
Handling the Coordinator…
● We saw this error in coordinator logs during
auto compaction for many datasources.
“is larger than inputSegmentSize[2147483648]”
● Removing this setting from auto compaction
config resolved the issue.
● This is no longer an issue Druid 25 onwards.
inputSegmentSizeBytes: 100TB
Handling the Historicals
● Until auto compaction done:
○ More no of segments for queries
○ More processing power for historicals
● Cold data has HIGHER segment
granularity
○ Compaction Done!
● Hot data has LOWER segment
granularity
○ Compaction NOT done YET!
Query for
recent data
Query for
recent data
Older Historicals
Current Historicals
Larger segments
Smaller segments
Datasource 2
Datasource 1
Datasource 1
Datasource 2
Druid Database
Happy State!!!
Summary
● Once we stabilized Druid Ingestion and query both pipelines we
onboarded all customers in a phased manner.
● Set the optimal queue size.
● To absorb the initial burst of tasks we increased MM count.
● Right size Overlord and coordinator once the onboarding was
complete
● Do know overlord and coordinator settings well.
Thank You
Questions?
Shivji Kumar Jha
linkedin.com/in/shivjijha/
slideshare.net/shiv4289/presentations/
youtube.com/@shivjikumarjha
Sachidananda Maharana
https://www.linkedin.com/in/sachidanandamaharana/

More Related Content

Similar to Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in ContainerizationRyan Hunter
 
What makes me to migrate entire VPC JAWS PANKRATION 2021
What makes me to migrate entire VPC JAWS PANKRATION 2021What makes me to migrate entire VPC JAWS PANKRATION 2021
What makes me to migrate entire VPC JAWS PANKRATION 2021Naomi Yamasaki
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the fieldJoAnna Cheshire
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Dan Cundiff
 
[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacentersindeedeng
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containerskbajda
 
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
Running Dataproc At Scale in production - Searce Talk at GDG DelhiRunning Dataproc At Scale in production - Searce Talk at GDG Delhi
Running Dataproc At Scale in production - Searce Talk at GDG DelhiSearce Inc
 
#lspe Building a Monitoring Framework using DTrace and MongoDB
#lspe Building a Monitoring Framework using DTrace and MongoDB#lspe Building a Monitoring Framework using DTrace and MongoDB
#lspe Building a Monitoring Framework using DTrace and MongoDBdan-p-kimmel
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB plc
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Serverless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychServerless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychThe Software House
 

Similar to Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes (20)

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
What makes me to migrate entire VPC JAWS PANKRATION 2021
What makes me to migrate entire VPC JAWS PANKRATION 2021What makes me to migrate entire VPC JAWS PANKRATION 2021
What makes me to migrate entire VPC JAWS PANKRATION 2021
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 
[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters[@IndeedEng] Redundant Array of Inexpensive Datacenters
[@IndeedEng] Redundant Array of Inexpensive Datacenters
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
Running Dataproc At Scale in production - Searce Talk at GDG DelhiRunning Dataproc At Scale in production - Searce Talk at GDG Delhi
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
 
#lspe Building a Monitoring Framework using DTrace and MongoDB
#lspe Building a Monitoring Framework using DTrace and MongoDB#lspe Building a Monitoring Framework using DTrace and MongoDB
#lspe Building a Monitoring Framework using DTrace and MongoDB
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Serverless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychServerless Compose vs hurtownia danych
Serverless Compose vs hurtownia danych
 

More from Shivji Kumar Jha

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesShivji Kumar Jha
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxShivji Kumar Jha
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarShivji Kumar Jha
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingShivji Kumar Jha
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarShivji Kumar Jha
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar clusterShivji Kumar Jha
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar clusterShivji Kumar Jha
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationShivji Kumar Jha
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesShivji Kumar Jha
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityShivji Kumar Jha
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterShivji Kumar Jha
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationShivji Kumar Jha
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesShivji Kumar Jha
 

More from Shivji Kumar Jha (20)

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
 

Recently uploaded

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 

Recently uploaded (20)

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes

  • 1. Druid Ingestion: From 3 hr to 5 min Shivji kumar Jha, Staff Engineer, Nutanix Sachidananda Maharana, MTS 4, Nutanix Challenges, Mitigations & Learnings
  • 2.
  • 3. About Us Shivji Kumar Jha Staff Engineer, CPaaS Data Platform, Nutanix Sachidananda Maharana Sr Engineer / OLAP Ninja CPaaS Team, Nutanix  Software Engineer & Regular Speaker / Meetups  Excited about:  Distributed Databases & Streaming  Open-Source Software & Communities  MySQL, Postgres, Pulsar/NATS, Druid/Clickhouse  Regular Platform Engineer  Excited about:  Distributed OLAP Databases  Open-Source Enthusiast
  • 4. Contents Druid 101 How we use Druid Re-architecture : What & Why Impact On Druid components How we fixed the issues State of Bugs we filed / fixed
  • 5. Druid 101 • Open-source, Apache 2.0 License and under Apache Foundation • Columnar data store designed for high-performance. • Supports Real-time and Batch ingestion. • Segment Oriented Storage • Distributed and modular architecture, horizontally scalable for most parts • Supports Data tiering – Keep cold data in cheaper storage!
  • 6. What we love about Druid! Modularity - Separation of Concerns Modularity – Simplicity* : Ease to deploy , Upgrade, Migrate, Manage Modularity – Flexibility - Scale only what you need, Retain based on retention rules on tiers Modularity - Built for Cloud Durability – Object Store (S3 or Nutanix Objects for instance) for Deep Storage Durability - SQL database for metadata Admin Dashboard – easier debugging and monitoring
  • 8. Ingestion & Query Patterns ● IPFix log files are collected from clouds. ○ IPFIX : IP Flow Information Export ○ Summarizes network data packets to track IP actions ● We enrich data and store in an s3 bucket. ● S3 data is ingested into druid. ● Serves Analytics dashboards in slice and dice manner. ● Used for ML engine as well.
  • 9. Druid Nos : 3+ years in Prod Last 24 hrs Cluster Size
  • 10.
  • 11. Data Model for our Apps ● Analytics Apps as part of Nutanix Dashboard ● Customers can slice and dice data given some filters ● Multi-tenant Use Case ● Druid Data source per customer per use case ● Enable features for some data sources ○ Phased rollout for new Druid features ○ Druid Version Upgrades ○ App redesign requiring Change in Druid ingestion or query. ● Workflow engine (Temporal) for pipeline. ● Java based Workers backed by Postgres storage for state.
  • 12. Change in Requirements ● Change in Requirement: Batch (3 hours) to 5 minutes ● Earlier: ○ Agent collects data, dumps to S3. ○ Cron runs every 3 hour, ingests from S3 to Druid ○ SLA : 3 hours ● New Design: ○ SLA : 15 minutes ○ Agent collects data, dumps to S3 every 5 minutes. ○ Ingestion Pipeline ingests to Druid depending on what Druid likes. ○ Ingestion Pipeline gobbles backpressure. ● Release Plan ○ Data sources uploaded to cluster in a phased manner
  • 13. Before: old batch system Cron : 3 hrs
  • 14. Change: Batch to near-real-time system nudge State Machine, absorb backpressure Cron : 5 mins
  • 15. Batch to near-real-time system Cron : 5 mins Druid Ingestion Tasks
  • 19. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 20. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 21. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 22. Druid Database Datasource N Datasource 3 Datasource 2 Datasource 1
  • 23. Proof of the Pudding !
  • 24. Proof of the Pudding(2) !
  • 25. Proof of the Pudding(3) !
  • 26. Summary: When Druid was struggling (Overlord on fire) ● Ingested smaller, but more tasks. ● onboarding a few large datasources, fine for a day ● More confidence  ● Onboarded all datasource at once ○ Task queue kept increasing (till 25K). Overlord overwhelmed after 5K ○ Soon, overlord machine CPU usage at 100% ● All the tasks were stuck in pending state ● Task count was 12x more than previous but smaller. ● Middle managers were sitting idle, no incoming tasks. ● Task state were not updating properly as overlord was overwhelmed. Druid Overlord
  • 30. Druid Database Overlord Process Bigger DB Instance Bigger VM
  • 31. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs:
  • 32. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs: No ZK for assignment Druid.indexer.runner.type : httpRemote
  • 33. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs: Throttle, Don’t give up Druid.indexer.runner.type : httpRemote Druid.indexer.queue.maxSize : 5000
  • 34. Handling the Overlord… ● Vertically scale overlord. Didn’t help! No support for horizontal scaling. ● Changed configs: ● Set max pending tasks per datasource for an interval to 1 Throttle, Don’t give up Druid.indexer.runner.type : httpRemote Druid.indexer.queue.maxSize : 5000 GET /druid/indexer/v1/pendingTasks?datasource=ds1
  • 35. Filed Github issues so you don’t hit these…
  • 37. Making DB functional… ● Queries from overlord to Postgres for task metadata were taking long time. ● Add more CPU to DB server ● Improvements: ○ Overlord CPU utilization is less ○ Number of pending tasks are less ○ Task slot utilization graph looks stable
  • 41. Druid Database Middle managers Peon Processes More VMs Bigger Compact tasks Tiering Middle Managers
  • 42. Druid Database More slots per Middle Managers More VMs More slots Bigger Compact tasks
  • 43. Druid Database Right size Middle Managers Less VMs More slots Bigger Compact tasks
  • 44. Summary : Scaling Middle Manager ● Increased number of middle manager as so that more task slots are available for overlord to assign tasks. ● Then we increased number of slots per middle manager as new tasks were small i.e. having less number of files to ingest. ● We created a separate tier for compaction as these tasks took more resource then the current index tasks. ● Then we right sized the middle manager count in each tier by reducing it. 12 MMs * 5 slots => 24MMs * 5 slots 24 MMs * 5 slots => 12MMs * 10 slots 12 MMs * 10 slots => 10 MMs * 10 slots + 2 MMs *5 slots Tiering
  • 49. Summary of Coordinator crisis… ● Happy Overlord. ● But issues in Coordinator now: ○ Huge number of small segments. ○ Unavailable segments count increasing. ○ Coordinator CPU usage increasing ○ Coordinator cycle is taking too long to complete
  • 51. Druid Database Handling the Coordinator Coordinator Process
  • 52. Druid Database Handling the Coordinator Coordinator Process Bigger VM
  • 53. Druid Database Handling the Coordinator Coordinator Process Bigger VM Same big DB
  • 54. Handling the Coordinator… ● Increased Coordinator instance type as it is not scalable horizontally ● Tried the following coordinator dynamic configs:
  • 55. Handling the Coordinator… ● Increased Coordinator instance type as it is not scalable horizontally ● Tried few coordinator dynamic configs: maxSegmentsToMove: 1000 percentOfSegmentsToConsiderPerMove: 25 reducing the number of segments per coordinator cycle
  • 56. Handling the Coordinator… ● Increased Coordinator instance type as it is not scalable horizontally ● Tried few coordinator dynamic configs: maxSegmentsToMove: 1000 percentOfSegmentsToConsiderPerMove: 25 Assign segments In round-robin fashion first. Lazily reassign with chosen balancer strategy later useRoundRobinSegmentAssignment: true
  • 57. Handling the Coordinator… ● We saw this error in coordinator logs during auto compaction for many datasources. “is larger than inputSegmentSize[2147483648]” ● Removing this setting from auto compaction config resolved the issue. ● This is no longer an issue Druid 25 onwards. inputSegmentSizeBytes: 100TB
  • 58. Handling the Historicals ● Until auto compaction done: ○ More no of segments for queries ○ More processing power for historicals ● Cold data has HIGHER segment granularity ○ Compaction Done! ● Hot data has LOWER segment granularity ○ Compaction NOT done YET! Query for recent data Query for recent data Older Historicals Current Historicals Larger segments Smaller segments Datasource 2 Datasource 1 Datasource 1 Datasource 2
  • 60. Summary ● Once we stabilized Druid Ingestion and query both pipelines we onboarded all customers in a phased manner. ● Set the optimal queue size. ● To absorb the initial burst of tasks we increased MM count. ● Right size Overlord and coordinator once the onboarding was complete ● Do know overlord and coordinator settings well.
  • 61. Thank You Questions? Shivji Kumar Jha linkedin.com/in/shivjijha/ slideshare.net/shiv4289/presentations/ youtube.com/@shivjikumarjha Sachidananda Maharana https://www.linkedin.com/in/sachidanandamaharana/