SlideShare a Scribd company logo
1 of 35
Building a Cron Scheduler
By Himanshu
What is it?
1. Schedules job periodically.
What is it?
1. Schedules job periodically.
2. Granularity in minutes.
What is it?
1. Schedules job periodically.
2. Granularity in minutes. How do I retry
asynchronous
API requests
after few
seconds?
What is it?
1. Schedules job periodically.
2. Granularity in minutes. E-commerce
companies
need to change
prices on
demand.
What is it?
1. Schedules job periodically.
2. Granularity in minutes. Need to send
confirmation
message after
10 seconds?
What is it?
1. Schedules job periodically.
2. Granularity in minutes.
Event based
Scheduler is
the answer.
What is it?
1. Schedules job periodically.
2. Granularity in minutes.
3. Some jobs may take 30 minutes to execute.
Events take
less than a
second to
execute.
Functional Requirements
1. Job Dependency
Functional Requirements
1. Job Dependency
a. Execute a job only if the most recent execution of its dependent jobs in the last 24 hrs is
successful.
Functional Requirements
1. Job Dependency
2. Job Retry
Functional Requirements
1. Job Dependency
2. Job Retry
a. If it fails, retry it after a given time interval.
Functional Requirements
1. Job Dependency
2. Job Retry
3. Job Timeout
Functional Requirements
1. Job Dependency
2. Job Retry
3. Job Timeout
a. If execution takes more than 1 hr, timeout the job.
b. Developers must break their jobs if it takes more than 1 hr.
Functional Requirements
1. Job Dependency
2. Job Retry
3. Job Timeout
4. Job Criticality
Functional Requirements
1. Job Dependency
2. Job Retry
3. Job Timeout
4. Job Criticality
a. It can give a list of machines where it can be executed.
Functional Requirements
1. Job Dependency
2. Job Retry
3. Job Timeout
4. Job Criticality
a. It can give a list of machines where it can be executed.
And so we are
designing our
own scheduler!!
Non Functional Requirements
1. 5000 applications need the scheduler.
Non Functional Requirements
1. 5000 applications need the scheduler.
2. On an average, an application needs to execute 5 jobs every minute.
Non Functional Requirements
1. 5000 applications need the scheduler.
2. On an average, an application needs to execute 5 jobs every minute.
25000 jobs
every minute!!
Ideal Job?
1. Idempotent
a. f(f(x)) = f(x)
b. Re-running a job shouldn’t redo things. Payment done once shouldn’t be done again.
c. A job may execute well but the cron may not get the status.
d. A job may fail in between.
Ideal Job?
1. Idempotent
a. f(f(x)) = f(x)
b. Re-running a job shouldn’t redo things. Payment done once shouldn’t be done again.
c. A job may execute well but the cron may not get the status.
d. A job may fail in between.
2. Transactional
a. Commit changes together.
b. A job may fail in between.
Implementation
1. Application will provide machines for job execution.
a. Execution time will not be a bottleneck in scaling the cron scheduler.
Implementation
1. Application will provide machines for job execution.
a. Execution time will not be a bottleneck in scaling the cron scheduler.
2. To schedule a cron job, application will give following parameters:
a. Unique identifier (uuid)
b. Schedule time (*/10 * * * *)
c. Machines where the job can be executed ([{ip1, user1}, {ip2, user2}])
d. Command to execute the job (node <path> <arguments>)
e. Retry (time interval)
f. Dependent jobs ([uuid1, uuid2])
Implementation - Architecture
Implementation - Kafka
1. Topics that will get jobs to be executed
a. Cron scheduler will produce into it.
b. Each machine will have a kafka consumer to consume the jobs it needs to execute.
c. 1-1 Partition-machine mapping for each partition to gets jobs for the same machine in order.
d. 5000 applications will have at most 50000 machines and so 5 kafka brokers are needed.
e. Expiry of 1 hr as we have timeout of 1 hr.
Implementation - Kafka
1. Topics that will get jobs to be executed
a. Cron scheduler will produce into it.
b. Each machine will have a kafka consumer to consume the jobs it needs to execute.
c. 1-1 Partition-machine mapping for each partition to gets jobs for the same machine in order.
d. 5000 applications will have at most 50000 machines and so 5 kafka brokers are needed.
e. Expiry of 1 hr as we have timeout of 1 hr.
2. Topics that will get metaData of each job execution
a. Each machine will produce into it.
b. One partition needed as order is not important while consuming.
Implementation - RDBMS
1. Table cron_jobs
a. Store cron jobs with their metaData.
2. Table cron_jobs_scheduled
a. Contain cron jobs to be executed in next 10 minutes with trigger timestamp in minutes.
3. Table cron_jobs_executed
a. Each attempt of cron job execution with trigger timestamp and status (~1 hr data)
4. Table machines
a. Contain machine metaData with healthStatus, numOfJobsBeingExecuted and corresponding
kafka topic/partition
Implementation - Cache
1. Will help us implementing Job Dependency requirement.
2. Whenever a job fails store its uuid with 24hrs TTL.
3. When a job succeeds, remove it from cache.
4. In the worst case it will have all unique cron jobs
a. 5000 applications * 100 jobs per application = 500000*5 bytes (2.5 MB)
Implementation - Scheduler Jobs
1. A job that run every 10th minute:
a. Generate sql files containing 1 row per 1 cron job to be executed in the next 10 minutes.
b. at 13:10 it will generate sql file containing jobs to be executed at [13:11, 13:12, …, 13:20]
c. After generating it dumps the file in cron_jobs_scheduled (2.5 lakh entries takes < 1 sec.
Implementation - Scheduler Jobs
1. A job that run every 10th minute:
a. Generate sql files containing 1 row per 1 cron job to be executed in the next 10 minutes.
b. at 13:10 it will generate sql file containing jobs to be executed at [13:11, 13:12, …, 13:20]
c. After generating it dumps the file in cron_jobs_scheduled (2.5 lakh entries takes < 1 sec)
2. A job that runs every 5th minute:
a. Get health of every machine.
b. Two bulk updates in machines table, each one for healthy and unhealthy.
Implementation - Scheduler Jobs
1. A job that run every 10th minute:
a. Generate sql files containing 1 row per 1 cron job to be executed in the next 10 minutes.
b. at 13:10 it will generate sql file containing jobs to be executed at [13:11, 13:12, …, 13:20]
c. After generating it dumps the file in cron_jobs_scheduled (2.5 lakh entries takes < 1 sec)
2. A job that runs every 5th minute:
a. Get health of every machine.
b. Two bulk updates in machines table, each one for healthy and unhealthy.
3. A job that runs every minute:
a. Update cron jobs in PENDING state with trigger time = curMinute - 60 as TIMEOUT
b. For every such job that needs to be retried, insert a row in the corresponding sql file.
Implementation - Scheduler Processes
1. A process that at 55th second of every minute:
a. Fetches & then deletes, from cron_jobs_scheduled, jobs to be executed at 60th s [< 1s]
b. Filter jobs whose any of its dependent jobs is in the cache [fetch from cache + build map +
filter] [< 3s]
c. Fetch machines and for each job pick the least busy healthy machine (else an unhealthy one)
[< 1s]
d. Bulk update machines table to increment numOfJobsBeingExecuted [< 100 ms]
e. Bulk insert in cron_jobs_executed [< 1s]
f. Push the jobs in kafka partition corresponding to the machine they were assigned [< 100 ms]
Implementation - Scheduler Processes
1. A process that at 55th second of every minute:
a. Fetches & then deletes, from cron_jobs_scheduled, jobs to be executed at 60th s [< 1s]
b. Filter jobs whose any of its dependent jobs is in the cache [fetch from cache + build map +
filter] [< 3s]
c. Fetch machines and for each job pick the least busy healthy machine (else an unhealthy one)
[< 1s]
d. Bulk update machines table to increment numOfJobsBeingExecuted [< 100 ms]
e. Bulk insert in cron_jobs_executed [< 1s]
f. Push the jobs in kafka partition corresponding to the machine they were assigned [< 100 ms]
2. A process that consumes from Kafka topic that gets execution metaData:
a. Asynchronous updates on cache & bulk updates in cron_job_execution for successful and
failed ones.
b. For every failed execution, insert in the sql file using the retry interval.
c. MetaData for each execution like time taken, logs can be consumed in elasticsearch.
Scaling Further
1. We used just one instance of DB, processes, jobs, cache.
2. Horizontal scaling can be used.
3. NoSQL can be evaluated as writes don’t lock the table.

More Related Content

What's hot

A brief history of system calls
A brief history of system callsA brief history of system calls
A brief history of system callsSysdig
 
Processes And Job Control
Processes And Job ControlProcesses And Job Control
Processes And Job Controlahmad bassiouny
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmJooho Lee
 
Linux Cluster Job Management Systems (SGE)
Linux Cluster Job Management Systems (SGE)Linux Cluster Job Management Systems (SGE)
Linux Cluster Job Management Systems (SGE)anandvaidya
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitAndrea Righi
 
Quay 3.3 installation
Quay 3.3 installationQuay 3.3 installation
Quay 3.3 installationJooho Lee
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance countersJean-Philippe BEMPEL
 
Git why how when and more
Git   why how when and moreGit   why how when and more
Git why how when and moreGastón Acosta
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
FOSDEM2015: Live migration for containers is around the corner
FOSDEM2015: Live migration for containers is around the cornerFOSDEM2015: Live migration for containers is around the corner
FOSDEM2015: Live migration for containers is around the cornerAndrey Vagin
 
Is ruby logger thread(process)-safe? at RubyConf 2013
Is ruby logger thread(process)-safe? at RubyConf 2013Is ruby logger thread(process)-safe? at RubyConf 2013
Is ruby logger thread(process)-safe? at RubyConf 2013Naotoshi Seo
 
Data Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleData Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleScyllaDB
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsBrendan Gregg
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing ToolsBrendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 

What's hot (19)

A brief history of system calls
A brief history of system callsA brief history of system calls
A brief history of system calls
 
Processes And Job Control
Processes And Job ControlProcesses And Job Control
Processes And Job Control
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvm
 
Linux Cluster Job Management Systems (SGE)
Linux Cluster Job Management Systems (SGE)Linux Cluster Job Management Systems (SGE)
Linux Cluster Job Management Systems (SGE)
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
 
Context switching
Context switchingContext switching
Context switching
 
Quay 3.3 installation
Quay 3.3 installationQuay 3.3 installation
Quay 3.3 installation
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance counters
 
Git why how when and more
Git   why how when and moreGit   why how when and more
Git why how when and more
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
FOSDEM2015: Live migration for containers is around the corner
FOSDEM2015: Live migration for containers is around the cornerFOSDEM2015: Live migration for containers is around the corner
FOSDEM2015: Live migration for containers is around the corner
 
Is ruby logger thread(process)-safe? at RubyConf 2013
Is ruby logger thread(process)-safe? at RubyConf 2013Is ruby logger thread(process)-safe? at RubyConf 2013
Is ruby logger thread(process)-safe? at RubyConf 2013
 
Data Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleData Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at Scale
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 

Similar to Building a cron scheduler

Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Eliran Eliassy
 
Sequential Models - Meaning, assumptions, Types and Problems
Sequential Models - Meaning, assumptions, Types and ProblemsSequential Models - Meaning, assumptions, Types and Problems
Sequential Models - Meaning, assumptions, Types and ProblemsSundar B N
 
11.minimizing rental cost under specified rental policy in two stage flowshop...
11.minimizing rental cost under specified rental policy in two stage flowshop...11.minimizing rental cost under specified rental policy in two stage flowshop...
11.minimizing rental cost under specified rental policy in two stage flowshop...Alexander Decker
 
Minimizing rental cost under specified rental policy in two stage flowshop se...
Minimizing rental cost under specified rental policy in two stage flowshop se...Minimizing rental cost under specified rental policy in two stage flowshop se...
Minimizing rental cost under specified rental policy in two stage flowshop se...Alexander Decker
 
Operations Management : Line Balancing
Operations Management : Line BalancingOperations Management : Line Balancing
Operations Management : Line BalancingRohan Bharaj
 
ITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdf
ITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdfITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdf
ITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdfOrtus Solutions, Corp
 
Document 14 (6).pdf
Document 14 (6).pdfDocument 14 (6).pdf
Document 14 (6).pdfRajMantry
 
Angular performance improvments
Angular performance improvmentsAngular performance improvments
Angular performance improvmentsEliran Eliassy
 
Ch5 process analysis
Ch5 process analysisCh5 process analysis
Ch5 process analysisvideoaakash15
 
Ch5 process+analysis
Ch5 process+analysisCh5 process+analysis
Ch5 process+analysisvideoaakash15
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward
 
C++ C++ C++ In Chapter 1- the class clockType was designed to implem.docx
C++ C++ C++   In Chapter 1- the class clockType was designed to implem.docxC++ C++ C++   In Chapter 1- the class clockType was designed to implem.docx
C++ C++ C++ In Chapter 1- the class clockType was designed to implem.docxCharlesCSZWhitei
 

Similar to Building a cron scheduler (20)

Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019Angular - Improve Runtime performance 2019
Angular - Improve Runtime performance 2019
 
Sequential Models - Meaning, assumptions, Types and Problems
Sequential Models - Meaning, assumptions, Types and ProblemsSequential Models - Meaning, assumptions, Types and Problems
Sequential Models - Meaning, assumptions, Types and Problems
 
Test
TestTest
Test
 
Tn6 facility layout
Tn6 facility layoutTn6 facility layout
Tn6 facility layout
 
Tn6 facility+layout
Tn6 facility+layoutTn6 facility+layout
Tn6 facility+layout
 
Runtime performance
Runtime performanceRuntime performance
Runtime performance
 
11.minimizing rental cost under specified rental policy in two stage flowshop...
11.minimizing rental cost under specified rental policy in two stage flowshop...11.minimizing rental cost under specified rental policy in two stage flowshop...
11.minimizing rental cost under specified rental policy in two stage flowshop...
 
Minimizing rental cost under specified rental policy in two stage flowshop se...
Minimizing rental cost under specified rental policy in two stage flowshop se...Minimizing rental cost under specified rental policy in two stage flowshop se...
Minimizing rental cost under specified rental policy in two stage flowshop se...
 
Operations Management : Line Balancing
Operations Management : Line BalancingOperations Management : Line Balancing
Operations Management : Line Balancing
 
ITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdf
ITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdfITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdf
ITB_2023_Human-Friendly_Scheduled_Tasks_Giancarlo_Gomez.pdf
 
Document 14 (6).pdf
Document 14 (6).pdfDocument 14 (6).pdf
Document 14 (6).pdf
 
Angular performance improvments
Angular performance improvmentsAngular performance improvments
Angular performance improvments
 
Ch5 process analysis
Ch5 process analysisCh5 process analysis
Ch5 process analysis
 
Ch5 process+analysis
Ch5 process+analysisCh5 process+analysis
Ch5 process+analysis
 
Scheduling
SchedulingScheduling
Scheduling
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
 
C++ C++ C++ In Chapter 1- the class clockType was designed to implem.docx
C++ C++ C++   In Chapter 1- the class clockType was designed to implem.docxC++ C++ C++   In Chapter 1- the class clockType was designed to implem.docx
C++ C++ C++ In Chapter 1- the class clockType was designed to implem.docx
 
DonnerCompany
DonnerCompanyDonnerCompany
DonnerCompany
 
SECON'2014 - Филипп Торчинский - Трансформация баг-трекера под любой проект: ...
SECON'2014 - Филипп Торчинский - Трансформация баг-трекера под любой проект: ...SECON'2014 - Филипп Торчинский - Трансформация баг-трекера под любой проект: ...
SECON'2014 - Филипп Торчинский - Трансформация баг-трекера под любой проект: ...
 
Salesforce asynchronous apex
Salesforce asynchronous apexSalesforce asynchronous apex
Salesforce asynchronous apex
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

Building a cron scheduler

  • 1. Building a Cron Scheduler By Himanshu
  • 2. What is it? 1. Schedules job periodically.
  • 3. What is it? 1. Schedules job periodically. 2. Granularity in minutes.
  • 4. What is it? 1. Schedules job periodically. 2. Granularity in minutes. How do I retry asynchronous API requests after few seconds?
  • 5. What is it? 1. Schedules job periodically. 2. Granularity in minutes. E-commerce companies need to change prices on demand.
  • 6. What is it? 1. Schedules job periodically. 2. Granularity in minutes. Need to send confirmation message after 10 seconds?
  • 7. What is it? 1. Schedules job periodically. 2. Granularity in minutes. Event based Scheduler is the answer.
  • 8. What is it? 1. Schedules job periodically. 2. Granularity in minutes. 3. Some jobs may take 30 minutes to execute. Events take less than a second to execute.
  • 10. Functional Requirements 1. Job Dependency a. Execute a job only if the most recent execution of its dependent jobs in the last 24 hrs is successful.
  • 11. Functional Requirements 1. Job Dependency 2. Job Retry
  • 12. Functional Requirements 1. Job Dependency 2. Job Retry a. If it fails, retry it after a given time interval.
  • 13. Functional Requirements 1. Job Dependency 2. Job Retry 3. Job Timeout
  • 14. Functional Requirements 1. Job Dependency 2. Job Retry 3. Job Timeout a. If execution takes more than 1 hr, timeout the job. b. Developers must break their jobs if it takes more than 1 hr.
  • 15. Functional Requirements 1. Job Dependency 2. Job Retry 3. Job Timeout 4. Job Criticality
  • 16. Functional Requirements 1. Job Dependency 2. Job Retry 3. Job Timeout 4. Job Criticality a. It can give a list of machines where it can be executed.
  • 17. Functional Requirements 1. Job Dependency 2. Job Retry 3. Job Timeout 4. Job Criticality a. It can give a list of machines where it can be executed. And so we are designing our own scheduler!!
  • 18. Non Functional Requirements 1. 5000 applications need the scheduler.
  • 19. Non Functional Requirements 1. 5000 applications need the scheduler. 2. On an average, an application needs to execute 5 jobs every minute.
  • 20. Non Functional Requirements 1. 5000 applications need the scheduler. 2. On an average, an application needs to execute 5 jobs every minute. 25000 jobs every minute!!
  • 21. Ideal Job? 1. Idempotent a. f(f(x)) = f(x) b. Re-running a job shouldn’t redo things. Payment done once shouldn’t be done again. c. A job may execute well but the cron may not get the status. d. A job may fail in between.
  • 22. Ideal Job? 1. Idempotent a. f(f(x)) = f(x) b. Re-running a job shouldn’t redo things. Payment done once shouldn’t be done again. c. A job may execute well but the cron may not get the status. d. A job may fail in between. 2. Transactional a. Commit changes together. b. A job may fail in between.
  • 23. Implementation 1. Application will provide machines for job execution. a. Execution time will not be a bottleneck in scaling the cron scheduler.
  • 24. Implementation 1. Application will provide machines for job execution. a. Execution time will not be a bottleneck in scaling the cron scheduler. 2. To schedule a cron job, application will give following parameters: a. Unique identifier (uuid) b. Schedule time (*/10 * * * *) c. Machines where the job can be executed ([{ip1, user1}, {ip2, user2}]) d. Command to execute the job (node <path> <arguments>) e. Retry (time interval) f. Dependent jobs ([uuid1, uuid2])
  • 26. Implementation - Kafka 1. Topics that will get jobs to be executed a. Cron scheduler will produce into it. b. Each machine will have a kafka consumer to consume the jobs it needs to execute. c. 1-1 Partition-machine mapping for each partition to gets jobs for the same machine in order. d. 5000 applications will have at most 50000 machines and so 5 kafka brokers are needed. e. Expiry of 1 hr as we have timeout of 1 hr.
  • 27. Implementation - Kafka 1. Topics that will get jobs to be executed a. Cron scheduler will produce into it. b. Each machine will have a kafka consumer to consume the jobs it needs to execute. c. 1-1 Partition-machine mapping for each partition to gets jobs for the same machine in order. d. 5000 applications will have at most 50000 machines and so 5 kafka brokers are needed. e. Expiry of 1 hr as we have timeout of 1 hr. 2. Topics that will get metaData of each job execution a. Each machine will produce into it. b. One partition needed as order is not important while consuming.
  • 28. Implementation - RDBMS 1. Table cron_jobs a. Store cron jobs with their metaData. 2. Table cron_jobs_scheduled a. Contain cron jobs to be executed in next 10 minutes with trigger timestamp in minutes. 3. Table cron_jobs_executed a. Each attempt of cron job execution with trigger timestamp and status (~1 hr data) 4. Table machines a. Contain machine metaData with healthStatus, numOfJobsBeingExecuted and corresponding kafka topic/partition
  • 29. Implementation - Cache 1. Will help us implementing Job Dependency requirement. 2. Whenever a job fails store its uuid with 24hrs TTL. 3. When a job succeeds, remove it from cache. 4. In the worst case it will have all unique cron jobs a. 5000 applications * 100 jobs per application = 500000*5 bytes (2.5 MB)
  • 30. Implementation - Scheduler Jobs 1. A job that run every 10th minute: a. Generate sql files containing 1 row per 1 cron job to be executed in the next 10 minutes. b. at 13:10 it will generate sql file containing jobs to be executed at [13:11, 13:12, …, 13:20] c. After generating it dumps the file in cron_jobs_scheduled (2.5 lakh entries takes < 1 sec.
  • 31. Implementation - Scheduler Jobs 1. A job that run every 10th minute: a. Generate sql files containing 1 row per 1 cron job to be executed in the next 10 minutes. b. at 13:10 it will generate sql file containing jobs to be executed at [13:11, 13:12, …, 13:20] c. After generating it dumps the file in cron_jobs_scheduled (2.5 lakh entries takes < 1 sec) 2. A job that runs every 5th minute: a. Get health of every machine. b. Two bulk updates in machines table, each one for healthy and unhealthy.
  • 32. Implementation - Scheduler Jobs 1. A job that run every 10th minute: a. Generate sql files containing 1 row per 1 cron job to be executed in the next 10 minutes. b. at 13:10 it will generate sql file containing jobs to be executed at [13:11, 13:12, …, 13:20] c. After generating it dumps the file in cron_jobs_scheduled (2.5 lakh entries takes < 1 sec) 2. A job that runs every 5th minute: a. Get health of every machine. b. Two bulk updates in machines table, each one for healthy and unhealthy. 3. A job that runs every minute: a. Update cron jobs in PENDING state with trigger time = curMinute - 60 as TIMEOUT b. For every such job that needs to be retried, insert a row in the corresponding sql file.
  • 33. Implementation - Scheduler Processes 1. A process that at 55th second of every minute: a. Fetches & then deletes, from cron_jobs_scheduled, jobs to be executed at 60th s [< 1s] b. Filter jobs whose any of its dependent jobs is in the cache [fetch from cache + build map + filter] [< 3s] c. Fetch machines and for each job pick the least busy healthy machine (else an unhealthy one) [< 1s] d. Bulk update machines table to increment numOfJobsBeingExecuted [< 100 ms] e. Bulk insert in cron_jobs_executed [< 1s] f. Push the jobs in kafka partition corresponding to the machine they were assigned [< 100 ms]
  • 34. Implementation - Scheduler Processes 1. A process that at 55th second of every minute: a. Fetches & then deletes, from cron_jobs_scheduled, jobs to be executed at 60th s [< 1s] b. Filter jobs whose any of its dependent jobs is in the cache [fetch from cache + build map + filter] [< 3s] c. Fetch machines and for each job pick the least busy healthy machine (else an unhealthy one) [< 1s] d. Bulk update machines table to increment numOfJobsBeingExecuted [< 100 ms] e. Bulk insert in cron_jobs_executed [< 1s] f. Push the jobs in kafka partition corresponding to the machine they were assigned [< 100 ms] 2. A process that consumes from Kafka topic that gets execution metaData: a. Asynchronous updates on cache & bulk updates in cron_job_execution for successful and failed ones. b. For every failed execution, insert in the sql file using the retry interval. c. MetaData for each execution like time taken, logs can be consumed in elasticsearch.
  • 35. Scaling Further 1. We used just one instance of DB, processes, jobs, cache. 2. Horizontal scaling can be used. 3. NoSQL can be evaluated as writes don’t lock the table.