SlideShare a Scribd company logo
1 of 14
Download to read offline
7 WAYS TO EXECUTE
SCHEDULED JOBS
WITH PYTHON
GUEST POST: TIM MUGAYI
JOB SCHEDULING
Job scheduling is a common programming challenge that
most organizations and developers at some point must
tackle in order to solve critical problems. This is further
exacerbated by the proliferation of big data and training
models for machine learning.
Havingtheabilitytocrunchpetabytesofdataonapredictable
and automated basis in order to derive insights is a key
driving factor to set you apart from the competition.
There are various opensource solutions, such as Hadoop
and Apache-Spark, and proprietary vendor solutions, such
as AWS Glue, used to handle these large data sets. One
key component that is required by all these technologies
is a “job scheduler” that gives the ability to trigger events
in predefined time intervals in a fault-tolerant and scalable
way.
SCHEDULED
JOBS
ARE AUTOMATED PIECES OF WORK THAT CAN BE PERFORMED AT A SPECIFIC TIME OR ON A RECURRING BASIS
WHICH PREDOMINATELY WORK WITH UNIX STYLE EXPRESSIONS CALLED CRON. THESE ARE TIME-BASED EVENT
TRIGGERS, WHICH ENABLE APPLICATIONS TO SCHEDULE A WORK TO BE PERFORMED AT A CERTAIN DATE OR
TIME BASED ON CRON EXPRESSIONS.
Many applications need to schedule routine tasks like system
maintenance, administration, taking a daily backup of data or
sending emails. If you code often, there will always be a need to run
some event or task at a predefined time period. A scheduled job
can be synchronous or asynchronous, spanning any arbitrary time
frame. The infrastructure that it was scheduled on might be entirely
different than that on which it runs.
APPLICATIONS
PREREQUISITES
The objective of this slide deck is to outline innovative
options that you have available at your disposal when
crafting out your next Job scheduler in Python. Then you
can immediately start automating your python and data
science solutions.
In order to follow along ensure you have Python ≥ 3.5
installed with Anaconda environment or python virtual
environment configured, so you can run some of the
sample codes to see how some of the libraries work.
OPTIONS
APSCHEDULER1 5
2 6
3 7
4
CRONTAB
AWS CRON
CELERY PERIODICAL
TASKS
TIMELOOP
QUEUE-BASED
DECOUPLED SCHEDULING
APACHE AIRFLOW
APSCHEDULER1 MEMORY
(Host machine in-memory scheduler)
SQL ALCHEMY
(Any RDBMS supported by SQLAlchemy)
MONGO DB
(NoSQL database)
REDIS
(In-memory key-value pair data structure store)
1
This is probably one of the easiest ways you can add a
cron-like scheduler into your web-based or standalo-
ne python applications. This library is pretty easy to get
started with and offers multiple backends, also known as
job-stores, such as:
RETHINKDB5
4
3
6 ZOOKEEPER
...
2
LOREN IP-
Backends provide a storage location where
you can persist your triggers. For example, if
you set your python script to execute every
day at 5 pm, you have created a trigger.
If you shut down your program or if your
program terminates unexpectedly — upon
resuming your script, this data can be read
from your persistence store and continue
firing your python script as per your
configured schedule.
Trigger stores also make sense in situations
where you do not wish to hard code triggers
or utilize redeployment cycles. They give the
option to dynamically change such triggers
through backends or offer users to change
triggers via a user interface. Your choice of
backends entirely depends on your stack.
APSCHEDULER1
optional start/end
times
Cron-style
Scheduling
APScheduler offers three basic scheduling systems that
should meet most of your job scheduler needs:
run jobs on even
intervals, with optional
start/end times
Interval-based
Execution
One-off Delayed
Execution
run jobs once, on a set
date/time
...continued
Cron is a utility that allows us
to schedule tasks in Unix-based
systemsusingCronexpressions.
The tasks in cron are defined in
a crontab, which is a text file
containing the commands to be
executed. The syntax used in a
crontab is described below in
this article.
Python presents us with the
crontab module to manage
scheduled jobs via cron. The
functions available in it allow
you to access cron, create jobs,
set restrictions, remove jobs, and
more without having to manually
write crontab files yourself. In
this article, we will show how to
use these operations from within
your Python code.
Cron uses a specific syntax to
define the time schedules. It
consists of five fields, which are
separated by white spaces.
The Python module crontab
provides us with a handy tool to
programmatically manage our
cron entries which are available
to Unix-like systems. Using the
python interface makes life easier
than having to rely on creating
crontabs manually. For more
details on all you can do with
crontab, you can read up on the
API documentation.
CronTab2
Thisoptionistheconventionalapproach
on AWS to create cron triggers. The
approachleveragesAWSLambda,which
is being triggered by the Cloudwatch
cron event trigger. Since lambda is being
leveraged here, your python code is
bound to the limitation of lambda.
Amazon Elastic Container Service
(Amazon ECS) is a fully-managed
container orchestration service. If you
choose to use this approach — you
have to be comfortable using docker.
The idea of building ECS schedule
tasks is similar to that of using lambda,
but lambda is only used to trigger the
execution of your ECS task definition
which corresponds to a docker image
hosting your python code. CloudWatch
events are still used as the cron trigger.
If you take the concepts of CloudWatch
event triggers, and mash them up with
lambda and EC2 instances, you get
this approach. When you need your
python code to do more, perhaps some
CPU intensive task that requires more
resources, but don’t need the benefits
of Serverless, this approach might make
sense.
CloudWatch Events
with Lambada
AWS Cron Jobs3
ECS Scheduled Tasks CloudWatch Events with
Lambda and EC2
AWS Batch
If AWS is your primary development environment and you’re not concerned with vendor lock, you
have a couple of options at your disposal to get your python code working in a cron-like fashion.
This option includes more complicated
tasksthatspanbeyondthelimitationsof
serverlessortasksthatcantakehoursor
days to complete. AWS batch manages
the scheduling and provisioning of
the work. You can define multi-stage
pipelines where each stage depends
on the completion of the previous
one. AWS batch works on a queue and
initializes EC2 instances on a per need
basis. The benefits of this approach are
you only pay for what you use, and the
managing of instances is done for you.
Celery is a python framework that allows distributed processing
of tasks in an asynchronous fashion via message brokers such
as RabbitMQ, SQS, and Redis. Celery has been built around the
consumer-producer FIFO queue design pattern. Though mainly
used for such use cases, it has an built-in scheduler that you
can take advantage of named beat.
Beat, as the name implies, is a scheduler that places
messages in a message broker queue when the predefined
time interval is reached-either via basic intervals or complex
cron expressions. Once beat places messages in the message
broker, they become available for consumption by the next
available Celery worker.
Celery Periodicial
Tasks
4
Something to take note of is that jobs may overlap if
the job does not complete before the next is triggered. This
is something to keep in mind whenever you craft out your job
scheduler. Things like semaphores and Redis locks can be used
to mitigate this behavior if it’s not desired.
Timeloop5
Timeloop is a library that can be used to run multiple period tasks. This is a simple library that
uses a decorator pattern for running tagged functions in threads. If you are looking to take ad-
vantage of cores, this might not be the library to use. It’s sufficient enough for simple use cases
if you don’t need a full-blown framework or you need something simple to incorporate into your
web or standalone python applications. To get started using the library, install it via pip.
The core feature in this design is the
producer, which handles the cron events
and publishes them to an exchange.
Scheduled workers can simply bind to a
shared queue. This design is implement-
ed with an AMQP system such as Rabbit-
MQ, is vendor-neutral, and can be multi-
cloud. Python has RabbitMQ clients such
as pika that are easy to work with and get
started.
The queue-based job scheduler design
is a clean way to decouple producers
and consumers. It also solves a few of
the issues with many of the traditional
schedulers not having replay mechanism.
Queue-based job schedulers are useful
in the event you outgrow simple cron-
based schedulers. There are instances
where you require a little bit more out
of your python-scheduled jobs such as:
•	 Job schedules that need to handle
complex relationships between jobs
(e.g one job triggers another job) such
as a state machine; or workflow-based
scheduled jobs such as big data ETL
scheduling.
•	 You need to have complex retry
mechanisms for failed scheduled jobs
with reporting and alert mechanisms.
PRODUCER
Queue-Based
Decoupled Scheduling
6
If you have a need to add predictability
and redundancy to your scheduled jobs,
you can opt to build out a distributed job
scheduler by utilizing AMQP, RabbitMQ
or any queue stack of your choosing.
This gives you the ability to scale your job
scheduler. Leveraging on queue delegates
work across consumers via an exchange
that determines what kind of messages-
the queue should get, depending on the
exchange type and some of the queue
parameters.
An exchange is effectively a safe place
to publish messages that are decoupled
from the producer and provide all needed
information about available consumers.
The exchange will take a message and for-
ward it along.
USE
Airflow’s Direct Acrylic Graph, DAG for short, offers a
way to build and schedule complex and dynamic data
workflows using python. Airflow is mostly known for
its ability to build out workflows that tap into external
resources such as RDBMS databases or custom scripts
to perform ETL related data transformation and cleanup.
Though that’s not the only thing it’s good for. With a few
configurations in your python code, you can build out a
pretty robust job scheduler that you can integrate easily
with Dask and other machine learning frameworks.
Apache Airflow7
DAGs Operators Executors
Stay up to date with Saturn Cloud on LinkedIn and Twitter.
You may also be interested in: Linear Models in Python.
With the tools and options presented in this article, managing your scheduled
jobs does not have to be tedious. You can quickly start building out job schedul-
ers that can help you automate most, if not all, of your data science pipelines and
build-out ETL job schedulers that perform data extraction. It can help you pipe
extracted data into services such as Dask, offered by Saturn Cloud, which makes
scaling your data science, deep learning and machine learning models a lot easier.
Original blog post here.
THANK YOU!
SATURN CLOUD
33 IRVING PL
NEW YORK, NY 10003
SUPPORT@SATURNCLOUD.IO
(831) 228-8739

More Related Content

What's hot

Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and visionStephan Ewen
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Brian Brazil
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Databricks
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibSpark Summit
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkFlink Forward
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinFlink Forward
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Malo Denielou
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 

What's hot (20)

Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlib
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 

Similar to 7 ways to execute scheduled jobs with python

MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Function App Integration with Dapper ORM.pptx
Function App Integration with Dapper ORM.pptxFunction App Integration with Dapper ORM.pptx
Function App Integration with Dapper ORM.pptxKnoldus Inc.
 
Function App Integration with Dapper ORM
Function App Integration with Dapper ORMFunction App Integration with Dapper ORM
Function App Integration with Dapper ORMKnoldus Inc.
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflowmutt_data
 
Serverless Computing
Serverless ComputingServerless Computing
Serverless ComputingAnand Gupta
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxJohn J Zhao
 
Big Data Engineering for Machine Learning
Big Data Engineering for Machine LearningBig Data Engineering for Machine Learning
Big Data Engineering for Machine LearningVasu S
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowPyData
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
You are to simulate a dispatcher using a priority queue system.New.pdf
You are to simulate a dispatcher using a priority queue system.New.pdfYou are to simulate a dispatcher using a priority queue system.New.pdf
You are to simulate a dispatcher using a priority queue system.New.pdfgardenvarelianand
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3'sdelagoya
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage MagicPerforce
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsxManKD
 
Asynchronous and parallel programming
Asynchronous and parallel programmingAsynchronous and parallel programming
Asynchronous and parallel programmingAnil Kumar
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web ApplicationMichael Choi
 

Similar to 7 ways to execute scheduled jobs with python (20)

Lab3F22.pdf
Lab3F22.pdfLab3F22.pdf
Lab3F22.pdf
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Function App Integration with Dapper ORM.pptx
Function App Integration with Dapper ORM.pptxFunction App Integration with Dapper ORM.pptx
Function App Integration with Dapper ORM.pptx
 
Function App Integration with Dapper ORM
Function App Integration with Dapper ORMFunction App Integration with Dapper ORM
Function App Integration with Dapper ORM
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Function as a Service
Function as a ServiceFunction as a Service
Function as a Service
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Serverless Computing
Serverless ComputingServerless Computing
Serverless Computing
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
Big Data Engineering for Machine Learning
Big Data Engineering for Machine LearningBig Data Engineering for Machine Learning
Big Data Engineering for Machine Learning
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
You are to simulate a dispatcher using a priority queue system.New.pdf
You are to simulate a dispatcher using a priority queue system.New.pdfYou are to simulate a dispatcher using a priority queue system.New.pdf
You are to simulate a dispatcher using a priority queue system.New.pdf
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsx
 
Asynchronous and parallel programming
Asynchronous and parallel programmingAsynchronous and parallel programming
Asynchronous and parallel programming
 
System design for Web Application
System design for Web ApplicationSystem design for Web Application
System design for Web Application
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

7 ways to execute scheduled jobs with python

  • 1. 7 WAYS TO EXECUTE SCHEDULED JOBS WITH PYTHON GUEST POST: TIM MUGAYI
  • 2. JOB SCHEDULING Job scheduling is a common programming challenge that most organizations and developers at some point must tackle in order to solve critical problems. This is further exacerbated by the proliferation of big data and training models for machine learning. Havingtheabilitytocrunchpetabytesofdataonapredictable and automated basis in order to derive insights is a key driving factor to set you apart from the competition. There are various opensource solutions, such as Hadoop and Apache-Spark, and proprietary vendor solutions, such as AWS Glue, used to handle these large data sets. One key component that is required by all these technologies is a “job scheduler” that gives the ability to trigger events in predefined time intervals in a fault-tolerant and scalable way. SCHEDULED JOBS ARE AUTOMATED PIECES OF WORK THAT CAN BE PERFORMED AT A SPECIFIC TIME OR ON A RECURRING BASIS WHICH PREDOMINATELY WORK WITH UNIX STYLE EXPRESSIONS CALLED CRON. THESE ARE TIME-BASED EVENT TRIGGERS, WHICH ENABLE APPLICATIONS TO SCHEDULE A WORK TO BE PERFORMED AT A CERTAIN DATE OR TIME BASED ON CRON EXPRESSIONS.
  • 3. Many applications need to schedule routine tasks like system maintenance, administration, taking a daily backup of data or sending emails. If you code often, there will always be a need to run some event or task at a predefined time period. A scheduled job can be synchronous or asynchronous, spanning any arbitrary time frame. The infrastructure that it was scheduled on might be entirely different than that on which it runs. APPLICATIONS PREREQUISITES The objective of this slide deck is to outline innovative options that you have available at your disposal when crafting out your next Job scheduler in Python. Then you can immediately start automating your python and data science solutions. In order to follow along ensure you have Python ≥ 3.5 installed with Anaconda environment or python virtual environment configured, so you can run some of the sample codes to see how some of the libraries work.
  • 4. OPTIONS APSCHEDULER1 5 2 6 3 7 4 CRONTAB AWS CRON CELERY PERIODICAL TASKS TIMELOOP QUEUE-BASED DECOUPLED SCHEDULING APACHE AIRFLOW
  • 5. APSCHEDULER1 MEMORY (Host machine in-memory scheduler) SQL ALCHEMY (Any RDBMS supported by SQLAlchemy) MONGO DB (NoSQL database) REDIS (In-memory key-value pair data structure store) 1 This is probably one of the easiest ways you can add a cron-like scheduler into your web-based or standalo- ne python applications. This library is pretty easy to get started with and offers multiple backends, also known as job-stores, such as: RETHINKDB5 4 3 6 ZOOKEEPER ... 2
  • 6. LOREN IP- Backends provide a storage location where you can persist your triggers. For example, if you set your python script to execute every day at 5 pm, you have created a trigger. If you shut down your program or if your program terminates unexpectedly — upon resuming your script, this data can be read from your persistence store and continue firing your python script as per your configured schedule. Trigger stores also make sense in situations where you do not wish to hard code triggers or utilize redeployment cycles. They give the option to dynamically change such triggers through backends or offer users to change triggers via a user interface. Your choice of backends entirely depends on your stack. APSCHEDULER1 optional start/end times Cron-style Scheduling APScheduler offers three basic scheduling systems that should meet most of your job scheduler needs: run jobs on even intervals, with optional start/end times Interval-based Execution One-off Delayed Execution run jobs once, on a set date/time ...continued
  • 7. Cron is a utility that allows us to schedule tasks in Unix-based systemsusingCronexpressions. The tasks in cron are defined in a crontab, which is a text file containing the commands to be executed. The syntax used in a crontab is described below in this article. Python presents us with the crontab module to manage scheduled jobs via cron. The functions available in it allow you to access cron, create jobs, set restrictions, remove jobs, and more without having to manually write crontab files yourself. In this article, we will show how to use these operations from within your Python code. Cron uses a specific syntax to define the time schedules. It consists of five fields, which are separated by white spaces. The Python module crontab provides us with a handy tool to programmatically manage our cron entries which are available to Unix-like systems. Using the python interface makes life easier than having to rely on creating crontabs manually. For more details on all you can do with crontab, you can read up on the API documentation. CronTab2
  • 8. Thisoptionistheconventionalapproach on AWS to create cron triggers. The approachleveragesAWSLambda,which is being triggered by the Cloudwatch cron event trigger. Since lambda is being leveraged here, your python code is bound to the limitation of lambda. Amazon Elastic Container Service (Amazon ECS) is a fully-managed container orchestration service. If you choose to use this approach — you have to be comfortable using docker. The idea of building ECS schedule tasks is similar to that of using lambda, but lambda is only used to trigger the execution of your ECS task definition which corresponds to a docker image hosting your python code. CloudWatch events are still used as the cron trigger. If you take the concepts of CloudWatch event triggers, and mash them up with lambda and EC2 instances, you get this approach. When you need your python code to do more, perhaps some CPU intensive task that requires more resources, but don’t need the benefits of Serverless, this approach might make sense. CloudWatch Events with Lambada AWS Cron Jobs3 ECS Scheduled Tasks CloudWatch Events with Lambda and EC2 AWS Batch If AWS is your primary development environment and you’re not concerned with vendor lock, you have a couple of options at your disposal to get your python code working in a cron-like fashion. This option includes more complicated tasksthatspanbeyondthelimitationsof serverlessortasksthatcantakehoursor days to complete. AWS batch manages the scheduling and provisioning of the work. You can define multi-stage pipelines where each stage depends on the completion of the previous one. AWS batch works on a queue and initializes EC2 instances on a per need basis. The benefits of this approach are you only pay for what you use, and the managing of instances is done for you.
  • 9. Celery is a python framework that allows distributed processing of tasks in an asynchronous fashion via message brokers such as RabbitMQ, SQS, and Redis. Celery has been built around the consumer-producer FIFO queue design pattern. Though mainly used for such use cases, it has an built-in scheduler that you can take advantage of named beat. Beat, as the name implies, is a scheduler that places messages in a message broker queue when the predefined time interval is reached-either via basic intervals or complex cron expressions. Once beat places messages in the message broker, they become available for consumption by the next available Celery worker. Celery Periodicial Tasks 4 Something to take note of is that jobs may overlap if the job does not complete before the next is triggered. This is something to keep in mind whenever you craft out your job scheduler. Things like semaphores and Redis locks can be used to mitigate this behavior if it’s not desired.
  • 10. Timeloop5 Timeloop is a library that can be used to run multiple period tasks. This is a simple library that uses a decorator pattern for running tagged functions in threads. If you are looking to take ad- vantage of cores, this might not be the library to use. It’s sufficient enough for simple use cases if you don’t need a full-blown framework or you need something simple to incorporate into your web or standalone python applications. To get started using the library, install it via pip.
  • 11. The core feature in this design is the producer, which handles the cron events and publishes them to an exchange. Scheduled workers can simply bind to a shared queue. This design is implement- ed with an AMQP system such as Rabbit- MQ, is vendor-neutral, and can be multi- cloud. Python has RabbitMQ clients such as pika that are easy to work with and get started. The queue-based job scheduler design is a clean way to decouple producers and consumers. It also solves a few of the issues with many of the traditional schedulers not having replay mechanism. Queue-based job schedulers are useful in the event you outgrow simple cron- based schedulers. There are instances where you require a little bit more out of your python-scheduled jobs such as: • Job schedules that need to handle complex relationships between jobs (e.g one job triggers another job) such as a state machine; or workflow-based scheduled jobs such as big data ETL scheduling. • You need to have complex retry mechanisms for failed scheduled jobs with reporting and alert mechanisms. PRODUCER Queue-Based Decoupled Scheduling 6 If you have a need to add predictability and redundancy to your scheduled jobs, you can opt to build out a distributed job scheduler by utilizing AMQP, RabbitMQ or any queue stack of your choosing. This gives you the ability to scale your job scheduler. Leveraging on queue delegates work across consumers via an exchange that determines what kind of messages- the queue should get, depending on the exchange type and some of the queue parameters. An exchange is effectively a safe place to publish messages that are decoupled from the producer and provide all needed information about available consumers. The exchange will take a message and for- ward it along. USE
  • 12. Airflow’s Direct Acrylic Graph, DAG for short, offers a way to build and schedule complex and dynamic data workflows using python. Airflow is mostly known for its ability to build out workflows that tap into external resources such as RDBMS databases or custom scripts to perform ETL related data transformation and cleanup. Though that’s not the only thing it’s good for. With a few configurations in your python code, you can build out a pretty robust job scheduler that you can integrate easily with Dask and other machine learning frameworks. Apache Airflow7 DAGs Operators Executors
  • 13. Stay up to date with Saturn Cloud on LinkedIn and Twitter. You may also be interested in: Linear Models in Python. With the tools and options presented in this article, managing your scheduled jobs does not have to be tedious. You can quickly start building out job schedul- ers that can help you automate most, if not all, of your data science pipelines and build-out ETL job schedulers that perform data extraction. It can help you pipe extracted data into services such as Dask, offered by Saturn Cloud, which makes scaling your data science, deep learning and machine learning models a lot easier. Original blog post here.
  • 14. THANK YOU! SATURN CLOUD 33 IRVING PL NEW YORK, NY 10003 SUPPORT@SATURNCLOUD.IO (831) 228-8739