What is Batch Computing?
Run jobs asynchronously and automatically across one or more
Jobs may have dependencies, making the sequencing and scheduling of
multiple jobs complex and challenging.
Batch Computing Today
• In-house compute clusters powered by open source or
commercial job schedulers.
• Often comprised of a large array of identical,
undifferentiated processors, all of the same vintage and
built to the same specifications.
• Fully managed batch processing
• Enables developers, scientists, and engineers to easily
and efficiently run hundreds of thousands of batch
computing jobs on AWS
• Jobs executed as containerized applications
• Dynamically provisions the optimal compute resources
• Allows you to focus on analyzing results and solving
Job queues are mapped to one or more compute environments
• instance types – or choose “optimal”
• min/max/desired vCPUs
• Spot or On-Demand provisioning
Batch launches and scales resources on
Instances must include ECS agent and
run supported operating systems and
You control the instances and scaling
Customer Provided AMIs
Customer Provided AMIs let you set the AMI that is
launched as part of a managed compute environment.
Makes it possible to configure Docker settings, mount
EBS/EFS volumes, and configure drivers for GPU jobs.
AMIs must be Linux-based, HVM and have a working ECS
Jobs are submitted to a job queue, where they reside until they are
able to be scheduled to a compute resource. Information related to
completed jobs persists in the queue for 24 hours.
Job queues support priorities and multiple queues can schedule work
to the same compute environment.
AWS Batch job definitions specify how jobs are to be run.
Some of the attributes specified in a job definition:
• IAM role associated with the job
• vCPU and memory requirements
• Mount points
• Container properties
• Environment variables
• Retry strategy
• While each job must reference a job definition, many parameters
can be overridden.
Jobs are the unit of work executed by AWS Batch as containerized
applications running on Amazon EC2.
Containerized jobs can reference a container image, command, and
Or, users can fetch a .zip containing their application and run it on a
Amazon Linux container.
Collection of jobs (between 2 and 10,000) that share common
parameters, such as the job definition, vCPUs, and memory.
Distributed across multiple hosts and may run concurrently
First child job must succeed
before the next child job
Each index child of this job
must wait for the
corresponding index child of
each dependency to
complete before it can
Jobs submitted to a queue can have the following states:
SUBMITTED: Accepted into the queue, but not yet evaluated for execution
PENDING: Your job has dependencies on other jobs which have not yet completed
RUNNABLE: Your job has been evaluated by the scheduler and is ready to run
STARTING: Your job is in the process of being scheduled to a compute resource
RUNNING: Your job is currently running
SUCCEEDED: Your job has finished with exit code 0
FAILED: Your job finished with a non-zero exit code, was cancelled or terminated.
The scheduler evaluates when, where, and how to run jobs
that have been submitted to a job queue.
Jobs run in approximately the order in which they are
submitted, as long as all dependencies on other jobs have
AWS Batch and CloudWatch Events
Event stream – near real-time notifications regarding the current state
• Monitor the progress of jobs
• Build custom workflows with complex dependencies
• Generate usage reports or metrics around job execution
• Build your own custom dashboards
Jobs are available as CloudWatch Events targets:
• Match events and submit AWS Batch jobs in response to them