5. • Batch processing – overview and challenges
• Why run batch workloads in the cloud
• Overview of AWS batch solutions
• Deep dive look at AWS Batch and Amazon ECS
• Best practices review
Agenda
6. Challenges of Running Batch Workloads
• Typically resource intensive
• Time constraint for completion
• Potential impact to concurrent batch jobs
• Scaling infrastructure resources
• Ensuring effective resource utilization and cost savings
• Fragile and unreliable
7. What Batch Workloads Need
Reliability Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
8. Why the cloud makes sense for batch workloads
Reliable Scalable Pay as you goInfrastructure as
code
9. Why containers make sense for batch workloads
• Simple to model
• Polyglot
• Image is the version
• Do one thing well
• You build it, you run it
• Black box
11. Introducing AWS Batch
• Fully managed batch primitives
• Focus on your applications (shell scripts,
Linux executables, Docker images) and
their resource requirements
• We take care of the rest!
12. IAM role for the
AWS Batch job
Input files
Queue of
runnable jobs
S3 events trigger a
Lambda function that submits
an AWS Batch job
AWS Batch
compute environments
AWS Batch
job output
Typical AWS Batch Job Architecture
Job definition
Job resource requirements
and other parameters
AWS Batch execution
Application
image
AWS Batch
Scheduler
14. File put into
S3 bucket
Amazon
Simple Queue
Service
Output to S3
bucket
Amazon ECS provisions compute
clusters and schedules tasks based
on demand
Batch worker
task polls
SQS for new
jobs
Queue load is
communicated to
ECS
Containerized
batch worker
processes file
Basic batch workflow with ECS
15. Trigger Batch Processing with Lambda
Amazon ECS
Availability Zone Availability Zone
Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3 Bucket
(Source)
ecs:RunTask
Amazon
S3 Bucket
(Target)
Amazon
S3 Bucket
Object
Amazon
CloudWatch
AWS CloudTrail
Container Instance
16. Fleet of workers with ECS with SQS
Amazon ECS
Availability Zone Availability Zone
SQS queue
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3
DynamoDB
Amazon
Kinesis
ecs:RunTask
Amazon
CloudWatch
AWS CloudTrail
17. Long-running Batch Jobs
• Utilize Spot
Instances
• EC2 Spot Blocks for
Defined-Duration
Workloads
• ECS event stream
for CloudWatch
Events
• Service Scaling and
Monitoring
Amazon ECS
Availability Zone Availability Zone
Container Instance Container Instance
AutoScaling Group
Task A Task B
Task C
Amazon
CloudWatch
AWS CloudTrail
18. Get the Best Value for EC2 Capacity – Spot
Instances
• Since Spot instances typically cost 50-90% less than
On-Demand, you can increase your compute capacity
by 2-10x within the same budget
• Or you could save 50-90% on your existing workload
• Either way, you should try it!
19. Best Practices
• Store state and inputs, outputs in S3 or another
datastore
• Minimize dependencies between task definitions (should
be independent of each other)
• Use Spot Instances and Spot fleets for long-running
batch jobs
• Monitor cluster state with ECS APIs
• Share pools of resources
• Auto Scaling, VPC, IAM, scheduled Reserved Instances
20.
21. Serving
Maps at
Scale on
AWS
Powers over 5,000 apps in categories ranging from social to mobility
Reaches more than 200 million users each month and growing
22. C4 R3 M4R3 R3
R3 R3 R3
M4 M4
M4 M4 M4
C4 C4
C4 C4 C4
Map Service Search Service Directions Service
30. Time and Event-Based Task Scheduling
• Schedule on fixed time intervals (e.g.: number of minutes, hours, or days)
• or use cron expressions.
• Set Amazon ECS as a CloudWatch Events target
31. Time and Event-Based Task Scheduling
• Schedule on fixed time intervals (e.g.: number of minutes, hours, or days)
• or use cron expressions.
• Set Amazon ECS as a CloudWatch Events target
32. Summary
• Cloud and containers are a great way to run batch
workloads
• Two options on AWS: Batch and ECS
• Why AWS Batch:
• Managed Batch Processing environment
• Why ECS:
• DIY Batch Processing
• Very flexible Time and Event based Task Scheduling