SlideShare a Scribd company logo
1 of 121
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:Invent
AWS Batch: Easy & Efficient Batch
Computing on Amazon Web Services
C M P 3 2 3
AdRoll: Mikko Juola & Oleg Avdeev
Base2 Genomics: Ryan Layer, Aaron Quinlan, & Brent Pedersen
Autodesk: Dinu Bunduchi
AWS: Jamie Kinney
November 28, 2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introductions
Mikko Juola: Sr. Software Engineer, AdRoll
Oleg Avdeev: Staff Engineer, AdRoll
Ryan Layer: Co-Founder, Base2 Genomics
Aaron Quinlan: Co-Founder, Base2 Genomics
Brent Pedersen: Co-Founder, Base2 Genomics
Dinu Bunduchi: Cloud Infrastructure Architect, Autodesk
Jamie Kinney: Principal Product Manager, AWS Batch and HPC
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
- Summary of Recent AWS Batch Launches
- Glimpse Into Our Roadmap
- Real-World Examples:
• AdRoll
• Base2 Genomics
• Autodesk
-Q&A
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch
Fully Managed
No software to install or
servers to manage. AWS
Batch provisions,
manages, and scales your
infrastructure
Integrated with AWS
Natively integrated with the
AWS Platform, AWS Batch
jobs can easily and securely
interact with services such as
Amazon S3, Amazon
DynamoDB, and Amazon
Rekognition
Cost-optimized
Resource Provisioning
AWS Batch automatically
provisions compute
resources tailored to the
needs of your jobs using
Amazon EC2 and EC2 Spot
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
2017 Launches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch Regional Expansion
AWS Batch is available in the following regions:
• us-east-1 (N. Virginia)
• us-east-2 (Ohio)
• us-west-2 (Oregon)
• eu-west-1 (Ireland)
• eu-west-2 (London)
• eu-central-1 (Frankfurt)
• ap-northeast-1 (Tokyo)
• ap-southeast-2 (Sydney)
• ap-southeast-1 (Singapore)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Improved Managed Compute Environments
Custom AMIs:
• Auto-mount EFS and other shared filesystems
• Larger/faster/encrypted EBS
• GPU/FPGA drivers
New instance families
• G3
• F1
• P3
• C5
Faster scale-up/scale-down and per-second billing
Automated tagging of EC2 Spot instances
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Manageability & Performance Improvements
• Automated Job Retries: Easily recover from application
errors, hardware failures, or EC2 Spot terminations
• Scheduling throughput: run jobs as short as 5 seconds with
~90% scheduling efficiency
• Native AWS CloudFormation Support for AWS Batch
Resources
• Amazon CloudWatch Events for Job State Transitions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HIPAA Eligibility
https://aws.amazon.com/compliance/hipaa-eligible-services-reference/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Workflows, Pipelines, and Job Dependencies
Jobs can express a dependency on the successful completion of
other jobs or specific elements of an array job.
You can also use AWS Step Functions or other workflow systems to
submit jobs. Flow-based systems submit jobs serially, while DAG-
based systems submit many jobs at once, identifying inter-job
dependencies.
$ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f ...
Job A Job CJob B
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Workflows and Job Dependencies
https://github.com/awslabs/aws-batch-genomics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New! Easily Run Massively Parallel Jobs
Array jobs allow you to run up to 10,000 copies of an application. AWS Batch
creates child jobs for each element in the array.
Array jobs are an efficient way to run:
• Parametric sweeps
• Monte Carlo simulations
• Processing a large collection of objects
Also includes:
• Extended Dependency Model
• Enhancements to Job APIs
Job A Job C
Job B:0
Job B:1
Job B:n
…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Array Job Submission
$ aws batch submit-job --job-name BigBatch --job-queue ProdQueue --job-
definition monte-carlo:8 --array-properties “size=10000” ...
{
"jobName": ”BigBatch", "jobId": "350f4655-5d61-40f0-aa0b-03ad787db329”
}
Job Name: BigBatch
Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329
Job Name: BigBatch
Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329:0
Job Name: BigBatch
Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329:1
Job Name: BigBatch
Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329:9999
…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Array Job Dependency Models
Job Depends on Array Job
“Job-A”
A:0
…
A:1
A:99
“Job-B”
$ aws batch submit-job –cli-input-json file://./Job-A.json
<Job-A.json>
{
"jobName": ”Job-A",
"jobQueue": "ProdQueue",
"jobDefinition": ”Job-A-Definition:1",
”arrayProperties":
{
“copies”: 100
}
}
$ aws batch submit-job –cli-input-json file://./Job-B.json
<Job-B.json>
{
"jobName": ”Job-B",
"jobQueue": "ProdQueue",
"jobDefinition": ”Job-B-Definition:1",
"dependsOn": [
{"jobId": "<job ID for Job A>" }
]
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Array Job Dependency Models
Array Job depends on Job
“Job-B”
B:0
…
B:1
B:99
“Job-A”
$ aws batch submit-job --job-name Job-A --job-queue ProdQueue --
job-definition job-A-Definition:1
{
"jobName": "sequential-stress-10",
"jobId": "7a6225f0-a16e-4241-9103-192c0c68124c”
}
<Job-B.json>
{
"jobName": ”Job-A",
"jobQueue": "ProdQueue",
"jobDefinition": ”Job-A-Definition:1",
”arrayProperties":
{
“copies”: 100
},
"dependsOn": [
{"jobId": "7a6225f0-a16e-4241-9103-192c0c68124c" }
]
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Array Job Dependency Models
Two Equally-Sized Array Jobs, a.k.a. “N-to-N”
“Job-A”
A:0
…
A:1
A:2
A:3
A:9999
B:0
B:1
B:2
B:3
B:n
“Job-B”
$ aws batch submit-job --job-name Job-A --job-queue
ProdQueue --job-definition job-A-Definition:1 --array-
properties size=10000
{
"jobName": ”Job-A",
"jobId": "7a6225f0-a16e-4241-9103-192c0c68124c”
}
$ aws batch submit-job --job-name Job-B --job-queue
ProdQueue --job-definition job-B-Definition:1 --array-
properties size=10000 --depends-on jobId=7a6225f0-a16e-
4241-9103-192c0c68124c,type=N_TO_N
{
"jobName": ”Job-B”,
"jobId": "7f2b6bfb-75e8-4655-89a5-1e5b233f5c08”
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Array Job Dependency Models
Array Job Depends on Self, a.k.a. Sequential Job
“Job-A”
A:0
…
A:1
A:9
A:2
$ aws batch submit-job --job-name Job-A --job-queue ProdQueue
--job-definition job-A-Definition:1 --array-properties
size=10 --depends-on type=SEQUENTIAL
{
"jobName": ”Job-A",
"jobId": "7a6225f0-a16e-4241-9103-192c0c68124c”
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Another Example
C is dependent on A and B
D has an N_TO_N dependency on C
“Job-A”
“Job-C”
C:0
…
C:1
C:2
C:3
C:9999
D:0
D:1
D:2
D:3
D:9999
“Job-D”
“Job-B”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
These Models Can be Combined
“Job-A”
“Job-C”
C:0
…
C:1
C:2
C:3
C:9999
D:0
D:1
D:2
D:3
D:9999
“Job-D”“Job-B”
B:0
…
B:1
B:9
B:2
“Job-E”
Heavy
Network I/O
CPU
Intensive
Large
Memory
Setup Cleanup
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roadmap
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Can You Expect in 2018?
• Significant improvements to the AWS Batch console
• Automatically submit AWS Batch jobs in response to
CloudWatch Events
• CloudTrail auditing of AWS Batch API calls
• Consumable resources
• Additional job types
• Further regional expansion
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mikko Juola: Senior Software Engineer, AdRoll
Oleg Avdeev: Staff Engineer, AdRoll
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adoption of AWS Batch at AdRoll
AdRoll started using AWS Batch in June 2017. These numbers were collected between June 2017 and November
2017.
1.2 MILLION
JOBS
SUBMITTED
300K
INSTANCES
CHURNED
600
CPU YEARS
SPENT
2 TEAMS USE
IN
PRODUCTION
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why does AdRoll like AWS Batch?
Docker support is first class. Very flexible!
Workflow is conceptually simple:
• Put code in Docker image
• Submit Docker image + command line arguments
Lots of control over the environment you are running your job in (e.g. huge instances, custom
monitoring software).
Cost-effective (especially with per-second billing and spot instances). AdRoll uses spot instances
exclusively in AWS Batch applications.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On the next few slides, I’ll go through some mechanisms we’ve built on top of AWS Batch.
How is AWS Batch being used by AdRoll?
These can give you some ideas how to structure your batch job workflow on top of this
service.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring Batch Jobs
When you are running thousands of jobs per
day, being able to monitor them by the bulk
becomes important.
At AdRoll, we built a monitoring tool tailored
to our way of using AWS Batch.
The monitoring on AWS Management Console for
AWS Batch is not bad, but it can be difficult to sift
through tens of thousands of jobs to find
something specific and doing it quickly.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring Batch Jobs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring Batch Jobs
SEARCH
STATUS
JOB QUEUE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring Batch Jobs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Submitting Jobs
from pybatch import run_on_awsbatch
run_on_awsbatch(image=‘ubuntu:16.04',
name=‘Hello Job’,
jobqueue='attribution-managed-spot-staging',
command_line=['echo', 'hello'],
timeout=3600)
We wrote a Python library for the purpose of making submitting jobs as
simple as possible.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch has no built-in support for timeouts. There will be jobs that rarely get stuck.
Timeouts
When we submit a job, we set an environment variable BATCH_TIMEOUT with the timestamp
when the job would time out.
A periodically running script inspects all active jobs and kills those whose timeout has expired.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Other quirks
We use a custom AMI that sets up instance store for batch jobs. To give an
example why this is useful: some of our jobs are specifically designed for i3
instances that make use of their highly efficient NVMe SSD storage.
Production and staging pipelines have entirely separate AWS Batch
infrastructure.
Our job queues are organized by the instance type. If I submit a job to a
certain job queue, I can be certain it’ll get a specific kind of instance type.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
General tips
Start simple. You don’t have to build anything complex on top of AWS Batch
to get started.
If you decide to scale up your AWS Batch use, think about how you are going
to monitor and debug your jobs. The system will not have a problem running
tens of thousands of containers. But you might have a problem finding
specific jobs and their logs after they have run.
If you can use spot instances, use spot instances. Can save you lots of $$$!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ryan Layer: Co-Founder, Base2 Genomics
base2genomics.com
@Base2G
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3 Batch
Large files
Heterogeneous pipeline
Task and data parallelism
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ewing Sarcoma
Raw sequence
1,049 samples
50 GB each
52.5 TB
0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
1,049 samples
50 GB each
52.5 TB
0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
0 GB / 0 CPU h
Sequence Alignment: Sample-Parallel
Raw sequence
50GB
alignment
0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
50GB
alignment
batchit
0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
batchitgithub.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
Set the Docker image
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
Set IAM role
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
Set Batch queue
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
Fill variables in script
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit
Create, format, attach in Docker
Auto cleanup
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "16" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
POST /v1/submitjob
HTTP/1.1 Content-type: application/json
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
POST /v1/submitjob
HTTP/1.1 Content-type: application/json
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "16" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "16" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
POST /v1/submitjob
HTTP/1.1 Content-type: application/json
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
POST /v1/submitjob
HTTP/1.1 Content-type: application/json
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
POST /v1/submitjob
HTTP/1.1 Content-type: application/json
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=2"
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
/bin/bash -c
for i in "$@"; do eval "$i"; done
export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)
trap "set +e;
umount /mnt/ebs || umount -l /mnt/ebs;
batchit ddv $vid;" EXIT
export BATCH_SCRIPT=$(mktemp)",
echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
chmod +x $BATCH_SCRIPT
$BATCH_SCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=2"
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
/bin/bash -c
for i in "$@"; do eval "$i"; done
export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)
trap "set +e;
umount /mnt/ebs || umount -l /mnt/ebs;
batchit ddv $vid;" EXIT
export BATCH_SCRIPT=$(mktemp)",
echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
chmod +x $BATCH_SCRIPT
$BATCH_SCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=2"
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
/bin/bash -c
for i in "$@"; do eval "$i"; done
export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)
trap "set +e;
umount /mnt/ebs || umount -l /mnt/ebs;
batchit ddv $vid;" EXIT
export BATCH_SCRIPT=$(mktemp)",
echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
chmod +x $BATCH_SCRIPT
$BATCH_SCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=2"
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
/bin/bash -c
for i in "$@"; do eval "$i"; done
export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)
trap "set +e;
umount /mnt/ebs || umount -l /mnt/ebs;
batchit ddv $vid;" EXIT
export BATCH_SCRIPT=$(mktemp)",
echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
chmod +x $BATCH_SCRIPT
$BATCH_SCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
github.com/base2genomics/batchit
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=2"
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
Andrey Kislyuk https://github.com/kislyuk/aegea
/bin/bash -c
for i in "$@"; do eval "$i"; done
export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)
trap "set +e;
umount /mnt/ebs || umount -l /mnt/ebs;
batchit ddv $vid;" EXIT
export BATCH_SCRIPT=$(mktemp)",
echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
chmod +x $BATCH_SCRIPT
$BATCH_SCRIPT
{
"ContainerProperties": {
"MountPoints": [ { "SourceVolume": "vol00",
"ContainerPath": "/dev” } ],
"Volumes": [ { "Host": { "SourcePath": "/dev },
"Name": "vol00" } ],
"Privileged": true,
"Ulimits": [ { "SoftLimit": 40000,
"HardLimit": 40000 } ]
}
}
github.com/base2genomics/batchit
batchit
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=2"
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
{
"Parameters": null,
"DependsOn": null,
"JobQueue": "i3-16xl",
"JobName": "i3-test",
"RetryStrategy": null,
"ContainerOverrides": {
"Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" },
{ "Name": "cpus", "Value": "2" },
{ "Name": "sample", "Value": "SS-12345" },
{ "Name": "reference", "Value": "GRCh37" },
{ "Name": "TMPDIR", "Value": "/mnt/ebs" } ],
"Vcpus": 16,
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
"Memory": 300000
},
"JobDefinition": "i3-test"
}
"Command": [
"/bin/bash", "-c",
"for i in "$@"; do eval "$i"; done",
"export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)",
"trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT",
"export BATCH_SCRIPT=$(mktemp)",
"echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
"chmod +x $BATCH_SCRIPT",
"$BATCH_SCRIPT" ],
/bin/bash -c
for i in "$@"; do eval "$i"; done
export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)
trap "set +e;
umount /mnt/ebs || umount -l /mnt/ebs;
batchit ddv $vid;" EXIT
export BATCH_SCRIPT=$(mktemp)",
echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT",
chmod +x $BATCH_SCRIPT
$BATCH_SCRIPT
{
"ContainerProperties": {
"MountPoints": [ { "SourceVolume": "vol00",
"ContainerPath": "/dev" } ],
"Volumes": [ { "Host": { "SourcePath": "/dev" },
"Name": "vol00" } ],
"Privileged": true,
"Ulimits": [ { "SoftLimit": 40000,
"HardLimit": 40000 } ]
}
}
Andrey Kislyuk https://github.com/kislyuk/aegea
batchit submit 
--image $docker_image 
--role $container_role 
--queue $queue 
--envvars "cpus=16" 
"sample=SS-12345" 
"reference=GRCh37" 
--ebs "/mnt/ebs:100:gp2:ext4" 
--cpus 16 
--mem 30000 
--jobname align-SS-12345 
alignment.sh
batchit
Returns job ID
# alignment.sh
bwa mem –t ${cpus}
${reference}.fa 
${sample}.fq.gz 
> ${sample}.sam
github.com/base2genomics/batchit
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
50GB
alignment
- 32 vCPUs
- 160 CPU hours
m4.16xlarger4.8xlarge
c4.8xlarger4.16xlarge
0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
i3.16xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
r4.16xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
50GB
alignment50GB
100 GB / 6.67 CPU d
Aligned Genome
- 32 vCPUs
- 160 CPU hours
m4.16xlarger4.8xlarge
c4.8xlarger4.16xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sequence Alignment: Sample-Parallel
Raw sequence
50GB
alignment50GB
1,049 ✕ 50GB+50GB
1,049 ✕ 160 CPU hours
= 104.9 TB
= 19.2 CPU years
104.9 TB / 19.2 CPU y
Aligned Genome
- 32 vCPUs
- 160 CPU hours
m4.16xlarger4.8xlarge
c4.8xlarger4.16xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Variant Calling: Sample-Parallel
Aligned Genome
104.9 TB / 19.2 CPU y
Raw sequence
50GB
alignment50GB
1,049 ✕ 50GB+50GB
1,049 ✕ 160 CPU hours
= 104.9 TB
= 19.2 CPU years
- 32 vCPUs
- 160 CPU hours
m4.16xlarg
e
r4.8xlarge
c4.8xlarger4.16xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Aligned Genome
Variant Calling: Sample-Parallel
104.9 TB / 19.2 CPU y
small_variant
large_variant
- 32 vCPUs
- 128 CPU hours
m4.16xlarger4.8xlarge
c4.8xlarger4.16xlarge
- 2 vCPUs
- 4-8 CPU hours
m4.large
r4.2xlarge
r4.xlarge
c4.large
c4.xlarge
c4.2xlarge
r4.large
m4.xlarge
m4.2xlarge
50GB
50GB
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Aligned Genome
Variant Calling: Sample-Parallel
50GB
small_variant
large_variant50GB 16GB
8GB
50GB
74GB
287.4 TB / 35.2 CPU y
Sample
Variants
- 32 vCPUs
- 128 CPU hours
m4.16xlarger4.8xlarge
c4.8xlarger4.16xlarge
- 2 vCPUs
- 4-8 CPU hours
m4.large
r4.2xlarge
r4.xlarge
c4.large
c4.xlarge
c4.2xlarge
r4.large
m4.xlarge
m4.2xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
287.4 TB / 35.2 CPU y
Sample
Variants
Joint Calling: Region-Parallel
Aligned Genome
50GB
small_variant
large_variant50GB 16GB
8GB
50GB
74GB- 32 vCPUs
- 128 CPU hours
m4.16xlarger4.8xlarge
c4.8xlarger4.16xlarge
- 2 vCPUs
- 4-8 CPU hours
m4.large
r4.2xlarge
r4.xlarge
c4.large
c4.xlarge
c4.2xlarge
r4.large
m4.xlarge
m4.2xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1,049
genomes
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
joint_calling
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
100MB
100MB
100MB
100MB
100MB
100MB
joint_calling
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
105GB
joint_calling
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y
joint_calling105GB 50GB
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y
m4.16xlarge
- 64 vCPUs
- 100 GB EBS
- 4 CPU hours
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
joint_calling105GB 50GB
1 of 80
Sample
Variants
Joint Calling: Region-Parallel
287.4 TB / 35.2 CPU y
m4.16xlarge
- 64 vCPUs
- 100 GB EBS
- 4 CPU hours
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
joint_calling105GB 50GB
80 ✕ 105GB+50GB
80 ✕ 4 CPU hours
= 12.4TB
= 13.3 CPU days1 of 80
Sample
Variants
Joint Calling: Region-Parallel
299.8 TB / 35.2 CPU y
m4.16xlarge
- 64 vCPUs
- 100 GB EBS
- 4 CPU hours
Wall Time : 3.1 days
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Academic Software and Exit Codes
$ bioinformatics-software 
$input 
> $output
# catastrophic silent ERROR resulting in truncated output
$ echo $?
0
Validate output at each step, create a sentinel file on S3
$ bioinformatics-software 
$input 
> $output
$ verify-output $output && touch $output.sentinel
$ aws s3 cp $output.sentinel s3://base2-sentinel/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
base2mon
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
base2mon
sample1.fq
sample2.fq
…
sampleN.fq
alignment
small_variant
large_variant
joint_calling
Input file Operations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
base2mon
sample1.fq
sample2.fq
…
sampleN.fq
alignment
small_variant
large_variant
joint_calling
Input file Operations
sample1.fq
sample1.s.vcf
sample1.s.vcf.sentinel
sample1.l.vcf
sample1.l.vcf.sentinel
small_variant large_variant
sampleN.fq
sampleN.bam
sampleN.bam.sentinel
sampleN.s.vcf
sampleN.s.vcf.sentinel
sampleN.l.vcf
sampleN.l.vcf.sentinel
alignment
samples1_N.vcf
joint_calling
...
sample2.fq
sample2.bam
sample2.bam.sentinel
alignment
...
sample1.bam
sample1.bam.sentinel
alignment
small_variant large_variant
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
base2mon
sample1.fq
sample2.fq
…
sampleN.fq
alignment
small_variant
large_variant
joint_calling
Input file Operations
Batch
t2.micro
S3
Data-dependent job planning, recovery, restart
Modularize and isolate 3rd party software
Reactive resubmit (e.g., more memory)
Per-instance time/cost monitoring
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
base2genomics.com
@Base2G
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dinu Bunduchi: Cloud Infrastructure Architect, Autodesk
© 2013 Autodesk© 2017 Autodesk
Autodesk Generative Design
Software mimics Nature’s approach to design
© 2017 Autodesk
Engineering Outcome
The ultimate goal for any engineering activity is to strike the right
balance between performance and cost to produce for a given design
challenge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Price Performance Curve
Cost to Produce
Performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Design Factors
Materials CostsProcessRequirements
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
© 2017 Autodesk
Generative Design
Designers input design goals into
generative design software, along with
parameters such as materials,
manufacturing methods, and cost
constraints. Then, using cloud
computing, the software explores all the
possible permutations of a solution,
quickly generating design alternatives.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Autodesk Generative Design Workflow
Define Generate Explore Refine Validate Make
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Main AWS Services Used
ECS SWF Batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
 It’s a state machine
 Amazon SWF helps developers build, run, and scale
background jobs that have parallel or sequential steps.
You can think of Amazon SWF as a fully-managed
state tracker and task coordinator in the Cloud
Simple Workflow Service
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Architecture
API Server (ECS)
SWF Workflow
Job Manager (ECS)
Generate Variants
Solve Variants
start Workflow poll for Decision
submit Batch Job
poll for Activity
submit Batch Job
poll for Activity
task Completed
task Completed
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
 Two managed Batch compute environments
 CPU cluster for generating variants (Instance Type: optimal)
 GPU cluster for variant solvers (Instance Type: p2) – custom AMI
 Two job queues, one per compute environment
AWS Batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
{
"containerProperties": {
"command": [ "solver" ],
"image": "_ECR_IMAGE_TAG_",
"vcpus": 4,
"memory": 48000,
"environment": [ { "name": "SOLVER_TIMEOUT", "value": "120" } … ],
"mountPoints": [ {
"containerPath": "/usr/local/nvidia",
"readOnly": false,
"sourceVolume": "nvidia"
} ],
"privileged": true,
"ulimits": [ { "hardLimit": 65535, "name": "nofile", "softLimit": 65535 } ],
"volumes": [ {
"host": { "sourcePath": "/var/lib/nvidia-docker/volumes/nvidia_driver/latest" },
"name": "nvidia"
} ]
},
"jobDefinitionName": "prod-solver-job",
"type": "container"
}
AWS Batch - Solver Job Definition
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
import (
"github.com/aws/aws-sdk-go/aws”
"github.com/aws/aws-sdk-go/aws/awserr”
"github.com/aws/aws-sdk-go/service/batch"
)
input := &batch.SubmitJobInput {
JobDefinition: aws.String( jobDef ),
JobName: aws.String( "variantGen” ),
JobQueue: aws.String( cfg.VariantGenBatchJobQueue ),
ContainerOverrides: &batch.ContainerOverrides {
Command: cmdParams,
Environment: environment,
},
}
b.BatchConn.SubmitJob( input )
AWS Batch - SubmitJob()
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Autodesk Generative Design Demo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Autodesk Generative Design Customers
 2,200 Design
Studies Created
 28,000 Design
Options
Computed
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017 Autodesk
Autodesk Generative Design Customers
Autodesk’s new Toronto office is the first example of a generatively
designed office space.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Autodesk and the Autodesk logo are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders.
Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.
© 2017 Autodesk. All rights reserved.
Fully Managed Integrated with AWS Cost-optimized
Resource Provisioning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Important Links
Product Details:
https://aws.amazon.com/batch/details/
Getting Started:
https://aws.amazon.com/batch/getting-started/
Sample code for AWS Batch + Step Functions Integration:
https://github.com/awslabs/aws-batch-genomics
Compute Blog Post Describing How to Use Batch + FPGAs:
https://aws.amazon.com/blogs/compute/accelerating-precision-medicine-at-scale/
Deep Learning on AWS Batch:
https://aws.amazon.com/pt/blogs/compute/deep-learning-on-aws-batch/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...Amazon Web Services
 
Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...
Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...
Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...Amazon Web Services
 
ARC214_Addressing Your Business Needs with AWS
ARC214_Addressing Your Business Needs with AWSARC214_Addressing Your Business Needs with AWS
ARC214_Addressing Your Business Needs with AWSAmazon Web Services
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...Amazon Web Services
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Amazon Web Services
 
規劃大規模遷移到 AWS 的最佳實踐
規劃大規模遷移到 AWS 的最佳實踐規劃大規模遷移到 AWS 的最佳實踐
規劃大規模遷移到 AWS 的最佳實踐Amazon Web Services
 
SRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in Minutes
SRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in MinutesSRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in Minutes
SRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in MinutesAmazon Web Services
 
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...Amazon Web Services
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Amazon Web Services
 
SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...
SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...
SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...Amazon Web Services
 
CON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesCON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesAmazon Web Services
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingAmazon Web Services
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersAmazon Web Services
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...Amazon Web Services
 
CON209_Interstella 8888 Learn How to Use Docker on AWS
CON209_Interstella 8888 Learn How to Use Docker on AWSCON209_Interstella 8888 Learn How to Use Docker on AWS
CON209_Interstella 8888 Learn How to Use Docker on AWSAmazon Web Services
 
MBL201_Progressive Web Apps in the Real World
MBL201_Progressive Web Apps in the Real WorldMBL201_Progressive Web Apps in the Real World
MBL201_Progressive Web Apps in the Real WorldAmazon Web Services
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSAmazon Web Services
 
Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...Amazon Web Services
 
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) Amazon Web Services
 

What's hot (20)

Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
 
Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...
Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...
Building a Photorealistic Real-Time 3D Configurator with Server-Side Renderin...
 
ARC214_Addressing Your Business Needs with AWS
ARC214_Addressing Your Business Needs with AWSARC214_Addressing Your Business Needs with AWS
ARC214_Addressing Your Business Needs with AWS
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
 
規劃大規模遷移到 AWS 的最佳實踐
規劃大規模遷移到 AWS 的最佳實踐規劃大規模遷移到 AWS 的最佳實踐
規劃大規模遷移到 AWS 的最佳實踐
 
SRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in Minutes
SRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in MinutesSRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in Minutes
SRV314_Building a Serverless Pipeline to Transcode a Two-Hour Video in Minutes
 
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
GPSBUS220-Refactor and Replatform .NET Apps to Use the Latest Microsoft SQL S...
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
 
SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...
SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...
SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline f...
 
CON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesCON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized Services
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
 
HLC308_Refactoring to the Cloud
HLC308_Refactoring to the CloudHLC308_Refactoring to the Cloud
HLC308_Refactoring to the Cloud
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with Containers
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
CON209_Interstella 8888 Learn How to Use Docker on AWS
CON209_Interstella 8888 Learn How to Use Docker on AWSCON209_Interstella 8888 Learn How to Use Docker on AWS
CON209_Interstella 8888 Learn How to Use Docker on AWS
 
MBL201_Progressive Web Apps in the Real World
MBL201_Progressive Web Apps in the Real WorldMBL201_Progressive Web Apps in the Real World
MBL201_Progressive Web Apps in the Real World
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWS
 
Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...Building Best Practices and the Right Foundation for your 1st Production Work...
Building Best Practices and the Right Foundation for your 1st Production Work...
 
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
 

Similar to CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services

CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot FleetCMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot FleetAmazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017Amazon Web Services
 
LFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfLFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfAmazon Web Services
 
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Amazon Web Services
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Amazon Web Services
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Brendan Bouffler
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSAmazon Web Services
 
CMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSCMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSAmazon Web Services
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Web Services
 
The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...
The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...
The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...Amazon Web Services
 
AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...
AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...
AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...Chris Munns
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
CON319_Interstella GTC CICD for Containers on AWS
CON319_Interstella GTC CICD for Containers on AWSCON319_Interstella GTC CICD for Containers on AWS
CON319_Interstella GTC CICD for Containers on AWSAmazon Web Services
 
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017Amazon Web Services
 

Similar to CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services (20)

CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot FleetCMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
 
SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
 
LFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfLFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdf
 
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
 
ARC205_Born in the Cloud
ARC205_Born in the CloudARC205_Born in the Cloud
ARC205_Born in the Cloud
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWS
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
CMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSCMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWS
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
 
The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...
The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...
The Best Practices and Hard Lessons Learned of Serverless Applications - AWS ...
 
AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...
AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...
AWS Startup Day - Boston 2018 - The Best Practices and Hard Lessons Learned o...
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
CON319_Interstella GTC CICD for Containers on AWS
CON319_Interstella GTC CICD for Containers on AWSCON319_Interstella GTC CICD for Containers on AWS
CON319_Interstella GTC CICD for Containers on AWS
 
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:Invent AWS Batch: Easy & Efficient Batch Computing on Amazon Web Services C M P 3 2 3 AdRoll: Mikko Juola & Oleg Avdeev Base2 Genomics: Ryan Layer, Aaron Quinlan, & Brent Pedersen Autodesk: Dinu Bunduchi AWS: Jamie Kinney November 28, 2017
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introductions Mikko Juola: Sr. Software Engineer, AdRoll Oleg Avdeev: Staff Engineer, AdRoll Ryan Layer: Co-Founder, Base2 Genomics Aaron Quinlan: Co-Founder, Base2 Genomics Brent Pedersen: Co-Founder, Base2 Genomics Dinu Bunduchi: Cloud Infrastructure Architect, Autodesk Jamie Kinney: Principal Product Manager, AWS Batch and HPC
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda - Summary of Recent AWS Batch Launches - Glimpse Into Our Roadmap - Real-World Examples: • AdRoll • Base2 Genomics • Autodesk -Q&A
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch Fully Managed No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure Integrated with AWS Natively integrated with the AWS Platform, AWS Batch jobs can easily and securely interact with services such as Amazon S3, Amazon DynamoDB, and Amazon Rekognition Cost-optimized Resource Provisioning AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2017 Launches
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch Regional Expansion AWS Batch is available in the following regions: • us-east-1 (N. Virginia) • us-east-2 (Ohio) • us-west-2 (Oregon) • eu-west-1 (Ireland) • eu-west-2 (London) • eu-central-1 (Frankfurt) • ap-northeast-1 (Tokyo) • ap-southeast-2 (Sydney) • ap-southeast-1 (Singapore)
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Improved Managed Compute Environments Custom AMIs: • Auto-mount EFS and other shared filesystems • Larger/faster/encrypted EBS • GPU/FPGA drivers New instance families • G3 • F1 • P3 • C5 Faster scale-up/scale-down and per-second billing Automated tagging of EC2 Spot instances
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Manageability & Performance Improvements • Automated Job Retries: Easily recover from application errors, hardware failures, or EC2 Spot terminations • Scheduling throughput: run jobs as short as 5 seconds with ~90% scheduling efficiency • Native AWS CloudFormation Support for AWS Batch Resources • Amazon CloudWatch Events for Job State Transitions
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HIPAA Eligibility https://aws.amazon.com/compliance/hipaa-eligible-services-reference/
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Workflows, Pipelines, and Job Dependencies Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job. You can also use AWS Step Functions or other workflow systems to submit jobs. Flow-based systems submit jobs serially, while DAG- based systems submit many jobs at once, identifying inter-job dependencies. $ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f ... Job A Job CJob B
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Workflows and Job Dependencies https://github.com/awslabs/aws-batch-genomics
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. New! Easily Run Massively Parallel Jobs Array jobs allow you to run up to 10,000 copies of an application. AWS Batch creates child jobs for each element in the array. Array jobs are an efficient way to run: • Parametric sweeps • Monte Carlo simulations • Processing a large collection of objects Also includes: • Extended Dependency Model • Enhancements to Job APIs Job A Job C Job B:0 Job B:1 Job B:n …
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 15. Example Array Job Submission $ aws batch submit-job --job-name BigBatch --job-queue ProdQueue --job- definition monte-carlo:8 --array-properties “size=10000” ... { "jobName": ”BigBatch", "jobId": "350f4655-5d61-40f0-aa0b-03ad787db329” } Job Name: BigBatch Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329 Job Name: BigBatch Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329:0 Job Name: BigBatch Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329:1 Job Name: BigBatch Job ID: 350f4655-5d61-40f0-aa0b-03ad787db329:9999 … © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Array Job Dependency Models Job Depends on Array Job “Job-A” A:0 … A:1 A:99 “Job-B” $ aws batch submit-job –cli-input-json file://./Job-A.json <Job-A.json> { "jobName": ”Job-A", "jobQueue": "ProdQueue", "jobDefinition": ”Job-A-Definition:1", ”arrayProperties": { “copies”: 100 } } $ aws batch submit-job –cli-input-json file://./Job-B.json <Job-B.json> { "jobName": ”Job-B", "jobQueue": "ProdQueue", "jobDefinition": ”Job-B-Definition:1", "dependsOn": [ {"jobId": "<job ID for Job A>" } ] }
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Array Job Dependency Models Array Job depends on Job “Job-B” B:0 … B:1 B:99 “Job-A” $ aws batch submit-job --job-name Job-A --job-queue ProdQueue -- job-definition job-A-Definition:1 { "jobName": "sequential-stress-10", "jobId": "7a6225f0-a16e-4241-9103-192c0c68124c” } <Job-B.json> { "jobName": ”Job-A", "jobQueue": "ProdQueue", "jobDefinition": ”Job-A-Definition:1", ”arrayProperties": { “copies”: 100 }, "dependsOn": [ {"jobId": "7a6225f0-a16e-4241-9103-192c0c68124c" } ] }
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Array Job Dependency Models Two Equally-Sized Array Jobs, a.k.a. “N-to-N” “Job-A” A:0 … A:1 A:2 A:3 A:9999 B:0 B:1 B:2 B:3 B:n “Job-B” $ aws batch submit-job --job-name Job-A --job-queue ProdQueue --job-definition job-A-Definition:1 --array- properties size=10000 { "jobName": ”Job-A", "jobId": "7a6225f0-a16e-4241-9103-192c0c68124c” } $ aws batch submit-job --job-name Job-B --job-queue ProdQueue --job-definition job-B-Definition:1 --array- properties size=10000 --depends-on jobId=7a6225f0-a16e- 4241-9103-192c0c68124c,type=N_TO_N { "jobName": ”Job-B”, "jobId": "7f2b6bfb-75e8-4655-89a5-1e5b233f5c08” }
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Array Job Dependency Models Array Job Depends on Self, a.k.a. Sequential Job “Job-A” A:0 … A:1 A:9 A:2 $ aws batch submit-job --job-name Job-A --job-queue ProdQueue --job-definition job-A-Definition:1 --array-properties size=10 --depends-on type=SEQUENTIAL { "jobName": ”Job-A", "jobId": "7a6225f0-a16e-4241-9103-192c0c68124c” }
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Another Example C is dependent on A and B D has an N_TO_N dependency on C “Job-A” “Job-C” C:0 … C:1 C:2 C:3 C:9999 D:0 D:1 D:2 D:3 D:9999 “Job-D” “Job-B”
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. These Models Can be Combined “Job-A” “Job-C” C:0 … C:1 C:2 C:3 C:9999 D:0 D:1 D:2 D:3 D:9999 “Job-D”“Job-B” B:0 … B:1 B:9 B:2 “Job-E” Heavy Network I/O CPU Intensive Large Memory Setup Cleanup
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roadmap
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What Can You Expect in 2018? • Significant improvements to the AWS Batch console • Automatically submit AWS Batch jobs in response to CloudWatch Events • CloudTrail auditing of AWS Batch API calls • Consumable resources • Additional job types • Further regional expansion
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mikko Juola: Senior Software Engineer, AdRoll Oleg Avdeev: Staff Engineer, AdRoll
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Adoption of AWS Batch at AdRoll AdRoll started using AWS Batch in June 2017. These numbers were collected between June 2017 and November 2017. 1.2 MILLION JOBS SUBMITTED 300K INSTANCES CHURNED 600 CPU YEARS SPENT 2 TEAMS USE IN PRODUCTION
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why does AdRoll like AWS Batch? Docker support is first class. Very flexible! Workflow is conceptually simple: • Put code in Docker image • Submit Docker image + command line arguments Lots of control over the environment you are running your job in (e.g. huge instances, custom monitoring software). Cost-effective (especially with per-second billing and spot instances). AdRoll uses spot instances exclusively in AWS Batch applications.
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On the next few slides, I’ll go through some mechanisms we’ve built on top of AWS Batch. How is AWS Batch being used by AdRoll? These can give you some ideas how to structure your batch job workflow on top of this service.
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Batch Jobs When you are running thousands of jobs per day, being able to monitor them by the bulk becomes important. At AdRoll, we built a monitoring tool tailored to our way of using AWS Batch. The monitoring on AWS Management Console for AWS Batch is not bad, but it can be difficult to sift through tens of thousands of jobs to find something specific and doing it quickly.
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Batch Jobs
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Batch Jobs SEARCH STATUS JOB QUEUE
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Batch Jobs
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Submitting Jobs from pybatch import run_on_awsbatch run_on_awsbatch(image=‘ubuntu:16.04', name=‘Hello Job’, jobqueue='attribution-managed-spot-staging', command_line=['echo', 'hello'], timeout=3600) We wrote a Python library for the purpose of making submitting jobs as simple as possible.
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch has no built-in support for timeouts. There will be jobs that rarely get stuck. Timeouts When we submit a job, we set an environment variable BATCH_TIMEOUT with the timestamp when the job would time out. A periodically running script inspects all active jobs and kills those whose timeout has expired.
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Other quirks We use a custom AMI that sets up instance store for batch jobs. To give an example why this is useful: some of our jobs are specifically designed for i3 instances that make use of their highly efficient NVMe SSD storage. Production and staging pipelines have entirely separate AWS Batch infrastructure. Our job queues are organized by the instance type. If I submit a job to a certain job queue, I can be certain it’ll get a specific kind of instance type.
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. General tips Start simple. You don’t have to build anything complex on top of AWS Batch to get started. If you decide to scale up your AWS Batch use, think about how you are going to monitor and debug your jobs. The system will not have a problem running tens of thousands of containers. But you might have a problem finding specific jobs and their logs after they have run. If you can use spot instances, use spot instances. Can save you lots of $$$!
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ryan Layer: Co-Founder, Base2 Genomics
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 46. S3 Batch Large files Heterogeneous pipeline Task and data parallelism © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 47. Ewing Sarcoma Raw sequence 1,049 samples 50 GB each 52.5 TB 0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 48. Sequence Alignment: Sample-Parallel Raw sequence 1,049 samples 50 GB each 52.5 TB 0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 49. Sequence Alignment: Sample-Parallel Raw sequence 0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sequence Alignment: Sample-Parallel Raw sequence 0 GB / 0 CPU h
  • 51. Sequence Alignment: Sample-Parallel Raw sequence 50GB alignment 0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 52. Sequence Alignment: Sample-Parallel Raw sequence 50GB alignment batchit 0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 53. batchitgithub.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 54. batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 55. batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 56. # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh Set the Docker image github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 57. # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh Set IAM role github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 58. # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh Set Batch queue github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 59. # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh Fill variables in script github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 60. batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit Create, format, attach in Docker Auto cleanup github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 61. batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "16" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } POST /v1/submitjob HTTP/1.1 Content-type: application/json github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 62. github.com/base2genomics/batchit { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh POST /v1/submitjob HTTP/1.1 Content-type: application/json "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "16" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "16" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 63. github.com/base2genomics/batchit { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh POST /v1/submitjob HTTP/1.1 Content-type: application/json "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 64. github.com/base2genomics/batchit { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh POST /v1/submitjob HTTP/1.1 Content-type: application/json "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 65. github.com/base2genomics/batchit { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh POST /v1/submitjob HTTP/1.1 Content-type: application/json "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 66. github.com/base2genomics/batchit batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=2" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], /bin/bash -c for i in "$@"; do eval "$i"; done export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4) trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT export BATCH_SCRIPT=$(mktemp)", echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", chmod +x $BATCH_SCRIPT $BATCH_SCRIPT © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 67. github.com/base2genomics/batchit batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=2" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], /bin/bash -c for i in "$@"; do eval "$i"; done export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4) trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT export BATCH_SCRIPT=$(mktemp)", echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", chmod +x $BATCH_SCRIPT $BATCH_SCRIPT © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 68. github.com/base2genomics/batchit batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=2" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], /bin/bash -c for i in "$@"; do eval "$i"; done export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4) trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT export BATCH_SCRIPT=$(mktemp)", echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", chmod +x $BATCH_SCRIPT $BATCH_SCRIPT © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 69. github.com/base2genomics/batchit batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=2" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], /bin/bash -c for i in "$@"; do eval "$i"; done export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4) trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT export BATCH_SCRIPT=$(mktemp)", echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", chmod +x $BATCH_SCRIPT $BATCH_SCRIPT © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 70. github.com/base2genomics/batchit batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=2" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], Andrey Kislyuk https://github.com/kislyuk/aegea /bin/bash -c for i in "$@"; do eval "$i"; done export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4) trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT export BATCH_SCRIPT=$(mktemp)", echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", chmod +x $BATCH_SCRIPT $BATCH_SCRIPT { "ContainerProperties": { "MountPoints": [ { "SourceVolume": "vol00", "ContainerPath": "/dev” } ], "Volumes": [ { "Host": { "SourcePath": "/dev }, "Name": "vol00" } ], "Privileged": true, "Ulimits": [ { "SoftLimit": 40000, "HardLimit": 40000 } ] } }
  • 71. github.com/base2genomics/batchit batchit # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=2" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh { "Parameters": null, "DependsOn": null, "JobQueue": "i3-16xl", "JobName": "i3-test", "RetryStrategy": null, "ContainerOverrides": { "Environment": [ { "Name": "B64GZ", "Value": "H4sIAAAAAAAA/wEAAP//AAAAAAAAAAA=" }, { "Name": "cpus", "Value": "2" }, { "Name": "sample", "Value": "SS-12345" }, { "Name": "reference", "Value": "GRCh37" }, { "Name": "TMPDIR", "Value": "/mnt/ebs" } ], "Vcpus": 16, "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid; if [[ $si "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], "Memory": 300000 }, "JobDefinition": "i3-test" } "Command": [ "/bin/bash", "-c", "for i in "$@"; do eval "$i"; done", "export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4)", "trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT", "export BATCH_SCRIPT=$(mktemp)", "echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", "chmod +x $BATCH_SCRIPT", "$BATCH_SCRIPT" ], /bin/bash -c for i in "$@"; do eval "$i"; done export vid=$(batchit ebsmount -n 1 -m /mnt/ebs -s 100 -v gp2 -t ext4) trap "set +e; umount /mnt/ebs || umount -l /mnt/ebs; batchit ddv $vid;" EXIT export BATCH_SCRIPT=$(mktemp)", echo "$B64GZ" | base64 -d | gzip -dc > $BATCH_SCRIPT", chmod +x $BATCH_SCRIPT $BATCH_SCRIPT { "ContainerProperties": { "MountPoints": [ { "SourceVolume": "vol00", "ContainerPath": "/dev" } ], "Volumes": [ { "Host": { "SourcePath": "/dev" }, "Name": "vol00" } ], "Privileged": true, "Ulimits": [ { "SoftLimit": 40000, "HardLimit": 40000 } ] } } Andrey Kislyuk https://github.com/kislyuk/aegea
  • 72. batchit submit --image $docker_image --role $container_role --queue $queue --envvars "cpus=16" "sample=SS-12345" "reference=GRCh37" --ebs "/mnt/ebs:100:gp2:ext4" --cpus 16 --mem 30000 --jobname align-SS-12345 alignment.sh batchit Returns job ID # alignment.sh bwa mem –t ${cpus} ${reference}.fa ${sample}.fq.gz > ${sample}.sam github.com/base2genomics/batchit © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 73. Sequence Alignment: Sample-Parallel Raw sequence 50GB alignment - 32 vCPUs - 160 CPU hours m4.16xlarger4.8xlarge c4.8xlarger4.16xlarge 0 GB / 0 CPU h© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 74. i3.16xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 75. r4.16xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 76. Sequence Alignment: Sample-Parallel Raw sequence 50GB alignment50GB 100 GB / 6.67 CPU d Aligned Genome - 32 vCPUs - 160 CPU hours m4.16xlarger4.8xlarge c4.8xlarger4.16xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 77. Sequence Alignment: Sample-Parallel Raw sequence 50GB alignment50GB 1,049 ✕ 50GB+50GB 1,049 ✕ 160 CPU hours = 104.9 TB = 19.2 CPU years 104.9 TB / 19.2 CPU y Aligned Genome - 32 vCPUs - 160 CPU hours m4.16xlarger4.8xlarge c4.8xlarger4.16xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 78. Variant Calling: Sample-Parallel Aligned Genome 104.9 TB / 19.2 CPU y Raw sequence 50GB alignment50GB 1,049 ✕ 50GB+50GB 1,049 ✕ 160 CPU hours = 104.9 TB = 19.2 CPU years - 32 vCPUs - 160 CPU hours m4.16xlarg e r4.8xlarge c4.8xlarger4.16xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 79. Aligned Genome Variant Calling: Sample-Parallel 104.9 TB / 19.2 CPU y small_variant large_variant - 32 vCPUs - 128 CPU hours m4.16xlarger4.8xlarge c4.8xlarger4.16xlarge - 2 vCPUs - 4-8 CPU hours m4.large r4.2xlarge r4.xlarge c4.large c4.xlarge c4.2xlarge r4.large m4.xlarge m4.2xlarge 50GB 50GB © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 80. Aligned Genome Variant Calling: Sample-Parallel 50GB small_variant large_variant50GB 16GB 8GB 50GB 74GB 287.4 TB / 35.2 CPU y Sample Variants - 32 vCPUs - 128 CPU hours m4.16xlarger4.8xlarge c4.8xlarger4.16xlarge - 2 vCPUs - 4-8 CPU hours m4.large r4.2xlarge r4.xlarge c4.large c4.xlarge c4.2xlarge r4.large m4.xlarge m4.2xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 81. 287.4 TB / 35.2 CPU y Sample Variants Joint Calling: Region-Parallel Aligned Genome 50GB small_variant large_variant50GB 16GB 8GB 50GB 74GB- 32 vCPUs - 128 CPU hours m4.16xlarger4.8xlarge c4.8xlarger4.16xlarge - 2 vCPUs - 4-8 CPU hours m4.large r4.2xlarge r4.xlarge c4.large c4.xlarge c4.2xlarge r4.large m4.xlarge m4.2xlarge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 82. Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 83. Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 84. Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 85. 1,049 genomes Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 86. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. joint_calling Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y
  • 87. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 100MB 100MB 100MB 100MB 100MB 100MB joint_calling Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y
  • 88. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 105GB joint_calling Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y
  • 89. joint_calling105GB 50GB Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y m4.16xlarge - 64 vCPUs - 100 GB EBS - 4 CPU hours © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 90. joint_calling105GB 50GB 1 of 80 Sample Variants Joint Calling: Region-Parallel 287.4 TB / 35.2 CPU y m4.16xlarge - 64 vCPUs - 100 GB EBS - 4 CPU hours © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 91. joint_calling105GB 50GB 80 ✕ 105GB+50GB 80 ✕ 4 CPU hours = 12.4TB = 13.3 CPU days1 of 80 Sample Variants Joint Calling: Region-Parallel 299.8 TB / 35.2 CPU y m4.16xlarge - 64 vCPUs - 100 GB EBS - 4 CPU hours Wall Time : 3.1 days © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 92. Academic Software and Exit Codes $ bioinformatics-software $input > $output # catastrophic silent ERROR resulting in truncated output $ echo $? 0 Validate output at each step, create a sentinel file on S3 $ bioinformatics-software $input > $output $ verify-output $output && touch $output.sentinel $ aws s3 cp $output.sentinel s3://base2-sentinel/ © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 93. base2mon © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 95. base2mon sample1.fq sample2.fq … sampleN.fq alignment small_variant large_variant joint_calling Input file Operations sample1.fq sample1.s.vcf sample1.s.vcf.sentinel sample1.l.vcf sample1.l.vcf.sentinel small_variant large_variant sampleN.fq sampleN.bam sampleN.bam.sentinel sampleN.s.vcf sampleN.s.vcf.sentinel sampleN.l.vcf sampleN.l.vcf.sentinel alignment samples1_N.vcf joint_calling ... sample2.fq sample2.bam sample2.bam.sentinel alignment ... sample1.bam sample1.bam.sentinel alignment small_variant large_variant © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 96. base2mon sample1.fq sample2.fq … sampleN.fq alignment small_variant large_variant joint_calling Input file Operations Batch t2.micro S3 Data-dependent job planning, recovery, restart Modularize and isolate 3rd party software Reactive resubmit (e.g., more memory) Per-instance time/cost monitoring © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 98. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dinu Bunduchi: Cloud Infrastructure Architect, Autodesk
  • 99. © 2013 Autodesk© 2017 Autodesk Autodesk Generative Design Software mimics Nature’s approach to design
  • 100. © 2017 Autodesk Engineering Outcome The ultimate goal for any engineering activity is to strike the right balance between performance and cost to produce for a given design challenge © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 101. © 2017 Autodesk Price Performance Curve Cost to Produce Performance © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 102. © 2017 Autodesk Design Factors Materials CostsProcessRequirements © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 103. © 2017 Autodesk © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 105. © 2017 Autodesk Generative Design Designers input design goals into generative design software, along with parameters such as materials, manufacturing methods, and cost constraints. Then, using cloud computing, the software explores all the possible permutations of a solution, quickly generating design alternatives. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 106. © 2017 Autodesk © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 107. © 2017 Autodesk Autodesk Generative Design Workflow Define Generate Explore Refine Validate Make © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 108. © 2017 Autodesk Main AWS Services Used ECS SWF Batch © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 109. © 2017 Autodesk  It’s a state machine  Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud Simple Workflow Service © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 110. © 2017 Autodesk Architecture API Server (ECS) SWF Workflow Job Manager (ECS) Generate Variants Solve Variants start Workflow poll for Decision submit Batch Job poll for Activity submit Batch Job poll for Activity task Completed task Completed © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 111. © 2017 Autodesk  Two managed Batch compute environments  CPU cluster for generating variants (Instance Type: optimal)  GPU cluster for variant solvers (Instance Type: p2) – custom AMI  Two job queues, one per compute environment AWS Batch © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 112. © 2017 Autodesk { "containerProperties": { "command": [ "solver" ], "image": "_ECR_IMAGE_TAG_", "vcpus": 4, "memory": 48000, "environment": [ { "name": "SOLVER_TIMEOUT", "value": "120" } … ], "mountPoints": [ { "containerPath": "/usr/local/nvidia", "readOnly": false, "sourceVolume": "nvidia" } ], "privileged": true, "ulimits": [ { "hardLimit": 65535, "name": "nofile", "softLimit": 65535 } ], "volumes": [ { "host": { "sourcePath": "/var/lib/nvidia-docker/volumes/nvidia_driver/latest" }, "name": "nvidia" } ] }, "jobDefinitionName": "prod-solver-job", "type": "container" } AWS Batch - Solver Job Definition © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 113. © 2017 Autodesk import ( "github.com/aws/aws-sdk-go/aws” "github.com/aws/aws-sdk-go/aws/awserr” "github.com/aws/aws-sdk-go/service/batch" ) input := &batch.SubmitJobInput { JobDefinition: aws.String( jobDef ), JobName: aws.String( "variantGen” ), JobQueue: aws.String( cfg.VariantGenBatchJobQueue ), ContainerOverrides: &batch.ContainerOverrides { Command: cmdParams, Environment: environment, }, } b.BatchConn.SubmitJob( input ) AWS Batch - SubmitJob() © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 114. © 2017 Autodesk Autodesk Generative Design Demo © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 115. © 2017 Autodesk Autodesk Generative Design Customers  2,200 Design Studies Created  28,000 Design Options Computed © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 116. © 2017 Autodesk Autodesk Generative Design Customers Autodesk’s new Toronto office is the first example of a generatively designed office space. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 117. Autodesk and the Autodesk logo are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2017 Autodesk. All rights reserved.
  • 118. Fully Managed Integrated with AWS Cost-optimized Resource Provisioning © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 119. Important Links Product Details: https://aws.amazon.com/batch/details/ Getting Started: https://aws.amazon.com/batch/getting-started/ Sample code for AWS Batch + Step Functions Integration: https://github.com/awslabs/aws-batch-genomics Compute Blog Post Describing How to Use Batch + FPGAs: https://aws.amazon.com/blogs/compute/accelerating-precision-medicine-at-scale/ Deep Learning on AWS Batch: https://aws.amazon.com/pt/blogs/compute/deep-learning-on-aws-batch/ © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 120. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 121. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!