More Related Content Similar to High-Throughput Genomics on AWS - LFS309 - re:Invent 2017 (20) More from Amazon Web Services (20) High-Throughput Genomics on AWS - LFS309 - re:Invent 20171. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
High-Throughput Genomics on AWS
A a r o n F r i e d m a n
A n g e l P i z a r r o
L F S 3 0 9
N o v e m b e r 2 7 , 2 0 1 7
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Presentation - Introduction and AWS Batch Deep Dive (20 minutes)
• Hands-on Lab - Packaging applications as Docker containers and
integrating into AWS Batch to align genome sequences (1 hour)
• Presentation - AWS Lambda and AWS Step Functions (20 minutes)
• Hands-on Lab - Defining a end-to-end genomic data analysis workflow
using Step Functions, Lambda, and Batch (40 minutes)
Prerequisites and materials
amzn.to/reinvent17-lfs309
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Serial steps
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Parallel steps
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Retry logic
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Amazon ECR
Amazon S3
Applications
Data
Job Layer
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Amazon ECR
Amazon S3
AWS Batch
Job Layer Batch Layer
Job
Execution
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Job Layer Batch Layer Workflow Layer
Orchestration
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Job Layer Batch Layer Workflow Layer
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The job layer: Application
packaging using Docker
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The reference architecture
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Workflow LayerBatch LayerJob Layer
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Virtual machines vs. containers
Pros:
• Easy application publishing
• Clean dependency bundling
Cons:
• Large OS images
• Duplication of basic services
• Long start time
Application
Bins/Libs
OS
Application
Bins/Libs
OS
Application
Bins/Libs
OS
Application
Pros:
• Easy application publishing
• Clean dependency bundling
• Shared dependencies
• Shared OS services
• Small images
Cons:
• Some cross container
networking issues
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FROM ubuntu:16.04
RUN apt-get install -y python-pip python-dev
RUN pip install PIL
FROM python:2.7
RUN pip install numpy pandas
Docker Dockerfile and the build process
961f9d3583
c6d01316e4
a408d3cfe23
python27ubuntu:precise
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Docker container sources
Community containers Custom developed
• Support for S3 download and check
pointing
• Scratch space management
• Container metadata management
• Full control on the software stack
• Licensing
• Monitoring
• Security and compliance adherence
https://dockstore.org/
http://biocontainers.pro/
http://bioshadock.genouest.org/
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The batch layer: AWS Batch
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The reference architecture
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Workflow LayerBatch LayerJob Layer
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing AWS Batch
Fully Managed
Task Execution
No software to install or
servers to manage. AWS
Batch provisions and
scales your infrastructure
Integrated with AWS
AWS Batch jobs can easily
and securely interact with
services such as Amazon S3,
DynamoDB, and Rekognition
Cost-Efficient
AWS Batch launches compute
resources tailored to your jobs
and can provision Amazon EC2
and EC2 Spot instances
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch concepts
• Jobs
• Job definitions
• Job queue
• Compute environments
• Scheduler
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: AWS Batch job architecture
IAM Role for
Batch Job
Amazon S3
Input Files
Queue of
Runnable Jobs
S3 Events Trigger
Lambda Function
Submits Batch Job
AWS Batch
Compute Environments
AWS Batch Job
Output
Job Definition
Job Resource Requirements
and other parameters
AWS Batch Execution
Application
Image
AWS Batch
Scheduler
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Visual Representation of AWS Batch
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Executing Job(s)
Specify Docker run parameters as container overrides
Specify Job Queue
Submit Dependencies
response = batch_client.submit_job(
dependsOn=event['dependsOn'],
containerOverrides=event['containerOverrides'],
jobDefinition=event['jobDefinition'],
jobName=event['jobName'],
jobQueue=event['jobQueue'],
)
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Considerations for Batch Layer for
genomics
? Data Staging
> Use Amazon S3 to store reference and input data, store
results
? Multi-tenancy
> Have processes work with temporary directories
? Storage cost/efficiency
> Each Job cleans up after itself before returning
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 1: Creating the job and batch
layers
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The workflow layer: AWS Lambda
and AWS Step Functions
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The reference architecture
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Workflow LayerBatch LayerJob Layer
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Owning servers means dealing with...
Scaling
Availability and fault tolerance
Operations and management
Provisioning and utilization
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless compute: AWS Lambda
COMPUTE
SERVICE
EVENT- DRIVEN
Run arbitrary
code without
managing
servers
Code only runs
when it needs to
run
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda: Run code in response to
events
Lambda functions: Stateless, trigger-based code execution
Triggered by events:
• Direct sync and async API calls
• AWS service integrations
• Third-party triggers
• Many more…
Makes it easy to:
• Perform data-driven auditing, analysis, and notification
• Build back-end services that perform at scale
34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions
35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions…
…makes it easy to
coordinate the components
of distributed applications
using visual workflows
36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application lifecycle in AWS Step Functions
Visualize in the
console
Define in JSON Monitor
executions
37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Seven state types
Task A single unit of work
Choice Adds branching logic
Parallel Fork and join the data across tasks
Wait Delay for a specified time
Fail Stops an execution and marks it as a failure
Succeed Stops an execution successfully
Pass Passes its input to its output
38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build Visual Workflows Using State Types
Task
Choice
Fail
ParallelMountains
People
Snow
Amazon
Rekognition
39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of incorporating
AWS Step Functions
40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment with AWS Step Functions
41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A flexible workflow deployment model
• Decouple batch engine and workflow orchestration
• Workflow creation now done as JSON
• Easier to deploy
• Easier to automate
• Easier to test
• Can integrate non-batch applications as well
42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change one line to change workflow
{
...
"SubmitJob": {
"Type": "Task",
"Resource":
"arn:aws:lambda:REGION:ACCOUN
T:function:batchSubmitJob1",
"Next": "GetJobStatus"
},
...
}
{
...
"SubmitJob": {
"Type": "Task",
"Resource":
"arn:aws:lambda:REGION:ACCOUN
T:function:batchSubmitJob2",
"Next": "GetJobStatus"
},
...
}
43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment with AWS Step Functions
44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Genomics Workflow
Alignment
Variant
Calling
Annotation
QC
45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Put it together
$ aws stepfunctions start-execution
--state-machine-arn <your-
state-machine-arn>
--input
file://input.states.json
AWS Command Line Interface
AWS Batch console
Step Function console
S3 object listing
46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 2: Creating the workflow layer
47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!