SlideShare a Scribd company logo
1 of 58
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Angel Pizarro
AWS Research & Technical Computing
April 24, 2018
High Throughput Genomics on AWS
Containers and serverless technology for science
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Background and introduction
• Deep dive on application packaging and AWS Batch
• Demo - packaging samtools using Docker and submitting a Job
• Encoding and executing full scientific workflows with AWS Lambda and
AWS Step Functions
• Demo - running a full workflow
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Serial steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Parallel steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Retry logic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem 1: Application packaging
Need to package a application with its dependencies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem 2: Application execution
Need to provide inputs, runtime arguments, and collect output
input
output
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem 2: Orchestration of execution
Need to define a dependency graph of applications and data
input
output output
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Amazon ECR
Amazon S3
Applications
Data
Application Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Amazon ECR
Amazon S3
AWS Batch
Execution Layer
Job
Execution
Application Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration
Application Layer Execution Layer Orchestration Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Application Layer
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Virtualization of whole pipelines
Pros:
• Easy application publishing
• Clean dependency bundling
Cons:
• Large OS images
• Duplication of basic services
• Long start time
GATK v4.0
Bins/Libs
OS
GATK v4.0.1
Bins/Libs
OS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Packaging applications using Docker containers
GATK v4.0
Bins/Libs
OS
GATK v4.0.1
Pros:
• Easy application publishing
• Clean dependency bundling
• Shared dependencies
• Shared OS services
• Small images
Cons:
• Some cross container
networking issues
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FROM ubuntu:16.04
RUN apt-get install -y python-pip python-dev
RUN pip install PIL
FROM python:2.7
RUN pip install numpy pandas
Docker Dockerfile and the build process
961f9d3583
c6d01316e4
a408d3cfe23
python:2.7ubuntu:precise
e3fc50a88d0
961f9d3583
c6d01316e4
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Docker container source repositories
Community containers Custom developed
• Control specific version and build
features
• Support for S3 download and check
pointing data
• Scratch space management
• Container metadata management
• Full control on the software stack
• Licensing
• Monitoring
• Security and compliance adherence
https://dockstore.org/
http://biocontainers.pro/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Considerations for genomics applications on AWS
Data Staging
Use Amazon S3 to store reference and input data, store
results
Multi-tenancy
Have processes work with temporary directories
Storage cost/efficiency
Each Job cleans up after itself before returning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo 1 - Application packaging
using Docker
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Execution Layer
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing AWS Batch
Fully Managed
Task Execution
No software to install or
servers to manage. AWS
Batch provisions and
scales your infrastructure
Integrated with AWS
AWS Batch jobs can easily
and securely interact with
services such as Amazon S3,
DynamoDB, and Rekognition
Cost-Efficient
AWS Batch launches compute
resources tailored to your jobs
and can provision Amazon EC2
and EC2 Spot instances
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch Concepts
Compute Environments
• The EC2 resources that do the work
Scheduler
• The resource scheduler, looks for submitted jobs and their
dependencies
Job Queue
• The resource to submit jobs to
Job Definition
• Defines the application, the minimal resources (CPUs, RAM)
and application arguments
Jobs
• The runtime instance of a Job Definition
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example AWS Batch Job Architecture
IAM Role for
Batch Job
Amazon S3
Input Files
Queue of
Runnable Jobs
Events Trigger
Lambda Function
Submits Batch Job
AWS Batch
Compute Environments
AWS Batch Job
Output
Job Definition
Job Resource Requirements
and other parameters
AWS Batch Execution
Application
Image
AWS Batch
Scheduler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Visual Representation of AWS Batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Executing Job(s)
Specify Docker run parameters as container overrides
Specify Job Queue
Submit Dependencies
aws batch submit-job --job-name testsamtools_stats
--job-queue ${JOB_QUEUE}
--job-definition ${JOB_DEFINITION}
--container-overrides vcpus=4,memory=6
# STDERR return should resemble the following
{ "jobName": "testsamtools_stats", "jobId": "f92b20d3-cdcd-4b92-aa0c-
6bfd98a65ac6" }
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo 2 - Executing samtools stats
with AWS Batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Orchestration Layer
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
input
output
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Workflow orchestration using
Serverless technologies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Owning servers means dealing with ...
Scaling
Availability and fault tolerance
Operations and management
Provisioning and utilization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless means…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
New data
available
EVENT SOURCE FUNCTION
Node.js
Python
Java
C#
Go
AWS Lambda provides Functions as a Service
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
input
output
AWS Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using AWS Lambda
Bring your own code
• Node.js, Java, Python,
C#, Go
• Bring your own libraries
(even native ones)
Simple resource model
• Select power rating from
128 MB to 3 GB
• CPU and network
allocated proportionately
Flexible use
• Synchronous or
asynchronous
• Integrated with other
AWS services
Flexible authorization
• Securely grant access to
resources and VPCs
• Fine-grained control for
invoking your functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anatomy of a Lambda function
Handler() function
Function to be executed
upon invocation
Event object
Data sent during
Lambda Function
Invocation
Context object
Methods available to
interact with runtime
information (request ID,
log group, etc.)
def handle_request(job, context):
batch_job_response = submitBatchJob(job["name"],
job["queue"],
job["definition"])
return batch_job_response
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Keep orchestration out of code.
Sequence Choice Parallel
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions
“Serverless” workflow management with zero
administration:
• Makes it easy to coordinate the components of
distributed applications and microservices using
visual workflows
• Automatically triggers and tracks each step, and
retries when there are errors, so your application
executes in order and as expected
• Logs the state of each step, so when things do go
wrong, you can diagnose and debug problems
quickly
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions
Orchestration of workflows
Initiate Actions and Transitions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Seven State Types
Task A single unit of work
Choice Adds branching logic
Parallel Fork and join the data across tasks
Wait Delay for a specified time
Fail Stops an execution and marks it as a failure
Succeed Stops an execution successfully
Pass Passes its input to its output
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment with Step Functions and Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Genomics Workflow
Alignment
Variant
Calling
Annotation
QC
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Putting it all together
$ aws stepfunctions start-execution
--state-machine-arn <your-
state-machine-arn>
--input
file://input.states.json
AWS Command Line Interface
AWS Batch console
Step Function console
S3 object listing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo 3 - Implementing a full
workflow using Lambda and Step
Functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alternatives!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Partner Network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open Source Workflow Orchastration
Cromwell
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BioIT World workshop May 15th
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
We will send a follow up email with more information
on how to get started using AWS Batch for genomics.

More Related Content

What's hot

What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012Amazon Web Services
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Amazon Web Services
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
 
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...Amazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsAmazon Web Services
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational DatabasesAmazon Web Services
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 

What's hot (20)

What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational Databases
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech Talks
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 

Similar to Genomics on aws-webinar-april2018

High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017Amazon Web Services
 
LFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfLFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfAmazon Web Services
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSCodeOps Technologies LLP
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best PracticesAmazon Web Services
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture PatternsAmazon Web Services
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWSAdrian Hornsby
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAdrian Hornsby
 
Building Serverless Microservices with AWS
Building Serverless Microservices with AWSBuilding Serverless Microservices with AWS
Building Serverless Microservices with AWSDonnie Prakoso
 
Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda Boaz Ziniman
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Web Services
 
Application Performance Management on AWS
Application Performance Management on AWSApplication Performance Management on AWS
Application Performance Management on AWSAmazon Web Services
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSAmazon Web Services
 
Create a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformCreate a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformAmazon Web Services
 
Serverless in Action on AWS
Serverless in Action on AWSServerless in Action on AWS
Serverless in Action on AWSAdrian Hornsby
 
Serverless architecture-patterns-and-best-practices
Serverless architecture-patterns-and-best-practicesServerless architecture-patterns-and-best-practices
Serverless architecture-patterns-and-best-practicessaifam
 
Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...
Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...
Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 

Similar to Genomics on aws-webinar-april2018 (20)

High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
 
LFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfLFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdf
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best Practices
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Building Serverless Microservices with AWS
Building Serverless Microservices with AWSBuilding Serverless Microservices with AWS
Building Serverless Microservices with AWS
 
Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)
 
Application Performance Management on AWS
Application Performance Management on AWSApplication Performance Management on AWS
Application Performance Management on AWS
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWS
 
Create a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformCreate a Serverless Image Processing Platform
Create a Serverless Image Processing Platform
 
Serverless in Action on AWS
Serverless in Action on AWSServerless in Action on AWS
Serverless in Action on AWS
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
Serverless architecture-patterns-and-best-practices
Serverless architecture-patterns-and-best-practicesServerless architecture-patterns-and-best-practices
Serverless architecture-patterns-and-best-practices
 
Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...
Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...
Scaling and Automating DevOps with CloudBees and Spot Instances (GPSTEC310) -...
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 

Recently uploaded

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 

Genomics on aws-webinar-april2018

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Angel Pizarro AWS Research & Technical Computing April 24, 2018 High Throughput Genomics on AWS Containers and serverless technology for science
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Background and introduction • Deep dive on application packaging and AWS Batch • Demo - packaging samtools using Docker and submitting a Job • Encoding and executing full scientific workflows with AWS Lambda and AWS Step Functions • Demo - running a full workflow
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The problem
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Serial steps
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Parallel steps
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Retry logic
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem 1: Application packaging Need to package a application with its dependencies
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem 2: Application execution Need to provide inputs, runtime arguments, and collect output input output
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem 2: Orchestration of execution Need to define a dependency graph of applications and data input output output
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Amazon ECR Amazon S3 Applications Data Application Layer
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Amazon ECR Amazon S3 AWS Batch Execution Layer Job Execution Application Layer
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration Application Layer Execution Layer Orchestration Layer
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Application Layer Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bioinformatics application stacks * Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Virtualization of whole pipelines Pros: • Easy application publishing • Clean dependency bundling Cons: • Large OS images • Duplication of basic services • Long start time GATK v4.0 Bins/Libs OS GATK v4.0.1 Bins/Libs OS
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bioinformatics application stacks * Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Packaging applications using Docker containers GATK v4.0 Bins/Libs OS GATK v4.0.1 Pros: • Easy application publishing • Clean dependency bundling • Shared dependencies • Shared OS services • Small images Cons: • Some cross container networking issues
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FROM ubuntu:16.04 RUN apt-get install -y python-pip python-dev RUN pip install PIL FROM python:2.7 RUN pip install numpy pandas Docker Dockerfile and the build process 961f9d3583 c6d01316e4 a408d3cfe23 python:2.7ubuntu:precise e3fc50a88d0 961f9d3583 c6d01316e4
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Docker container source repositories Community containers Custom developed • Control specific version and build features • Support for S3 download and check pointing data • Scratch space management • Container metadata management • Full control on the software stack • Licensing • Monitoring • Security and compliance adherence https://dockstore.org/ http://biocontainers.pro/
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Considerations for genomics applications on AWS Data Staging Use Amazon S3 to store reference and input data, store results Multi-tenancy Have processes work with temporary directories Storage cost/efficiency Each Job cleans up after itself before returning
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo 1 - Application packaging using Docker
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Execution Layer Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing AWS Batch Fully Managed Task Execution No software to install or servers to manage. AWS Batch provisions and scales your infrastructure Integrated with AWS AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Rekognition Cost-Efficient AWS Batch launches compute resources tailored to your jobs and can provision Amazon EC2 and EC2 Spot instances
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch Concepts Compute Environments • The EC2 resources that do the work Scheduler • The resource scheduler, looks for submitted jobs and their dependencies Job Queue • The resource to submit jobs to Job Definition • Defines the application, the minimal resources (CPUs, RAM) and application arguments Jobs • The runtime instance of a Job Definition
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example AWS Batch Job Architecture IAM Role for Batch Job Amazon S3 Input Files Queue of Runnable Jobs Events Trigger Lambda Function Submits Batch Job AWS Batch Compute Environments AWS Batch Job Output Job Definition Job Resource Requirements and other parameters AWS Batch Execution Application Image AWS Batch Scheduler
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Visual Representation of AWS Batch
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Executing Job(s) Specify Docker run parameters as container overrides Specify Job Queue Submit Dependencies aws batch submit-job --job-name testsamtools_stats --job-queue ${JOB_QUEUE} --job-definition ${JOB_DEFINITION} --container-overrides vcpus=4,memory=6 # STDERR return should resemble the following { "jobName": "testsamtools_stats", "jobId": "f92b20d3-cdcd-4b92-aa0c- 6bfd98a65ac6" }
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo 2 - Executing samtools stats with AWS Batch
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Orchestration Layer Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions input output
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Workflow orchestration using Serverless technologies
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Owning servers means dealing with ... Scaling Availability and fault tolerance Operations and management Provisioning and utilization
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless means…
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SERVICES (ANYTHING) Changes in data state Requests to endpoints New data available EVENT SOURCE FUNCTION Node.js Python Java C# Go AWS Lambda provides Functions as a Service
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions input output AWS Lambda
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using AWS Lambda Bring your own code • Node.js, Java, Python, C#, Go • Bring your own libraries (even native ones) Simple resource model • Select power rating from 128 MB to 3 GB • CPU and network allocated proportionately Flexible use • Synchronous or asynchronous • Integrated with other AWS services Flexible authorization • Securely grant access to resources and VPCs • Fine-grained control for invoking your functions
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anatomy of a Lambda function Handler() function Function to be executed upon invocation Event object Data sent during Lambda Function Invocation Context object Methods available to interact with runtime information (request ID, log group, etc.) def handle_request(job, context): batch_job_response = submitBatchJob(job["name"], job["queue"], job["definition"]) return batch_job_response
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Keep orchestration out of code. Sequence Choice Parallel
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Step Functions “Serverless” workflow management with zero administration: • Makes it easy to coordinate the components of distributed applications and microservices using visual workflows • Automatically triggers and tracks each step, and retries when there are errors, so your application executes in order and as expected • Logs the state of each step, so when things do go wrong, you can diagnose and debug problems quickly
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Step Functions Orchestration of workflows Initiate Actions and Transitions
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Seven State Types Task A single unit of work Choice Adds branching logic Parallel Fork and join the data across tasks Wait Delay for a specified time Fail Stops an execution and marks it as a failure Succeed Stops an execution successfully Pass Passes its input to its output
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deployment with Step Functions and Lambda
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Genomics Workflow Alignment Variant Calling Annotation QC
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Putting it all together $ aws stepfunctions start-execution --state-machine-arn <your- state-machine-arn> --input file://input.states.json AWS Command Line Interface AWS Batch console Step Function console S3 object listing
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo 3 - Implementing a full workflow using Lambda and Step Functions
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alternatives!
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Partner Network
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Open Source Workflow Orchastration Cromwell
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BioIT World workshop May 15th
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! We will send a follow up email with more information on how to get started using AWS Batch for genomics.