SlideShare a Scribd company logo
Phosphorus
Big Data Genomics NYC
#GenomicsNYC
• We meet quarterly
• We are passionate about Big-Data technologies, the
human genome and personalized medicine
• We have 775 genomies (members) #GenomicsNYC
Big Data Genomics NYC
Past MeetUps include:
• Dipping into Guacamole – a spark-powered Somatic
Variant Caller
• Next Generation Tools and Strategies for Genomic
Analysis
• Leverage ADAM and Spark for Genomic Analysis
Building Genomics
Pipelines in the Cloud:
Using AWS Batch and AWS Step
Functions to Design and Run
High-Throughput Workflows
with Angel Pizarro
What we will cover
• Some context
• Service Overview – AWS Batch
• Service Overview – AWS Step Functions
• Architecture Deep Dive
We see similar data analysis patterns
Life Sciences
Financial Services
The Architecture
AWS Batch
Introducing AWS Batch
Fully Managed
No software to install or
servers to manage. AWS
Batch provisions and
scales your infrastructure
Integrated with AWS
AWS Batch jobs can easily
and securely interact with
services such as Amazon S3,
DynamoDB, and Rekognition
Cost-Efficient
AWS Batch launches compute
resources tailored to your jobs
and can provision Amazon EC2
and EC2 Spot instances
AWS Batch Concepts
• Jobs
• Job Definitions
• Job Queue
• Compute Environments
• Scheduler
IAM Role for
Batch Job
Input Files
Queue of
Runnable Jobs
S3 Events Trigger
Lambda Function
Submits Batch Job
AWS Batch
Compute Environments
AWS Batch Job
Output
Example AWS Batch Job Architecture
Job Definition
Job Resource Requirements
and other parameters
AWS Batch Execution
Application
Image
AWS Batch
Scheduler
Job Definitions
Similar to ECS Task Definitions, AWS Batch Job Definitions specify how
jobs are to be run. While each job must reference a job definition, many
parameters can be overridden.
Some of the attributes specified in a job definition:
• IAM role associated with the job
• vCPU and memory requirements
• Mount points
• Container properties
• Environment variables
$ aws batch register-job-definition --job-definition-name gatk
--container-properties ...
Jobs
Jobs are the unit of work executed by AWS Batch as containerized
applications running on Amazon EC2.
As your job starts, AWS Batch creates a container using the command
and parameters specified in your job definition. You can optionally
override properties such as CPU and Memory requirements.
$ aws batch submit-job --job-name variant-calling
--job-definition gatk:12 --job-queue genomics
Job Queues
Jobs are submitted to a Job Queue, where they reside until they are
able to be scheduled to a compute resource. Information related to
completed jobs persists in the queue for 24 hours.
$ aws batch create-job-queue --job-queue-name genomics
--priority 500 --compute-environment-order ...
Compute Environments
Job queues are mapped to one or more Compute Environments which
contain the EC2 instances used to run your AWS Batch jobs.
Managed compute environments enable you to describe your business
requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as
a % of On-Demand). AWS Batch will then launch an elastic quantity of
instances from a range of instance types based on your jobs’ requirements.
You can select specific instance types (e.g. c4.8xlarge), instance families
(e.g. C4, M4, R4), or simply choose “optimal” and AWS Batch will launch
appropriately sized instances from our more-modern instance families.
AWS Batch Concepts
The Scheduler evaluates when, where, and
how to run jobs that have been submitted to
a job queue.
Jobs run in approximately the order in which
they are submitted as long as all
dependencies on other jobs have been met.
AWS Step Functions
AWS Step Functions…
…makes it easy to
coordinate the components
of distributed applications
using visual workflows.
Application Lifecycle in AWS Step Functions
Visualize in the
Console
Define in JSON Monitor
Executions
Seven State Types
Task A single unit of work
Choice Adds branching logic
Parallel Fork and join the data across tasks
Wait Delay for a specified time
Fail Stops an execution and marks it as a failure
Succeed Stops an execution successfully
Pass Passes its input to its output
BUILD VISUAL WORKFLOWS USING STATE TYPES
2
2
AWS STEP FUNCTIONS
Task
Choice
Fail
ParallelMountains
People
Snow
Architecture Deep Dive
Example Architecture
Example Architecture
Executing Job(s)
Specify Docker run parameters as container overrides
Specify Job Queue
Submit Dependencies
response = batch_client.submit_job(
dependsOn=event['dependsOn'],
containerOverrides=event['containerOverrides'],
jobDefinition=event['jobDefinition'],
jobName=event['jobName'],
jobQueue=event['jobQueue'],
)
Confidential
Considerations for Batch Layer: Data Sharing
Consideration: Jobs are managed at the container, not
instance level. Cannot guarantee consecutive containers in
a workflow will run on same instance.
Solution: Stage all data in Amazon S3, and read and write
everything from there. Also important for traceability,
logging, etc.
Considerations for Batch Layer: Multitenancy
Consideration: May have multiple containers running
batch processes on same instance in same base working
directory.
Solution: Within scratch directory, each batch process
creates a subfolder with a unique ID. All scratch data
written to this subdirectory.
Considerations for Batch Layer: Volume Reuse
Consideration: Scratch data should live only as long as
the job using it in order to optimize for instance and
Amazon EBS storage costs.
Solution: Within scratch directory, each batch process
creates a subfolder with a unique ID. All scratch data
written to this subdirectory. Delete subdirectory at end of
job.
Example Architecture
Deployment with AWS Step Functions
A Flexible Workflow Deployment Model
• Decouple batch engine and workflow orchestration
• Workflow creation now done as JSON
• Easier to deploy
• Easier to automate
• Easier to test
• Can integrate non-Batch applications as well
{
...
"SubmitJob": {
"Type": "Task",
"Resource":
"arn:aws:lambda:REGION:ACCOUN
T:function:batchSubmitJob1",
"Next": "GetJobStatus"
},
...
}
Change one line to change workflow
{
...
"SubmitJob": {
"Type": "Task",
"Resource":
"arn:aws:lambda:REGION:ACCOUN
T:function:batchSubmitJob2",
"Next": "GetJobStatus"
},
...
}
A Practical Example: Genomics
A Practical Example: Genomics
Annotation
Variant
Calling
QC
Alignment
Other serverless science
Evented File Processing
Nanocall*
* Matei David (Jared T. Simpson lab)
doi:10.1093/bioinformatics/btw569
Control Plane for other
Infrastructure
Human Microbiome Project
Public Data Set
Targeted 16S sequencine of 300 healthy adult at 18
specific sites (oral cavity, airways, urogenital track, skin,
and gut)
https://s3-us-west-2.amazonaws.com/human-microbiome-project
CRISPR off-target search
IARPA MICrONS
Intelligence Advanced Research Projects Activity
Machine Intelligence from Cortical Networks
• MICrONS seeks to revolutionize machine learning by
understanding the representations, transformations, and
learning rules employed by the brain
• The program is expressly designed as a dialogue between
computer science, data science, and neuroscience
Neurally-plausible
Machine Learning
Framework
Behavior
Experiment
Functional
Imaging
Structural
Imaging
Data
Analysis
Why Is This Different?
• Current Neural networks are “neurally inspired” but
not considered biofidelic or neurally plausible
• Previous projects to build algorithms based on the
brain exist, but have been focused on macro and
micro information, or lower-fidelity statistics
• Little is known about the brain at the mesoscale
• A “cortical column” is theorized to be order ~1mm3
• In this program, structure and function co-registration
provides a uniquely rich picture of computing circuits
• Researchers are directly measuring mesoscale
activity and circuits
Human Connectome Project
(1-100’s of neurons)
microscale
(1k – 1M neurons)
mesoscale
(brain regions)
macroscale
?
Why Is This Different: Functional Imaging
Video Credit: Tianyu Wang (Xu Lab, Cornell University) & Jacob Reimer (Tolias Lab, Baylor College of Medicine)
Why Is This Different: Structural Imaging
• Peta-scale structural imaging
• 1mm3 region is large enough to
contain meaningful circuits never
before observed
• ~50k-100k neurons
• ~100,000,000 synapses
• ~4x4x30nm voxels
• ~2 – 2.5 PB
• Three different techniques
• Scanning Electron Microscopy
(SEM)
• Transmission Electron
Microscopy (TEM)
• Fluorescent in situ sequencing
(FISSEQ) Barcoding
Video Credit: Kasthuri, et al. - Cell 2015
Bobby Kasthuri, Daniel Berger, Jeff Lichtman
Why Is This Different: Co-registered Data
• Co-registration links structure to
function
• For the first time, researchers will
measure in the same sample at scale:
• Stimulus (”input”)
• Behavior (“output”)
• Connectome (“circuit diagram”)
• Neuronal Activity (“voltages”)
Calcium Imaging Data – Tolias Lab, Baylor College of Medicine
X-ray Tomography and co-registration – Allen Institute for Brain Science
Why Can We Succeed Now?
• New imaging techniques and engineering
capabilities can interrogate mesoscale circuits
• Increased computing power has enabled
automated analysis with machine learning
• Reduced storage costs have made collection
and analysis of many petabytes of data possible
• Use of the cloud has provided the ability to scale
when needed and facilitates sharing and
collaboration
We can directly observe and reconstruct mesoscale
neuronal circuits in vivo for the first time
https://www.karlrupp.net
The Boss
Block and Object Storage Service
The Boss is a multi-dimensional spatial database, provided as a managed service on AWS
The Boss stores annotation data co-registered to image data
• An annotation is a unique 64-bit identifier applied to a set of voxels, representing its spatial distribution
ID: 1267
ID: 345345
ID: 534534799
High-Level System Architecture
PyWren
Utilizing
stateless
functions for
distributed
computing
http://pywren.io
https://arxiv.org/abs/1702.04024
Thank you!
AWS Batch: https://aws.amazon.com/batch/
AWS Step Functions: https://aws.amazon.com/step-functions/
Genomics Reference Architecture: https://github.com/awslabs/aws-batch-genomics
The Boss: https://youtu.be/806a3x2s0CY

More Related Content

What's hot

Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
Gal Marder
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark Applications
Databricks
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Reactivesummit
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Lightbend
 
Sourabh Bajaj - Big data processing with Apache Beam
Sourabh Bajaj - Big data processing with Apache BeamSourabh Bajaj - Big data processing with Apache Beam
Sourabh Bajaj - Big data processing with Apache Beam
PyData
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Databricks
 
Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + Kafka
Knoldus Inc.
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Andrzej Ludwikowski
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Lightbend
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizations
Gal Marder
 
Building a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at YieldbotBuilding a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at Yieldbot
yieldbot
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
Demi Ben-Ari
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
Spark Summit
 

What's hot (20)

Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark Applications
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Sourabh Bajaj - Big data processing with Apache Beam
Sourabh Bajaj - Big data processing with Apache BeamSourabh Bajaj - Big data processing with Apache Beam
Sourabh Bajaj - Big data processing with Apache Beam
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
 
Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + Kafka
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizations
 
Building a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at YieldbotBuilding a Lambda Architecture with Elasticsearch at Yieldbot
Building a Lambda Architecture with Elasticsearch at Yieldbot
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...Kenneth Knowles -  Apache Beam - A Unified Model for Batch and Streaming Data...
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 

Similar to Nyc big datagenomics-pizarroa-sept2017

SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
Amazon Web Services
 
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
Amazon Web Services
 
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Amazon Web Services
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
Amazon Web Services
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
Amazon Web Services
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
Amazon Web Services
 
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
Amazon Web Services Korea
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...
Amazon Web Services
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Noriaki Tatsumi
 
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
Amazon Web Services
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Web Services
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
Lynn Langit
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
Amazon Web Services
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
Info Alchemy Corporation
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
Amazon Web Services Korea
 
AWS Cloud Kata 2014 | Jakarta - Startup Best Practices
AWS Cloud Kata 2014 | Jakarta - Startup Best PracticesAWS Cloud Kata 2014 | Jakarta - Startup Best Practices
AWS Cloud Kata 2014 | Jakarta - Startup Best PracticesAmazon Web Services
 
Batch Processing with Containers on AWS - June 2017 AWS Online Tech Talks
Batch Processing with Containers on AWS -  June 2017 AWS Online Tech TalksBatch Processing with Containers on AWS -  June 2017 AWS Online Tech Talks
Batch Processing with Containers on AWS - June 2017 AWS Online Tech Talks
Amazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Amazon Web Services
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
Amazon Web Services
 

Similar to Nyc big datagenomics-pizarroa-sept2017 (20)

SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
 
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
 
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing on Amaz...
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
 
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
AWS Cloud Kata 2014 | Jakarta - Startup Best Practices
AWS Cloud Kata 2014 | Jakarta - Startup Best PracticesAWS Cloud Kata 2014 | Jakarta - Startup Best Practices
AWS Cloud Kata 2014 | Jakarta - Startup Best Practices
 
Batch Processing with Containers on AWS - June 2017 AWS Online Tech Talks
Batch Processing with Containers on AWS -  June 2017 AWS Online Tech TalksBatch Processing with Containers on AWS -  June 2017 AWS Online Tech Talks
Batch Processing with Containers on AWS - June 2017 AWS Online Tech Talks
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 

More from delagoya

Machine Learning on the Cloud with Apache MXNet
Machine Learning on the Cloud with Apache MXNetMachine Learning on the Cloud with Apache MXNet
Machine Learning on the Cloud with Apache MXNet
delagoya
 
Ruby FFI
Ruby FFIRuby FFI
Ruby FFI
delagoya
 
padrino_and_sequel
padrino_and_sequelpadrino_and_sequel
padrino_and_sequel
delagoya
 
Itmat pcbi-r-course-1
Itmat pcbi-r-course-1Itmat pcbi-r-course-1
Itmat pcbi-r-course-1
delagoya
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
delagoya
 
CouchDB : More Couch
CouchDB : More CouchCouchDB : More Couch
CouchDB : More Couch
delagoya
 
Couchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problemCouchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problem
delagoya
 

More from delagoya (7)

Machine Learning on the Cloud with Apache MXNet
Machine Learning on the Cloud with Apache MXNetMachine Learning on the Cloud with Apache MXNet
Machine Learning on the Cloud with Apache MXNet
 
Ruby FFI
Ruby FFIRuby FFI
Ruby FFI
 
padrino_and_sequel
padrino_and_sequelpadrino_and_sequel
padrino_and_sequel
 
Itmat pcbi-r-course-1
Itmat pcbi-r-course-1Itmat pcbi-r-course-1
Itmat pcbi-r-course-1
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
CouchDB : More Couch
CouchDB : More CouchCouchDB : More Couch
CouchDB : More Couch
 
Couchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problemCouchdb: No SQL? No driver? No problem
Couchdb: No SQL? No driver? No problem
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Nyc big datagenomics-pizarroa-sept2017

  • 1. Phosphorus Big Data Genomics NYC #GenomicsNYC
  • 2. • We meet quarterly • We are passionate about Big-Data technologies, the human genome and personalized medicine • We have 775 genomies (members) #GenomicsNYC Big Data Genomics NYC
  • 3. Past MeetUps include: • Dipping into Guacamole – a spark-powered Somatic Variant Caller • Next Generation Tools and Strategies for Genomic Analysis • Leverage ADAM and Spark for Genomic Analysis
  • 4. Building Genomics Pipelines in the Cloud: Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Workflows with Angel Pizarro
  • 5. What we will cover • Some context • Service Overview – AWS Batch • Service Overview – AWS Step Functions • Architecture Deep Dive
  • 6. We see similar data analysis patterns Life Sciences Financial Services
  • 9. Introducing AWS Batch Fully Managed No software to install or servers to manage. AWS Batch provisions and scales your infrastructure Integrated with AWS AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Rekognition Cost-Efficient AWS Batch launches compute resources tailored to your jobs and can provision Amazon EC2 and EC2 Spot instances
  • 10. AWS Batch Concepts • Jobs • Job Definitions • Job Queue • Compute Environments • Scheduler
  • 11. IAM Role for Batch Job Input Files Queue of Runnable Jobs S3 Events Trigger Lambda Function Submits Batch Job AWS Batch Compute Environments AWS Batch Job Output Example AWS Batch Job Architecture Job Definition Job Resource Requirements and other parameters AWS Batch Execution Application Image AWS Batch Scheduler
  • 12. Job Definitions Similar to ECS Task Definitions, AWS Batch Job Definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden. Some of the attributes specified in a job definition: • IAM role associated with the job • vCPU and memory requirements • Mount points • Container properties • Environment variables $ aws batch register-job-definition --job-definition-name gatk --container-properties ...
  • 13. Jobs Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. As your job starts, AWS Batch creates a container using the command and parameters specified in your job definition. You can optionally override properties such as CPU and Memory requirements. $ aws batch submit-job --job-name variant-calling --job-definition gatk:12 --job-queue genomics
  • 14. Job Queues Jobs are submitted to a Job Queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours. $ aws batch create-job-queue --job-queue-name genomics --priority 500 --compute-environment-order ...
  • 15. Compute Environments Job queues are mapped to one or more Compute Environments which contain the EC2 instances used to run your AWS Batch jobs. Managed compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and EC2 Spot bid as a % of On-Demand). AWS Batch will then launch an elastic quantity of instances from a range of instance types based on your jobs’ requirements. You can select specific instance types (e.g. c4.8xlarge), instance families (e.g. C4, M4, R4), or simply choose “optimal” and AWS Batch will launch appropriately sized instances from our more-modern instance families.
  • 16. AWS Batch Concepts The Scheduler evaluates when, where, and how to run jobs that have been submitted to a job queue. Jobs run in approximately the order in which they are submitted as long as all dependencies on other jobs have been met.
  • 17.
  • 19. AWS Step Functions… …makes it easy to coordinate the components of distributed applications using visual workflows.
  • 20. Application Lifecycle in AWS Step Functions Visualize in the Console Define in JSON Monitor Executions
  • 21. Seven State Types Task A single unit of work Choice Adds branching logic Parallel Fork and join the data across tasks Wait Delay for a specified time Fail Stops an execution and marks it as a failure Succeed Stops an execution successfully Pass Passes its input to its output
  • 22. BUILD VISUAL WORKFLOWS USING STATE TYPES 2 2 AWS STEP FUNCTIONS Task Choice Fail ParallelMountains People Snow
  • 26. Executing Job(s) Specify Docker run parameters as container overrides Specify Job Queue Submit Dependencies response = batch_client.submit_job( dependsOn=event['dependsOn'], containerOverrides=event['containerOverrides'], jobDefinition=event['jobDefinition'], jobName=event['jobName'], jobQueue=event['jobQueue'], ) Confidential
  • 27. Considerations for Batch Layer: Data Sharing Consideration: Jobs are managed at the container, not instance level. Cannot guarantee consecutive containers in a workflow will run on same instance. Solution: Stage all data in Amazon S3, and read and write everything from there. Also important for traceability, logging, etc.
  • 28. Considerations for Batch Layer: Multitenancy Consideration: May have multiple containers running batch processes on same instance in same base working directory. Solution: Within scratch directory, each batch process creates a subfolder with a unique ID. All scratch data written to this subdirectory.
  • 29. Considerations for Batch Layer: Volume Reuse Consideration: Scratch data should live only as long as the job using it in order to optimize for instance and Amazon EBS storage costs. Solution: Within scratch directory, each batch process creates a subfolder with a unique ID. All scratch data written to this subdirectory. Delete subdirectory at end of job.
  • 31. Deployment with AWS Step Functions
  • 32. A Flexible Workflow Deployment Model • Decouple batch engine and workflow orchestration • Workflow creation now done as JSON • Easier to deploy • Easier to automate • Easier to test • Can integrate non-Batch applications as well
  • 33. { ... "SubmitJob": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUN T:function:batchSubmitJob1", "Next": "GetJobStatus" }, ... } Change one line to change workflow { ... "SubmitJob": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUN T:function:batchSubmitJob2", "Next": "GetJobStatus" }, ... }
  • 35. A Practical Example: Genomics Annotation Variant Calling QC Alignment
  • 37. Evented File Processing Nanocall* * Matei David (Jared T. Simpson lab) doi:10.1093/bioinformatics/btw569
  • 38. Control Plane for other Infrastructure Human Microbiome Project Public Data Set Targeted 16S sequencine of 300 healthy adult at 18 specific sites (oral cavity, airways, urogenital track, skin, and gut) https://s3-us-west-2.amazonaws.com/human-microbiome-project
  • 40. IARPA MICrONS Intelligence Advanced Research Projects Activity Machine Intelligence from Cortical Networks • MICrONS seeks to revolutionize machine learning by understanding the representations, transformations, and learning rules employed by the brain • The program is expressly designed as a dialogue between computer science, data science, and neuroscience Neurally-plausible Machine Learning Framework Behavior Experiment Functional Imaging Structural Imaging Data Analysis
  • 41. Why Is This Different? • Current Neural networks are “neurally inspired” but not considered biofidelic or neurally plausible • Previous projects to build algorithms based on the brain exist, but have been focused on macro and micro information, or lower-fidelity statistics • Little is known about the brain at the mesoscale • A “cortical column” is theorized to be order ~1mm3 • In this program, structure and function co-registration provides a uniquely rich picture of computing circuits • Researchers are directly measuring mesoscale activity and circuits Human Connectome Project (1-100’s of neurons) microscale (1k – 1M neurons) mesoscale (brain regions) macroscale ?
  • 42. Why Is This Different: Functional Imaging Video Credit: Tianyu Wang (Xu Lab, Cornell University) & Jacob Reimer (Tolias Lab, Baylor College of Medicine)
  • 43. Why Is This Different: Structural Imaging • Peta-scale structural imaging • 1mm3 region is large enough to contain meaningful circuits never before observed • ~50k-100k neurons • ~100,000,000 synapses • ~4x4x30nm voxels • ~2 – 2.5 PB • Three different techniques • Scanning Electron Microscopy (SEM) • Transmission Electron Microscopy (TEM) • Fluorescent in situ sequencing (FISSEQ) Barcoding Video Credit: Kasthuri, et al. - Cell 2015 Bobby Kasthuri, Daniel Berger, Jeff Lichtman
  • 44. Why Is This Different: Co-registered Data • Co-registration links structure to function • For the first time, researchers will measure in the same sample at scale: • Stimulus (”input”) • Behavior (“output”) • Connectome (“circuit diagram”) • Neuronal Activity (“voltages”) Calcium Imaging Data – Tolias Lab, Baylor College of Medicine X-ray Tomography and co-registration – Allen Institute for Brain Science
  • 45. Why Can We Succeed Now? • New imaging techniques and engineering capabilities can interrogate mesoscale circuits • Increased computing power has enabled automated analysis with machine learning • Reduced storage costs have made collection and analysis of many petabytes of data possible • Use of the cloud has provided the ability to scale when needed and facilitates sharing and collaboration We can directly observe and reconstruct mesoscale neuronal circuits in vivo for the first time https://www.karlrupp.net
  • 46. The Boss Block and Object Storage Service The Boss is a multi-dimensional spatial database, provided as a managed service on AWS The Boss stores annotation data co-registered to image data • An annotation is a unique 64-bit identifier applied to a set of voxels, representing its spatial distribution ID: 1267 ID: 345345 ID: 534534799
  • 49. Thank you! AWS Batch: https://aws.amazon.com/batch/ AWS Step Functions: https://aws.amazon.com/step-functions/ Genomics Reference Architecture: https://github.com/awslabs/aws-batch-genomics The Boss: https://youtu.be/806a3x2s0CY