Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Workflows - AWS Online Tech Talks

3,041 views

Published on

Learning Objectives:
- How to simply scale out your batch workflows on AWS
- How to think about container/job management within managed, high-throughput workflows
- How to build a scalable orchestration framework within AWS Step Functions

Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Workflows - AWS Online Tech Talks

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jamie Kinney, Principal Product Manager AWS Batch Aaron Friedman, Partner Solutions Architect, Healthcare and Life Sciences September 14, 2017 Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Workflows
  2. 2. What we will cover • What Are High-Throughput Workflows? • Architecture Overview • Service Overview – AWS Batch • Service Overview – AWS Step Functions • Architecture Deep Dive
  3. 3. What are high-throughput workflows? Start Pre-processing Long-running operation Post- Processing Copy results to S3 End Now run this same workflow for thousands of inputs while also: • Starting each step at the right time • Running each step on appropriate compute resources • Managing concurrency • Scaling infrastructure up and down • Handling errors • Providing notifications • Accelerating workflow development Network I/O and CPU Disk I/O and large memory GPU- Accelerated Network I/O
  4. 4. High-throughput workflows are everywhere Media & Entertainment Transportation & Logistics Manufacturing & Design Financial Services Life Sciences Earth Sciences & Geospatial Analytics
  5. 5. The Architecture
  6. 6. AWS Batch
  7. 7. Introducing AWS Batch Fully Managed No software to install or servers to manage. AWS Batch provisions and scales your infrastructure Integrated with AWS AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Rekognition Cost-Efficient AWS Batch launches compute resources tailored to your jobs and can provision Amazon EC2 and EC2 Spot instances
  8. 8. AWS Batch Concepts • Jobs • Job Definitions • Job Queue • Compute Environments • Scheduler
  9. 9. IAM Role for Batch Job Input Files Queue of Runnable Jobs S3 Events Trigger Lambda Function Submits Batch Job AWS Batch Compute Environments AWS Batch Job Output Example AWS Batch Job Architecture Job Definition Job Resource Requirements and other parameters AWS Batch Execution Application Image AWS Batch Scheduler
  10. 10. A Visual Representation of AWS Batch
  11. 11. AWS Step Functions
  12. 12. AWS Step Functions… …makes it easy to coordinate the components of distributed applications using visual workflows.
  13. 13. Application Lifecycle in AWS Step Functions Visualize in the Console Define in JSON Monitor Executions
  14. 14. Seven State Types Task A single unit of work Choice Adds branching logic Parallel Fork and join the data across tasks Wait Delay for a specified time Fail Stops an execution and marks it as a failure Succeed Stops an execution successfully Pass Passes its input to its output
  15. 15. Build Visual Workflows Using State Types AWS STEP FUNCTIONS Task Choice Fail ParallelMountains People Snow
  16. 16. Architecture Deep Dive
  17. 17. Example Architecture
  18. 18. Example Architecture
  19. 19. Executing Job(s) Specify Docker run parameters as container overrides Specify Job Queue Submit Dependencies response = batch_client.submit_job( dependsOn=event['dependsOn'], containerOverrides=event['containerOverrides'], jobDefinition=event['jobDefinition'], jobName=event['jobName'], jobQueue=event['jobQueue'], ) Confidential
  20. 20. Considerations for Batch Layer: Data Sharing Consideration: Jobs are managed at the container, not instance level. Cannot guarantee consecutive containers in a workflow will run on same instance. Solution: Stage all data in Amazon S3, and read and write everything from there. Also important for traceability, logging, etc.
  21. 21. Considerations for Batch Layer: Multitenancy Consideration: May have multiple containers running batch processes on same instance in same base working directory. Solution: Within scratch directory, each batch process creates a subfolder with a unique ID. All scratch data written to this subdirectory.
  22. 22. Considerations for Batch Layer: Volume Reuse Consideration: Scratch data should live only as long as the job using it in order to optimize for instance and Amazon EBS storage costs. Solution: Within scratch directory, each batch process creates a subfolder with a unique ID. All scratch data written to this subdirectory. Delete subdirectory at end of job.
  23. 23. Example Architecture
  24. 24. Deployment with AWS Step Functions
  25. 25. A Flexible Workflow Deployment Model • Decouple batch engine and workflow orchestration • Workflow creation now done as JSON • Easier to deploy • Easier to automate • Easier to test • Can integrate non-Batch applications as well
  26. 26. { ... "SubmitJob": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUN T:function:batchSubmitJob1", "Next": "GetJobStatus" }, ... } Change one line to change workflow { ... "SubmitJob": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUN T:function:batchSubmitJob2", "Next": "GetJobStatus" }, ... }
  27. 27. A Practical Example: Genomics
  28. 28. A Practical Example: Genomics Annotation Variant Calling QC Alignment
  29. 29. Thank you! AWS Batch: https://aws.amazon.com/batch/ AWS Step Functions: https://aws.amazon.com/step-functions/ Reference Architecture: https://github.com/awslabs/aws-batch-genomics

×