AWS Community Day Chicago
Uploading Millions of Images using a
Serverless Environment
Brett Sutter, Steve Miers, Kirtesh Garg
Minnesota User Group
June 2018
Lifetouch
● School Photography Company
● 30 + Million Subjects
Photographed Each Fall
Problem Statement
● Objectives
○ New project called “Prism” made up of 2 components. Prelab and PPS.
○ Make images available for new PPS fulfillment systems.
○ Capture all image metadata in image data management service. (AIR)
○ Minimal changes to existing applications/infrastructure as they would be in
peak. (Fox Systems)
○ Extra Credit: Can we do this without additional servers?????
Job 1 - Name The Project
Amazon Image Repository
Legacy Image File Transfer
AIRLIFT
AirLift Usecase
AWS Lambda
● Serverless Compute Resource.
● Invokes code in response to events.
● Manages all underlying resources.
● Pay for the actual usage.
Why Lambda ?
● Suits well for our seasonal compute demands.
○ No Auto Scaling to worry about.
● No servers to maintain.
● Tiny functions and faster cold starts.
● Well under the Request/Response size limits.
Lambda Invocation Types
● On Demand (Using AWS SDK)
○ Synchronous
○ Asynchronous
● Event Sources
○ S3 (Asynchronous)
○ Kinesis Streams & DynamoDB (Polls & Invokes Synchronously)
○ Many other services (Cognito, Iot, Alexa, etc.)
Lambda - 1 (Triggers Image Upload)
Lambda - 2 (MetaData Upload)
AWS Resources
● 7 SQS Queues
● 2 Lambda functions
● 2 S3 buckets
● 3 IAM roles
● 3 IAM policies
CloudFormation in CI/CD pipeline
Learnings
● Declare all initialization code outside of handler.
● Reduce cold start latency by allocating more memory.
● Lambda function invoked asynchronously is retried twice. Configuring DLQ is
helpful.
● If hosting Lambda inside VPC make sure your subnet has sufficient ENI capacity.
● AWS Lambda (Async invoked) auto retries in case of throttled event.
● Function level concurrent execution limit (AWS:ReInvent 2017).
Results
Fall 2017 Season - School Photography
◆ Subjects uploaded
● August: 6,712,193
● September: 13,031,102
● October: 13,973,025
● November: 5,641,892
● Season Total: 39,358,212 - Approx 39M
◆ Objects Moved
● August: 54,033,153
● September: 104,900,371
● October: 112,482,851
● November: 45,417,230
● Season Total: 316,833,605 - Approx 316M Images were uploaded using AirLift
platform. Data - 110TB
Thank You !

Server-less solution for moving Millions of Images in Cloud - Brett Sutter, Steve Miers, Kirtesh Garg, Minneapolis

  • 1.
    AWS Community DayChicago Uploading Millions of Images using a Serverless Environment Brett Sutter, Steve Miers, Kirtesh Garg Minnesota User Group June 2018
  • 2.
    Lifetouch ● School PhotographyCompany ● 30 + Million Subjects Photographed Each Fall
  • 3.
    Problem Statement ● Objectives ○New project called “Prism” made up of 2 components. Prelab and PPS. ○ Make images available for new PPS fulfillment systems. ○ Capture all image metadata in image data management service. (AIR) ○ Minimal changes to existing applications/infrastructure as they would be in peak. (Fox Systems) ○ Extra Credit: Can we do this without additional servers?????
  • 4.
    Job 1 -Name The Project Amazon Image Repository Legacy Image File Transfer
  • 5.
  • 6.
  • 7.
    AWS Lambda ● ServerlessCompute Resource. ● Invokes code in response to events. ● Manages all underlying resources. ● Pay for the actual usage.
  • 8.
    Why Lambda ? ●Suits well for our seasonal compute demands. ○ No Auto Scaling to worry about. ● No servers to maintain. ● Tiny functions and faster cold starts. ● Well under the Request/Response size limits.
  • 9.
    Lambda Invocation Types ●On Demand (Using AWS SDK) ○ Synchronous ○ Asynchronous ● Event Sources ○ S3 (Asynchronous) ○ Kinesis Streams & DynamoDB (Polls & Invokes Synchronously) ○ Many other services (Cognito, Iot, Alexa, etc.)
  • 10.
    Lambda - 1(Triggers Image Upload)
  • 11.
    Lambda - 2(MetaData Upload)
  • 12.
    AWS Resources ● 7SQS Queues ● 2 Lambda functions ● 2 S3 buckets ● 3 IAM roles ● 3 IAM policies
  • 13.
  • 14.
    Learnings ● Declare allinitialization code outside of handler. ● Reduce cold start latency by allocating more memory. ● Lambda function invoked asynchronously is retried twice. Configuring DLQ is helpful. ● If hosting Lambda inside VPC make sure your subnet has sufficient ENI capacity. ● AWS Lambda (Async invoked) auto retries in case of throttled event. ● Function level concurrent execution limit (AWS:ReInvent 2017).
  • 15.
    Results Fall 2017 Season- School Photography ◆ Subjects uploaded ● August: 6,712,193 ● September: 13,031,102 ● October: 13,973,025 ● November: 5,641,892 ● Season Total: 39,358,212 - Approx 39M ◆ Objects Moved ● August: 54,033,153 ● September: 104,900,371 ● October: 112,482,851 ● November: 45,417,230 ● Season Total: 316,833,605 - Approx 316M Images were uploaded using AirLift platform. Data - 110TB
  • 16.