Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline for over 100 Million Photos

1,988 views

Published on

In this session, principal architect Mike Broadway describes how HomeAway built a high-throughput, scalable pipeline for manipulating, storing, and serving hundreds of image files every second with Lambda, Amazon S3, DynamoDB, and Amazon SNS. He also shares best practices and lessons learned as they scaled their mission-critical On Demand Image Service (ODIS) system into production. Lambda functions form the backbone of ODIS, which handles over 100 million photographs that are uploaded to HomeAway's vacation rental platform. HomeAway is a vacation rental marketplace with more than 2 million rentals in 190 countries and is part of Expedia.

  • Be the first to comment

  • Be the first to like this

SRV315_How We Built a Mission-Critical, Serverless File Processing Pipeline for over 100 Million Photos

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENTHow We Built a Mission-Critical, Serverless File Processing Pipeline for over 100 Million Photos M i k e B r o a d w a y , P r i n c i p a l A r c h i t e c t , H o m e A w a y N o v e m b e r 2 7 , 2 0 1 7
  2. 2. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. About Me —Mik e Br oadw ay A recovering start-up junkie Designing and writing software for ~40 years • Mainframe device drivers, word processors, terminal emulators, financial planners, job schedulers, etc. A Principal Architect at HomeAway, Expedia • Focus on content, media, identity, communications, etc. • Anything that is not UX or payment-related
  3. 3. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. W hat I W ill Talk About • Why we needed a new photo storage system • How we met that need with AWS Serverless • Amazon Simple Storage Service (S3), AWS Lambdas, Amazon DynamoDB, Amazon Kinesis, etc. • The best practices that we learned
  4. 4. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Shor t Shelf Life War ning AWS is iterating rapidly … … this talk will be out of date soon
  5. 5. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. H omeAw ay– A Vac ation R ental Mar k etplac e 2 million places to stay in 190 countries
  6. 6. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. H omeAw ay H as a Lot of Photos !
  7. 7. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. H ow Many Photos ? • ~ 6 million original image uploads/month • ~ 5 terabytes/month • ~ 80 million unique originals • ~ 1.5 billion “treatment” derivatives
  8. 8. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. H ow D id H omeAw ay Manage Images Befor e? • A farm of Java application servers in Austin • Implemented nine years ago • Design goals defined for acquired brands • Network Attached Storage/NFS
  9. 9. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Pr evious Image Sys tem —Expens ive & D ated Pricey top tier network storage … $5,000/terabyte $20,000 to $25,000/month Hard-coded treatment sizes 4x3 aspect ratio 1024 largest width then now
  10. 10. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. H ighly Var iable U pload Volume Image uploads/hour over one week Nine image uploads per second @ 12:03 p.m. Saturday Even the smaller peaks are a doubling and tripling of traffic 225:1 Load Ratio 25 seconds between image uploads @ 00:40 a.m. Monday
  11. 11. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. We U pgr aded to an AW S Ser ver les s Solution • AWS Lambda for extreme scalability • Amazon S3 to address the storage costs • DynamoDB for metadata and treatment definitions • Multi-region for high availability
  12. 12. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. A Ver y Shor t Intr oduc tion to AW S Lambda
  13. 13. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. • Single function code blocks • No threads to manage • No server patches to apply • Fine-grained scaling • Only runs when triggered • Only pay for what runs AW S Lambda Over view
  14. 14. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 'use strict'; exports.lambda_handler = function(event, context, callback) { var message = event.Records[0].Sns.Message; console.log('Message received from SNS:', message); callback(null, "Success"); }; Tr ivial J avaSc r ipt Example … logs the content of an SNS message
  15. 15. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. @Override public Response handleRequest(SNSEvent event, Context context) { return handleSNSEvent(event, context, Disposition.class, disposition -> { if (disposition.getStatusFlag().equals(Disposition.Status.DEPRECATED)) { ... logging ... return; } if (isBlacklistedPhoto(disposition)) { throw new ErrorResponseException(... description ...); } ... logging ... BucketDescriptor sourceBucketDescriptor = new BucketDescriptor(getBusinessDescriptor(), getEnvironmentDescriptor(), BucketDescriptor.Type.MASTER); ImageFileDescriptor sourceImageFileDescriptor = new ImageFileDescriptor(sourceBucketDescriptor, disposition.getS3Key()); getTreatmentProcessor().renderTreatmentSets(sourceImageFileDescriptor, disposition); }); } An Image Manipulation Lambda
  16. 16. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. OD IS: On D emand Image Ser vic e
  17. 17. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. The OD IS Image Pipeline 6000 x 4000 Original 2880 x 1920 Master Client ready “treatments” Induction λ Treatment caching λ S3 S3 S3
  18. 18. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Thr ee S3 Buc k ets — Thr ee Stages
  19. 19. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Adding Or iginal Images
  20. 20. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. D er iving the “ Mas ter ”
  21. 21. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. R ec or ding The Initial “ D is pos ition”
  22. 22. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. R eac ting to a N ew Image D is pos ition
  23. 23. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Pr e - C a c h in g Se le c te d Ima g e Tr e a tme n ts NOTE: named “treatment definitions” are stored in a second DynamoDB table (not shown)
  24. 24. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Ser ving Images to the C D N
  25. 25. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. R es ponding to C ac he Mis s es
  26. 26. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Alter ing an Image D is pos ition
  27. 27. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Keeping the C lient Platfor m Infor med
  28. 28. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Analyz ing Image C ontent and Quality
  29. 29. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Benefits of AW S Ser ver les s for OD IS • Ultra fine-grained scalability • Event-driven • Rapid development • Naturally flexible/extensible design • Significant cost savings • ~$5,000/month as opposed to > $25,000/month • Simplified operations • No patching, no server monitoring, etc.
  30. 30. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Bes t Pr ac tic es Lear ned
  31. 31. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Lambdas Ar e Als o Good for Sc heduled Tas k s • Can schedule Lambdas to run at intervals or time • At specified rate: rate(1 hour) • At specified time: cron(0 17 ? * MON-FRI *) • No need to have a server just to schedule a job • Startup cost is irrelevant for such one-offs • Not so much for long running jobs
  32. 32. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Amaz on API Gatew ay API Gateway interfaces are always public
  33. 33. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Mor e C an Be Les s • Initially, we minimized Lambda memory • Lower memory = lower cost per 100ms • But CPU is allocated proportionally • Twice the memory = twice the CPU • We saved money by configuring more memory! • More memory = more CPU = shorter duration • Optimize by experimentation
  34. 34. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Lambda Spin - U p & R eus e Duration of the main, image processing, ODIS Lambdas • Average: ~2.5 seconds • First invocation: 10 to 12 seconds (GraphicsMagick, etc., setup) Spiky traffic patterns may not allow start up costs to be amortized
  35. 35. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Only Inc lude W hat You N eed! • Lambda jar/zip files must be less than 50 MB • Java dependencies can add up! • Use the Maven Shade or Gradle Shadow plugin Maven Shade
  36. 36. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. One Pr ojec t to R ule Them All • Keep it all together • A single github project • A single jar file/single deployment package • Greatly simplifies management • Simpler testing and version control
  37. 37. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. D eletion & Amaz on S3 Ver s ioning 82 million master images Ramp up during migration from Austin data center 193 million original images??? Duh! Deletes are soft when S3 versioning is enabled Master bucket object count
  38. 38. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Lambda Thr ottling • Account-wide limit on concurrent Lambdas • Limit is per region, can be different across regions • Now defaults to 1,000 (can request more) • HomeAway currently at 1,500
  39. 39. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Worse: it could be another product and another team in the same account A surge here … … starvation there Lambda Thr ottling Pain
  40. 40. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. D ynamoD B Str eam H andling Stream processing can be a bottleneck • Minimize per event processing times • Fan out to other, asynchronous, Lambdas
  41. 41. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Tes ting and D iagnos tic s • Unit and functional test thoroughly!! • Highest possible line coverage • Utilize AWS SAM Local (a local CLI tool) • Log generously • Separate log from each instance :-( • HomeAway uses Splunk to aggregate • Consider • AWS X-Ray • IOpipe (only for JavaScript?) AWS X-Ray
  42. 42. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Timeouts W ill H appen —Plan For Them Happened during migration/high upload rates Appeared to never start or never finish Automatic retries meant no lost data Never did find the cause
  43. 43. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Tip: Use dead letter queues!
  44. 44. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Stutter ing D ynamoD B Event Str eam Erratic stream performance (probably due to exceeding partition capacity during high traffic migration)
  45. 45. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. D ynamoD B Str eam Wor k ar ound Temporarily removed the dependency on DynamoDB streams DynamoDB capacity is getting easier to manage (automated) ODIS is now using the DynamoDB stream Lambda again
  46. 46. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Ending on a humorous note … What happens when you require 1920 pixel wide photos from your users? H umans W ill Sur pr is e You!
  47. 47. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. Summing U p Lambdas are great for: • Event-driven systems • Highly elastic loads • Scheduled activities • Rapid development • Extensible designs Use dead letter queues! Does not fit every need … use thoughtfully Use DynamoDB auto-scaling AWS X-Ray Unit test thoroughly Consider X-Ray
  48. 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×