Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cost Effective Rendering in the Cloud with Spot Instances


Published on

Usman Shakeel from Amazon Web Services, explains to us how to use AWS Spot Instances to implement low cost video rendering applications and workflows.

This presentation was delivered during the AWS Toronto Media and Entertainment Symposium

Published in: Software
  • If you just broke up with your Ex,you have to follow these steps to get her back or risk ruining your chances. Click here ➤➤
    Are you sure you want to  Yes  No
    Your message goes here

Cost Effective Rendering in the Cloud with Spot Instances

  1. 1. Cost Effective Rendering at Scale with EC2 Spot Usman Shakeel | Principal Solutions Architect M&E Amazon Web Services
  2. 2. Agenda Cost Effective Rendering at Scale with EC2 Spot VFX/Animation Rendering Computationally intensive Batch Process Non-deterministic Compute usage patterns Customer Sizes/Types Hybrid/All-in Cloud Workflows Architectures AWS’s Spare Capacity at Scale Spot Features that make it super easy Terminations – What is it worth? Real world examples Under 2 pennies per core hour What is the definition of “large” in scale Is it really cost effective? 13 2
  3. 3. VFX/Animation Rendering - workflow components CompostingModeling Rendering Asset Management Collaboration and Task Management
  4. 4. Challenges in the VFX/Animation Industry • Increasingly Shrinking Budgets • Cap-ex / Op-ex conundrum and flexible hardware needs • Increasingly Global Workflows • Increased Demand for Computation – High Resolutions (4K, 8K and beyond) – 3D Stitching – VR, AR Stitching • Project based Infrastructure requirements – Budget, Quality, Render Time • A broad and complex Software toolset per project • Security of Crown Jewels
  5. 5. The challenge of making a film
  6. 6. The challenge of making a film On-premise capacity
  7. 7. The challenge of making a film On-premise capacity Rendering in the Cloud
  8. 8. The challenge of making a film On-premise capacity Rendering in the Cloud Cloud provides you the capability to scale fast and get the outputs faster Initial project on-boarding artwork
  9. 9. Rendering Workflow Components Storage Render Farm Pipeline and License Manager Graphics Artist Workstations
  10. 10. Rendering Workflow Components (move to the cloud) Storage Render Farm Pipeline and License Manager Graphics Artist Workstations • Content has gravity • Network Bandwidth • Hybrid/All-in Cloud • IO Performance • Ability to burst at a very short notice • Cost? • Performance • Security • License mobility/Elasticity • Dependency Management (hybrid scenario) • Interactivity • High Performant Storage • Hardware Support
  11. 11. Rendering in the Cloud - Hydrating the Cloud Renderfarm • S3 as the content repository for your content/data • On AWS Marketplace/SaaS (Aspera, Signiant, File Catalyst, Expedat) • S3 Multi-part Upload • AWS import/export Snowball • S3 Transfer Acceleration NEW ! • Direct to Shared File Systems • EFS throughput scales linearly to the storage • Lustre can hydrate from an S3 bucket • Avere can be fronted to S3 or an on-premise NAS • AWS Snowball NEW ! • AWS Direct Connect EFS S3 Multipart AWS Snowball
  12. 12. Rendering in the Cloud - Shared FS Everywhere (some ideas) Shared Storage On-prem Storage AWS Direct Connect Storage Cache Amazon S3 Luster on EC2 Avere on EC2 EFS AWS Direct Connect Hydrate workers EC2 Spot Shared Storage FXT on-prem
  13. 13. Rendering in the Cloud - Shared FS (Content/Data Share) Everywhere Elastic File System (Amazon EFS) • Designed to support Peta-Byte scale file systems • Throughput scales linearly to storage • Same latency spec across each AZ • Thousands of concurrent NFS connections • Works great for Large I/O sizes • Pay for only what you use not what you provision • Managed with multi-copy durability Amazon EFS
  14. 14. • BYOL • SaaS • AWS Marketplace • Elastic Licensing models Thinkbox Deadline 8 Usage Based Licensing • Render nodes pull metered licenses from Cloud-based license server • Usage is tracked per minute • Bulk minutes will be available via Thinkbox’s online store • Hosts 3rd party licensing (Nuke, VRay, etc) Rendering in the Cloud - Licensing at Cloud Scale
  16. 16. Rendering in the Cloud - Move the Graphic Artist to the Cloud … Rendering is going Global • NVIDIA GPU based EC2 instances • Nice DCV • Teradici PCoIP • Windows and Linux (VNC+VirtualGL) 3D Modeler Modeling Dumb Client Remote Application running on a G2 instance G2
  17. 17. Rendering in the Cloud - Managing your “disposable” infrastructure Launch a CloudFormation stack with all the infrastructure resources for a specific project Autoscale the stack as appropriate AMI CloudFormation Template CloudFormation Terminate Template
  18. 18. Rendering in the Cloud – Securing the Crown Jewels • AWS alignment with the latest MPAA cloud based application guidelines for content security – August 2015 • VPC private endpoint for S3 – enables a true private workflow capability • Encryption & key management capabilities • Glacier Vault for high-value media/originals
  19. 19. Rendering in the Cloud - A Sample Architecture (All in Cloud Pipeline) Shared Storage Renderfarm On-Prem Storage Pipeline and License Manager 3D Modeler Remote App Visualization AWS Direct Connect Modeling Dumb Client Storage Cache Amazon S3 Avere on EC2 Scalable Renderfarm on EC2 Appstream or Teradici running on a G2 instance Pipeline Manager running on EC2 G2 EC2 SPOT EFS Hydrate workers EC2 Spot
  20. 20. Render Farm Rendering in the Cloud - A Sample Architecture (A Hybrid Pipeline) Shared Storage Renderfarm On-Prem Storage AWS Direct Connect Storage Cache Amazon S3 Avere on EC2 Scalable Renderfarm on EC2 EFS Hydrate workers EC2 Spot On-premise Renderfarm Cloud renderfarm as an extension of on-prem renderfarm FXT on-prem Pipeline and License Manager (also manage cloud renderfarm)
  21. 21. Lets dig deeper into EC2 Spot $1
  22. 22. On-Demand Pay for compute capacity by the hour with no long-term commitments For spiky workloads, or to define needs AWS EC2 Consumption Models Reserved Make a low, one-time payment and receive a significant discount on the hourly charge For committed utilization Spot Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand For time-insensitive or transient workloads
  23. 23. Spare capacity at scale • AWS has more than a million active customers in 190 countries. • Amazon EC2 instance usage has increased 93% YoY, comparing Q4 2014 and Q4 2013, not including Amazon use.
  24. 24. With Spot the rules are simple Markets where the price of compute changes based on supply and demand You’ll never pay more than your bid. When the market exceeds your bid you get 2 minutes to wrap up your work
  25. 25. Capacity pools AZ1 AZ2 SYD Total Capacity T2 C4 M4 I2 R3 D2 Shared Dedicated Shared Dedicated
  26. 26. $0.27 $0.29$0.50 1b 1c1a 8XL $0.30 $0.16$0.214XL $0.07 $0.08$0.082XL $0.05 $0.04$0.04XL $0.01 $0.04$0.01L C3 $1.76 On Demand $0.88 $0.44 $.22 $0.11 Show me the markets! Each instance family Each instance size Each Availability Zone In every region Is a separate Spot Market
  27. 27. 50% Bid 75% Bid You pay the market price Bid Price Vs Market Price 25% Bid
  28. 28. - ClusterK acquisition - Spot Bid Advisor - Spot fleet - Spot blocks What Happened In 2015?
  29. 29. Amazon EC2 Spot – in the wild 1) We make this easy using the Spot bid advisor 2) With deliberate pool selection and bidding, you will keep your Spot instance as long as you need to. 3) And with new features like Spot fleet diversified we do the heavy lifting for you...
  30. 30. Spot Bid Advisor – aws-spot-labs
  31. 31. Spot fleet helps you Launch Thousands of Spot Instances with one RequestSpotFleet call. Get Best Price Find the lowest priced horsepower that works for you. or Get Diversified Resources Diversify your fleet. Grow your availability. And Apply Custom Weighting Create your own capacity unit based on your application needs
  32. 32. It is easy! • aws ec2 request-spot-fleet --spot-fleet-request-config file://config.json { "IamFleetRole": "arn:aws:iam::781603563322:role/fleet-role", "TargetCapacity": "100", "SpotPrice": "0.03", "ValidFrom": "2015-09-15T00:56:19Z", "ValidUntil": "2016-09-14T07:00:00Z", "TerminateInstancesWithExpiration": true, "LaunchSpecifications": [ { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.large", "WeightedCapacity": 2, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.large", "WeightedCapacity": 2, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.large", "WeightedCapacity": 2, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.xlarge", "WeightedCapacity": 4, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge", "WeightedCapacity": 16, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge", "WeightedCapacity": 16, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.4xlarge", "WeightedCapacity": 16, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId": "subnet-64531413" }, { "ImageId": "ami- 0d4cfd66", "InstanceType": "c3.8xlarge", "WeightedCapacity": 32, "SubnetId": "subnet-0b1b8052" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge", "WeightedCapacity": 8, "SubnetId": "subnet-d0dc51fb" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge", "WeightedCapacity": 8, "SubnetId": "subnet-64531413" }, { "ImageId": "ami-0d4cfd66", "InstanceType": "c3.2xlarge", "WeightedCapacity": 8, "SubnetId": "subnet-0b1b8052" } ] }
  33. 33. An easy to use interface that lets you launch spare EC2 instances in seconds Helps you select and bid on the EC2 instances that meet your applications requirements Simple to use dashboard lets you modify and manage your application’s compute capacity EC2 Spot Console
  34. 34. Spot Fleet – Focus on application needs
  35. 35. Using a single additional Parameter Run continuously for up to 6 hours Save up to 50% off On-Demand pricing EC2 Spot block $1
  36. 36. Capitalizing on two minute warning • When the Spot price exceeds your bid price, the instance will receive a two- minute warning • Check for the 2 minute spot instance termination notification every 5 seconds leveraging a script invoked at instance launch
  37. 37. Sample script – two minutes left! 1) Check for 2 minute warning 2) If YES, run shutdown scripts 3) OTHERWISE, do nothing 4) Then sleep for 5 seconds #!/bin/bash while true do if curl -s data/spot/termination-time | grep -q .*T.*Z; then /env/bin/; else # Spot instance not yet marked for termination. sleep 5 fi done
  38. 38. Real-Life Examples… Cost, Scale & Performance
  39. 39. A Customer Example – Large Scale, Cheap, High Performant A large scale example for animation rendering on AWS: • Hybrid Environment using Avere • All in Cloud Rendering using EFS • Automated environment leveraging Spot Fleet • Launched 40K cores in 20 min at < $0.02/core/hr for the particular rendering workload Findings: • EFS performance for rendering • Hybrid Rendering Scenarios AmazonWebServices/ cmp404-cloud-rendering-at-walt-disney-animation-studios
  40. 40. Customer Example - Spot Fleet Deployment Core Count ./aws_spot_fleet_request -p reinvent --cpu 8 --ram 64 -m 4.7 -c 1500
  41. 41. EFS Performance in a real rendering scenario - Average Open Latency
  42. 42. EFS Performance in a real rendering scenario - Average Read Latency 0 100 200 300 400 500 600 700 100 500 800 1200 2400 4000 Time(µs) Render Processes Mid-TierA Mid-TierB Mid-TierC Archive EFS
  43. 43. Customer Example Rendering in the Cloud vs. On-Premise - 5,000 10,000 15,000 20,000 25,000 30,000 1 10 20 30 40 50 60 70 80 90 RenderTime(s) Frame # EC2/EFS On Prem Lower is better
  44. 44. The $9 Billion Experiment 50,000 physical cores to meet the 1500 scientific researchers demand Over 5 days, less than 1% of instances were terminated, leaving them with a significant margin of safety. Instead of building a 50,000 core data center they were able to successfully use AWS Spot for 5 days and pay just $45,000 Another customer example - Large Scale, Cheap, High Performant
  45. 45. Parting thoughts VFX/Animation rendering workloads can be streamlined on the cloud • Avoid Data/Content movement • Distribute Single job across multiple nodes • Manage state often • Segregate subworkflows (winthin a single pipeline) between incloud and on-premises based on dependancies Rendering in the Cloud is possible and can be more performant over traditional hardware setup • All-in Cloud vs. Hybrid • Technical Feature set has come a long way from even a year ago AWS EC2 has a VERY Large Capacity @ CHEAP • EC2 Spot (Fleet, Block) and Reserved Instance models
  46. 46. Questions / Comments / Feedback