Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(CMP404) Cloud Rendering at Walt Disney Animation Studios

10,555 views

Published on

"Each year, the technical complexity of making the next great Walt Disney Animation Studios film increases. Animation and Visual FX studios continue to push the bounds of what is possible in computer graphics. This complexity drives rapid technological growth in both computational resources and storage to the point that it exceeds what we can physically provide with our on-premise compute cluster. As a result, we have started to adopt a hybrid approach with the cloud.

This session addresses the hurdles that animation and VFX studios face and focuses on automation of 'disposable' components (specifically infrastructure, licensing, fleet management, data and dependency management in a large-scale batch workload). We apply these general cloud techniques and utilities to an animation/VFX workload and push the limits with a very large scale cloud renderfarm deployment.

The team from Walt Disney Animation Studios walks through how they use cloud technologies to maximize render capacity. Learn how to leverage high-performance storage (like Amazon EFS), Amazon EC2 networking and the latest EC2 Spot features to provide a fully functional renderfarm at production-quality scale."

Published in: Technology
  • Who Else Wants To Cure Their uterine fibroids, Regain Their Natural Inner Balance and Achieve LASTING Freedom From PCOS Related Symptoms? ♥♥♥ http://ishbv.com/fibroids7/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Unlock Her Legs - How to Turn a Girl On In 10 Minutes or Less... ♣♣♣ http://ishbv.com/unlockher/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • How I Cured My Acne? Ex Sufferer reveals secret system For Lasting Acne Free Skin ➤➤ https://tinyurl.com/ybbtmvh8
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Tired of being scammed? Take advantage of a program that, actually makes you money! ●●● http://ishbv.com/ezpayjobs/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

(CMP404) Cloud Rendering at Walt Disney Animation Studios

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Usman Shakeel, Amazon Web Services Kevin Constantine, Walt Disney Animation Studios October 2015 CMP404 Cloud Rendering at Walt Disney Animation Studios
  2. 2. Visual Effects and Animation 1 Who is using AWS for rendering? 3 Theme Parks 5 Gaming Marketing2 4 Manufacturing 6 Life Sciences 7 Engineering and Architecture
  3. 3. Visual Effects and Animation 1 Let’s make a film in the cloud…
  4. 4. VFX/Animation Rendering - workflow components CompositingModeling Rendering Asset management Collaboration and task management
  5. 5. The challenge of making a film
  6. 6. The challenge of making a film On-premises capacity
  7. 7. The challenge of making a film On-premises capacity Rendering in the cloud
  8. 8. The challenge of making a film On-premises capacity Rendering in the cloud Cloud provides you the capability to scale fast and get the outputs faster Initial project on-boarding artwork
  9. 9. A tale of two customers A boutique studio Walt Disney Animation Studios On-Premises Hardware No or very little investment A significant investment Licenses Limited Unlimited Project Structure Project based from other studios Internal customers/projects Budget Constraints Time and resources Time and resources Compute Needs Large scale Very large scale Infrastructure Efficiencies No or very little On-premises infrastructure optimized for rendering workload Cloud Model All-in mostly Hybrid mostly Security Mandated by customers Required due to high valued assets
  10. 10. They both ask us the same thing… The ability to spin up thousands of cores on-demand …without any upfront investment …and leveraging the most up-to-date configurations A project-based “disposable” infrastructure …with a flexible licensing / utility / by the hour
  11. 11. They both tell us the same thing… =< $0.01 per core/hour Access to thousands of cores whenever needed No upfront investments in infrastructure Easier collaboration Ecosystem of software providers Access to large memory configs to do 6K/10K renders Project based “disposable” infrastructure
  12. 12. …when the rubber meets the road ! Share FS everywhere Latency Large datasets Lots of instances {Data/Content}
  13. 13. Rendering in the Cloud
  14. 14. Rendering in the Cloud - State of the Union Scale at a very cheap price EC2 Spot
  15. 15. Leveraging Spot successfully today requires some effort Build stateless, distributed, scalable applications Choose which instance types fit your workload the best Ingest price feed data for AZs and regions Make run time decisions on which Spot pools to launch in based on price and volatility Manage interruptions Monitor and manage market prices across AZs and instance types Manage the capacity footprint in the fleet And all of this while you don’t know where the capacity is Serve your customers
  16. 16. Spot Fleet Instead of writing all that code to manage Spot instances, simply specify: •  Target Capacity – The number of EC2 instances that you want in your fleet. •  Maximum Bid Price – The maximum bid price that you are willing to pay. •  Launch Specifications – # of and types of instances, AMI ID, VPC, subnets or AZs, etc. •  IAM Fleet Role – The name of an IAM role. It must allow Amazon EC2 to terminate instances on your behalf.
  17. 17. Spot Fleet Example – Instance Weighting Say your workload needs at least 60 GB of memory Want capacity to complete 20 units of work Choices: •  r3.2xlarge (61.0 GB, 8 vCPUs) = 1 unit of 20 •  r3.4xlarge (122.0 GB, 16 vCPUs) = 2 units of 20 •  r3.8xlarge (244.0 GB, 32 vCPUs) = 4 units of 20 An option to bid for all of these instance types:
  18. 18. AWS cloud scale is “large” • 10s/100s/1000s/10000s cores on-demand in the cloud • A “large” (Disney Animation Studio) renderfarm: 55,000 cores • In this demo: ~40,000 vCPUs on EC2 Spot Market Rendering in the Cloud - State of the Union Scale at a very cheap price
  19. 19. • BYOL • SaaS • AWS Marketplace • Elastic Licensing models Thinkbox Deadline Usage Based Licensing •  Render nodes pull metered licenses from cloud-based license server •  Usage is tracked per minute •  Bulk minutes will be available via Thinkbox’s online store •  Store will eventually host 3rd party licensing (Nuke, VRay, etc.) AutoDesk Maya Rendering in the Cloud - State of the Union Licensing at Cloud Scale
  20. 20. Rendering in the Cloud - State of the Union Hydrating the Cloud Renderfarm Amazon S3 as the source of truth for your content/data •  On AWS Marketplace/SaaS (Aspera, Signiant, File Catalyst, Expedat) •  Amazon S3 Multi-part Upload Direct to Shared File Systems •  Amazon EFS throughput scales linearly to the storage •  Lustre can hydrate from an S3 bucket •  Avere can be fronted to Amazon S3 or an on-premises NAS + AWS Direct Connect EFS S3 Multipart
  21. 21. Rendering in the Cloud - State of the Union Shared FileSystem Everywhere (some ideas) Shared Storage On-prem Storage AWS Direct Connect Storage Cache Amazon S3 Luster on EC2 Avere on EC2 EFS AWS Direct Connect Hydrate workers EC2 Spot Shared Storage FXT on-prem
  22. 22. Rendering in the Cloud - State of the Union NFS/CIFS (Content/Data Share) Everywhere (some ideas) Elastic File System •  Designed to support petabyte scale file systems •  Throughput scales linearly to storage •  Same latency spec across each AZ •  Thousands of concurrent NFS connections •  Works great for large I/O sizes •  Pay for only what you use not what you provision •  Managed with multi-copy durability EFS
  23. 23. Rendering in the Cloud - State of the Union Move the Graphic Artist to the Cloud … •  NVIDIA GPU based EC2 instances •  Teradici PCoIP •  Frame, Otoy •  Windows and Linux (VNC+VirtualGL)
  24. 24. Rendering in the Cloud - State of the Union Managing your “disposable” infrastructure Launch a CloudFormation stack with all the infrastructure resources for a specific project Automatically scale the stack as appropriate AMI CloudFormation Template CloudFormation Terminate Template
  25. 25. Rendering in the Cloud - State of the Union The Crown Jewels •  AWS alignment with the latest MPAA cloud based application guidelines for content security – August 2015 •  VPC private endpoint for Amazon S3 – enables a true private workflow capability •  Encryption & key management capabilities •  Amazon Glacier Vault for high-value media/originals
  26. 26. Rendering in the Cloud - A Sample Architecture (All in Cloud Pipeline) Shared Storage Renderfarm On-Prem Storage Pipeline and License Manager 3D Modeler Remote App Visualization AWS Direct Connect Modeling Dumb Client Storage Cache Amazon S3 Avere on EC2 Scalable Renderfarm on EC2 Appstream or Teradici running on a G2 instance Pipeline Manager running on EC2 G2 EC2 SPOT EFS Hydrate workers EC2 Spot
  27. 27. Render Farm Rendering in the Cloud - A Sample Architecture (A Hybrid Pipeline) Shared Storage Renderfarm On-Prem Storage AWS Direct Connect Storage Cache Amazon S3 Avere on EC2 Scalable Renderfarm on EC2 EFS Hydrate workers EC2 Spot On-premise Renderfarm EC2 SPOT Cloud renderfarm as an extension of on-prem renderfarm FXT on-prem Pipeline and License Manager (also manage cloud renderfarm)
  28. 28. Let’s make a real film in the cloud…
  29. 29. Disney Animation Renderfarm Renderfarm Avere FXT cluster WDAS Data Center Renderfarm Avere FXT cluster Storage Remote Data Center Renderfarm Avere FXT cluster Remote Data Center San Francisco Los Angeles Burbank Artists Redundant 10Gb
  30. 30. Disney Animation’s Environment •  90% Red Hat Enterprise Linux 6, 8% MacOSX •  1Gb/s Ethernet to clients, 10Gb/s to most servers •  Clients are bursty, not generally bandwidth constrained •  Major Applications: •  Hyperion (GI Renderer) •  Maya •  Houdini •  Nuke •  Coda (Scheduler)
  31. 31. Disney Animation’s Environment •  NFS v3 Everywhere •  5-7 petabytes •  500 TB working-set •  100 TB/week of data churn •  Global namespace •  Lots of metadata operations •  Serve everything out of RAM/SSD •  Renderfarm Footprint •  55,000 core renderfarm •  1.1 million render hours per day •  200,000-400,000 tasks per day •  Typical render •  8-16 threads, 64 GB •  3-5 hours per task
  32. 32. Disney Animation Renderfarm Renderfarm Avere FXT cluster WDAS Data Center Renderfarm Avere FXT cluster Storage Remote Data Center Renderfarm Avere FXT cluster Remote Data Center San Francisco Los Angeles Burbank Artists Redundant 10Gb virtual private cloud Avere vFXT Oregon Spot Instances 10Gb Primary, 1Gb backup EFS
  33. 33. Mostly Automated Deployment •  Pre-built EBS-backed AMI •  Heavily customized RHEL •  Python/Boto3 •  Pass in how many resources and the minimum instance size •  Calculates resource weights •  Needs to calculate pricing •  User-Data •  Raids ephemeral disks if available for scratch space •  Integrate with on-premises environment (DNS, asset inventory, Puppet) •  Creates EC2 tags •  Runs Puppet to pick up changes since AMI-build-time •  Joins the render queue and asks for work •  Scale-up/down still a manual process
  34. 34. Spot Fleet Deployment Core Count ./aws_spot_fleet_request  -­‐p  reinvent  -­‐-­‐cpu  8  -­‐-­‐ram  64  -­‐m  4.7    -­‐c  1500  
  35. 35. Spot Fleet Deployment
  36. 36. Spot Fleet Pricing •  Target Price 1 •  $0.47/resource for the 40,000 core •  Target Price 2 •  $0.16/resource for 16,000 cores
  37. 37. Cloud Rendering Benchmarks
  38. 38. Benchmarks: On Premises vs. the Cloud 0" 20" 40" 60" 80" 100" 120" stream"triad" disk"read" disk"write" On"Prem" r3.4xlarge" r3.8xlarge" m4.4xlarge" m4.10xlarge" cr1.8xlarge" Higher is better
  39. 39. EFS Hydration Single Node 50 Clients – multi-threaded file copy
  40. 40. Average Open Latency
  41. 41. Average Read Latency 0 100 200 300 400 500 600 700 100 500 800 1200 2400 4000 Time(µs) Render Processes Mid-TierA Mid-TierB Mid-TierC Archive EFS
  42. 42. Rendering in the Cloud vs. On-Premises !"!!!! !5,000!! !10,000!! !15,000!! !20,000!! !25,000!! !30,000!! 1! 10! 20! 30! 40! 50! 60! 70! 80! 90! RenderTime(s) Frame # EC2/EFS! On!Prem! Lower is better
  43. 43. Lessons Learned •  Use as many different instance types as you can. Especially older generations. •  Think about ways to modify your workload •  Use every Availability Zone •  Check your limits, especially your Amazon EBS limit and VPC setup (address space) •  Resource-oriented bidding •  Diversified allocation •  Benchmark your workload and set pricing accordingly •  Set ONLY realistic pricing that you will pay for •  Don’t be afraid to ask for help or pre-planning your run from AWS
  44. 44. Conclusion •  Cloud rendering on AWS - State of the Union Is getting stronger … •  Rendering forecast Partly cloudy with a chance of all in the cloud… •  Future research • Storage hydration Distribute across many clients to saturate the EFS throughput • Storage for processing Read freely and lump the writes (for shared FS performance) • Latency is killer Atomic workflows within a single AZ/region Caching appliances
  45. 45. Relevant talks
  46. 46. Remember to complete your evaluations!
  47. 47. Thank you!

×