Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ILM - Pipeline in the cloud

8,888 views

Published on

An overview of our experiments at Industrial Light and Magic to create a fully cloud based pipeline, based on Mesos, Docker and automated with Ansible.

Published in: Technology

ILM - Pipeline in the cloud

  1. 1. Who are we? Jim Vanns Aaron Carey Production Engineers at ILM London
  2. 2. VFX Pipeline in the Cloud Experiments with Mesos and Docker
  3. 3. Nomenclature, glossary and other big words ★ VFX Visual Effects ★ Pipeline Data->Process->Data repeat! ★ Show Film ★ Sequence A thematically linked series of (continuous) scenes! ★ Shot An uninterrupted portion of the sequence
  4. 4. What is a VFX pipeline?
  5. 5. What is a VFX pipeline? Film Scan Roto 3D FX Comp Lighting
  6. 6. What is a VFX pipeline?
  7. 7. What VFX isn’t.
  8. 8. What VFX isn’t ★ Rendering and Sims are our ‘Big Data’ ★ We’re not crunching analytics in real-time ★ Rendering != MapReduce ★ Apps run on hardware, not in a browser ★ We’re not here to re-write a renderer (not yet...) Where does the cloud meet VFX?
  9. 9. What’s in it for us? ★ Reducing Capital Expenditure ★ Potentially reducing overheads ★ Flexibility ★ Giving power back to developers
  10. 10. VFX Studio Infrastructure ★ Render Farm ★ Database ★ Storage ★ Workstations
  11. 11. Render Farm
  12. 12. First, what is rendering!? ★ Take a virtual 3D representation of a scene ○ 3D Models ○ Textures ○ Light sources ○ Static backgrounds (plates) ★ Place a virtual camera in the scene ★ Compute the 2D image that the camera will see
  13. 13. Rendering in the cloud ★ Low hanging fruit ★ Already happening ★ Typical Farm 30-50k procs ★ Managed by specialist software (Tractor/Deadline/in-house etc) ★ VFX has been doing clustered computing for decades What’s next?
  14. 14. Mesos ★ Open Source framework for scheduling ★ Already used at massive scale ★ NOT a job scheduler ★ We can concentrate on the scheduling logic ★ Support for task isolation/containment (eg Docker)
  15. 15. Automating our Mesos cluster with Docker and Ansible ★ Goals: Quick - Easy - Repeatable ★ Didn’t want to spend time fighting our config manager (or each other) ★ Be able to deploy a virtual studio from scratch in under an hour (including provisioning, building software, deploying, configuration) ★ Run multiple versions of the infrastructure at the same time (in the same availability zone/network) ★ If something is typed in the terminal, we want to automate and version it Docker + Ansible was the answer
  16. 16. Automating our Mesos cluster with Ansible ★ Heavily using tags and variables in Ansible ★ Cloud agnostic: Some modification of GCE inventory and launch modules ★ Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries lineinfile: dest=/etc/zookeeper/conf/zoo.cfg insertafter=EOF line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888" with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
  17. 17. Service Discovery in Mesos ★ No control over where a service or render runs ★ Services may move hosts ★ Can’t guarantee hosts will have same IP ★ Options: ○ Mesos-DNS ○ Homegrown (etcd etc) ○ Consul
  18. 18. Mesos and Consul ★ What is Consul? ★ Every host runs an agent ★ All DNS lookups on a host go to its agent ★ Consul servers outside the Mesos cluster ★ Mesos-Consul automates service registry ★ Can be used for services outside the cluster
  19. 19. Example - Static service outside the cluster $ ssh -i mykey.pem username@172.100.121.100 $ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY -e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3 $ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry", "Tags": ["docker-registry", "v2"], "Port": 5000 }' http://127.0.0.1:8500/v1/agent/service/register
  20. 20. Example - Static service outside the cluster - name: Run docker registry container docker: name: docker-registry image: registry:2.1 state: started ports: - "5000:5000" restart_policy: always env: REGISTRY_STORAGE_S3_ACCESSKEY: REGISTRY_STORAGE_S3_SECRETKEY: REGISTRY_STORAGE_S3_REGION: REGISTRY_STORAGE_S3_BUCKET: REGISTRY_STORAGE: s3 - name: Register registry with consul uri: url: http://127.0.0.1:8500/v1/agent/service/register method: PUT body: '{ "Name": "docker-registry", "Tags": [ "docker-registry", "v2" ], "Port": 5000 }' body_format: json
  21. 21. Example - Launching a service on marathon - name: Submit maya container to marathon hosts: "tag_build_docker_{{ consul_domain }}" gather_facts: False tasks: - name: Submit maya job to marathon uri: url: http://marathon:8080/v2/apps method: POST status_code: 201,409 body: '{ "args": [], "container": { "type": "DOCKER", "docker": { "network": "BRIDGE", "portMappings": [ { "containerPort": 5901, "hostPort": 0, "protocol": "tcp" } ], "image": "docker-registry:5000/studio-local-base/maya", "forcePullImage": true, "parameters": [ { "key": "env", "value": "DISPLAY" }, { "key": "device", "value": "/dev/dri/card0" }, { "key": "device", "value": "/dev/nvidia0" }, { "key": "device", "value": "/dev/nvidiactl" } ] }, "volumes": [ { "containerPath": "/tmp/.X11-unix/X0", "hostPath": "/tmp/.X11-unix/X0", "mode": "RW" } ] }, "id": "maya", "instances": 1, "cpus": 4, "mem": 8024, "constraints": [ ["gfx", "CLUSTER", "gpu"] ] }' body_format: json
  22. 22. Studio Services
  23. 23. Studio Service Structure
  24. 24. Studio Service Deployment
  25. 25. Database
  26. 26. ★ Sites (eg. London, San Francisco, Singapore etc.) ★ Departments ★ Shows (film) ★ Sequences ★ Shots ★ Tasks ★ Assets ★ Data Modelling studio relationships
  27. 27. Challenges ★ New technologies ○ Graph database ○ Query language/APIs ○ Distributed storage engine ★ Complexity (both in the data modelling and system) ★ Adoption/Approval
  28. 28. Storage
  29. 29. Cloud Storage Pros and Cons ★ Managed ★ No more tape archives/backups But.. ★ Getting data into the cloud is expensive ★ Getting data into the cloud is slooow Is there another way?
  30. 30. Work in Progress... ★ Applications need a POSIX filesystem interface ★ Can we cache cloud storage? ○ EFS ○ Avere ○ Homegrown Can we create content entirely in the cloud?
  31. 31. Workstations
  32. 32. Can we create content entirely in the Cloud? ★ Applications require OpenGL ★ OpenGL requires hardware ★ Hardware needs drivers Can we do this in Docker?
  33. 33. Dockerising OpenGL Applications ★ NVIDIA drivers must match the host version exactly ★ Driver inside the container must not install kernel module ★ Container requires access to GPU device and X Server
  34. 34. Running an OpenGL Docker application docker run -it -v /tmp/.X11-unix:/tmp/.X11-unix:rw --device=/dev/dri/card0 --device=/dev/nvidia0 --device=/dev/nvidiactl -e DISPLAY
  35. 35. Scheduling a VFX app on Mesos in the cloud ★ Must use custom Mesos resources/attributes to only schedule on GPU machines ★ Cloud machines have no monitor ★ Remote desktop apps will forward GL calls to the client machine
  36. 36. Using VirtualGL ★ Intercepts GLX calls on the host ★ Calls forwarded to 2nd (local) X Server ★ GPU computation is done on the GPU and output forwarded to the 2D (VNC) X Server
  37. 37. Using VirtualGL
  38. 38. 3D X server setup /etc/X11/xorg.conf Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GRID K520" BusID "PCI:0:3:0" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "UseDisplayDevice" "None" SubSection "Display" Depth 24 EndSubSection EndSection $ lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1) 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
  39. 39. Demo
  40. 40. We’re Hiring

×