• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013
 

Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

on

  • 736 views

Ingesting, storing, processing and delivering a large library of content involves massive complexity. This session walks through sample code that leverages AWS Services to perform all these tasks ...

Ingesting, storing, processing and delivering a large library of content involves massive complexity. This session walks through sample code that leverages AWS Services to perform all these tasks while coordinating the activities with Amazon Simple Workflow Service (SWF). Along the journey you are introduced to best practices for cost optimization, monitoring, reporting, and exception or error handling. In addition to the sample workflow, a guest speaker from Netflix takes the audience on a deep dive into their “digital supply chain” where you learn how they have automated their processes in moving data all the way from the studios to the last mile. Services covered include Amazon SWF, Amazon Simple Storage Service (S3), Amazon Glacier, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Transcoder, Amazon Mechanical Turk, and Amazon CloudFront.

Statistics

Views

Total Views
736
Views on SlideShare
646
Embed Views
90

Actions

Likes
0
Downloads
20
Comments
0

1 Embed 90

http://www.scoop.it 90

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013 Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013 Presentation Transcript

    • MED304 - Automated Media Workflows in the Cloud John Mancuso, Amazon Web Services November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
    • Agenda • • • • • Why automate Workflow steps Automating the workflow Demo of an end-to-end media workflow How Netflix approaches their digital supply chain
    • Why Automate? Analog VCD SI DVD 720p ZE 1080p (3D) 2K 4K RM A T FO ER S US
    • Scenario • At any given time, company X produces 10 broadcast quality shows • Each show consists of 200 30-minute episodes per year • High-res post-production copies of each show are temporarily stored at company X’s studio in Tokyo • The content must be made available for distribution to consumers via web, mobile devices, and media players • The high-res content must be archived for future access
    • Media Workflow Ingest Processing Discovery & Delivery
    • Media Workflow Amazon Simple Workflow Service (SWF) Ingest Processing Discovery & Delivery Amazon Storage Services Amazon S3 – Standard & RRS, Amazon Glacier
    • Media Workflow Amazon Simple Workflow Service (SWF) Ingest Processing Discovery & Delivery Amazon Storage Services Amazon S3 – Standard & RRS, Amazon Glacier
    • Ingest Image courtesy of porbital FreeDigitalPhotos.net Amazon S3 – US East
    • Ingest – Data Transfer Amazon S3 parallel multipart uploads AWS Command Line Interface (CLI) Amazon S3 Server Side
    • Ingest – Data Transfer Image courtesy of porbital FreeDigitalPhotos.net Amazon EC2 Tsunami UDP Amazon S3
    • Ingest –Timing Comparison 885 MB Video File Single thread to S3 13 minutes 25 seconds -- Multiple threads to S3 1 minute 93% reduction Tsunami UDP + multiple threads 15 seconds + 7 seconds = 22 seconds 63% further reduction Instance size: CC2.8xlarge OS: Amazon Linux
    • Ingest – Code Snippet def doWork_INGEST(remoteIP,remoteFileName,s3Key_HighRes): #Transfer using TSUNAMI cmd_s = '/usr/local/bin/tsunami connect {} set rate 500m get {} quit' cmd_s = cmd_s.format(remoteIP,remoteFileName) execCMD(cmd_s) #Upload to S3 using AWS CLI s3Path = 's3://{}/{}' s3Path = s3Path.format(s3Bucket_HighRes,s3Key_HighRes) cmd_s = 'aws s3 cp {} {} --region us-east-1' cmd_s = cmd_s.format(remoteFileName,s3Path) execCMD(cmd_s) #Delete the local file os.remove(localFilePath)
    • Media Workflow Amazon Simple Workflow Service (SWF) Ingest Processing Discovery & Delivery Amazon Storage Services Amazon S3 – Standard & RRS, Amazon Glacier
    • Processing • Transcoding • Thumbnail selection • Archiving of high-res videos
    • Processing – Transcoding Amazon S3 Amazon Elastic Transcoder Amazon S3 (RRS)
    • Transcoding – Code Snippet def doWork_PROCESS_TRANSCODE(Key_HighRes,s3PreFix_TranscodeRoot): etc = ElasticTranscoderConnection() job_input_name={"Key": s3Key_HighRes, "FrameRate": "auto", "Resolution": "auto", "AspectRatio": "auto", "Interlaced": "auto", "Container": "auto" } job_outputs=[ {"Key": "MP4.mp4", "ThumbnailPattern": "MP4{count}", "Rotate": "auto", "PresetId": ET_PresetId_MP4}, {"Key": "HLS", "ThumbnailPattern": "HLS{count}", "Rotate": "auto", "PresetId": ET_PresetId_HLS}] job = etc.create_job(pipeline_id=ET_Pipeline_ID,input_name=job_input_name,outputs=job_outputs ,output_key_prefix=s3PreFix_TranscodeRoot) jid = job['Job']['Id'] #Ideally you would leverage the SNS capabilities of ET to signal SWF on completion waitForCompletion(etc,jid)
    • Processing –Thumbnail selection Amazon S3 (RRS) Amazon Mechanical Turk Amazon DynamoDB
    • Thumbnail Selection – Code Snippet def getRequest(s3WebPath_Thumbnails): request_params = {"Title":"Thumbnail Selcection", "Description":"Please choose a thumbnail", "MaxAssignments":"1", "HITLayoutId": MTurk_HITLAYOUTID, "Reward": {"Amount": "0.10","CurrencyCode":"USD"}, "LifetimeInSeconds":"300", "AssignmentDurationInSeconds":"300", "HITLayoutParameter": [ {"Name": "image1","Value": s3WebPath_Thumbnails + "MP400001.png"}, . . . {"Name": "image10","Value": s3WebPath_Thumbnails + "MP400010.png"}, ] } print request_params
    • Thumbnail Selection – Code Snippet def doWork_PROCESS_THUMBNAIL(s3PreFix_Thumbnails): m = mturkcore.MechanicalTurk() mtc = MTurkConnection() s3WebPath_Thumbnails = 'http://{}.s3-website-us-east-1.amazonaws.com/{}' s3WebPath_Thumbnails = s3WebPath_Thumbnails.format(s3Bucket_Thumbs, s3PreFix_Thumbnails) request_params = getRequest(s3WebPath_Thumbnails) hit = m.create_request("CreateHIT", request_params) hid = hit['CreateHITResponse']['HIT']['HITId'] #Wait for an answer answer = getAnswer(mtc,hid) #Get the imagename from the answer answer = answer[5:] answer = answer.zfill(5) imagekey = '{}MP4{}.png' imagekey = imagekey.format(s3WebPath_Thumbnails,answer) return imagekey
    • Processing – Archiving of High-res Videos Amazon S3 Amazon Glacier
    • Archiving – Code Snippet def doWork_PROCESS_ARCHIVE(s3Key_HighRes): #Move the high-res video to a path in S3 configured to archive #to Amazon Glacier with a lifecycle policy s3PathA = 's3://{}/{}' s3PathA = s3PathA.format(s3Bucket_HighRes,s3Key_HighRes) s3PathB = 's3://{}/toArchive/{}' s3PathB = s3PathB.format(s3Bucket_HighRes,s3Key_HighRes) cmd_s = 'aws s3 mv {} {} --region us-east-1' cmd_s = cmd_s.format(s3PathA,s3PathB) execCMD(cmd_s)
    • Media Workflow Amazon Simple Workflow Service (SWF) Ingest Processing Discovery & Delivery Amazon Storage Services Amazon S3 – Standard & RRS, Amazon Glacier
    • Discovery & Delivery CMS Running on Amazon EC2 Amazon S3 (RRS) Amazon CloudFront
    • Automating the Workflow
    • Media Workflow Amazon Simple Workflow Service (SWF) Ingest Processing Discovery & Delivery Amazon Storage Services Amazon S3 – Standard & RRS, Amazon Glacier
    • Amazon Simple Workflow (SWF) • SWF – Maintains distributed application state – Tracks workflow executions – Dispatches tasks (activities & deciders) – Retains history – Provides visibility • Activities tasks – Do the “work” associated with a workflow step • Decider tasks – Determines which activity task should come next • Activities & deciders can run anywhere (on prem, in cloud)
    • Start Decider Logic No NextActivity = ACTIVITIES[len(EventList)] Task Exists? Task = GetDecision Yes No EventList with [‘ActivityTaskCompleted’, ‘WorkflowExecutionStarted’] All Activities Completed? Yes Signal Completion of Execution Is First Activity? Yes NextActivity.Input = Execution Input No NextActivity.Input = PreviosActivity.Result
    • Activity Worker – Code Snippet from mwf_Ingest import * swf_l1 = swf.Layer1() while True: task = swf_l1.poll_for_activity_task(domain['name'], workflow_type['task_list']) if 'taskToken' in task: task_token = task['taskToken'] task_input = json.loads(task['input']) try: if task['activityType']['name'] == activities[0]['name']: remoteIP = task_input['remoteIP'] remoteFileName = task_input['remoteFileName'] s3Key_HighRes = get_rand() + remoteFileName[remoteFileName.rindex('.'):] doWork_INGEST(remoteIP,remoteFileName,s3Key_HighRes) dataToPass = {'s3Key_HighRes' : s3Key_HighRes} task_status_s = json.dumps(dataToPass) out = swf_l1.respond_activity_task_completed(task_token,task_status_s) except: out = swf_l1.respond_activity_task_failed(task_token,'','')
    • Workflow Steps • Start workflow execution • Ingest (transfer file to Amazon EC2 using Tsunami UDP & upload to Amazon S3) • Transcode file (multiple output formats) • Select thumbnail • Archive high-res file • Signal completion of execution
    • Scalability & Fault Tolerance Analysis Step Ingest Transcode Archive to Amazon Glacier Amazon Mechanical Turk for thumbnails Delivery with Amazon CloudFront Automation elements Is Scalable? Is Fault Tolerant?
    • Demo External references: MTurkCore, Boto
    • Netflix’s Transcoding Transformation Tony Koinov, Director Engineering, Netflix
    • Netflix Media in AWS • Matrix : The Netflix media pipeline • MAPLE : New generation media pipeline • Concluding thoughts 33
    • Netflix Media Pipeline Media Processing S3 EC2 FTP S3 EC2 EC2 Open Connect
    • Driving to Hollywood Game 35
    • Rules of the Game • • • • • • 200 MPH! Purchase only Quantities limited It breaks, you fix it Pay for parking Obsolete in 1 year • • • • 85 MPH Lease, cancel anytime Unlimited quantity It breaks, replace it, no charge • No parking, just walk away • Brand new each year
    • Industry Heritage : Optimize for Latency • Interactive editing – Master creation – DVD/Blu-ray authoring – Edits for television 37
    • Netflix 2008 • Custom data center • Custom GPU encoders • Fixed size • New format needed – PC, Mac, Xbox • Content library doubled • Frequent HW failures • Fail! Catalog incomplete 38
    • Fall 2009 – Launch Netflix PS3 Player • First 100% AWS transcode • New format, unique to Netflix PS3 player • Encode recipe nailed down late • 3 weeks, transcode entire catalog
    • Netflix 2009 to Present • • • • US East AWS Variable sized EC2 farm S3 for storage Optimized for throughput, not latency • No more missed deadlines – Devices, catalogs, countries 40
    • Spring 2010 – Launch Netflix iPad Player • Launch April 10th • Apple approached us in mid February • Grew EC2 farm to 4,000 instances • Entire library transcoded in 2 weeks • New format ready for launch 41
    • Netflix Media Pipeline Media Processing S3 EC2 FTP S3 EC2 EC2 Open Connect
    • For Netflix, Throughput Trumps Latency • • • • Think horizontal, not vertical Priuses move more people than Ferraris Frequent re-encodes of growing libraries Netflix is nimble because of AWS 43
    • More Proof That Horizontal Wins • New countries, new content • Codec innovation 44
    • AWS Handles Netflix Scale • 6 regional catalogs • 4 formats supported today – 1 VC-1, 3 H.264 – Multiple bit rates per format • 10s of 1000s of hours of content • Petabytes of S3 storage 45
    • Netflix Media in AWS • Matrix: The Netflix media pipeline • MAPLE: New generation media pipeline • Concluding thoughts
    • New Generation : Address Faults and Latency • More than 1 week 4K transcode • 2 – 3 days for HD transcode • Fault intolerant ~700 Mbps EC2: C1 Medium 10-16 Mbps S3 • Maintenance is challenging • Often too slow – Day after broadcast – Redelivery of damaged content 47
    • MAPLE : Massively Parallel Encoding • 5-minute chunks – Close to real time • Fault tolerant • Easy maintenance • Address low latency use cases – Day after broadcast – Redelivery of damaged content EC2 S3 48
    • Netflix Media in AWS • Matrix : The Netflix media pipeline • MAPLE : New generation media pipeline • Concluding thoughts 49
    • We Would Do It All Over Again • Don’t be fooled by IT cost comparisons – We don’t administer the gear • 6,000 EC2 instances • Petabytes of storage • High network traffic – Storage is durable – It is a moving target • You cannot put a price on nimble 50
    • Please give us your feedback on this presentation MED304 As a thank you, we will select prize winners daily for completed surveys!