Scalable Media Processing
Phil Cluff, British Broadcasting Corporation
David Sayed, Amazon Web Services
November 13, 2013
...
Agenda
•
•
•
•

Media workflows
Where AWS fits
Cloud media processing approaches
BBC iPlayer in the cloud
Media Workflows

Archive

Featurettes

Networks

Interviews
Media
Workflow

2D Movie
3D Movie
Archive
Materials
Stills

Th...
Where AWS Fits Into Media Processing
Analytics and Monetization

Amazon Web Services

Playback

Track

Auth.

Protect

Pac...
Media Processing Approaches

3 Phases
Cloud Media Processing Approaches
Phase 1: Lift
processing from
the premises and
shift to the cloud
Lift and Shift

Media Processing
Operation

OS
Media Processing
Operation

OS

Storage

EC2

Storage
Media Processing
Oper...
The Problem with Lift and Shift
Monolithic Media Processing Operation

OS

EC2

Storage

Ingest

Operation

Postprocessing...
Cloud Media Processing Approaches:
Phase 2

Phase 1: Lift
processing from
the premises and
shift to the cloud

Phase 2: Re...
Refactor and Optimization Opportunities
“Deconstruct monolithic media processing
operations”
–
–
–
–
–
–

Ingest
Atomic me...
Refactoring and Optimization Example
EBS

EC2

EBS

EC2

EBS

API Calls

EC2

Source S3
Bucket

SWF

Output S3
Bucket
Cloud Media Processing Approaches

Phase 1: Lift
processing from
the premises and
shift to the cloud

Phase 2: Refactor
an...
Decomposition and Modularization Ideas
for Media Processing
• Decouple *everything* that is not part of atomic
media proce...
in the Cloud
AKA “Video Factory”

Phil Cluff
Principal Software Engineer & Team Lead
BBC Media Services
Sources:
BBC iPlayer Performance Pack August 2013
http://www.bbc.co.uk/blogs/internet/posts/Video-Factory

• The UK’s bigg...
Video
“Where Next?”
What Is Video Factory?
• Complete in-house rebuild of ingest, transcode,
and delivery workflows for BBC iPlayer
• Scalable...
And here they are!
Why Did We Build Video Factory?
• Old system
–
–
–
–

Monolithic
Slow
Couldn’t cope with spikes
Mixed ownership with third...
Why Use the Cloud?
• Background of 6 channels, spikes up to 24 channels, 6 days a week
• A perfect pattern for an elastic ...
Video Factory – Architecture
• Entirely message driven
– Amazon Simple Queuing Service (SQS)
• Some Amazon Simple Notifica...
Video Factory – Workflow
SDI Broadcast
Video Feed

Amazon Elastic
Transcoder

x 24

Broadcast
Encoder

SMPTE
Timecode

RTP...
Detail
• Mezzanine video capture
• Transcode abstraction
• Eventing demonstration
Mezzanine Video Capture
Mezzanine Capture
SDI Broadcast
Video Feed
x 24
3 GB HD/1 GB SD

SMPTE
Timecode

Broadcast Grade Encoder

MPEG2 Transport ...
Concatenating Chunks
• Build file using Amazon S3 multipart requests
– 10 GB Mezzanine file constructed in under 10 second...
By Numbers – Mezzanine Capture
• 24 channels
– 6 HD, 18 SD
– 16 TB of Mezzanine data every day per capture

• 200,000 chun...
Transcode Abstraction
Transcode Abstraction
• Abstract away from single supplier
–
–
–

Avoid vendor lock in
Choose suppliers based on performan...
Transcode Abstraction
Subtitle
Extraction
Backend

Transcode
Request
SQS

Transcode
Router

SQS

Amazon Elastic
Transcoder...
Transcode Abstraction - Future
Subtitle
Extraction
Backend

Transcode
Request
SQS

Transcode
Router

SQS

Amazon Elastic
T...
Example – A Simple Elastic Transcoder Backend
Amazon Elastic
Transcoder

XML
Transcode
Request

Get Message
from Queue

PO...
Example – Add Error Handling
Amazon Elastic
Transcoder

XML
Transcode
Request

Get Message
from Queue

Dead Letter
Queue

...
Example – Add Monitoring Eventing
Amazon Elastic
Transcoder

XML
Transcode
Request

POST

Get Message
from Queue

Unmarsha...
BBC eventing framework
• Key-value pairs pushed into Splunk
– Business-level events, e.g.:
• Message consumed
• Transcode ...
Component Development – General Development &
Architecture
•

Java applications
–
–
–

•

Run inside Apache Tomcat on m1.s...
Error Handling Messaging Patterns
• We use several message patterns
– Bad message queue
– Dead letter queue
– Fail queue

...
Message Patterns – Bad Message Queue
The message doesn’t unmarshal to the object it should
OR

We could unmarshal the obje...
Message Patterns – Dead Letter Queue
We tried processing the message a number of times, and
something we weren’t expecting...
Message Patterns – Fail Queue
Something I knew could go wrong went wrong

•
•
•
•

Wrapped in a message wrapper that conta...
Demonstration – Eventing Framework
Questions?

philip.cluff@bbc.co.uk
dsayed@amazon.com

@GeneticGenesis
@dsayed
Please give us your feedback on this
presentation

MED302
As a thank you, we will select prize
winners daily for completed...
Scalable Media Processing in the Cloud (MED302) | AWS re:Invent 2013
Scalable Media Processing in the Cloud (MED302) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

Scalable Media Processing in the Cloud (MED302) | AWS re:Invent 2013

2,191

Published on

The cloud empowers you to process media at scale in ways that were previously not possible, enabling you to make business decisions that are no longer constrained by infrastructure availability. Hear about best practices to architect scalable, highly available, high-performance workflows for digital media processing. In addition, this session covers AWS and partner solutions for transcoding, content encryption (watermarking and DRM), QC, and other processing topics.

Published in: Technology, Business

Scalable Media Processing in the Cloud (MED302) | AWS re:Invent 2013

  1. 1. Scalable Media Processing Phil Cluff, British Broadcasting Corporation David Sayed, Amazon Web Services November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Agenda • • • • Media workflows Where AWS fits Cloud media processing approaches BBC iPlayer in the cloud
  3. 3. Media Workflows Archive Featurettes Networks Interviews Media Workflow 2D Movie 3D Movie Archive Materials Stills Theatrical DVD/BD Media Workflow Media Workflow Online MSOs Mobile Apps
  4. 4. Where AWS Fits Into Media Processing Analytics and Monetization Amazon Web Services Playback Track Auth. Protect Package QC Process Index Ingest Media Asset Management
  5. 5. Media Processing Approaches 3 Phases
  6. 6. Cloud Media Processing Approaches Phase 1: Lift processing from the premises and shift to the cloud
  7. 7. Lift and Shift Media Processing Operation OS Media Processing Operation OS Storage EC2 Storage Media Processing Operation OS EC2 Storage
  8. 8. The Problem with Lift and Shift Monolithic Media Processing Operation OS EC2 Storage Ingest Operation Postprocessing Export Workflow Media Processing Operation Parameters
  9. 9. Cloud Media Processing Approaches: Phase 2 Phase 1: Lift processing from the premises and shift to the cloud Phase 2: Refactor and optimize to leverage cloud resources
  10. 10. Refactor and Optimization Opportunities “Deconstruct monolithic media processing operations” – – – – – – Ingest Atomic media processing operation Post-processing Export Workflow Parameters
  11. 11. Refactoring and Optimization Example EBS EC2 EBS EC2 EBS API Calls EC2 Source S3 Bucket SWF Output S3 Bucket
  12. 12. Cloud Media Processing Approaches Phase 1: Lift processing from the premises and shift to the cloud Phase 2: Refactor and optimize to leverage cloud resources Phase 3: Decomposed, modular cloudnative architecture
  13. 13. Decomposition and Modularization Ideas for Media Processing • Decouple *everything* that is not part of atomic media processing operation • Use managed services where possible for workflow, queues, databases, etc. • Manage – – – – Capacity Redundancy Latency Security
  14. 14. in the Cloud AKA “Video Factory” Phil Cluff Principal Software Engineer & Team Lead BBC Media Services
  15. 15. Sources: BBC iPlayer Performance Pack August 2013 http://www.bbc.co.uk/blogs/internet/posts/Video-Factory • The UK’s biggest video & audio on-demand service – And it’s free! • Over 7 million requests every day – ~2% of overall consumption of BBC output • Over 500 unique hours of content every week – Available immediately after broadcast, for at least 7 days • Available on over 1000 devices including – PC, iOS, Android, Windows Phone, Smart TVs, Cable Boxes… • Both streaming and download (iOS, Android, PC) • 20 million app downloads to date
  16. 16. Video “Where Next?”
  17. 17. What Is Video Factory? • Complete in-house rebuild of ingest, transcode, and delivery workflows for BBC iPlayer • Scalable, message-driven cloud-based architecture • The result of 1 year of development by ~18 engineers
  18. 18. And here they are!
  19. 19. Why Did We Build Video Factory? • Old system – – – – Monolithic Slow Couldn’t cope with spikes Mixed ownership with third party • Video Factory – Highly scalable, reliable – Completely elastic transcode resource – Complete ownership
  20. 20. Why Use the Cloud? • Background of 6 channels, spikes up to 24 channels, 6 days a week • A perfect pattern for an elastic architecture Off-Air Transcode Requests for 1 week
  21. 21. Video Factory – Architecture • Entirely message driven – Amazon Simple Queuing Service (SQS) • Some Amazon Simple Notification Service (SNS) – We use lots of classic message patterns • ~20 small components – Singular responsibility – “Do one thing, and do it well” • Share libraries if components do things that are alike • Control bloat – Components have contracts of behavior • Easy to test
  22. 22. Video Factory – Workflow SDI Broadcast Video Feed Amazon Elastic Transcoder x 24 Broadcast Encoder SMPTE Timecode RTP Chunker Playout Video Amazon S3 Mezzanine Time Addressable Media Store Mezzanine Video Capture Mezzanine Elemental Cloud Live Ingest Logic Transcoded Video Metadata Playout Data Feed Transcode Abstraction Layer DRM QC Editorial Clipping MAM Amazon S3 Distribution Renditions
  23. 23. Detail • Mezzanine video capture • Transcode abstraction • Eventing demonstration
  24. 24. Mezzanine Video Capture
  25. 25. Mezzanine Capture SDI Broadcast Video Feed x 24 3 GB HD/1 GB SD SMPTE Timecode Broadcast Grade Encoder MPEG2 Transport Stream (H.264) on RTP Multicast 30 MB HD/10 MB SD RTP Chunker MPEG2 Transport Stream (H.264) Chunks Chunk Concatenator Chunk Uploader Amazon S3 Mezzanine Chunks Control Messages Amazon S3 Mezzanine
  26. 26. Concatenating Chunks • Build file using Amazon S3 multipart requests – 10 GB Mezzanine file constructed in under 10 seconds • Amazon S3 multipart APIs are very helpful – Component only makes REST API calls • Small instances; still gives very high performance • Be careful – Amazon S3 isn’t immediately consistent when dealing with multipart built files – Mitigated with rollback logic in message-based applications
  27. 27. By Numbers – Mezzanine Capture • 24 channels – 6 HD, 18 SD – 16 TB of Mezzanine data every day per capture • 200,000 chunks every day – And Amazon S3 has never lost one – That’s ~2 (UK) billion RTP packets every day… per capture • Broadcast grade resiliency – Several data centers / 2 copies each
  28. 28. Transcode Abstraction
  29. 29. Transcode Abstraction • Abstract away from single supplier – – – Avoid vendor lock in Choose suppliers based on performance and quality and broadcaster-friendly feature sets BBC: Elemental Cloud (GPU), Amazon Elastic Transcoder, in-house for subtitles • Smart routing & smart bundling – – Save money on non–time critical transcode Save time & money by bundling together “like” outputs • Hybrid cloud friendly – Route a baseline of transcode to local encoders, and spike to cloud • Who has the next game changer?
  30. 30. Transcode Abstraction Subtitle Extraction Backend Transcode Request SQS Transcode Router SQS Amazon Elastic Transcoder Backend Amazon Elastic Transcoder REST Elemental Backend Elemental Cloud Amazon S3 Mezzanine Amazon S3 Distribution Renditions
  31. 31. Transcode Abstraction - Future Subtitle Extraction Backend Transcode Request SQS Transcode Router SQS Amazon Elastic Transcoder Backend Amazon Elastic Transcoder REST Elemental Backend Elemental Cloud Unknown Future Backend X ? Amazon S3 Mezzanine Amazon S3 Distribution Renditions
  32. 32. Example – A Simple Elastic Transcoder Backend Amazon Elastic Transcoder XML Transcode Request Get Message from Queue POST Unmarshal and Validate Message Initialize Transcode SQS Message Transaction POST (Via SNS) XML Transcode Status Message Wait for SNS Callback over HTTP
  33. 33. Example – Add Error Handling Amazon Elastic Transcoder XML Transcode Request Get Message from Queue Dead Letter Queue POST Unmarshal and Validate Message Initialize Transcode Bad Message Queue SQS Message Transaction POST (Via SNS) XML Transcode Status Message Wait for SNS Callback over HTTP Fail Queue
  34. 34. Example – Add Monitoring Eventing Amazon Elastic Transcoder XML Transcode Request POST Get Message from Queue Unmarshal and Validate Message Monitoring Events Monitoring Events Dead Letter Queue Initialize Transcode Monitoring Events Bad Message Queue SQS Message Transaction POST (Via SNS) XML Transcode Status Message Wait for SNS Callback over HTTP Monitoring Events Fail Queue
  35. 35. BBC eventing framework • Key-value pairs pushed into Splunk – Business-level events, e.g.: • Message consumed • Transcode started – System-level events, e.g.: • HTTP call returned status 404 • Application’s heap size • Unhandled exception • Fixed model for “context” data – Identifiable workflows, grouping of events; transactions – Saves us a LOT of time diagnosing failures
  36. 36. Component Development – General Development & Architecture • Java applications – – – • Run inside Apache Tomcat on m1.small EC2 instances Run at least 3 of everything Autoscale on queue depth Built on top of the Apache Camel framework – – – A platform for build message-driven applications Reliable, well-tested SQS backend Camel route builders Java DSL • Full of messaging patterns • Developed with Behavior-Driven Development (BDD) & Test-Driven Development (TDD) – • Cucumber Deployed continuously – Many times a day, 5 days a week
  37. 37. Error Handling Messaging Patterns • We use several message patterns – Bad message queue – Dead letter queue – Fail queue • Key concept – Never lose a message – Message is either in-flight, done, or in an error queue somewhere • All require human intervention for the workflow to continue – Not necessarily a bad thing
  38. 38. Message Patterns – Bad Message Queue The message doesn’t unmarshal to the object it should OR We could unmarshal the object, but it doesn’t meet our validation rules • • • • Wrapped in a message wrapper which contains context Never retried Very rare in production systems Implemented as an exception handler on the route builder
  39. 39. Message Patterns – Dead Letter Queue We tried processing the message a number of times, and something we weren’t expecting went wrong each time • • • • Message is an exact copy of the input message Retried several times before being put on the DLQ Can be common, even in production systems Implemented as a bean in the route builder for SQS
  40. 40. Message Patterns – Fail Queue Something I knew could go wrong went wrong • • • • Wrapped in a message wrapper that contains context Requires some level of knowledge of the system to be retried Often evolve from understanding the causes of DLQ’d messages Implemented as an exception handler on the route builder
  41. 41. Demonstration – Eventing Framework
  42. 42. Questions? philip.cluff@bbc.co.uk dsayed@amazon.com @GeneticGenesis @dsayed
  43. 43. Please give us your feedback on this presentation MED302 As a thank you, we will select prize winners daily for completed surveys!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×