• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
AWS Sydney Summit 2013 - Scalable Media Processing on the Cloud
 

AWS Sydney Summit 2013 - Scalable Media Processing on the Cloud

on

  • 1,311 views

Session 1, Presentation 5 from the AWS Sydney Summit

Session 1, Presentation 5 from the AWS Sydney Summit

Statistics

Views

Total Views
1,311
Views on SlideShare
1,311
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good afternoon everyone and welcome to the AWS Summit 2013.My name is Daniel Hand, I am a principal consultant, formerly a Solutions Architect, with AWSDuring the next 40 minutes I will take you through a journey of scalable media processing and delivery in the Cloud.So, let’s get started.
  • We will start by looking at how media and its consumption has evolved, from the humble beginnings of analogue, through to the latest digital formats.We will then consider how the flexibility and scale of the cloud can help in the storage, processing and delivery of mediaWe will conclude with a closer look at media workflows in the cloud and a real-life example of how Viocorp have developed a media publishing, management and analysis platform on AWS.
  • There are three significant factors to consider when we review how media has evolvedThe first is the changing format of not only the encoded content but also the medium on which it is stored.One of the earliest forms of media and storage was drawing or inscribing pictures, glyphs or text onto stone, wood or similar WRITE once mediums. While modern day storage may not be as beautiful as an ancient stone carving, its certainly more convenient than carrying around a collection of rocks.The advent of vinyl to store audio and photographic film or plates, especially ones created by automated processes allowed the media to be consumed by greater numbers of users.Greater resolution in the recording process, brought with it larger amounts of data being stored within the actual medium.The start of the digital media revolution that we see today can be traced back to some of the early digital formats such as BetaMax and VHS, those tape based video recorders for the home.As we move through the formats of VCD, DVD, 720p, 1080p, 2k and 4k, apart from the obvious increasing demands in the required amount of storage, we see a growing global population with access to the digital content being created.This creates some interesting and growing challenges as the size of data grows exponentially:How we ingest the dataHow we store the dataHow we Process the dataHow we Deliver the data
  • But as we saw earlier, the latest high quality media formats are resulting in increasing storage and processing challenges.As the average density of digital media is increasing, then so are the total storage requirements. Futhermore, as the number of device consuming media increases, we often need to deliver the same content in different formats and resolutions.
  • Increasingly media is consumed in a digital format over the Internet.Can I get a quick show of hands, who here hired a physical DVD from a rental shop in the last 12 months?To put it into content, streaming of online video and music in NA accounts for 65% of all downstream traffic during peak hours, with Netflix alone accounting for half of that trafficrefhttp://www.sandvine.com/news/pr_detail.asp?ID=394
  • Let’s consider the media processing workflow. Arguably, this is becoming more complex. Workflows often consist of multiple steps, different input formats that need to be combined, the output may be in a number of formats some of which may need DRM. Oh and can I have the result yesterday and at low cost. The more complex the process, the greater then challenge in managing the state of all jobs in the process flow, ensuring coordination between those entities deciding what to do next “The Deciders” and those entities carrying out work “The Workers”
  • In the case of software, organisations often want to protect its distribution to prevent piracy or illegal installation, so protecting not only the content format but distribution channel is important too.
  • Users need tools that scale for an increasing media storage, processing and delivery challenges.
  • So how can Amazon Web Service help?Amazon Web Services provides a rich collection of Web Services for users facing these continually increasing challenges:We take care of the undifferentiated heavy lift, the things that don’t make your product or service different to your competitors. This allows you to focus your scare resources on things that really matter. We are experienced operating at massive scale – this gives you confidence that as your business grows, we will grow with you and and provide reliable, scalable, secure platform.The platform is Pay as You Go and Pay for What you use. While we have a different payment models to allows customers to enjoy even greater savings, you can, if you so choose consume resources with no upfront commitent.Each of the AWS services is designed with security as our number one concern. #######Resources are elastic and highly available. If you needs increase then the platform can grow rapidly with you, removing the need for unnecessary delays while resources are acquired.
  • Let’s start to look to break the challenges of media processing down into their constituent parts, We’ll start by taking a look at how to store data on AWS.The Simple Storage Service (S3) provides fully redundant data storage for the Internet. It provides a simple web service interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. Placing your data into S3 is the first step in providing access to the massive scalable computing resources of the Elastic Compute Cloud (EC2)Not only are compute resources available on a per hour basis, but a growing eco-system of ISV allow for the consumption of sofware by the hour via the AWS Market place *********For those ISVs who don’t have applications available onAWS Market place, then customers simply need to install them on top of EC2 as they would with on premise infrastructure.The combined scalabiliity of Storage AND Compute provides an ideal foundation for a highly scalable Media Processing and Delivery Platform
  • As we saw earlier, the Simple Storage Service (S3) is an ideal location to stage your media data when you are actively using it, but what if you rarely access the data but still require all the benefits of S3 such as high Durability, the security, its scale?Amazon Glacier was created as a cold storage solution to address these specific requirements. Storage costs on Amazon Glacier start from as little as 1c per Gigabyte per month. So it’s a great option for data archiving. Furthermore To simplify the process of moving data into glacier from S3, we allow you to configure object lifecycle policies, be that define after what period an object should migrated from S3 to Glacier.#### do a little more research here ######
  • Another storage service provides by AWS is the AWS Storage gateway. This was introduced to help customers securely store data onto the AWS platform. It’s used in two main configurations:You can store your primary data on S3 and retain frequently accessed data locally, that is have a local store of cached volumes.Or Option #2, use the storage gateway to perform regular snap-shots of your local datastore into S3, thereby providing a durable off-site backup solution.So th AWS Storage Gateway actually addresses two areas, the first being data storage and the second, one of data ingestion. Let’s take a closer look at date ingestion.#### Do some research into V LTO####
  • AWS has a number of services designed to help customers inject data onto the platform, oncluding S3 mult, Direct conecti, SG and import/export. We also have a number of ISVs that provide software to simplify the process. We’ll take a closer look at how each of these helps solve the problem of data ingestion over the next few slides.Customers requiring to ingest a truly large amounts of data, the AWS import/Export service is a good choice (currently not available in Sydney – 3 US, EU and Singapore). As Andrew Tanenbaum once said “Never under estimate the bandwidth of a station wagon full of tapes hurtling down the highway”. Of course we don’t use station wagons, but I think you understand what I mean. The decision to choose Import/Export will depend on a few factors, but is typically a factor of volume and how quickly you need to ingest the data.
  • Back in November 2012, we introduced a new feature to the Simple Storage Platform called “Multipart Upload” to allow faster, more reliable uploads into S3. It allows you upload a single object, say and video media file, as a collection of parts. After all of the parts have been uploaded, they are reassembled by S3. This feature allows for parallel uploads of data, the ability topause and resume and upload and to begin uploading an object before you know the full size of the object.The parallelisatino of an architectural process is critical for scale and in this case allows us to mamise the scalability features of S3.
  • When moving data between locations, latency affects the overall efficiency or effective bandwidth of of communication when using TCP as it requires acknowledgement between sender and receiver. While there are ways to improve the effective bandwidth such as changing the TCP window size (effectively increasing the amount of data that can be in transit without acknowledgement by the recipient), there are limitations. To address the limitations of transmitting data via TCP, UDP based Fast And Secure Protocol (FASP) such as that used by Aspera. Aspera allows customers to optimise the transmission of data source to S3 buy having an FASP termination point running on EC2. As the latency from EC2 to S3 within the same region is small, they allow for the end-to-end efficient transmission of data over great distances and latencies.
  • When data transfer throughput rates exceed that of a single EC2 instance, We can scale-out with multiple Aspera instances and upload files in parallel to S3.
  • Now that we have cvoered the areas of storagte and ingestion, lets take a look at what’s required for the processing of media.Unsurprisingly, we will need some compute resources and for this we will be using the Elastic Compute Service EC2.EC2 provides access to:Network & IOComputeMemeoryLocal and network storageToday, AWS provides a number of EC2 instance types ranging from our t1.micro instance with a small memory and compute footprint, right through to powerful cluster compute instances such as cc2.8xlarge
  • Each running EC2 instance is created from an Amazon Machine Image (AMI), think of it like a blueprint from which you can create identical running virtual machinesWhen launching an instance, you choose which EC2 instance typeis best suited to the job in hand based on your compute, memory and network requirement, know that if you initial is incorrect, you can easily change it with minimal disruption.
  • Of course, any large scale media processing solution will require an large number of EC2 resources. The good thing here is that we can launch and manage multiple instances as simply as one.
  • As I mentionedearilier in the presentation, AWS allows you to PAYG and what only what you use, however, customers who are consuming resources on a steady state basis or have a process flow that can advantage of additional EC2 capacity, can benefit from what’s calledReserved Instance and Spot Instances repsectively. Let’s take a closer look at how a typically transcoding job may consume both.In the diagram, on the right hand side have have a Transcode Queue. This Queue, can be implemented using the Simple Queue Service (SQS). As it’s name suggests, this queue contains as number of processing jobs. A typical job make consist of the following attributes:Where the source media file resides (e.g. a URL to an S3 object)The input meida formatThe required output media formatThe required output media bitrateThe required output locationSteady State Instances can request a job off the queue, process it (transcode the media) then remove the job to ensure that it is not processed twice. If an instance fails during the transcoding stage then after a preconfigured amount of the time the job reappears in the queue to be processed by another transcoder.But what happened during period of high demand, for example, image that a large number of new media has just been ingested and there’s a race on to get it transcoded and ready for consumption, what can we do?In this case, the EC2 spot market provides a cost effetively means to transcoding jobs. You simply bid on spare EC2 isntances and run them whether your bid exceeds the current spot price, which varies in realtime based on demand and is often significantly less than the on-demand price. However, it is worth noting that resoruces on the spot market can be taken away from you with zero notice so are not well suited to long running jobs.
  • You can even make use of Amazon CloudWatch and Autoscaling to control the size of your spot resources in response to the number of jobs in the transcoder queue. This is an excellent way to develop not only a highly efficient transcoding platform but once that’s elastic and dynamic.As you will hear during our architecture sessions today, decoupling is an very effective way to allow architctural components to scale independently and here we have make effective use of that design principal but decoupling Job creationand Job execution all in a highly efficient manner.
  • Let’s take a look a real life example of how the scalability of the AWS platform allows an organisation to perform what would otherwise require a significant CAPEX and provisioning for peak load.Netflix provides a movie streaming service to their customes over the Internet.When a new movie(s) become available, time to market is important so they may go from running a steady state number of transcoders to many thousand transcoders. Remember, it’s not just a matter of transcoding media to a single format, they will need to do this for a number of devices and formats.Once the work is complete, say via spot instances, the excess capacity is released and consumption goes back to the steady state.
  • As demonstrated by the transcoding example of NetFlix, they effectively were able to increase their capacity requirements in real-time in response to a change in demand. This allows for not only a significant saving in OPEX (represented by the area between the thick gray line and the Thick Red line, but also a dynamic platform that can cater for unexpected demands that are difficult to plan for.
  • As we saw earlier, Spot instances and On Demand or Reserved instance with a Transcode Queue is an effective way to create a scalable, but simple process flow. Let’s take a look at how things possibly change as the process flow becomes more complex.
  • In addition to transcoders “One type of worker” we now have a Slicer that splits up the source content, this adds an extra step to the process. While we can certainly use multiple queues to store jobs and now to develop some workflow logic, typically inside the application tier. This creates a few problems, such as consitency between appicaiotn servers and how to ensure that state is highly durable.How can we solve this problem?
  • The solution to this problem is a combination of the Simple Notficaion (SNS) a means to public messages to a subscribers list SQS that we used earlierSWF a Simpled Workflow Service – a means to implement decision logic and worker tasks for complex workflows
  • So let’s take a look at what a real workflow may look likeStarting on the right hand side of the diagram. Content is ingested into the system via AWS Import/Export, over the internet or via FASP suing say Aspera. This content and job definitoned are commonly controlled by a slicer/CMS combination.Transcoding jobs are created by users via a web applicatoin, this job is then the starting poing for a previsouly defined process process contained with Amazon SWF.In this example, step one of the process may be to slice up the input media or extract the required segmentStep 2 is to transcode the segment of media as depicted by the Transcode WorkersStep 3 is to apply Digigtal Right MangagementStep 4 is to push the content to a suitable Streaming service such as Amazon CloudFrontAt each step of the process, the next steps is decided by a Decider and the state history of a job is persistently held within SWF.
  • Let’s take a closer look at this Media Workflow architeture and focus in on the Transcoder and Slicer/CMS components.While the AWS platform provides all the compoents required to perform these two duties, as a maintainer of those components, you need to develop and maintain the combined CMS/Slicer, not to mention frequent changes to the transcoder nodes as new codes are introduced. Even if media transcoding is part of your core business, can you oursource some of this undifferentiated heavy lifting?
  • The answer is yes you can. The introduction of Amazon Elastic Transcoder effectively allows the transcoding and job management to be simplified by performing the combined activities all as a managed service. Allowing you to differentiate yourself with your competitiors by focussing on what make you DIFFERENT.
  • #########
  • The last element of a scalable media transcoding and delivery platform that we will consider today is that of Delivery.After processing, addition of DRM, in the case of video, we generally want to stream the content to the end consumer. Amazon CloudFront allows customers to stream via HTTP/HTTPS or RTMP using an S3 bucket as an origin. Like the other Amazon Web Services that I have introduced during today’s talk, there are no upfront commitments or contracts to using the service. Within a matter of minutes you can create and have a live streaming distribution that’s highly performant and cost effective.
  • You may be thinking, that’s all very well, but what if my transcoding platform currently is on-premise, does that mean that I can’t use AWS for distriibution?Not at all. CloudFront can be used with either S3, EC2 or an on-premise Webserver acting as the origin.
  • Today, AWS has 39 Cloud Front Edge Locations serving a global footprint of customers.
  • No presentation would be complete without some reference to big data, so what’s big data got to do with media transcoding and delivery? As it happens, quite a bit.There is a wealth of information contained within access logs that with suitable processing can unveil a hosts of user behaviours. Considerations such as viewing preferences (types of movies, viewing times, consumptions rates) can be used to provide a better customer experience, targeted marketing via recommendations, as well as predict trends and usage patterns.AWS Provides a number of services that can help find value in this data, from S3 as the hightly scalable durable storage location for logs, to Elastic Map Reduce (EMR) a managed Hadoop framework to perform the distributed processing of analysis to Amazon Red Shift, out data warehouse solution, allowing for the effective analysis of column orientated data.
  • Finally, we can orchestrate the deployment and configuration of all the services discussed so far with AWS CloudFormation a text based template system allowing you to reliably and consistently deploy your media transcoding and distribution system in one or more regions around the world all from the AWS management console or via an API call for even greater efficiency. And as a CloudFormation template is a simple text file, you can use a version control system of your choice to manage your virtual infrastructure in the same way you manage your software development. To recap, you can deploy a scalable, elastic, and secure media encoding and delivery platform on the AWS platform as simply as checking out and compiling some code from version control.
  • Good afternoon everyone and welcome to the AWS Summit 2013.My name is Daniel Hand, I am a principal consultant, formerly a Solutions Architect, with AWSDuring the next 40 minutes I will take you through a journey of scalable media processing and delivery in the Cloud.So, let’s get started.

AWS Sydney Summit 2013 - Scalable Media Processing on the Cloud AWS Sydney Summit 2013 - Scalable Media Processing on the Cloud Presentation Transcript