• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013
 

Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013

on

  • 2,486 views

With the breadth of AWS services available that are relevant to digital media, organizations can readily build out complete content/asset management (DAM/MAM/CMS) solutions in the cloud. This session ...

With the breadth of AWS services available that are relevant to digital media, organizations can readily build out complete content/asset management (DAM/MAM/CMS) solutions in the cloud. This session provides a detailed walkthrough for implementing a scalable, rich-media asset management platform capable of supporting a variety of industry use cases. The session includes code-level walkthrough, AWS architecture strategies, and integration best practices for content storage, metadata processing, discovery, and overall library management functionality—with particular focus on the use of Amazon S3, Amazon Elastic Transcoder, Amazon DynamoDB and Amazon CloudSearch. Customer case study will highlight successful usage of Amazon CloudSearch by PBS to enable rich discovery of programming content across the breadth of their network catalog.

Statistics

Views

Total Views
2,486
Views on SlideShare
2,435
Embed Views
51

Actions

Likes
3
Downloads
57
Comments
0

1 Embed 51

http://www.scoop.it 51

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013 Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013 Presentation Transcript

    • MED402: Building a Scalable Video / Digital Asset Management (DAM) Platform in the Cloud Michael Limcaco – Enterprise Solutions Architect (AWS) Jonathan Rivers – Director, Technical Operations (PBS) November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
    • Agenda • • • • • The big picture Architecture Build-out exercise Customer case study (PBS) Observations and summary
    • Big Picture: Enterprise Media Architecture Integrated Workflow RTMP MPEG-TS Live Stream Media Files Content Management, Discovery & Delivery Physical Media Transcoders Camera HD-SDI Store output profile and file Store output profile and file
    • Big Picture: Digital Asset Management (DAM) Integrated Workflow RTMP MPEG-TS Live Stream Media Files Content Management, Discovery, & Delivery Physical Media DAM Transcoders Camera HD-SDI Store output profile and file Store output profile and file
    • Workflow Management Ingest Processing Storage Discovery & Delivery
    • Workflow Management Ingest Processing Storage Discovery & Delivery
    • Key DAM Requirements • • • • • • • Ingest Metadata extraction Create renditions Build the catalog Enable rich search Manage storage lifecycle Provide efficient delivery of media assets
    • Key DAM Requirements • • • • • • • Ingest Metadata extraction Create renditions Build the catalog Enable rich search Manage storage lifecycle Provide efficient delivery of media assets
    • Key DAM Requirements • • • • • • • Ingest Metadata extraction Create renditions Build the catalog Enable rich search Manage storage lifecycle Provide efficient delivery of media assets
    • Why Scalable? • Increasing volume, variety, velocity – Collectors, cameras, sensors and sources • Ex: UGC, raw source, Mezzanine, B-roll, creative collateral • Final content – Formats and standards • Transport, containers, codecs, metadata • SD, HD, 4K …. 8K – Devices and user expectations • Opportunities through cloud enablement – Media platform as a service – Multitenancy
    • What about Search? Ugh … • Core elements – Project, keyword, asset name, tags, date/time capture, timecode range, subject, format, size • Extended structured search – Dublin core, XMP, MPEG-7, IPTC, EXIF, FCXML, SMPTE, MISB • Unstructured search – Comments, notes, transcript, closed captioning
    • Enough Theory … Let’s Build a DAM in the Cloud!
    • (Demo) The User Experience (Notional Reference Client)
    • Architecture
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Web Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Web Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Web Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Web Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • Tools Available to Us Need Description AWS Service Ingest Integrate w / existing file-based workflows Amazon S3 Metadata Process inline and sidecar files EC2 / Elastic Beanstalk Renditions Autogenerate thumbnails and proxies Amazon Elastic Transcoder Catalog part 1 Administrative entities, simple retrieval Amazon DynamoDB Catalog part 2 Field and free-form search Amazon CloudSearch Storage Nearline, online, offline infinite storage Amazon S3, Amazon Glacier Delivery Global caching and streaming footprint Amazon CloudFront
    • Catalog: A word on why DynamoDB Container-A Header NoSQL Data Model Layer-2 Core Elem1 Core Elem2 Elem from A Name_A Size Some_Field Name_B Size Name_C Layer-1 Size Container-B Header Container-C Layer-1 Header Layer-2 Elem from B Some_Field
    • Catalog: A Word on Why CloudSearch • Video and text – Header fields with textual descriptions, synopsis, comments – Tracks with speech to text, closed caption data – Links to scripts • Video and structured elements – XMP dynamic media – Sidecar files • A managed search engine dedicated to these kinds of problems – Case folding, stemming, stopword removal, synonyms – Also accent normalization, UTF-8 normalization, etc.
    • Other Goodies • Back-end services – AWS CLI – Open source decode utilities • EXIFtool • MediaInfo – ETL support • Talend (representative) • Front-end services – Node.js + AWS Node SDK
    • Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG Amazon SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service Amazon SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • (Dual Screen) Walkthrough
    • Setup • Amazon Simple Storage Service (S3) buckets ready to go – External staging locations – Internal working locations • Amazon Simple Notification Service (SNS) + Amazon Simple Queue Service (SQS) wired up • Catalog data models established – Amazon DynamoDB table “catalog” created – Amazon CloudSearch search domain “catalog” created
    • 1. Ingest, Crawl, Notify a. b. c. d. End user initiates data copy EC2 worker scans Amazon S3 staging bucket EC2 worker copies or moves content EC2 worker broadcasts “NEW DATA” event
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • 1. Ingest, Crawl, Notify a. b. c. d. End user initiates data copy EC2 worker scans Amazon S3 staging bucket EC2 worker copies or moves content EC2 worker broadcasts “NEW DATA” event (SNS)
    • 2. Metadata Extraction a. EC2 worker polls inbox (SQS) b. EC2 worker pulls down media asset from Amazon S3 c. EC2 worker parses media files d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion e. EC2 worker inserts into catalog (Amazon DynamoDB)
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • 2. Metadata Extraction a. EC2 worker polls inbox (SQS) b. EC2 worker pulls down media asset from Amazon S3 c. EC2 worker parses media files d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion e. EC2 worker inserts into catalog (Amazon DynamoDB)
    • Preparing for Amazon DynamoDB Insert { "COMPLETE_NAME" : { "S" : "01_01_SoccerF_05_A.mp4" }, "FORMAT" : { "S" : "MPEG-4" }, "CODEC_ID" : { "S" : "mp42" } }
    • Model It and Deploy to EC2! (Talend)
    • 3. Catalog Processing a. Store metadata record in Amazon DynamoDB b. Reflect searchable subset to Amazon CloudSearch c. Go crazy (HTTP GET)
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch
    • CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB 1 Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG 2 Metadata Workers Amazon CloudSearch
    • Querying the Catalog (Amazon CloudSearch) • http://cloudsearch.demo.aws.com/2011-0201/search?bq=complete_name : …<field=value> • In Node.js var optionsget host : port : path : = { 'cloudsearch.demo.aws.com', // here only the domain name 80, '/2011-02-01/search?bq=complete_name:'-STRAWBERRY'& return-fields=complete_name,text_relevance,codec_id_info, duration,file_size, duration,encoded_date', method : 'GET' }
    • Customer Case Study (PBS)
    • Merlin: PBS CMS/DAM • Code name Merlin • Structured metadata • 200+ web object records daily – 29,046 web objects • 150+ Video objects daily – 91,436 videos • Users from over 150 stations 30 national producers – Frontline – Downton Abbey – PBS Newshour
    • What’s It Do? • Large multitenant system – 1200 registered users • 250 million streams per month • 20 million unique viewers • 8 PB of video delivered monthly
    • Getting Data In • 33 ingestible web feeds – Content editors – Web page listings • Batch video ingest API – Video content editors – External workflow integration • Manually entered videos – Video content editors from all 50 states – Number of user accounts
    • System Overview User Input Ingest API Amazon CloudSearch Search Util DAM (Merlin) Workflow Service Content API Amazon SWF RSS Amazon RDS Amazon S3 CDN Amazon RDS
    • Basic Workflow • Object registered with Merlin • Images registered and processed with ITS – Stored in CDN fronted Amazon S3 bucket • Videos registered with VTS – Jobs sent to Zencoder for processing – Video stored in CDN fronted Amazon S3 bucket • Objects ready for clients – Objects rendered for consumption in Amazon S3 – Objects registered with APIs – Objects discoverable
    • Making It Discoverable • Search util service • Runs every hour – Re-indexes last several hours each time • Polls APIs – Content API – Modified time • Updates Amazon CloudSearch index – 2 primary indexes
    • Search Considerations • Hidden objects • Rights management • Partitioned search – Local station search – Results by geo – Restrict results for international customers • Unify and normalize existing APIs – Flatten data model • Users looking for programs – Specific searches – Suitable for structured data
    • Challenges • No native time field – Convert dates to integers – Epoch time • Versioning of documents – Epoch for versioning • Exposing two versions of most fields – Text searchable – Facets (copy of text version)
    • Search Consumers (PBS.org) Site Search
    • Search Consumers (Video Portal) Site Search Programs A-Z
    • Xbox / OTT
    • Summary
    • Summary • Build an enterprise-scale DAM platform now – Managed storage and archive (Amazon S3, Amazon Glacier) – Managed database for catalog processing (Amazon DynamoDB, Amazon Relational Database Service [RDS]) – Managed search (CloudSearch) • Application development accelerators – Elastic Beanstalk harness (web, API, and worker roles) – Reduced effort with the AWS CLI • (Almost) fire and forget
    • AWS Marketplace Can Help • AWS online software store – – – – – • Customer can find, research, buy software Simple pricing, aligns with EC2 usage model 1-click launch in minutes Marketplace billing integrated into your AWS account 1,000+ products across 24 categories Digital asset management related options Include: – – – – WebDAM – centralize, store, manage and distribute collateral Digital asset management cloud – web-based open source DAM Widen – manage and distribute digital media and brand assets with user roles and permissions Adobe Experience Manager – unified asset management including mobile Learn more at: http://aws.amazon.com/marketplace
    • “DAM!”
    • Please give us your feedback on this presentation MED-402 Building a Scalable Video / DAM Solution in the Cloud As a thank you, we will select prize winners daily for completed surveys!