Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012


Published on

In this session we will discuss the numerous ways to ingest data into AWS including options such as physical media import & direct connect. We also talk about policy-based Hierarchical Storage Management (HSM) in the cloud, total cost of ownership, the importance of storage durability, and the infinite scalability of Amazon S3. Also, the founder of photo-share sensation IMGUR, Alan Schaaf, speaks about their migration to AWS.

MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012

  1. 1. tweet #reinvent
  2. 2. ••••••
  7. 7. 99.99% vs. 99.999999999%durability of objects over agiven year
  8. 8. Customer Decides Where Applications and Data Reside
  9. 9. • Disaster Recovery • Backup Region 2 Region 1 • Distribution• Archive• Load for Elastic MapReduce• Migrate / Sync Enterprise • SaaS-based solution • Simplicity, “click and replicate” • Speed, optimized and elastic transfer • Safe, secure and guaranteed delivery
  11. 11. PER GB / MONTH
  12. 12. PER TB / YEAR
  13. 13. DURABILITY
  16. 16. Archive after 30 days logs/file1 logs/file2 logs/file3 logs Amazon GlacierMy S3 bucket
  17. 17. Expire after 365 days logs/file1 logs/file2 logs logs/file3 Amazon GlacierMy S3 bucket
  18. 18. Storage Options Amazon Reduced Data Glacier Redundancy Storage Standard StorageAmazon Glacier Amazon S3
  19. 19. Glacier Transforms tiered storage
  20. 20. HSMwith AWS Amazon Amazon SAN S3 AWS Cloud Glacier Corporate Data CenterversusTraditionalApproachto HSM offsite tape SAN tier 2 disk storage storage backup Corporate Data Center
  21. 21. Compliancewith AWS O/S image Amazon Amazon S3 AWS Cloud GlacierversusTraditionalApproach off-site tape O/S image disk storage backup Corporate Data Center
  22. 22. Options to consider
  23. 23. 5 HTTP – 2 fasp 3 multipart 4 Parallel Transcoding fasp 5 14 instances: 3 min Herndon, VA1 6 1. Video broadcast capture 2. High-speed upload Direct-to-S3 3. Scale out parallel transcode` 4. Deliver back to S3 5. High-speed download from S3 to UFC 6. Insert into CMS for streaming to mobile devices
  24. 24. Common mistakes
  25. 25. What is Imgur?• A simple image sharer• Has the most viral images on the Internet• Anyone can upload as many images as they want – without an account• 2,000,000 images uploaded per day • That’s 23 images per second• Can be embedded and shared on any site
  26. 26. The greatest image site. Full of the all the wondersand magic of the interwebs. Be forewarned, time hasbeen known to quicken in this realm.“I spent half a day on Imgur, and it was the greatest 6 hoursof my life.” - Urban Dictionary
  27. 27. • Started as a side project while at Ohio University• Redditors needed a place to host their images• Organically grew into a business• Alan was the only developer for 3 years• Moved to San Francisco• Now a team of 7 • (600 million pageviews per engineer)
  28. 28. • Every month. There are: • 11 minutes average visit duration • 2.9 billion page views • 11 pages per visit • 38 billion image views (images loaded) • 46th biggest site in the US • 54 million unique visitors – (according to • 4.7 petabytes of bandwidth used • 600 million objects stored in S3 • 62 million images uploaded * All data as of Nov 2012
  29. 29. • Pageviews are growing 15% every month.• How are we able to support this kind of growth?
  30. 30. User make a request for animage (Don’t do this!)
  31. 31. User make a request for an image
  32. 32. User make a request for an image
  33. 33. • Site traffic is increasing more than ever. How many more servers do we need?• Hardware failures• Tweaking every little thing is really hard and easy to get wrong, but necessary• There’s only one man doing all this; how can we make his life easier, while scaling the site at the same time?
  34. 34. • Autoscaling is awesome• Automated DB backups are awesome• Security features are awesome• Much easier to manage in the long run• Because everything’s managed, you require less admins to look over everything all the time• AWS has managed solutions for all the core services your website needs (server, database, cache, backups, security, etc.)
  35. 35. • Lots of new stuff to learn and set up• Possible downtime during migration• Very time consuming at first because you’re reconfiguring your entire stack
  36. 36. • AWS has a lot of services; find out which ones can work for you and how• Use the price calculator:• Read the docs:• Set up a test environment• Install the AWS SDK• Call AWS if you have questions. • (You don’t need AWS Support to call in)• Start coding!
  37. 37. • How do you get all your data to S3? • Duplicate writes: 1 to native, 1 to S3. • Upload all your data to S3 in parallel.We had 12 background processes running around the clock all uploading a different subset of data to S3 – it took 2 weeks to finish. • No need to store more than one copy • Turn on versioning for even more protection• Very similar process for Amazon RDS
  38. 38. • There’s no web interface• Have to do everything from command line• Confusing terminology• Hard to verify that it’s working as intended• But in the end, it’s amazing• If you’re not using it, you’d better have a really good reason why
  39. 39. • EC2: • Maximum performance with RAID0 Elastic Block Store • RAID0 EBS requires a pretty significant amount of maintenance overhead • Have to come up with your own backup plan• RDS: • Will provide very good performance out of the box (but not maximum) • Management console is fantastic • Easy to upgrade instances • High availability and read-only slaves are a click away • Managed service, which makes it more expensive• If you enjoy tuning every last little bit for maximum performance, then you can consider EC2 + EBS RAID 0• Still on the fence? Go with RDS
  40. 40. • There’s no access to the underlying file system• Migrating requires a dump and an import of your data, which is extremely time consuming for large databases• No access to the logs when things break• We were able to do it live – without taking the site down – but with lots of headaches
  41. 41. • Wed (1:00 p.m.–1:50 p.m.) MED203: Scalable Media Processing with AWS• Wed (2:05 p.m.–2:55 p.m.) MED202: Netflix’s Transcoding Transformation• Wed (3:25 p.m.–4:15 p.m.) MED303: Addressing Security in Media Workflow• Thu (10:30 a.m.–11:20 a.m.) STG205: Amazon S3: Reduce costs, save time, and better protect your data• Thu (11:35 a.m.–12:25 p.m.) STG203: Cloud Storage War Stories: From the front lines of some of the biggest battles• Thu (4:05 p.m.–4:55 p.m.) STG302: Archive in the Cloud with Amazon Glacier• Wed (1:00 p.m.–1:50 p.m.) STG201: Understanding AWS Storage Options
  42. 42. We are sincerely eager to tweet #reinvent hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.