Your SlideShare is downloading. ×
Word Format
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Word Format


Published on

Published in: Technology, Business

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 1 Video Hosting Architecture Phillip Sutton, TBD, Technical Advisor, Dale Callahan, Ph.D., P.E., Managing Advisor Abstract – This paper represents a high-level overview of a possible architecture that might be used to build a video hosting and sharing service with little to no upfront costs. While recent advances in commodity hardware have helped to drive down the costs of hosting video files across the board, it can still be very expensive for a new startup to establish their IT infrastructure. By implementing the architecture presented here a new venture can build a very inexpensive infrastructure for storing and delivering video content. I. PROBLEM DESCRIPTION According to Price Waterhouse Cooper, the worldwide filmed entertainment market will reach $118 billion in 2009 [1]. In the U.S. alone consumers spent $36.4 billion in 2007 on movie entertainment, with 68% going towards DVD rentals [2]. By 2013, Insight Research predicts streaming content on the web will produce $70 billion in the U.S. alone [3]. Demand for online video content in one form or another is huge. Current technological trends combined with deep fragmentation in this market are creating numerous niche opportunities for entrepreneurs to deliver content to starving viewers. However, for the small entrepreneur, upfront costs to build an infrastructure for the storage and delivery of high-quality video content can be very costly. How then does a small company build an infrastructure, able to store and deliver massive amounts of high-quality video, with as little upfront costs as possible? A potential option would be to craft a solution based on current trends in technology such as Web 2.0, SaaS, utility computing, and so forth. Web 2.0 is the use of the Internet as a platform that aims to facilitate the sharing of information, collaboration, and creativity [5]. Software as a Service (SaaS) is an application model where customers do not pay to own software but rather pay for usage instead [6]. Utility computing, also known as on- demand computing, packages computation and storage as a metered utility [7]. What makes my idea unique is using a combination of existing technologies, Web 2.0 technologies, SaaS, and utility computing services to build an alternative solution to traditional video hosting services and content delivery networks. A preliminary architecture could be built using Amazon’s simple storage services, simple queue services, and elastic computing cloud. I believe that by combining these services a simple infrastructure for scalable media hosting can be created that is easily scaleable, provides free idle capacity, is able to handle the spikes in bandwidth that often occur when today’s websites suddenly find themselves at the center of the blogosphere. In summary, I plan to research some of the various methods used to build a scalable media hosting infrastructure. My goal will be to determine a low cost solution for the small business that doesn’t have deep pockets to compete with larger more established competitors.
  • 2. 2 II. SUPPORTING DOCUMENTATION Video sharing and distribution is big money. For example, YouTube claims over 100 million daily viewings and account for 60% of all videos watched online [10]. Quantacst estimates YouTube attracts 60 million unique visitors per month [11]. There are plenty of YouTube clones, with the vast majority of them serving up short segments of low to medium quality videos with limited file size. In general, video sharing sites are extremely popular and a very competitive marketplace to enter. However, there seems to be plenty opportunities for making niche video websites. One such niche opportunity would be to deliver full-length DVDs to customers over the Internet. While online video viewing is in the process of exploding many still prefer to see high-quality DVD content on their home theatre systems from the comfort of their living rooms with friends and family. An experience not yet easily replicated through the Internet and personal computers. A small business that needs to deliver DVD quality videos for download will most likely start by building a library of content for users to choose from. The library would consist of DVD ISO images, video clips for each DVD, and video trailers. Storage and bandwidth requirements of such a library could easily grow by terabytes each year. For any small business there’s always a price point to take into consideration. Reliability, scalability, and resources must be factored into that price point. Reliability refers to the guaranteed availability of your resources including uptime and connectivity. Scalability covers increases in storage, bandwidth, and computing power. Resources refer to those human factors such as system administrators or network technicians required to maintain the system. What are the options today for the storage and delivery of massive amounts of video? Does a better option exist? The majority of video sites build their infrastructure in-house, through a web host-provider offering either hosted or dedicated server space, or on content distribution networks (CDN). Other possibilities include using YouTube or its clones as an infrastructure or using Amazon S3 services. A. YouTube YouTube is really great at serving short video clips to a massive number of viewers. Serving up to 100 million videos on a daily basis is no small feat. Following on the success of YouTube several dozen serious video sharing sites have sprung up and have been growing in quality. There exists at least another 50 direct clones of YouTube offering various levels of video sharing capabilities. Most of the YouTube category type sites offer free hosting of upload videos. Why then would one not just build a content library based on these free services? Table I, illustrated on the following page, lists some of the specifications for several of the most popular video sharing websites being used today. Since being acquired by Google, YouTube has the advantage of leveraging Google’s highly reliable and scalable infrastructure [12]. Virtually unlimited storage and bandwidth exists however, the major limitations to hosting high- quality content on popular sites seem to be imposed limits on file size, playing time, resolution, and a widely varying quality per provider. Furthermore, no real content management system exists for these systems and there is no way for an individual to monetize their content on these systems other than what little ad-based revenue sharing program a particular site may employ. Perhaps in the future as
  • 3. 3 Google/YouTube opens up their APIs and and Google increases limits on file size uploads a viable architecture for storing and delivering high-quality content can be devised. Table I Video Website Comparisons Website YouTube Yahoo Video Veoh Vimeo Unique Visitors per year 205,593,000 48,026,000 11,476,00 569,000 0 Max Video Bit Rate (kbps) ~2001 3003 1,500 1,600 Max Upload File Size (mb) 1002 150 250 500/wk Max Length (min) 10 N/A N/A N/A Max Screen Size(s) 320x240 320x240 640x480 1280x7204 Host Format (streaming) FLV FLV FLV FLV Processing Time Up to several Up to several Few hours Minutes5 hours hours 1 estimated 2 increasing to 1 GB 3 upcoming 700 kbps 4 claims this capability B. In-House Hosting Hosting high-quality videos on your own servers can be a very expensive proposition? Initial acquisition of equipment, ongoing maintenance, support, and expansion can lead to significant expenditures. For example, let’s say you wanted to build a library of independent films containing at least 5000 videos at a minimum DVD quality file size of 4.7 GB, that’s approximately 23 TB of storage. Now that’s going to be 23 TB on a quality redundant raid array with possibly multiple copies of each. Not to mention multiple versions and multiple formats for different devices. Infrastructure needs such as bandwidth requirements, incoming/outgoing connections and ongoing costs will also factor into costs. C. Managed Hosted A second popular option is to use a hosting solution provider such as HostGator. Selected a dedicated hosting option with a quad core dedicated server, 4 GB of memory, 500 GB of storage, and 2,500 GB of monthly bandwidth will cost at least $374 per month, including support. On average, each additional 500 MB will cost $5 per month and each additional 5 GB of bandwidth will cost an additional $5 per month. So for 23 TB of additional storage the cost will be roughly $241,000 per month. Additional bandwidth costs will be around $141,312 assuming only 60% of your catalog is requested each month and roughly 10 copies per month are downloaded. That’s a whopping $4,587,774 per year. Managed hosting can’t scale with you and you can’t control hardware or make favorable networking agreements with providers. D. Content Distribution Networks
  • 4. 4 The third traditional hosting solution is the use of content distribution networks. A CDN is a system of computers networked together across the internet that cooperate trans- parently to deliver content, especially large media content) to end users [8]. CDNs have many advantages over self-hosting and hosting such as direct backbone access, multiple data centers, thousands of nodes with tens of thousands of servers per node. Some of the optimizations come in the form reduced bandwidth costs and improved end-user performance. The average price to deliver over a CDN varies by a number of factors. The going rate for 100 TBs per month is between $0.19 and $0.29 per gigabyte [9]. That roughly equates to between $19, 456 - $29,696 per month in bandwidth costs. And those costs come in chunks of tiered rates. Storage rates in the terabyte range could average $1.00 per GB. A big drawback to CDNs are monthly commitments and paying for bandwidth and storage you may never actually use. CDNs replicate content in multiple places. Better chance of content being closer to the user with fewer hops, and content will run over a friendlier network. Traditionally designed for performance and marketed to the enterprise crowd. E. Amazon S3 The post-Google world has begun to see the development of the distributed, on demand, grid/cloud-computing, redundant, failure-tolerant, scalable systems architecture. Amazon sorted out the fundamentals of S3 in developing their own infrastructure for and in the process has opened up their proprietary infrastructure to the world at minimal cost. Amazon Simple Storage Solutions, S3, provides a managed internet-accessible storage service where anyone can share any amount of data and retrieve it later again. The maximum amount of data per object is 5GB, and the maximum number of objects is not limited. Amazon has a stable and predictable pricing model that’s fairly competitive with the industry. Table II, below, lists the pricing structure provided by Amazon’s S3 service [13]. Table II S3 Pricing Storage $0.10 per GB/month of storage used Data Transfer $0.10 GB – all data transferred in $0.18 GB – first 10TB/month of data transferred out $0.16 GB – next 40TB/month of data transferred out $0.13 GB – data transferred out/month over 50TB Requests $0.01 per 1,000 PUT or LIST request $0.13 per 10,000 GET and all other requests. $0.00 for delete requests. Amazon S3 certainly provides an interesting alternative to traditional video hosting. Virtually zero start up costs and fairly competitive pricing coupled with standard REST and SOAP interfaces and HTTP transfers protocols with the option of building protocol or functional layers. S3 was built to be scalable, reliable, fast, inexpensive, and simple to use.
  • 5. 5 Table III, below, lists the average costs of hosting 5000 4.7 GB DVDs and delivering 100 TB of data. Table III Costs associated with Hosting Hosted CDN Amazon S3 Storage $241,00 $23,552 $3,523 0 Bandwidth $141,31 $29,696 $15,153 2 Total Per Month $382,31 $53,248 $18,676 2 Section III Amazon Simple Storage Service A. Overview of S3 S3 (Simple Storage Service) is Amazon’s online storage web service providing unlimited storage through a web services interface. The design of S3 is intended to provide scalability, high availability, and low latency at commodity prices. Amazon uses the same scalable storage infra-structure to run its own global e-commerce network on [50]. Furthermore, objects stored in S3 can be accessed by unmodified HTTP clients thereby providing the possibility of replacing a portion of existing web hosting infrastructures. Highlights of Amazon’s S3 service: • Storage of arbitrary objects up to 5 GB in size with 2 KB of metadata. • Objects stored in buckets. • Unlimited number of objects per bucket. • Each bucket is owned by an Amazon Web Service (AWS) account. • Each object is identified within each bucket by a unique user assigned key. • Use REST-style HTTP, SOAP, or HTTP GET/PUT interfaces to created, list, and retrieved objects. • Supports BitTorrent protocol. • Requests authorized using action control lists associated with each bucket and object. • Authenticated URLs can be created with time-bounded validity. Buckets are a simple way for S3 to group objects together much like a folder does. Bucket names have global scope and no one else can create a bucket of the same name. HTTP log information can also be configured for sibling buckets which can later be used for data mining tasks. Objects are the actual files, along with their metadata, that get stored on the platform. Objects can be created or deleted, and associated with a set of permissions. Every object is assigned a key and uniquely identifies the object within a bucket. Section IV Video Hosting Architecture A. Proposed S3 Architecture
  • 6. 6 S3 is an online storage service and economy hosting/bandwidth provider. It is an ideal solution for a small startup just beginning to build a video content storage and sharing service. Figure 1, shown on the next page, represents an oversimplified architecture using Amazon S3 storage as the backbone of a video distribution service. The Content Management System (CMS) keeps track of all assets contained in the S3 storage space. When the web client is ready to upload a video a request is made to the Web server which in turn creates a unique bucket for the user, if necessary, then creates a unique object for the client’s file. Next, the Web server issues the appropriate unique user assigned key and authorization so that the web client can then upload a file to the appropriate bucket on S3. When the web client is ready to access a file, a request is made to the server, which then queries the CMS for the proper Amazon Web Services access identifiers from which the web client can then access the file directly from S3. Web Server / CMS Web Client S3 Figure 1 Oversimplified Video Hosting Architecture S3 is built with a minimal set of features. Though APIs are provided to interface with S3, actual software utilities are sparse. Some tools do exist, such as S3 Organizer, which integrates into Firefox’s browser, S3 Sync written in Ruby, and professional offerings such as JungleDisk. However, these offerings are geared more towards backup operations between a user’s client machine and S3 and not for large scale management of assets between S3, web servers, and client browsers. C. Issues • May suffer from latency when compared to CDN networks. • Still may need to host most popular content on CDNs. • No server side processing; still need a server to perform server-side processing on scripts or to access a database. • Need a mechanism to handle read/write failures. • Must build your own software. D. What’s Left • Still lot’s of work left to do. • Create more detailed architecture.
  • 7. 7 • Work out coding details. • Begin implementing architecture and judging performance. D. Future • Fully integrate into content management systems. • Integrate Amazon EC2 services for on-demand computing power. • Experiment with Amazon’s Bittorrent services for greater throughput. V. Summary A. Conclusion Video sharing is hugely popular today. And online distribution of video is becoming more accepted as the quality and speed of video downloads continue to increase. For a small startup, it can be dauntingly hard to enter the market given the amount of hardware required to build a reliable and scalable hosting solution. Old standards like content delivery networks usually require prepayments for chunks of storage and bandwidth that may never be used. Of course, surpassing the allocated storage or bandwidth limits usually result in steep costs as well. Amazon Web Services is paving the way for developing new applications based on utility style computing and storage services. With a bottomless supply of cheap, worry-free storage and CPU power, infrastructures no longer have to be built based on anticipated traffic or have to pay for idle capacity. Furthermore, investments won’t have to be made in a large amount of hosting infrastructure or services just to handle occasional traffic spikes. The video hosting architecture presented herein allows a cost effective solution to be built with minimal costs, scalability, and reliability. It makes it possible for the smaller startup to compete with huge, deep pocketed companies without having to raise substantial amounts of cash for hardware. References [1] Independent Movie Market, indie, February 2008, [2] Apple tunes into movie-rental market by Wailin Wong, Chicago Tribute Web Edition, Chicago, IL, January 16, 2008, http://www.chicagotribune.c om/business/chi- wed_applejan16, 0,7918475.story?coll=chi-home page-fea. [3] Insight Research: Streaming Content to Generate $70 Billion By 2013, Seeking Alpha, April 2008, http://seekingalpha.c om/article/70673-streaming-content-to-generate-70- billion-by-2013-an-unrealistic-claim. [4] Theatrical Market Statistics 2007, MPAA, March 2008, Theatrical-Market-Statistics-Report.pdf. [5] Web 2.0, Wikipedia, March 2008, http: // [6] SaaS, Wikipedia, March 2008,
  • 8. 8 [7] Utility Computing, Wikipedia, March 2008, [8] Content Delivery Network, Wikipedia, March 2008, tribution_network [9] Content Delivery Video Pricing Rises In the First Half of This Year, April 2008, ml. [10],2933,203959,00.html July 18, 2006. [11]Guide to Video Marketing on YouTube, Search Engine Journal, February 2008, [12]Google Architecture, January 2008, [13]Amazon Simple Storage Service, March 2008,