Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Characteristics of Streaming Media Stored on the Web <ul><ul><li>Mingzhe Li, Mark Claypool, Robert Kinicki and James Nichols </li></ul></ul><ul><ul><li>ACM Transactions on Internet Technology (TOIT) </li></ul></ul><ul><ul><li>Vol. 5, No. 5, November 2005 </li></ul></ul>
  2. 2. Introduction (1 of 2) <ul><li>Improvements to Internet enable users to stream from Web browsers </li></ul><ul><ul><li>Across national and cultural boundaries </li></ul></ul><ul><li>Web users expect “point and click” to stream </li></ul><ul><li>2001, RealNetworks says 350,000 hours [1] </li></ul><ul><li>2002, CAIDA says streaming is significant fraction of traffic </li></ul><ul><ul><li>Going to increase with cellular networks </li></ul></ul><ul><li>Concern drives new protocols, routers, etc. to deal with traffic better </li></ul>
  3. 3. Introduction (2 of 2) <ul><li>Much work that characterizes streaming applications to better understand </li></ul><ul><li>Unfortunately, little shows what current streams stored on Web look like </li></ul><ul><li>Previous study in 1997 [19] </li></ul><ul><ul><li>Looked at every video on the Web </li></ul></ul><ul><ul><li>Found Internet could not support streaming </li></ul></ul><ul><ul><li>RealPlayer and Media Player not created </li></ul></ul><ul><li>In 1985, papers by Ousterhout et al [21] studied characteristics of files </li></ul><ul><ul><li>Fundamental in designing new file system </li></ul></ul><ul><li> Need study of streaming media stored on the Web to help research today </li></ul>
  4. 4. Investigation (1 of 2) <ul><li>What are the most popular streaming media products? </li></ul><ul><ul><li>Previous studies [12] show very different </li></ul></ul><ul><ul><li>Earlier, prevalence of MPEG, AVI, QuickTime made it difficult for new comers </li></ul></ul><ul><li>What is the ratio of streaming audio versus streaming video? </li></ul><ul><ul><li>Audio has lower bitrate cap (voice, music) than video </li></ul></ul><ul><ul><li>Can give current bitrate expectations </li></ul></ul><ul><li>Are media durations long-tailed? </li></ul><ul><ul><li>Long-tailed can contribute to self-similarity </li></ul></ul><ul><ul><li>Self-similar traffic difficult to manage </li></ul></ul>
  5. 5. Investigation (2 of 2) <ul><li>What are typical streaming media target bitrates? </li></ul><ul><ul><li>Direct impact on network traffic </li></ul></ul><ul><ul><li>Provides insight into frame resolution, frame rates, color depth </li></ul></ul><ul><li>What fraction of streaming codecs being used? </li></ul><ul><ul><li>Codecs determine compression efficiency </li></ul></ul><ul><ul><li>Knowledge of codec prevalence suggests how fast improvements incorporated </li></ul></ul>
  6. 6. Focus <ul><li>Focus on commercial </li></ul><ul><ul><li>Big 3: Media Player, RealPlayer, QuickTime </li></ul></ul><ul><li>Other studies looked at server side or one client </li></ul><ul><ul><li>This study broader </li></ul></ul><ul><li>Have been p2p studies, but p2p not streamed (mostly) </li></ul><ul><ul><li>Instead downloaded, as is file transfer </li></ul></ul><ul><li>Build specialized crawler, crawl over 17 million URLs from different starting points, and analyze about 30 thousand clips </li></ul>
  7. 7. Teasers <ul><li>Volume and relative amount increased since 1997 </li></ul><ul><li>Proprietary most prevalent </li></ul><ul><ul><li>RealPlayer 1 st , Media Player 2 nd </li></ul></ul><ul><li>Most clips short, with long-tailed duration </li></ul><ul><li>Encoded at low-resolution, less than current monitors can handle </li></ul><ul><li>Work useful for: </li></ul><ul><ul><li>Selecting clip workloads </li></ul></ul><ul><ul><li>Generating streaming models </li></ul></ul>
  8. 8. Outline <ul><li>Introduction (done) </li></ul><ul><li>Methodology </li></ul><ul><li>Analysis </li></ul><ul><li>Sampling Issues </li></ul><ul><li>Conclusions </li></ul>
  9. 9. Methodology (Mini-Outline) <ul><li>Media Crawler </li></ul><ul><li>Starting Pages </li></ul><ul><li>Measurement </li></ul>
  10. 10. Media Crawler <ul><li>Modify Larbin Web crawler </li></ul><ul><li>Recursively traverses URLs </li></ul><ul><ul><li>Avoid loops by caching previous </li></ul></ul><ul><li>Identify streaming media based on protocol type </li></ul><ul><ul><li>Ex: mms://, </li></ul></ul><ul><ul><li>rtsp:// </li></ul></ul><ul><li>Also examine </li></ul><ul><li>HTTP extensions </li></ul>
  11. 11. Starting Pages <ul><li>Wanted international and popular </li></ul><ul><li>International – chose 10 most wired countries </li></ul><ul><ul><li>Allow for cross cultural analysis </li></ul></ul><ul><ul><li>If Nielsen gave no additional info, chose domestic newspaper as starting point </li></ul></ul><ul><li>USA – chose 7 popular themes </li></ul><ul><ul><li>Allow for cross-content analysis </li></ul></ul><ul><li>Feb 13, 2003, crawl 1 million from each </li></ul><ul><ul><li>Took 4 to 24 hours, based on RTT </li></ul></ul>
  12. 12. Measurement of Content Characteristics <ul><li>Use specialized tools to access each Media URL </li></ul><ul><ul><li>Collect: encoding, bitrate, duration, size, … </li></ul></ul><ul><ul><li>Tools built from SDK, use player core </li></ul></ul><ul><li>RealNetworks: </li></ul><ul><ul><li>RealAnalyzer, TestPlay (could not do levels) </li></ul></ul><ul><li>Microsoft Media: </li></ul><ul><ul><li>Media Analyzer, Wmprop (could do levels) </li></ul></ul><ul><li>MPlayer </li></ul><ul><ul><li>Open source (could not do bitrate) </li></ul></ul>
  13. 13. Outline <ul><li>Introduction (done) </li></ul><ul><li>Methodology (done) </li></ul><ul><li>Analysis </li></ul><ul><ul><li>Aggregate analysis </li></ul></ul><ul><ul><li>Commercial products </li></ul></ul><ul><ul><ul><li>Video </li></ul></ul></ul><ul><ul><ul><li>Audio </li></ul></ul></ul><ul><ul><li>Codec </li></ul></ul><ul><li>Sampling Issues </li></ul><ul><li>Conclusions </li></ul>
  14. 14. Aggregate Analysis (1 of 3) <ul><li>Remove unique, giving about 11 million URLs </li></ul><ul><ul><li>About 54,000 were streaming </li></ul></ul><ul><li>In 1997, about 25 million URLs </li></ul><ul><ul><li>About 22,000 were streaming </li></ul></ul><ul><li>Extrapolating </li></ul><ul><ul><li> Today, about 15 million total </li></ul></ul><ul><ul><li> Increase from 0.09% to 0.47% </li></ul></ul>
  15. 15. Aggregate Analysis (2 of 3) Some “heavy hitters”, more so than typical Web servers
  16. 16. Aggregate Analysis (3 of 3) - Real almost ½ of all streaming content - In 1997, MPEG, AVI, QuickTime were all, but now only 10% combined - MP3 is most popular non-proprietary format
  17. 17. Outline <ul><li>Introduction (done) </li></ul><ul><li>Methodology (done) </li></ul><ul><li>Analysis </li></ul><ul><ul><li>Aggregate analysis </li></ul></ul><ul><ul><li>Commercial products </li></ul></ul><ul><ul><ul><li>Video </li></ul></ul></ul><ul><ul><ul><li>Audio </li></ul></ul></ul><ul><ul><li>Codec </li></ul></ul><ul><li>Sampling Issues </li></ul><ul><li>Conclusions </li></ul>
  18. 18. Commercial Product Analysis <ul><li>Run custom tools on commercial </li></ul><ul><li>Of original 39,000 only about 29,000 valid </li></ul><ul><ul><li>50% “cannot find specified file” </li></ul></ul><ul><ul><li>25% “cannot connect to server” </li></ul></ul><ul><ul><li>10% “authorization failure” </li></ul></ul><ul><li>Can be from playlist </li></ul><ul><ul><li>But 97% only 1 clip </li></ul></ul>
  19. 19. Live versus Pre-Recorded - Most pre-recorded - 98% is pre-recorded, 2% live
  20. 20. Percentage of Audio and Video - More RealAudio than MP3 Audio - Proportionally less WSM is audio - Almost no QuickTime is audio
  21. 21. Duration - 1997, 90% only 45 seconds or less - Still, today much shorter than T.V. show or movie
  22. 22. Self-Similar Analysis (1 of 2) Definitive test: Is tail flat? Looks flat, but that is not good enough [31]
  23. 23. Self-Similar Analysis (2 of 2) <ul><li>Measure curve of tail (1/16 th of distro, others same) </li></ul><ul><ul><li>Curve defined as 3 point estimate, take derivative </li></ul></ul><ul><li>Estimate Pareto (long-tailed) slope  </li></ul><ul><ul><li>Used aest tool </li></ul></ul><ul><li>Generate 1000 samples from Pareto with  </li></ul><ul><ul><li>Each sample has same number of points as n </li></ul></ul><ul><ul><li>Calculate curvature of sample tail, mean  </li></ul></ul><ul><li>Calculate difference (d) between  and original </li></ul><ul><li>Count number out of 1000 differ by d </li></ul><ul><ul><li>495 (video) and 498 (audio), about ½ </li></ul></ul><ul><li>Cannot reject null-hypothesis  May be long-tailed </li></ul>
  24. 24. Outline <ul><li>Introduction (done) </li></ul><ul><li>Methodology (done) </li></ul><ul><li>Analysis </li></ul><ul><ul><li>Aggregate analysis </li></ul></ul><ul><ul><li>Commercial products </li></ul></ul><ul><ul><ul><li>Video </li></ul></ul></ul><ul><ul><ul><li>Audio </li></ul></ul></ul><ul><ul><li>Codec </li></ul></ul><ul><li>Sampling Issues </li></ul><ul><li>Conclusions </li></ul>
  25. 25. Video Encoded Bitrate In 1997, 1% stream for modem, 50% for broadband, 20% for T1+ - Said, modem could not support streaming Note, today, broadband still not targeted
  26. 26. Streams Encoded Per Clip Media Scaling will be difficult! Note, earlier study [15] found real at 65% Audio is one stream
  27. 27. Aspect Ratios Very uniform, but a few odd-balls 30% above or below Take product for size (next)
  28. 28. Video Resolution - Most much smaller than typical monitors (1024 x 768 would be 786,432) - Room to grow!
  29. 29. Outline <ul><li>Introduction (done) </li></ul><ul><li>Methodology (done) </li></ul><ul><li>Analysis </li></ul><ul><ul><li>Aggregate analysis </li></ul></ul><ul><ul><li>Commercial products </li></ul></ul><ul><ul><ul><li>Video </li></ul></ul></ul><ul><ul><ul><li>Audio </li></ul></ul></ul><ul><ul><li>Codec </li></ul></ul><ul><li>Sampling Issues </li></ul><ul><li>Conclusions </li></ul>
  30. 30. Audio Encoded Bitrates - Most for modems, but 10% for broadband - In 1999, 100% found for modems - Will likely increase (MP3 128 kbps), but cap
  31. 31. Video Codecs v8 buffers differently than v9 - Newest versions, v9, still not deployed much - Useful as snapshot in time
  32. 32. Outline <ul><li>Introduction (done) </li></ul><ul><li>Methodology (done) </li></ul><ul><li>Analysis (done) </li></ul><ul><li>Sampling Issues </li></ul><ul><li>Conclusions </li></ul>
  33. 33. Sampling Issues <ul><li>In 1997, could analyze all on Web </li></ul><ul><li>Today, impractical </li></ul><ul><ul><li>Would take 16 years to crawl and analyze clips </li></ul></ul><ul><li>Is 17 million large “enough” sample? </li></ul><ul><ul><li>Is is possible to obtain same results with fewer starting points? </li></ul></ul><ul><ul><li>Is it possible to obtain same results with fewer than 1 million URLs per starting point? </li></ul></ul><ul><ul><li>How does sampling affect distributions? </li></ul></ul><ul><ul><li>How does choice of starting point affect distribution? </li></ul></ul>
  34. 34. Percentage of Media versus URLs Took 200k from each, build set Overall, above 400k from each is stable  ½ million
  35. 35. Duration of Video for Number of URLs Can get away with far fewer and have same distribution of durations
  36. 36. Media Type versus Starting Points 9 Starting points sufficient
  37. 37. Duration for Number of Starting Points
  38. 38. Media Type in USA versus International - International similar - May be because cross-cultural Web
  39. 39. Duration for USA and Non-USA
  40. 40. Summary <ul><li>Many researchers worry about volume increase of Video </li></ul><ul><li>Video characteristics made based on old data </li></ul><ul><li>Current data on media stored on Web </li></ul><ul><li>Crawled 17 million URLs, analyzed 30k clips </li></ul>
  41. 41. Conclusions <ul><li>Streaming media increased 600% in past 5 years </li></ul><ul><li>Real Media 1 st , Microsoft Media 2 nd </li></ul><ul><li>Audio and video about equal </li></ul><ul><li>Vast majority pre-recorded (not live) </li></ul><ul><li>Most targets still for modem </li></ul><ul><li>Potential to be large since monitor resolutions much larger than video </li></ul>
  42. 42. Future Work?
  43. 43. Future Work <ul><li>Correlate to actual data streamed </li></ul><ul><li>Congestion responsiveness </li></ul><ul><li>P2P </li></ul><ul><li>Future study (now ~5 years old!) </li></ul>