20021028-Videoconferencing-Chen.ppt

207
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
207
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20021028-Videoconferencing-Chen.ppt

  1. 1. Challenging 5 Common Assumptions about Videoconferencing Milton Chen Computer Systems Lab Stanford University Presented at Internet2 Advanced Applications Track 10/28/2002
  2. 2. The Stanford Video Auditorium desktop interface 15’ x 5’ video wall
  3. 3. Video Auditorium publicity/users <ul><li>Intel president Paul Otellini’s Intel Developer Forum keynote </li></ul><ul><li>Invited demo to NASA headquarters for Paul G. Pastorek </li></ul><ul><li>CANARIE, Canada </li></ul><ul><li>CUDI, Mexico </li></ul><ul><li>Comdex, Brazil </li></ul><ul><li>IBM Almaden Lab </li></ul><ul><li>Manhattan College </li></ul><ul><li>Hopkins Marine Station </li></ul><ul><li>Stanford Medical School </li></ul><ul><li>Stanford Learning Lab </li></ul><ul><li>Stanford Center for Design Research </li></ul><ul><li>Berkeley Bioengineering Lab </li></ul><ul><li>Universidade Federal do Rio Grande do Sul, Brazil </li></ul>
  4. 4. Outline <ul><li>Common assumptions </li></ul><ul><ul><li>Technology </li></ul></ul><ul><li>1. High-fidelity AV requires dedicated hardware </li></ul><ul><li>2. Difficult to install and use </li></ul><ul><ul><li>Human factors </li></ul></ul><ul><li>3. Life size displays are ideal </li></ul><ul><li>4. Floor control requires interactive frame rate </li></ul><ul><li>5. Eye contact is difficult </li></ul><ul><li>Beyond MCU and H323 </li></ul><ul><ul><li>Peer-to-peer </li></ul></ul><ul><ul><li>Stanford’s Port Bootstrap Protocol </li></ul></ul><ul><ul><li>Personal directory </li></ul></ul><ul><li>An evaluation of distance learning at Stanford </li></ul><ul><li>Why videoconferencing is not ubiquitous </li></ul>
  5. 5. 1. High-fidelity low-latency AV requires dedicated hardware
  6. 6. Your PC outperforms all dedicated systems $700 Pentium 4 computer $7000 systems outperforms                                                
  7. 7. Comparison of videoconferencing solutions * CUSeeME, iVisit, Yahoo messenger have unacceptable latency 400 Kbps 720x480 many AccessGrid, VRVS 3000 Kbps 720x480 1 WIDE DVTS 200 Kbps 352x288 4 Polycom, Sony, … 16 to more than 100 1 1 Max number of links 2000 Kbps 720x480 Vbrick 100 Kbps 720x480 Stanford Video Auditorium 200 Kbps 352x288 NetMeeting BW required at 352x288 15fps Max video resolution
  8. 8. demo
  9. 9. A scalable AV streaming architecture * TrueSpeech 8.5 * MPEG-4 * Encrypted, AES (Rijndael), streaming * Simultaneous AV recording * Perceptual streaming adapts to network conditions audio capture audio compress audio send audio receive audio decompress audio render video capture video compress video send video receive video decompress video render
  10. 10. Beyond MCU and H323 <ul><li>MCU vs. peer-to-peer </li></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><li>Ease of deployment </li></ul></ul><ul><li>H323 vs. Stanford’s Port-Bootstrap Protocol </li></ul><ul><ul><li>Firewall </li></ul></ul><ul><ul><li>Ease of deployment </li></ul></ul><ul><li>Personal directory </li></ul>
  11. 11. 2. Videoconferencing systems are difficult to install and use
  12. 12. One click operation <ul><li>To use the Video Auditorium </li></ul><ul><ul><li>“ Nothing” to install </li></ul></ul><ul><ul><li>One click on the html speed dial </li></ul></ul><ul><ul><ul><li><OBJECT </li></ul></ul></ul><ul><ul><ul><li>CLASSID=&quot;CLSID:E80F7B8F-7906-4A89-B59E-B19871F474A9&quot; </li></ul></ul></ul><ul><ul><ul><li>CODEBASE=&quot;runtime/VA_Start.ocx#Version=-1,-1,-1,-1&quot;> </li></ul></ul></ul><ul><ul><ul><li><PARAM NAME=&quot;addr&quot; VALUE=&quot;stanford -client_only&quot;> </li></ul></ul></ul><ul><ul><ul><li></OBJECT> </li></ul></ul></ul>Makes conferencing as simple as surfing the web
  13. 13. 3. Life size displays are ideal
  14. 14. Each video should be between 6 ° and 14° wide <ul><ul><li>* 12 people sat 10’ from the display Subjectively, people reported 6 ° as minimum and 14° as ideal. Life size is 12 °. </li></ul></ul>
  15. 15. Balance between size and head movements * 12 people viewed 9 and 36 students on a large and immersive display. Immersive display requires head movements to see all the students. 9 ° 14 ° 7 ° 4 °
  16. 16. 4. Effective floor control requires interactive frame rate
  17. 17. Minimum required frame rate <ul><li>Interactive 10 fps </li></ul><ul><li>Tolerable 5 fps </li></ul><ul><ul><li>[Tang and Isaac ’93] </li></ul></ul><ul><li>Lip synchronization 5 fps </li></ul><ul><ul><li>[Watson and Sasse ’96] </li></ul></ul><ul><li>Content understanding 5 fps </li></ul><ul><ul><li>[Ghinea and Thomas ’98] </li></ul></ul><ul><li>Sign language recognition 1 fps </li></ul><ul><ul><li>[Johnson and Caird ’96] </li></ul></ul>
  18. 18. Gesture Detection Algorithm input image frame difference after erosion Visualization of algorithm
  19. 19. Requires 10% of full motion bandwidth full-motion (10 fps) gesture-sensitive (0.2 fps) * MPEG4 encoded at 320x240
  20. 20. Gesture sensitive allows dynamic discussion 15 fps ~0.2 fps 0.2 fps * 8 groups of 4 people during a discussion
  21. 21. 5. Eye contact is difficult
  22. 22. Eye contact fires up our brain [Kampe et al. ’01]
  23. 23. Eye contact is difficult Looking into the camera Attempting eye contact
  24. 24. Solutions to eye contact Half-silvered mirror [Rosenthal ’47] MAJIC [Okada, et al. ’94] ClearBoard [Ishii, et al. ’92] GazeMaster [Gemmell, et al. ’00]
  25. 25. A simple solution Hydra [Sellen, Buxton, and Arnott ’92]
  26. 26. Eye contact sensitivity is high <ul><li>Spatial perception task </li></ul><ul><li>As good as Snellen acuity </li></ul>[Gibson and Pick ’63] 2 m * 6 observers judged 1 looker looker observer 0 8.5 -8.5 0 100 stdev = 2.8 ° Eye contact (%) Angle (deg)
  27. 27. Sensitivity is symmetric <ul><li>Cline ’67 </li></ul><ul><li>Kruger and Huckstedt ‘69 </li></ul><ul><li>Anstis, et al. ’69 </li></ul><ul><li>Stokes ’69 </li></ul><ul><li>Ellgring ’70 </li></ul>PicturePhone camera above display Hydra camera below display
  28. 28. Methodology <ul><li>* Two rooms can be linked in a videoconferencing session </li></ul>Observers watch videos of looker and judge eye contact large display with camera at the center Record lookers gazing at different targets
  29. 29. Sensitivity is asymmetric * 16 observers judged recorded videos of 1 looker
  30. 30. An anatomical explanation looking at you looking sideways looking up looking down eye closing Illustrations from The Artist’s Guide to Facial Expression [Faigin ’90]
  31. 31. Sensitivity is less in conversation * 16 observers judged videos of 1 looker (down) recorded conversation
  32. 32. Sensitivity is less in video * 16 observers judged 1 looker in conversation (down) face-to-face video
  33. 33. We are biased to perceive contact angle eye contact (%) sideway, up down down & video down & video & conversation Snellen Acuity Conferencing Acuity 0 100
  34. 34. Maximum camera to eyes distance * Assuming a sensitivity of 7 ° 12” 8’ Wall size 3” 2’ Desktop 1.5” 1’ Palm held camera to rendered eyes distance minimum viewing distance device
  35. 35. Eye contact in the Video Auditorium
  36. 36. Why is videoconferencing essential to distance learning: An evaluation of distance learning at Stanford
  37. 37. Distance learning at Stanford <ul><li>Remote students can call in during class </li></ul><ul><li>Instructor cannot see the remote students </li></ul>a 1969 classroom a 2002 operator console a 2002 lecture viewer
  38. 38. Students like distance learning * 120 students, 15 TAs, and 41 faculty
  39. 39. Learning is less effective * 120 students, 15 TAs, and 41 faculty
  40. 40. F2F interaction is important F2F is important for lecturing and crucial for discussions
  41. 41. No interaction with remote students <ul><li>Classroom observation of 4 CS classes </li></ul><ul><ul><li>Instructor on average asked 9 questions per session </li></ul></ul><ul><ul><li>Local students on average asked/made 3 questions/comments per session </li></ul></ul><ul><ul><li>Remote students spoke once in 6 month </li></ul></ul>
  42. 42. Value of video beyond audio <ul><li>Cues only transmitted by the visual channel </li></ul><ul><ul><li>Negative feedbacks, … </li></ul></ul><ul><li>Emotional bond </li></ul><ul><ul><li>Establishing and maintaining relationships </li></ul></ul><ul><li>Can you imagine it? </li></ul><ul><ul><li>A new face, … </li></ul></ul>
  43. 43. A proposal
  44. 44. The world’s largest video wall: link all Internet2 members for Spring 03 <ul><li>Developed technology </li></ul><ul><li>One Mouse </li></ul><ul><li>AV stream migration </li></ul><ul><li>Bandwidth: 2 x 300 x (100 Kbps + 10 Kbps)  60Mbps </li></ul><ul><li>Cost: 10 P4 laptops + 10 portable projectors  $30K </li></ul>
  45. 45. A prediction
  46. 46. Why all videoconferencing products has failed A plane that does not fly is not a plane First flight, Wrights 1903 <ul><li>A videophone that limits communication is not a videophone </li></ul><ul><ul><li>poor audio fidelity </li></ul></ul><ul><ul><li>poor video fidelity </li></ul></ul><ul><ul><li>excessive latency </li></ul></ul><ul><ul><li>no eye contact </li></ul></ul><ul><ul><li>poor lip synchronization </li></ul></ul>
  47. 47. Threshold of quality for the 2nd revolution first mobile phone, 1924 first handheld phone, 1973 1 st Revolution: Possible 2 nd Revolution: Practical first videoconferencing system, 1927
  48. 48. Conclusion <ul><li>Common assumptions </li></ul><ul><li>1. High-fidelity AV requires dedicated hardware higher on a PC </li></ul><ul><li>2. Difficult to install/use one click </li></ul><ul><li>3. Life size displays are ideal 6 ° to 14 ° </li></ul><ul><li>4. Floor control requires at least 10fps 0.2 fps avg </li></ul><ul><li>5. Eye contact is difficult 7 ° down </li></ul><ul><li>Videoconferencing is essential to distance learning </li></ul><ul><li>A MCU-less and H323-less future </li></ul>
  49. 49. You already have a one-click high-fidelity multiparty videoconferencing system We are at the dawn of a videoconferencing revolution that will fuel the demand for a 1000X increase in available bandwidth
  50. 50. <ul><li>Acknowledgement </li></ul><ul><ul><li>NASA </li></ul></ul><ul><ul><li>Intel </li></ul></ul><ul><ul><li>Sony </li></ul></ul><ul><ul><li>Interval Research </li></ul></ul><ul><ul><li>Wallenberg Global Learning Network </li></ul></ul><ul><ul><li>Department of Defense </li></ul></ul><ul><li>Future work </li></ul><ul><ul><li>Gold release for Feb 2003 </li></ul></ul><ul><ul><li>SDK </li></ul></ul><ul><ul><li>The Wall  </li></ul></ul>

×