20021028-Videoconferencing-Chen.ppt
Upcoming SlideShare
Loading in...5
×
 

20021028-Videoconferencing-Chen.ppt

on

  • 346 views

 

Statistics

Views

Total Views
346
Views on SlideShare
346
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

20021028-Videoconferencing-Chen.ppt 20021028-Videoconferencing-Chen.ppt Presentation Transcript

  • Challenging 5 Common Assumptions about Videoconferencing Milton Chen Computer Systems Lab Stanford University Presented at Internet2 Advanced Applications Track 10/28/2002
  • The Stanford Video Auditorium desktop interface 15’ x 5’ video wall
  • Video Auditorium publicity/users
    • Intel president Paul Otellini’s Intel Developer Forum keynote
    • Invited demo to NASA headquarters for Paul G. Pastorek
    • CANARIE, Canada
    • CUDI, Mexico
    • Comdex, Brazil
    • IBM Almaden Lab
    • Manhattan College
    • Hopkins Marine Station
    • Stanford Medical School
    • Stanford Learning Lab
    • Stanford Center for Design Research
    • Berkeley Bioengineering Lab
    • Universidade Federal do Rio Grande do Sul, Brazil
  • Outline
    • Common assumptions
      • Technology
    • 1. High-fidelity AV requires dedicated hardware
    • 2. Difficult to install and use
      • Human factors
    • 3. Life size displays are ideal
    • 4. Floor control requires interactive frame rate
    • 5. Eye contact is difficult
    • Beyond MCU and H323
      • Peer-to-peer
      • Stanford’s Port Bootstrap Protocol
      • Personal directory
    • An evaluation of distance learning at Stanford
    • Why videoconferencing is not ubiquitous
  • 1. High-fidelity low-latency AV requires dedicated hardware
  • Your PC outperforms all dedicated systems $700 Pentium 4 computer $7000 systems outperforms                                                
  • Comparison of videoconferencing solutions * CUSeeME, iVisit, Yahoo messenger have unacceptable latency 400 Kbps 720x480 many AccessGrid, VRVS 3000 Kbps 720x480 1 WIDE DVTS 200 Kbps 352x288 4 Polycom, Sony, … 16 to more than 100 1 1 Max number of links 2000 Kbps 720x480 Vbrick 100 Kbps 720x480 Stanford Video Auditorium 200 Kbps 352x288 NetMeeting BW required at 352x288 15fps Max video resolution
  • demo
  • A scalable AV streaming architecture * TrueSpeech 8.5 * MPEG-4 * Encrypted, AES (Rijndael), streaming * Simultaneous AV recording * Perceptual streaming adapts to network conditions audio capture audio compress audio send audio receive audio decompress audio render video capture video compress video send video receive video decompress video render
  • Beyond MCU and H323
    • MCU vs. peer-to-peer
      • Scalability
      • Ease of deployment
    • H323 vs. Stanford’s Port-Bootstrap Protocol
      • Firewall
      • Ease of deployment
    • Personal directory
  • 2. Videoconferencing systems are difficult to install and use
  • One click operation
    • To use the Video Auditorium
      • “ Nothing” to install
      • One click on the html speed dial
        • <OBJECT
        • CLASSID=&quot;CLSID:E80F7B8F-7906-4A89-B59E-B19871F474A9&quot;
        • CODEBASE=&quot;runtime/VA_Start.ocx#Version=-1,-1,-1,-1&quot;>
        • <PARAM NAME=&quot;addr&quot; VALUE=&quot;stanford -client_only&quot;>
        • </OBJECT>
    Makes conferencing as simple as surfing the web
  • 3. Life size displays are ideal
  • Each video should be between 6 ° and 14° wide
      • * 12 people sat 10’ from the display Subjectively, people reported 6 ° as minimum and 14° as ideal. Life size is 12 °.
  • Balance between size and head movements * 12 people viewed 9 and 36 students on a large and immersive display. Immersive display requires head movements to see all the students. 9 ° 14 ° 7 ° 4 °
  • 4. Effective floor control requires interactive frame rate
  • Minimum required frame rate
    • Interactive 10 fps
    • Tolerable 5 fps
      • [Tang and Isaac ’93]
    • Lip synchronization 5 fps
      • [Watson and Sasse ’96]
    • Content understanding 5 fps
      • [Ghinea and Thomas ’98]
    • Sign language recognition 1 fps
      • [Johnson and Caird ’96]
  • Gesture Detection Algorithm input image frame difference after erosion Visualization of algorithm
  • Requires 10% of full motion bandwidth full-motion (10 fps) gesture-sensitive (0.2 fps) * MPEG4 encoded at 320x240
  • Gesture sensitive allows dynamic discussion 15 fps ~0.2 fps 0.2 fps * 8 groups of 4 people during a discussion
  • 5. Eye contact is difficult
  • Eye contact fires up our brain [Kampe et al. ’01]
  • Eye contact is difficult Looking into the camera Attempting eye contact
  • Solutions to eye contact Half-silvered mirror [Rosenthal ’47] MAJIC [Okada, et al. ’94] ClearBoard [Ishii, et al. ’92] GazeMaster [Gemmell, et al. ’00]
  • A simple solution Hydra [Sellen, Buxton, and Arnott ’92]
  • Eye contact sensitivity is high
    • Spatial perception task
    • As good as Snellen acuity
    [Gibson and Pick ’63] 2 m * 6 observers judged 1 looker looker observer 0 8.5 -8.5 0 100 stdev = 2.8 ° Eye contact (%) Angle (deg)
  • Sensitivity is symmetric
    • Cline ’67
    • Kruger and Huckstedt ‘69
    • Anstis, et al. ’69
    • Stokes ’69
    • Ellgring ’70
    PicturePhone camera above display Hydra camera below display
  • Methodology
    • * Two rooms can be linked in a videoconferencing session
    Observers watch videos of looker and judge eye contact large display with camera at the center Record lookers gazing at different targets
  • Sensitivity is asymmetric * 16 observers judged recorded videos of 1 looker
  • An anatomical explanation looking at you looking sideways looking up looking down eye closing Illustrations from The Artist’s Guide to Facial Expression [Faigin ’90]
  • Sensitivity is less in conversation * 16 observers judged videos of 1 looker (down) recorded conversation
  • Sensitivity is less in video * 16 observers judged 1 looker in conversation (down) face-to-face video
  • We are biased to perceive contact angle eye contact (%) sideway, up down down & video down & video & conversation Snellen Acuity Conferencing Acuity 0 100
  • Maximum camera to eyes distance * Assuming a sensitivity of 7 ° 12” 8’ Wall size 3” 2’ Desktop 1.5” 1’ Palm held camera to rendered eyes distance minimum viewing distance device
  • Eye contact in the Video Auditorium
  • Why is videoconferencing essential to distance learning: An evaluation of distance learning at Stanford
  • Distance learning at Stanford
    • Remote students can call in during class
    • Instructor cannot see the remote students
    a 1969 classroom a 2002 operator console a 2002 lecture viewer
  • Students like distance learning * 120 students, 15 TAs, and 41 faculty
  • Learning is less effective * 120 students, 15 TAs, and 41 faculty
  • F2F interaction is important F2F is important for lecturing and crucial for discussions
  • No interaction with remote students
    • Classroom observation of 4 CS classes
      • Instructor on average asked 9 questions per session
      • Local students on average asked/made 3 questions/comments per session
      • Remote students spoke once in 6 month
  • Value of video beyond audio
    • Cues only transmitted by the visual channel
      • Negative feedbacks, …
    • Emotional bond
      • Establishing and maintaining relationships
    • Can you imagine it?
      • A new face, …
  • A proposal
  • The world’s largest video wall: link all Internet2 members for Spring 03
    • Developed technology
    • One Mouse
    • AV stream migration
    • Bandwidth: 2 x 300 x (100 Kbps + 10 Kbps)  60Mbps
    • Cost: 10 P4 laptops + 10 portable projectors  $30K
  • A prediction
  • Why all videoconferencing products has failed A plane that does not fly is not a plane First flight, Wrights 1903
    • A videophone that limits communication is not a videophone
      • poor audio fidelity
      • poor video fidelity
      • excessive latency
      • no eye contact
      • poor lip synchronization
  • Threshold of quality for the 2nd revolution first mobile phone, 1924 first handheld phone, 1973 1 st Revolution: Possible 2 nd Revolution: Practical first videoconferencing system, 1927
  • Conclusion
    • Common assumptions
    • 1. High-fidelity AV requires dedicated hardware higher on a PC
    • 2. Difficult to install/use one click
    • 3. Life size displays are ideal 6 ° to 14 °
    • 4. Floor control requires at least 10fps 0.2 fps avg
    • 5. Eye contact is difficult 7 ° down
    • Videoconferencing is essential to distance learning
    • A MCU-less and H323-less future
  • You already have a one-click high-fidelity multiparty videoconferencing system We are at the dawn of a videoconferencing revolution that will fuel the demand for a 1000X increase in available bandwidth
    • Acknowledgement
      • NASA
      • Intel
      • Sony
      • Interval Research
      • Wallenberg Global Learning Network
      • Department of Defense
    • Future work
      • Gold release for Feb 2003
      • SDK
      • The Wall 