Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harm Belt & Kees Janse - Lifelike Communication -front-end audio and video technologies


Published on

  • Be the first to comment

  • Be the first to like this

Harm Belt & Kees Janse - Lifelike Communication -front-end audio and video technologies

  1. 1. Lifelike CommunicationFront-end audio and video technologiesHarm Belt and Kees JansePhilips Research, Eindhoven, The NetherlandsiMinds, Ghent, Belgium, 16 December 2010.<br />
  2. 2. Philips defined: we are…<br />“…a global company of leading businesses creating value with meaningful innovations that improve people’s health and well-being.”<br />Healthcare Lighting Consumer lifestyle<br />
  3. 3. Communication<br /><ul><li>A fundamental social process
  4. 4. A basic human need</li></ul>Social support, belonging, love, friendship, intimacy, connection, sharing, being near friends and family, feeling secure, …<br />
  5. 5. Communication is part of our lifestyle<br />We have means to communicate, any time anywhere,<br />but it is not natural yet. We want to communicate<br /><ul><li>without being bothered by the equipment,
  6. 6. feeling free.
  7. 7. Lifelike, a natural replacement of “face-to-face”.</li></li></ul><li>Outline<br />Lifelike Communication<br />The spatial sound & video communication terminal<br />Conclusions<br />
  8. 8. Lifelike communication - what is important?<br />Speech!<br />“Without video you talk,<br /> without audio you walk”<br />Video? Definitely!<br />Non-verbal communications<br />But the quality must be high<br />Intelligibility<br />Clarity (fatigue)<br />Eye contact<br />“Lifelike” implies a large screen, hence a distance between people and sensors.<br />
  9. 9. Lifelike communicationImportant applications<br />Family and Friends connect<br />Remote health care<br />Doctor at hospital, patient at home<br />Family member can join<br />Remote patient monitoring<br />Doctor and remote colleague<br /> during medical procedure<br />
  10. 10. Telepresence systems<br />High quality audio and video<br />Multiple users<br />However:<br />Very expensive<br />Little freedom to move<br />Limited applicability<br />Room is fully conditioned from an acoustic and illumination point of view.<br />
  11. 11. PC video phone clients<br />Free.<br />Great for single user.<br />However:<br />Only small distance to sensors allowed.<br />Audio and video quality in general is not sufficient.<br />
  12. 12. Audio Video Enhancements for Communication<br />scene<br />analysis<br />microphone(s)<br />audio<br />enhancement<br />speaker(s)<br />audio/video<br />(de-)coding<br />transmit /<br />receive<br />camera(s)<br />video<br />enhancement<br />display(s)<br />
  13. 13. Communication terminal – spatial sound & video<br />a<br />c<br />..<br />b<br />d<br />Lifelike communication<br />Spatial audio: no fatigue during simultaneous conversations<br />Spatial video: eye contact<br />Communication dynamics like in real life.<br />
  14. 14. Communication terminal<br />- Spatial Sound -<br />
  15. 15. Acoustic impulse response<br />A typical acoustic impulse response<br />sound<br />source<br />direct<br />sound<br />component<br />microphone<br />diffuse<br />sound<br />component<br />
  16. 16. Energy decaycurve<br />h[n]<br />c[n]<br />n<br />
  17. 17. Speech clarity<br />h[n]<br />This jump in c[n] determines<br /> the speech clarity<br />slope: reverberation time<br />(T60)<br />c[n]<br />n<br />
  18. 18. Speech clarity<br />The clarity index is defined by the ratio between direct and diffuse sound.<br />Clarity index of at least 7 dB needed to avoid listener’s fatigue.<br />At 4 meters distance in a reverberant room (T60=800ms) this is very difficult to achieve.<br />Direct sound is attenuated much, bad direct/diffuse ration -> fatigue<br />Multi-microphone adaptive beamforming<br />We achieve 7 dB even in reverberant rooms (T60=800ms)<br />
  19. 19. Multiple microphones – improving clarity index<br />Simple delay-and-sum beamforming<br />sound<br />source<br />+<br />
  20. 20. Communication terminal - sound<br />Two locations with mono connection<br />One-to-one communication goes well.<br />Technologies<br />Full-duplex Acoustic Echo Cancellation<br />Noise Suppression<br />Clarity index improvement<br />Adaptive beamforming<br />Audio/video person localization<br />..<br />audio<br />enhancement<br />..<br />audio/video<br />tracking<br />
  21. 21. Communication terminal - sound<br />a<br />Two locations with mono connection<br />Multi-to-one communication: NOT OK<br />There is only a mono sound connection. <br />Far-end sound sources cannot be separated by listener<br />creates fatigue<br />b<br />..<br />c<br />
  22. 22. Communication terminal - sound<br />a<br />a<br />Multiple locations with mono transmission<br />Each terminal transmits a mono signal,<br /> and receives multiple signals.<br />Multi-to-one communication goes well.<br />In the near-end terminal<br />Multiple loudspeakers<br />Multi-channel Acoustic Echo Cancellation<br />Spatial sound is achieved by sound panning<br />Much reduced fatigue<br />b<br />b<br />..<br />c<br />c<br />
  23. 23. Communication terminal – sound<br />a<br />Two locations with multichannel transmission<br />Each terminal transmits and receives multiple signals.<br />Multi-to-one communication goes well.<br />In addition to all the previously mentioned technologies<br />source separation needed<br />Adaptive microphone array processing<br />“virtual close talk microphones”<br />b<br />..<br />c<br />Each microphone signal<br />contains contributions<br />from a, b, and c.<br />We want to transmit a, b,<br />and c separately.<br />
  24. 24. Communication terminal – stereo sound<br />c<br />a<br />..<br />b<br />d<br />a<br />Source<br />Separation<br />(a/v tracker)<br />Spatial<br />sound<br />reproduction<br />decoder<br />coder<br />..<br />b<br />
  25. 25. Communication terminal<br />- Spatial Video -<br />
  26. 26. Eye contactToday’s issue<br />Drawback of traditional display technologies for Telepresence:<br />Lack of natural eye contact and directional gaze awareness; <br />2D displays do not offer the sense of physical presence.<br />Two photo’s taken at the same time<br />
  27. 27. Eye contact telepresenceEU FP7 3DPresence<br />
  28. 28. Eye contact displayEU FP7 3DPresence<br />
  29. 29. 9<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />3D displays based on lenticular lenses<br />Merged views<br />lenticular lens<br />Right eye view<br />Left eye view<br />display<br />
  30. 30. Eye contact displaywith lenticular lenses<br />28<br />A large viewing cone: maximum freedom of movement for the two viewers. <br />A sufficiently large amount of views: good depth impression from a binocular cue. <br />Good picture quality: minimize resolution loss. <br /><ul><li>15 views
  31. 31. 46 degree viewing cone
  32. 32. Slant 1/6</li></li></ul><li>29<br />Eye contact displaywith lenticular lenses<br />Lens design<br />46o<br />
  33. 33. Eye contact displayInput format for rendering<br />Dual “image + depth” input  15 views (7 left + 7 right + 1 transition)<br />30<br />
  34. 34. 31<br />Eye contact displayBased on lenticular lenses<br />Natural eye gaze awareness: Offering multiple perspectives of the remote person using multi-view display design. <br />Immersive feeling: 3D autostereoscopic technology to maximize the feeling of physical presence. <br />(b) View from position B<br />(a) View from position A<br />
  35. 35. Communication terminal – spatial sound & video<br />a<br />c<br />..<br />b<br />d<br />Experiences<br />Communication dynamics feel like “real life”<br />People can talk through each other, casual communication enabled (no discipline needed)<br />Feels relaxed, less fatigue after longer time<br />
  36. 36. Conclusions<br />Lifelike communication important for Philips<br />Family & friends<br />Doctor & patient (& family)<br />Doctor & doctor<br />Lifelike communication<br />Spatial sound and video is an important aspect<br />Presented: The spatial sound & video communication terminal<br />