View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Lifelike, a natural replacement of “face-to-face”.
Outline Lifelike Communication The spatial sound & video communication terminal Conclusions
Lifelike communication - what is important? Speech! “Without video you talk, without audio you walk” Video? Definitely! Non-verbal communications But the quality must be high Intelligibility Clarity (fatigue) Eye contact “Lifelike” implies a large screen, hence a distance between people and sensors.
Lifelike communicationImportant applications Family and Friends connect Remote health care Doctor at hospital, patient at home Family member can join Remote patient monitoring Doctor and remote colleague during medical procedure
Telepresence systems High quality audio and video Multiple users However: Very expensive Little freedom to move Limited applicability Room is fully conditioned from an acoustic and illumination point of view.
PC video phone clients Free. Great for single user. However: Only small distance to sensors allowed. Audio and video quality in general is not sufficient.
Audio Video Enhancements for Communication scene analysis microphone(s) audio enhancement speaker(s) audio/video (de-)coding transmit / receive camera(s) video enhancement display(s)
Communication terminal – spatial sound & video a c .. b d Lifelike communication Spatial audio: no fatigue during simultaneous conversations Spatial video: eye contact Communication dynamics like in real life.
Speech clarity h[n] This jump in c[n] determines the speech clarity slope: reverberation time (T60) c[n] n
Speech clarity The clarity index is defined by the ratio between direct and diffuse sound. Clarity index of at least 7 dB needed to avoid listener’s fatigue. At 4 meters distance in a reverberant room (T60=800ms) this is very difficult to achieve. Direct sound is attenuated much, bad direct/diffuse ration -> fatigue Multi-microphone adaptive beamforming We achieve 7 dB even in reverberant rooms (T60=800ms)
Communication terminal - sound Two locations with mono connection One-to-one communication goes well. Technologies Full-duplex Acoustic Echo Cancellation Noise Suppression Clarity index improvement Adaptive beamforming Audio/video person localization .. audio enhancement .. audio/video tracking
Communication terminal - sound a Two locations with mono connection Multi-to-one communication: NOT OK There is only a mono sound connection. Far-end sound sources cannot be separated by listener creates fatigue b .. c
Communication terminal - sound a a Multiple locations with mono transmission Each terminal transmits a mono signal, and receives multiple signals. Multi-to-one communication goes well. In the near-end terminal Multiple loudspeakers Multi-channel Acoustic Echo Cancellation Spatial sound is achieved by sound panning Much reduced fatigue b b .. c c
Communication terminal – sound a Two locations with multichannel transmission Each terminal transmits and receives multiple signals. Multi-to-one communication goes well. In addition to all the previously mentioned technologies source separation needed Adaptive microphone array processing “virtual close talk microphones” b .. c Each microphone signal contains contributions from a, b, and c. We want to transmit a, b, and c separately.
Communication terminal – stereo sound c a .. b d a Source Separation (a/v tracker) Spatial sound reproduction decoder coder .. b
Eye contactToday’s issue Drawback of traditional display technologies for Telepresence: Lack of natural eye contact and directional gaze awareness; 2D displays do not offer the sense of physical presence. Two photo’s taken at the same time
9 1 2 3 4 5 6 7 8 3D displays based on lenticular lenses Merged views lenticular lens Right eye view Left eye view display
Eye contact displaywith lenticular lenses 28 A large viewing cone: maximum freedom of movement for the two viewers. A sufficiently large amount of views: good depth impression from a binocular cue. Good picture quality: minimize resolution loss.
Eye contact displayInput format for rendering Dual “image + depth” input 15 views (7 left + 7 right + 1 transition) 30
31 Eye contact displayBased on lenticular lenses Natural eye gaze awareness: Offering multiple perspectives of the remote person using multi-view display design. Immersive feeling: 3D autostereoscopic technology to maximize the feeling of physical presence. (b) View from position B (a) View from position A
Communication terminal – spatial sound & video a c .. b d Experiences Communication dynamics feel like “real life” People can talk through each other, casual communication enabled (no discipline needed) Feels relaxed, less fatigue after longer time
Conclusions Lifelike communication important for Philips Family & friends Doctor & patient (& family) Doctor & doctor Lifelike communication Spatial sound and video is an important aspect Presented: The spatial sound & video communication terminal