Harm Belt & Kees Janse - Lifelike Communication -front-end audio and video technologies

  • 742 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
742
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lifelike CommunicationFront-end audio and video technologiesHarm Belt and Kees JansePhilips Research, Eindhoven, The NetherlandsiMinds, Ghent, Belgium, 16 December 2010.
  • 2. Philips defined: we are…
    “…a global company of leading businesses creating value with meaningful innovations that improve people’s health and well-being.”
    Healthcare Lighting Consumer lifestyle
  • 3. Communication
    • A fundamental social process
    • 4. A basic human need
    Social support, belonging, love, friendship, intimacy, connection, sharing, being near friends and family, feeling secure, …
  • 5. Communication is part of our lifestyle
    We have means to communicate, any time anywhere,
    but it is not natural yet. We want to communicate
    • without being bothered by the equipment,
    • 6. feeling free.
    • 7. Lifelike, a natural replacement of “face-to-face”.
  • Outline
    Lifelike Communication
    The spatial sound & video communication terminal
    Conclusions
  • 8. Lifelike communication - what is important?
    Speech!
    “Without video you talk,
    without audio you walk”
    Video? Definitely!
    Non-verbal communications
    But the quality must be high
    Intelligibility
    Clarity (fatigue)
    Eye contact
    “Lifelike” implies a large screen, hence a distance between people and sensors.
  • 9. Lifelike communicationImportant applications
    Family and Friends connect
    Remote health care
    Doctor at hospital, patient at home
    Family member can join
    Remote patient monitoring
    Doctor and remote colleague
    during medical procedure
  • 10. Telepresence systems
    High quality audio and video
    Multiple users
    However:
    Very expensive
    Little freedom to move
    Limited applicability
    Room is fully conditioned from an acoustic and illumination point of view.
  • 11. PC video phone clients
    Free.
    Great for single user.
    However:
    Only small distance to sensors allowed.
    Audio and video quality in general is not sufficient.
  • 12. Audio Video Enhancements for Communication
    scene
    analysis
    microphone(s)
    audio
    enhancement
    speaker(s)
    audio/video
    (de-)coding
    transmit /
    receive
    camera(s)
    video
    enhancement
    display(s)
  • 13. Communication terminal – spatial sound & video
    a
    c
    ..
    b
    d
    Lifelike communication
    Spatial audio: no fatigue during simultaneous conversations
    Spatial video: eye contact
    Communication dynamics like in real life.
  • 14. Communication terminal
    - Spatial Sound -
  • 15. Acoustic impulse response
    A typical acoustic impulse response
    sound
    source
    direct
    sound
    component
    microphone
    diffuse
    sound
    component
  • 16. Energy decaycurve
    h[n]
    c[n]
    n
  • 17. Speech clarity
    h[n]
    This jump in c[n] determines
    the speech clarity
    slope: reverberation time
    (T60)
    c[n]
    n
  • 18. Speech clarity
    The clarity index is defined by the ratio between direct and diffuse sound.
    Clarity index of at least 7 dB needed to avoid listener’s fatigue.
    At 4 meters distance in a reverberant room (T60=800ms) this is very difficult to achieve.
    Direct sound is attenuated much, bad direct/diffuse ration -> fatigue
    Multi-microphone adaptive beamforming
    We achieve 7 dB even in reverberant rooms (T60=800ms)
  • 19. Multiple microphones – improving clarity index
    Simple delay-and-sum beamforming
    sound
    source
    +
  • 20. Communication terminal - sound
    Two locations with mono connection
    One-to-one communication goes well.
    Technologies
    Full-duplex Acoustic Echo Cancellation
    Noise Suppression
    Clarity index improvement
    Adaptive beamforming
    Audio/video person localization
    ..
    audio
    enhancement
    ..
    audio/video
    tracking
  • 21. Communication terminal - sound
    a
    Two locations with mono connection
    Multi-to-one communication: NOT OK
    There is only a mono sound connection.
    Far-end sound sources cannot be separated by listener
    creates fatigue
    b
    ..
    c
  • 22. Communication terminal - sound
    a
    a
    Multiple locations with mono transmission
    Each terminal transmits a mono signal,
    and receives multiple signals.
    Multi-to-one communication goes well.
    In the near-end terminal
    Multiple loudspeakers
    Multi-channel Acoustic Echo Cancellation
    Spatial sound is achieved by sound panning
    Much reduced fatigue
    b
    b
    ..
    c
    c
  • 23. Communication terminal – sound
    a
    Two locations with multichannel transmission
    Each terminal transmits and receives multiple signals.
    Multi-to-one communication goes well.
    In addition to all the previously mentioned technologies
    source separation needed
    Adaptive microphone array processing
    “virtual close talk microphones”
    b
    ..
    c
    Each microphone signal
    contains contributions
    from a, b, and c.
    We want to transmit a, b,
    and c separately.
  • 24. Communication terminal – stereo sound
    c
    a
    ..
    b
    d
    a
    Source
    Separation
    (a/v tracker)
    Spatial
    sound
    reproduction
    decoder
    coder
    ..
    b
  • 25. Communication terminal
    - Spatial Video -
  • 26. Eye contactToday’s issue
    Drawback of traditional display technologies for Telepresence:
    Lack of natural eye contact and directional gaze awareness;
    2D displays do not offer the sense of physical presence.
    Two photo’s taken at the same time
  • 27. Eye contact telepresenceEU FP7 3DPresence
  • 28. Eye contact displayEU FP7 3DPresence
  • 29. 9
    1
    2
    3
    4
    5
    6
    7
    8
    3D displays based on lenticular lenses
    Merged views
    lenticular lens
    Right eye view
    Left eye view
    display
  • 30. Eye contact displaywith lenticular lenses
    28
    A large viewing cone: maximum freedom of movement for the two viewers.
    A sufficiently large amount of views: good depth impression from a binocular cue.
    Good picture quality: minimize resolution loss.
  • 29
    Eye contact displaywith lenticular lenses
    Lens design
    46o
  • 33. Eye contact displayInput format for rendering
    Dual “image + depth” input  15 views (7 left + 7 right + 1 transition)
    30
  • 34. 31
    Eye contact displayBased on lenticular lenses
    Natural eye gaze awareness: Offering multiple perspectives of the remote person using multi-view display design.
    Immersive feeling: 3D autostereoscopic technology to maximize the feeling of physical presence.
    (b) View from position B
    (a) View from position A
  • 35. Communication terminal – spatial sound & video
    a
    c
    ..
    b
    d
    Experiences
    Communication dynamics feel like “real life”
    People can talk through each other, casual communication enabled (no discipline needed)
    Feels relaxed, less fatigue after longer time
  • 36. Conclusions
    Lifelike communication important for Philips
    Family & friends
    Doctor & patient (& family)
    Doctor & doctor
    Lifelike communication
    Spatial sound and video is an important aspect
    Presented: The spatial sound & video communication terminal