(download presentation)


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Echo and noise need to be addressed for a truly HD experience. But device design, and added bandwidth make task challenging.
  • High quality HD Voice codec Suitable for usage scenario (packet loss, bit-rate, complexity, cost, …) End-to-end network HD Voice support Preferably no transcoding Transcoding very common today Introduces delay, quality degradation and processing in the network Absolutely no narrowband
  • There are many codecs available in the market today that support HD Voice People choose specific codecs for the sound quality it can deliver but the actual quality delivered can be impacted by multiple other factors such as available bit-rate, cost, complexity. The diagram you see on this slide are results from a listening test which contained some of the most popular wideband codecs to show that there are significant differences in speech quality. Among those tested, you see some standard codecs such as AMR-Wideband and G.729.1 as well as a proprietary codec or more of a de-facto standard codec (iSAC from global IP solutions) and open source such as Speex. Although there may not be vast differences technically – cost tends to be one of the main factors when choosing a codec as well as interoperability. With one exception the wideband codecs all yield a noticeable improvement over (un-coded) narrowband speech A listening tests was conducted to assess the quality of codecs using 20 listeners. The source material consisted of two utterances each from four male and four female speakers sampled at 48 kHz in low noise conditions and down-sampled to 16 kHz. The tests followed the standard MUSHRA test methodology defined by ITU-R recommendation BS.1534-1. In MUSHRA, the listener is presented with a labeled reference and a number of unlabeled test samples. The listener assigns a rating of the unlabeled samples using a numerical continuous scale ranging from 0 to 100 in five descriptive intervals. The range 100-80 is described as “excellent,” the range 80-60 as ”good,” the range 60-40 as “fair,” the range 40-20 as “poor,” and the range 20-0 as “bad.” Among the unlabeled samples are a hidden version of the reference and one or more so-called anchor samples. For the lower rates, AMR-WB performed the best, with a quality almost equivalent to 24 kbps. Also noteworthy is that Speex was rated higher at 16 kbps than 24 kbps. At the higher rates, iSAC performed the best. iSAC was coded with a slightly higher average rate than the other codecs to get a comparable network load. AMR-WB, G.729.1 and Speex were coded at 24 kbps, which with 20 ms packets corresponds to an IP network load of 40 kbps, considering 40 bytes of overhead in each packet. iSAC was coded with an average rate of 26 kbps and with 30 ms packets, corresponding to a network load of around 38 kbps. This set of experiments compared iSAC to AMR-WB at comparable network bit rates, subjected to different degrees of packet loss. As can be seen, iSAC maintains its edge up to 10% packet loss.
  • (download presentation)

    1. 1. Implementing HD Voice on Mobile Devices
    2. 2. <ul><li>Global IP Solutions (GIPS) </li></ul><ul><ul><li>Recognized leader in world class voice and video processing technology for IP networks </li></ul></ul><ul><ul><li>GIPS software is deployed in over a billion end-points </li></ul></ul><ul><ul><li>Enables developers, operators and mobile manufacturers to offer the highest quality regardless of network conditions </li></ul></ul><ul><li>Roar Hagen </li></ul><ul><ul><li>Chief Technical Officer– GIPS </li></ul></ul><ul><ul><li>R&D in speech and video processing for more than 20 years </li></ul></ul><ul><ul><li>Ph.D. in Information Theory (source coding) </li></ul></ul>
    3. 3. Where is HD Voice Most Needed? <ul><li>Where audio bandwidth is currently most constrained </li></ul><ul><li>Where intelligibility is most difficult </li></ul><ul><li>Where environmental factors (background noise, echo) further impair quality </li></ul><ul><li>Conferencing/Collaboration </li></ul>CONFIDENTIAL *Best quality means clean, or perfect conditions Mobile telephony is the most in need of a quality upgrade Best possible PSTN Quality* Best possible Cell Phone Quality* Hang-up, intolerable speech degradation
    4. 4. Mobile Quality Issues <ul><li>Network Deficiencies </li></ul><ul><ul><li>Low signal strength </li></ul></ul><ul><ul><li>Wi-Fi bottlenecks </li></ul></ul><ul><ul><li>Additional delay, jitter and packet loss in IP network </li></ul></ul><ul><li>Device Limitations </li></ul><ul><ul><li>Limited processing power and battery life </li></ul></ul><ul><ul><li>OS limiting access to real-time VoIP </li></ul></ul><ul><ul><li>Recording and playout </li></ul></ul><ul><li>Environmental disturbances </li></ul><ul><ul><li>Acoustic Echo </li></ul></ul><ul><ul><li>Background noise </li></ul></ul>
    5. 5. Design Considerations Design Challenges Coping with Network Degradation Power Consumption Hardware Issues (Processor, OS, Acoustics, etc.) Echo Cancellation Additional Voice Processing Components Environment – Background Noise, Room Acoustics, etc. Speech Codec Network Codec Hardware Echo Power Voice Environment
    6. 6. Impact of Poor Networks <ul><li>Packet Loss </li></ul><ul><ul><li>Occurs due to flushed buffers in network nodes </li></ul></ul><ul><ul><li>Same effect if packets are too late to be used </li></ul></ul><ul><ul><li>Smooth concealment necessary </li></ul></ul><ul><li>Network Jitter </li></ul><ul><ul><li>Transmission time differs for each packet </li></ul></ul><ul><ul><li>Jitter buffer necessary to ensure continuous playout </li></ul></ul><ul><ul><li>Trade-off between delay and quality </li></ul></ul><ul><li>Latency </li></ul><ul><ul><li>Major effect is “stepping on each other’s talk” </li></ul></ul><ul><ul><li>Long delays make echo more annoying </li></ul></ul>
    7. 7. Enhancing the HD Voice Experience <ul><li>Echo cancellation </li></ul><ul><ul><li>Speaker and microphone placement can create echo </li></ul></ul><ul><ul><li>Hands free even more difficult </li></ul></ul><ul><ul><li>Higher audio bandwidth allows more echo </li></ul></ul><ul><ul><li>CPU limitations make full duplex difficult </li></ul></ul><ul><li>Noise suppression </li></ul><ul><ul><li>Mobile environments inherently noisy </li></ul></ul><ul><ul><li>Higher audio bandwidth allows more noise </li></ul></ul>
    8. 8. Sound and Noise CONFIDENTIAL
    9. 9. So Many Codec Options G.729.1 G.719 G.718 RTAudio SILK iPCM-WB Speex iSAC AAC-LD G.722.2 (AMR-WB) G.722.1 (Siren) G.722 BV 32 SVOPC G.711.1 EVRC-WB iSAC G.722 G.711.1
    10. 10. MUSHRA Test Results For Different Codecs
    11. 11. Summary <ul><li>Mobile (and conferencing) in most need of HD </li></ul><ul><ul><li>But also have the most challenges </li></ul></ul><ul><li>Need overall high quality HD design </li></ul><ul><ul><li>Delay, Packet Loss, Jitter handling </li></ul></ul><ul><ul><li>Suitable codec designed for IP networks </li></ul></ul><ul><ul><li>Echo Cancellation, Noise Suppression, Gain Control </li></ul></ul><ul><ul><li>Conference Mixing </li></ul></ul><ul><li>New devices more powerful and open (Android) </li></ul><ul><ul><li>Easier to provide high quality </li></ul></ul><ul><ul><li>Less need for heavy optimization of complexity </li></ul></ul>