Improving Perceived Speech Quality for Wireless

           VoIP By Cross-Layer Designs




                       By Zhuo...
ABSTACT
    Providing VoIP services with satisfying speech quality in wireless/mobile
Internet is difficult because of imp...
TABLE OF CONTENTS
ABSTACT ...................................................................................................
3.2.3 Adaptive jitter buffer and retransmission jitters...........................................23
  3.3 Simulation Syst...
LIST OF FIGURES
Figure 1-1 VoIP Protocol Architecture……………………………………………..... 2
Figure 1-2 the Wireless VoIP system overview...
ACKOWLEDGEMENTS
    I would like to express my sincere and deep gratitude to my supervisor, Professor
Emmanuel C. Ifeacher...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




CHAPTER 1

INTRODUCTION




1.1 VoIP and I...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




         Application Layer                ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




        Speech Source



     Talk        ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


    As the voice packets are sent over IP ne...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


uncertainty of the mobility of wireless hand...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


networks, queuing delays in network elements...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


error recovery methods are described hereaft...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


the entire datagram including media payload....
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


layered protocol architecture have been deve...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


    technologies?


    How to establish a c...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


    which may be used to improve speech qual...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




CHAPTER 2

BACKGROUND THEORIES




2.1 Spe...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




 Quality Scale       Score                ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


of subjective tests. PESQ takes into account...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


Equipment impairment Ie
    The loss impairm...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


proposed in [29]. This schematic diagram of ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


^
v i on the arrival of the ith packet. The ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


a sudden and large increase in delays over a...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


the situation is to be handled. For the retr...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




CHAPTER 3

PERCEIVED SPEECH QUALITY DRIVEN...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


sensitive to delay. A simplex retransmission...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


from the parameters of the previous frames. ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




3.2.3 Adaptive jitter buffer and retransmi...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


                                            ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


buffer and subsequently decoded to recover t...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


However, in TABLE 3-1, we can see that Full ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




                          2
              ...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


conversation interactivity [37]. Considering...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


    The performance of the new perceived spe...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


implemented to demonstrate wireless voice ov...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs




CHPAPTER 4

PLAYOUT DELAY CONSTRAINED ARQ
...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


     In ARQ, the sender sends packets or Pro...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


    These problems have been addressed in so...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


4.2.1 System model
    The system model of t...
Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs


given in Figure 4-3. In the receiver, the 80...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Upcoming SlideShare
Loading in …5
×

Improving Perceived Speech Quality for Wireless VoIP by Cross ...

1,661 views
1,579 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,661
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Improving Perceived Speech Quality for Wireless VoIP by Cross ...

  1. 1. Improving Perceived Speech Quality for Wireless VoIP By Cross-Layer Designs By Zhuoqun Li This dissertation is submitted to the University of Plymouth in partial fulfilment of the award of Master of Research in Network System Engineering Supervisor Prof. Emmanuel C. Ifeachor School of Computing, Communication and Electronics University of Plymouth September 2003
  2. 2. ABSTACT Providing VoIP services with satisfying speech quality in wireless/mobile Internet is difficult because of impairment factors introduced in the wireless channel, such as packet error, delay and jitter. Effective packet error recovery mechanisms such as Automatic Repeat on reQuest (ARQ) in wireless networks are important as they can reduce packet loss due to bit errors. This dissertation is focus on making use of cross-layer techniques to improve the performance of ARQ hence to improve the perceived speech quality for Wireless VoIP, which may be difficult for the layered protocol structure. The research works for this project have been carried out in two steps: First, we use an objective measure of perceived conversational speech quality (MOSc) as a metric to evaluate the performance of three current retransmission schemes (i.e. No Retransmission, Speech Property-Based Retransmission and Full Retransmission). Our findings indicate that the performance of the retransmission mechanisms is a function of both wireless link quality and delay introduced in the wireline network. We also propose a perceived speech quality driven retransmission mechanism, which can automatically switch to the most suitable retransmission schemes according to QoS parameters reported from different layers. Next, we investigate the problems introduced by retransmission procedures of the Stop and Wait ARQ protocol in a Wireless VoIP system. We then propose a cross- layer framework in which 1) the retransmission procedure of the link layer ARQ protocol is constrained by the available playout delay 2) In the playout delay estimation, delivery delay in the wireless channel and wireline network is estimated separately, and the delivery delay in the wireless channel is constrained to avoid delay accumulations in the transmitting queue.3) If the retransmission procedure is terminated prematurely, received noisy copies of a speech packet are combined together to reduce the damaged part and finally played out at the application layer. Simulation results show that these cross-layer designs improved the performance of the Stop and Wait ARQ protocol hence significantly enhanced the perceptual speech quality of a wireless VoIP system. I
  3. 3. TABLE OF CONTENTS ABSTACT .................................................................................................................I TABLE OF CONTENTS....................................................................................... II LIST OF FIGURES............................................................................................... IV LIST OF TABLES ................................................................................................. IV ACKOWLEDGEMENTS........................................................................................V CHAPTER 1 .............................................................................................................1 INTRODUCTION...................................................................................................1 1.1 VoIP and Its Application in Wireless Internet.......................................................1 1.2 Motivation ............................................................................................................4 1.2.1 Impairment factors of wireless VoIP speech quality......................................4 1.2.2 Packet error concealment techniques.............................................................6 1.2.3 Cross-layer designs ........................................................................................8 1.2.4 Problem statement..........................................................................................9 1.3 Aims and Objectives...........................................................................................10 1.4 Thesis Contributions...........................................................................................10 1.5 Organization of the Thesis.................................................................................. 11 CHAPTER 2 ........................................................................................................... 12 BACKGROUND THEORIES ............................................................................... 12 2.1 Speech Quality Evaluations................................................................................12 2.1.1 Objective Speech Quality Measurement......................................................12 2.1.2 PESQ............................................................................................................13 2.1.3 E-Model .......................................................................................................14 2.1.4 Conversational speech quality evaluation....................................................15 2.2 Adaptive Playout Buffer .....................................................................................16 2.3 Automatic Repeat upon reQuest (ARQ).............................................................18 CHAPTER 3 PERCEIVED SPEECH QUALITY DRIVEN RTRANSMISSION METCHANISM .........................20 3.1 Introduction ........................................................................................................20 3.2 Related Works.....................................................................................................21 3.2.1 Speech property-based retransmission mechanisms ....................................21 3.2.2 Measuring conversational speech quality ....................................................22 II
  4. 4. 3.2.3 Adaptive jitter buffer and retransmission jitters...........................................23 3.3 Simulation System Description ..........................................................................23 3.4 Performance Comparison of Current Retransmission Schemes.........................26 3.5 Perceived Speech Quality Driven Retransmission Scheme ...............................28 3.6 Summary ............................................................................................................29 CHPAPTER 4 PLAYOUT DELAY CONSTRAINED ARQ and ARQ AWARE PLAYOUT BUFFER .................... 31 4.1 Introduction ........................................................................................................31 4.2 The Cross-Layer Design.....................................................................................33 4.2.1 System model...............................................................................................34 4.2.2 Playout delay constrained ARQ ...................................................................34 4.2.3 ARQ aware playout buffer ...........................................................................35 4.2.3.1 Queue model..........................................................................................35 4.2.3.2 ARQ aware playout buffer.....................................................................36 4.3 Simulation Model and Experimental Results .....................................................37 4.3.1 Wireless channel model ...............................................................................37 4.3.2 Voice traffic model.......................................................................................38 4.3.3 Speech quality evaluation ............................................................................38 4.3.4 Simulation results and analysis....................................................................39 4.4 Summary ............................................................................................................41 CHAPTER 5 DISCUSSIONS, SUGGESTIONS for FURTHER WORKS, and CONCLUSIONS...............43 5.1 Discussions .........................................................................................................43 5.2 Suggestions for Further Works ...........................................................................45 5.3 Conclusions ........................................................................................................47 REFERENCES ......................................................................................................49 APPENDICES........................................................................................................53 [APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control ...........................53 [APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ ..............56 [APPENDIX C] C code for Majority-Logic Packet Combining ...........................60 [APPENDIX D] List of Items Included in the Appended CD ...............................63 [APPENDIX E] Published Papers .........................................................................64 III
  5. 5. LIST OF FIGURES Figure 1-1 VoIP Protocol Architecture……………………………………………..... 2 Figure 1-2 the Wireless VoIP system overview……………………………………… 3 Figure 1-3 the Basic model of cross-layer designs………………………………….. 8 Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality…………... 13 Figure 2-2 Schematic diagram for MOSc measurement …………………………...15 Figure 2-3 Timing associated with packet i………………………………………... 16 Figure 3-1 Simulation Environment………………………………………………. 24 Figure 3-2 Overall packet loss rate comparison…………………………………… 27 Figure 3-3 Buffered Retx delay comparison……………………………………….. 27 Figure 3-4 MOSc comparison with 175ms network delay………………………… 27 Figure 3-5 MOSc comparison with packet error probability 0.001………………... 27 Figure 3-6 Perceived speech quality driven Retx scheme pseudo code…………… 29 Figure 4-1 Stop and Wait ARQ……………………………………………………. 31 Figure 4-2 the Cross-layer design system model………………………………….. 33 Figure 4-3 Block diagram of the playout delay constraint ARQ with packet combining…………... 34 Figure 4-4 Timing associated with Packet…………………………………………. 36 Figure 4-5 the Simulation Model………………………………………………….. 37 Figure 4-6 Overall packet losses comparison……………………………………… 39 Figure 4-7 End-to-end delays with different inter-arrival delay…………………… 39 Figure 4-8 End-to-end delay comparison…….……………………………………. 39 Figure 4-9 Conversational MOS comparison……………………………………… 39 Figure 5-1 Perceived speech quality driven packet error recovery scheduler……... 46 LIST OF TABLES Table 2-1 MOS scale……………………………………………………………….13 Table.3-1- Average voiced packets losses with fast-exp playout buffer……………25 IV
  6. 6. ACKOWLEDGEMENTS I would like to express my sincere and deep gratitude to my supervisor, Professor Emmanuel C. Ifeacher, who provided me the opportunity to commence the study of Master of Research. His continuous advice and encouragements through this study are acknowledged and greatly appreciated. I also had the opportunity to work with researchers in the Centre for Signal Processing and Multimedia Communications I would like to thank them for their friendliness and support. Special thanks go to Ms. Lingfen Sun and Mr. ZiZhi Qiao, for their valuable comments and suggestions. Without their support, this thesis would not have been possible. I would like to acknowledge all my classmates in MRes/Msc NSE and CE&SP, for their generous help and enlightening. With them, I really enjoyed the passed year in University of Plymouth. On the personal side, I would like to thank my parents, for their unending love and support. V
  7. 7. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 1 INTRODUCTION 1.1 VoIP and Its Application in Wireless Internet Packet switched networks such as Internet had been developing very fast in the past decades. The advantages of packet switched networks, such as efficiency and flexibility, make them eventually become the terminator of traditional circuit switch networks, i.e. Public Switch Telephone Network (PSTN). VoIP (Voice over Internet Protocol or Voice over Packet) is one of the successful stories about applications of packet networks. Generally, VoIP service is the real time delivery of packetized voice traffic across packet switched networks such as Internet. It provides economical communication expense and suitable speech quality compared with traditional telephone networks. Recently, wireless/mobile communication has been growing rapidly and providing more and more convenient services. It’s not a surprise that there’s a great demand to add voice service to wireless IP networks and wireless handsets. Wireless VoIP services can be provided in Wireless Local Area Network (WLAN) i.e. IEEE 802.11 [1] network or third generation mobile network (3G) i.e. WCDMA [2]. The protocol stack of transmitting VoIP traffic in wireline and wireless network is presented in Figure 1-1. MRes Thesis –University of Plymouth 1
  8. 8. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Application Layer RTP RTCP Transport Layer UDP Network Layer IP Data Link Layer IEEE 802.3 IEEE 802.11x Figure 1-1 VoIP Protocol Architecture In application layer, VoIP is supported by RTP (Real-time Transport Protocol) [3]. RTP provides a way to delivery delay-sensitive real-time data. The services provided by RTP include payload type identification; sequence numbering; timestamping and delivery monitoring. RTP Applications typically running on top of UDP, which does not guarantee Quality of Service (QoS) but requiring lower overhead [4]. RTCP (Real-time Control Protocol) is the control protocol associated with RTP. RTCP monitors the quality of service and conveys information about the participants in an on-going session [3]. After voice sample is compressed and digitised, then it is packed as the payload of an IP packet, along with an IP address for the purposes of routing in IP networks. In the link layer, IP packets with speech data are encapsulated in frames and supported by IEEE 802.3 [4] or 802.11 for wireline network and wireless network respectively. Both of these link layer protocols provide services such as framing, error control, flow control. MRes Thesis –University of Plymouth 2
  9. 9. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Speech Source Talk Silence Internet Encoder Packetizer Depacketizer Decoder Access Playout Buffer Point Sender Receiver Figure 1-2 the Wireless VoIP system overview Figure 1-2 described a VoIP system implemented in the wireless Internet. Speech is an analog signal that varies slowly in time (with bandwidth not exceeding 4KHz). As depicted in Figure 1-2, the speech source alternates between talking and silence periods, which are typically considered to be exponentially distributed. Before transmitted over packet switched networks, the speech analog signal has to be digitised at the sender; the reverse process is performed at the receiver. The digitalization process is composed of sampling, quantization and encoding. There are many encoding techniques that have been developed and standardized by the ITU. The basic encoder is the ITU G.711 which samples the voice signal in 8 kHz and generates 8-bits per sample. Code Excited Linear Prediction (CELP) based encoders provide rate reduction (i.e. 8 Kbps for G.729, 5.3 and 6.4 Kbps for G.723.1) at the expense of lower quality and additional complexity and encoding delay [5]. For the wireless/mobile communication, codecs with variable rate have been developed, e.g. AMR [6], EVRC [7]. The encoded speech is then packetized into packets of equal size. Each such packet includes the headers at the various protocol layers (e.g. RTP 12 bytes, UDP 8 bytes, IP 20 bytes and 802.11 34 bytes) and the payload comprising the encoded speech for a certain duration depends on the codec deployed (e.g. 20ms for an AMR 12.2k frame). In the study, Wireless VoIP system is considered in a last-hop scenario. In this case, voice streams have to traverse wireline networks before they reach the access point, which is the conjunction point of a wireline network and the wireless channel. MRes Thesis –University of Plymouth 3
  10. 10. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs As the voice packets are sent over IP networks and wireless channel, they incur variable delay and possibly loss. In order to provide a smooth playout delay, at the receiver, a playout buffer is used to compensate the delay variations. Packets are held for a later playout time in order to ensure that there are enough packets buffered to be played out continuously. Any packet arriving after its scheduled playout time is discarded. There are two types of playout algorithms: fixed and adaptive. A fixed playout scheme schedules the playout of packets so that the end-to-end delay (including both network and buffering) is the same for all packets. Fixed jitter buffers cannot adapt readily to changes in network delays and as a result are not practical in real VoIP applications. Adaptive playout scheme is more common in VoIP systems. Adaptive playout buffer can adjust playout delay for each talkspurt hence it is more suitable for the time-varying IP networks. The scheduled playout delay is a tradeoff of buffer losses and end-to-end delay. It is important to select the value so as to maximize the quality of voice communications. A large playout delay decreases packet loss due to late arrivals but hinders interactivity between the communicating parties, while small playout delay improves interactivity but causes higher buffer losses and degrades the speech quality. The playout buffer deliver continuous stream of packets with fixed intervals to the depacketiser, whose responsibility is to stretch speech data from the payload and feed them to the decoder. The main function of the decoder is to reconstruct speech signals. Some decoders may implement packet loss concealment (PLC) methods that produce replacement for the lost data packets. Having been depacketized and decoded, speech signals are finally played out by the VoIP end devices. 1.2 Motivation 1.2.1 Impairment factors of wireless VoIP speech quality Perceived speech quality of VoIP is defined in subjective according as perceived by the end users. Despite its costs saving benefits, providing acceptable perceived speech quality is the key for the success of VoIP service. Currently, IP Telephony still can’t provide a very satisfied quality due to lots of impairments factors introduced in the transmission path over IP networks. When VoIP is applied in wireless/mobile IP networks, because of the unreliability of wireless channel performance and the MRes Thesis –University of Plymouth 4
  11. 11. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs uncertainty of the mobility of wireless handsets, the speech quality will be more aggravated. There exist many correlated impairment factors that may seriously affect the perceived speech quality of Wireless VoIP. In this study, the main impairment factors are concluded as packet losses, bit errors, end-to-end delays, jitters and coding. Packet Loss Packet loss is a major impairment factor. It causes more noticeable degradation in voice quality than any other impairment factors. During their trips in the inter- connected IP networks, speech packets may be lost due to router overflow or network link congestion. On the other hand, VoIP applications are supported by the connectionless protocol - UDP, which means speech packets may travel over different paths in the IP networks before they arrive at the destination. This result in some speech packets being out of sequence and are discarded at the receiver. Lost packets may be reconstructed by the decoder from related information. But it is impossible to completely rescue speech information carried by the lost packets. Bit Error Bit error is not really a problem for VoIP in wireline networks, as it does not happen very often. However, if wireless channels are included in the traverse path of speech packets, bit errors become a challenging nutshell. In the wireless environment, the digital signal wave is exposed to absorption, scattering, interference and multi- path fading. All these effects contribute to the Signal to Noise Ratio (SNR) at the receiver and hence determine the performance of Bit Error Rate (BER). For packet communications, the result of bit errors is packet loss if the whole packet is covered by a checksum. However, if a partial checksum is used specifically for VoIP applications, speech packets contain bit errors in the payload are still decoded and played out. In this case, the effect of bit error on the perceived speech quality is determined by the positions and number of bit errors. End-to-end delay Delay does not directly cause any reduction in speech information but affects the interactive nature of conversations. The end-to-end delay encompasses: a. the delay incurred in encoding and decoding; b. the delay incurred in packetization; c. the delay incurred in the path from the sender to the receiver (e.g. transmission time over IP MRes Thesis –University of Plymouth 5
  12. 12. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs networks, queuing delays in network elements, propagation and retransmission time in wireless channel); d. the delay incurred in the playout buffer. For natural hearing, delays lower than 100ms cannot really be noticed by most users, between 100ms and 300ms delay begin to affect conversation interactivity [9]. Longer delays are obvious to the user and make conversations becomes impossible. Jitter Jitter is defined as a variation in the delay of received packets. At the sending side, packets are sent in a continuous stream with the packets being spaced evenly apart. Due to network congestion, improper queuing, or configuration errors, the interval between adjacent packets changes constantly, hence the delay between each packet can vary instead of remaining constant. Jitters can make voice very annoying to the audience. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence and re-sequence if necessary. This job is normally performed by playout buffer, which maintains constant packet intervals at the expense of additional playout delay or packet losses due to not arriving in time. Coding In the process of transforming analog speech signal to digital bit streams, some codecs also use compression techniques to remove redundant or less important speech information, as a way to reduce transmission bandwidth requirement while preserving perceptual important voice signals. This procedure leads to a certain amount of speech information lost hence affects the speech quality perceived by the user at the receiving side. For Wireless VoIP, speech quality can be also affected the error-correction mechanism used by codecs. 1.2.2 Packet error concealment techniques Packet error due to packet loss or bit error has been a critical impairment factor to the perceived speech quality of Wireless VoIP. Many packet error concealment techniques have been developed and improved with great effort. But these techniques are far from perfect and even can not work properly in new communication environment such as the growing wireless/mobile internet. Some of the main packet MRes Thesis –University of Plymouth 6
  13. 13. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs error recovery methods are described hereafter: Forward Error Correction Forward Error Correction (FEC) [11] enables lost data to be recovered at the receiver without further reference to the sender. Both the original data and the redundant information are transmitted to the receiver. There are two kinds of redundant information: those that are either independent or dependent on the media stream. The media-independent FEC does not need to know the original data type. In media-independent FEC, original data together with some redundant data are transmitted to the receiver. In media dependent or specific FEC, if an original data packet is lost, redundant data packets, which are related to the specific media, are used to recover the loss. Usually, the redundant packet is produced using a lower- bandwidth encoding method than the primary encoding, which results in lower quality than the original one. The expenses of using FEC are reduced bandwidth efficiency and increasing end-to-end delay, for the redundant information is transmitted behind the packet it protects. Interleaving Interleaving has been widely used in mobile networks to distributed burst frame errors in several channels. In VoIP applications, if the size of a data unit produced at a time by a coder is smaller than the allowed payload size in a packet, then a few data units may be combined into a single packet. However, in order to reduce the packet- loss effects, or burst bit error effects in wireless environment, the original data units are not combined in the same sequential order as produced by the coder, instead they are interleaved by the transmitter. The resulting small gap intervals correspond typically to speech intervals considerably shorter than a phoneme length. Therefore, humans are able to mentally interpolate the gap intervals, and speech intelligibility is not decreased. UDP Lite UDP Lite [15] is designed for the applications that prefer to have damaged data delivered rather than discarded by the network. For VoIP over wireless, it’s not necessary to discard speech frames that contain only several bit errors. In IP layer, the IP header has no checksum to cover the IP payload. However UDP checksum covers MRes Thesis –University of Plymouth 7
  14. 14. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs the entire datagram including media payload. In fact, in real network applications, it’s the application layer, not the transport layer, knows best what should be verified by the checksum. UDP Lite provides a checksum with optionally partial coverage. Automatic Retransmission reQuest In Automatic Retransmission reQuest (ARQ) [16], when receiver can’t correctly receive a packet, sender will retransmit it for several times. ARQ-based schemes mainly consist of three parts: a. lost data detection by the receiver or by the sender (timeout); b. acknowledgment strategy: The receiver sends acknowledgments that indicate which data are received or which data are missing; c. retransmission strategy: It determines which data are retransmitted by the sender. Although it is robust and efficient against the burst losses, ARQ also bring a series of problems to real-time applications with delay constraint. 1.2.3 Cross-layer designs IP networks have been successfully supported by the layered protocol architecture since their early development stage. However, for the real-time applications such as Wireless VoIP, the layered architecture may prevent them to be readily adaptive for the instantaneous change of communication environment and consequently can seriously impact their performance. Examples of system performance degradation due to lack of co-operations among different layers have been given in [18]. Corresponding solutions for the problems introduced by the Qos inforamtion mapping and Joint-Layer QoS techniques Figure 1-3 the Basic model of cross-layer designs MRes Thesis –University of Plymouth 8
  15. 15. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs layered protocol architecture have been developed and named as cross-layer approach or cross-layer design. The objective of cross-layer designs is to achieve efficient QoS support and network resource allocating by joint-layer techniques, such as QoS knowledge sharing and QoS mechanisms cooperation among different layers (see Figure 1-3). The system performance of future networks may be enhanced by such cross-layer designs between PHY, MAC and higher layer protocols. Cross-layer designs have been addressed in many recent literatures. Krishnamachari et al [19] proposed a cross-layer framework to enhance the performance of video streaming. This framework can adaptively optimize link layer ARQ, application layer FEC and packetization according to wireless channel conditions. In [20], a cross-layer design was developed to control transmissions of video streams over wireless based on the information of prefetched video (application layer), signal strength and multiple access interference (physical layer). 1.2.4 Problem statement In this dissertation, we raise the following research questions regarding the improvement of perceived speech quality for Wireless VoIP by cross-layer approach. What are the impairment factors of Wireless VoIP applications? What are the pros and cons of ARQ mechanisms? Is the performance of Wireless VoIP System improved by ARQ mechanisms in terms of perceived speech quality? How to optimize current ARQ schemes to improve speech quality? And how to mapping real-time network and wireless channel QoS parameters into ARQ protocol optimization? What are the effects of the interactions between ARQ mechanisms with other components of the Wireless VoIP system? How to cope with these effects if they are negative? How to make use other packet error concealment technologies with ARQ? Or how to use ARQ as a complement mechanism for other packet error concealment MRes Thesis –University of Plymouth 9
  16. 16. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs technologies? How to establish a cross-layer framework in which we can optimize the QoS techniques located in different layer with a joint-layer analysis? And how to establish a profile of real-time predicted speech quality and QoS parameters collected from different layers and eventually make this profile become the scheduler of a cross-layer framework? Bearing these questions in mind, we have reviewed lots of related literatures and carried out research works toward their corresponding solutions. 1.3 Aims and Objectives The aim of this project is to develop and evaluate a cross-layer framework to improve perceived speech quality for Wireless VoIP systems. This framework is expected to utilize QoS parameters from multiple layers and optimize QoS techniques located in different layers based on a joint-layer analysis, consequently to achieve efficient and significant speech quality improvement, which may be very hard or even impossible for single layer approaches. 1.4 Thesis Contributions The contributions of this dissertation are listed hereafter: We identify the impairment factors for perceived speech quality of Wireless VoIP and specifically focus on the impact of ARQ mechanisms. We use an objective measure of perceived conversational speech quality (MOSc) as a metric to evaluate the performance of three current retransmission schemes including no retransmission, Speech Property-Based (SPB) [21] retransmission and full retransmission, while considering the impact of retransmission jitters. Our findings indicate that the performance of the retransmission mechanisms is a function of both wireless link quality and delay introduced in the wireline network. And the SPB retransmission, which is supposed to protect only perceptual important speech frames, may not achieve the expected performance as it introduces two much jitters. We propose a new perceived speech quality driven retransmission mechanism [22] MRes Thesis –University of Plymouth 10
  17. 17. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs which may be used to improve speech quality for wireless VoIP (in terms of the objective mean opinion score) by switching between No retransmission and Full retransmission according to different communication conditions. Through simulations, we show that the proposed method can achieve an optimum MOSc compared to no retransmission, full retransmission and SPB retransmission, and it can also achieve the similar retransmission efficiency as SPB retransmission while avoid the implementation complexity to obtain speech property information that is necessary for SPB retransmission We propose a cross-layer design in which 1) retransmission procedure of the link layer Automatic Repeat on request (ARQ) protocol is constrained by the available delay budget estimated by the application level playout buffer. 2) If the retransmission procedure is terminated prematurely, received noisy copies of a speech packet are presented to application layer and finally played out. 3) In the playout delay estimation, delivery delay in the wireless channel is estimated separately and constrained to avoid delay accumulations in the transmitting queue. The simulation results show that the perceptual speech quality of a wireless VoIP system can be significantly enhanced, since retransmission delay, playout buffer losses, queuing delay and losses are reduced by this design. 1.5 Organization of the Thesis The rest of this dissertation is organized as follows. Chapter 2 provides an introduction to some basic theories related to this project, such as speech quality evaluation, adaptive playout buffer and Automatic Retransmission reQuest (ARQ) protocol. In Chapter 3, we look at the impairment factors introduced by ARQ schemes, and introduce a perceived speech quality driven retransmission scheme to achieve optimum conversational speech quality. In Chapter 4, we consider problems introduced by an ARQ protocol when it works with other components of a Wireless VoIP system (e.g. transmitting queue, adaptive playout buffer) in the layered protocol architecture, and propose a cross-layer design as a solution for the presented problems. Finally, in Chapter 5 we discuss the research outcome of this project, and present extensions and ideas for future works, a short conclusion is also presented to conclude this thesis. MRes Thesis –University of Plymouth 11
  18. 18. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 2 BACKGROUND THEORIES 2.1 Speech Quality Evaluations 2.1.1 Objective Speech Quality Measurement In voice communications, the mean opinion score (MOS) provides a numerical measure of the quality of human speech at the receiving end. MOS indicates the speech quality perceived by the listener and can range from 1 (bad) to 5 (excellent) as presented in Table 2-1. There are number of measurements methods are available to measure speech quality of a VoIP system. Basically, speech quality measurements can be divided into two categories, subjective measurements and objective measurements. Subjective speech quality measurement requires a large group of people involved to attend the test. It is time consuming, unrepeatable and expensive. Compared with subjective tests, objective tests are repeatable, automatic and do not suffer from environment effects. The most popular objective measurements are Perceptual Evaluation of Speech Quality (PESQ) [23] and E-model [24]. PESQ is also categorized as a kind of intrusive speech quality measurement, as it requires the original speech signal with the degraded one to perform the quality evaluation. While E-model is categorized as one of the non-intrusive speech quality measurement, as it is parameter-based and does not require the help or original speech signal. MRes Thesis –University of Plymouth 12
  19. 19. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Quality Scale Score Listening Effort Scale Excellent 5 No effort required Good 4 No appreciable effort required Fair 3 Moderate effort required Poor 2 Considerable effort required Bad 1 No meaning understood with reasonable effort Table 2-1 MOS scale 2.1.2 PESQ PESQ was specifically developed to be applicable to end-to-end voice quality testing under real network conditions. The result of comparing the reference and degraded signals is a quality score. The simplified system model of PESQ is given in Figure 2-2. It consists of three key modules: time alignment module, perceptual transform module and cognition/judgment module. The time alignment model synchronized the degraded signal with the reference signal. The perceptual transform module transforms the signal into a psychophysical representation that approximates human perception. The cognition/judgment module maps the difference between original (reference) signal and distorted (degraded) signal into estimated perceptual distortion and then further mapped into Mean Opinion Score (MOS) scale. Original Speech Perceptual Estimated Time Transform Distortion Alignment Module Model Cognition/Judgment Module Perceptual Distorted Transform Speech Module Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality t The evaluated results given by PESQ have been calibrated using a large database MRes Thesis –University of Plymouth 13
  20. 20. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs of subjective tests. PESQ takes into account signal degradation such as coding distortions, errors, packet losses, delay and variable delay, and filtering with transfer function equalization, time alignment, and a new algorithm for averaging distortions over time. However, PESQ does not take into account the subjective effect of level changes in the network, echo, and the effect of round-trip delay on conversation. 2.1.3 E-Model The E-Model is a computational model, standardized by ITU-T in [24][27][28]. It uses transmission parameters to predict the subjective speech quality of packtized voice. E-Model has proven to be useful as a transmission-planning tool, for assessing the combined effects of variations in several transmission parameters that affect conversational1 quality of telephony [24]. The primary output from the EModel is the "Rating Factor" R, and R can be further transformed to give estimates of customer opinion by mapping it to the MOS scale. The EModel Equation for “Rating Factor” is R = R0 − I d − I s − I e + A This equation results in an R factor between 0 and 100. The components of R are: R0, base R value (noise level); Id, representing the effects of impairments occurring simultaneously with the speech signal; Is, representing the effects of impairments occurring simultaneously with the speech signal; Ie, representing the effects of "equipment” such as DCME or Voice over IP networks; A, the advantage factor, used to compensate for the allowance users make for poor quality when given some additional convenience (e.g. 0 for wireline and 10 for GSM) Delay impairment Id The Id factor models the quality degradation due to one-way or “mouth-to-ear” delay. Id can be computed from the one-way delay as [29]: I d = 0.024Ta + 0.11(Ta − 177.3) H (Ta − 177.3) ⎧H ( x) = 0 if x<0 where ⎪ ⎨ ⎪ H ( x) = 1 if x≥0 ⎩ Ta represents one-way delay ( or “mouth-to-ear” delay) in milliseconds. MRes Thesis –University of Plymouth 14
  21. 21. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Equipment impairment Ie The loss impairment Ie captures the distortion of the original voice signal due to low-rate codec, and packet losses in both the network and the playout buffer. Currently, the E-Model can only cope with speech distortion introduced by several codecs i.e. G.729 or G.723. Mapping R factor into MOS scale We can map R into MOS scale by the following equations [24]: MOS=1 if R ≤ 0 MOS = 1 + 0.035R + R( R − 60)(100 − R)7 ×10 −6 if 0 ≤ R < 100 MOS =4.5 if R ≥ 100 2.1.4 Conversational speech quality evaluation Reference Trace data (loss) Degraded speech speech Encoder Loss process Decoder PESQ Ie MOS MOS->R MOSc E-Model Concepts Trace data (delay) Delay model Id Figure 2-2 Schematic diagram for MOSc measurement Perceived speech quality during a VoIP conversation can be expressed as a conversational Mean Opinion Score (MOSc). MOSc values can be obtained by subjective listening tests or by objective evaluation methods, such as the EModel. As described in Section 2.1.2, the E-Model consists of very complicated equations and is not applicable to some impairment factors, such as some codecs or bit errors in the payload. A prediction method for perceived conversational speech quality has been MRes Thesis –University of Plymouth 15
  22. 22. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs proposed in [29]. This schematic diagram of this new method is illustrated in Figure 2-3. In this method, MOS index produced by PESQ is firstly transformed to R scale by R pesq = 3.026 x3 − 25.314 x 2 + 87.060 x − 57.336 where x represents MOS index from PESQ. Then equipment impairment factor Ie can be computed as Ie=R0-Rpesq, with delay impairment factor Id, we can get R scale value by R=R0-Id-Ie, finally get MOSc from R according to the standard E-Model equations. Hence, the impairments of delay, packet loss, coding and bit error can all be represented in the evaluated value of MOSc. 2.2 Adaptive Playout Buffer Playout buffer can be fixed or adaptive. In the fixed playout buffer, the playout delay for a packet stream is preset before a conversation begins. So the fixed playout buffer cannot readily adapt to the time-varying network conditions and may result in poor speech quality. For this reason, adaptive playout buffer is considered. A lot of works have been done in developing adaptive playout buffer algorithms to achieve the best balance between playout delay and packet losses in playout buffer. Recent work di ni bi receiver ai pi sender ti Figure 2-3 Timing associated with packet i in addressing the problem specifically for the Internet can be found in [30][31][32][33]. In this section, we briefly review some playout buffer algorithms from these literatures. The details of applications of adaptive playout buffer in our Wireless VoIP system can be found in Chapter 3, 4. In [30], Ramjee et. al. proposed four algorithms (e.g. ‘exp-avg’, ‘fast-exp’, ‘min- delay’ and ‘spk-delay’) to adjust playout delay according to estimated network delay ^ performance. These algorithms estimate mean and variation of network delay d i and MRes Thesis –University of Plymouth 16
  23. 23. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs ^ v i on the arrival of the ith packet. The playout delay is adjusted at the beginning of each talkspurt. Let ti be the timestamp of packet i which is the first packet in a talkspurt, the playout time pi is computed as ^ ^ pi = ti + di + µ ⋅ v i where µ is a constant. The playout time for the subsequent packets j in the same talkspurt pj is computed as pj = pi + t j − ti (see Figure 2-4 for the related timing notations). ^ In these four algorithms v i is given by ^ ^ ^ v i = α ⋅ v i −1 + (1 − α ) ⋅ abs(d i − n)i ^ But they differ in the computation of d i . ^ 1) exponential-average (exp-avg): In this algorithm, the mean delay d i is estimated through an exponentially weighted average [30]: ^ ^ di = α ⋅ d i + (1 − α ) ⋅ ni where ni means the one-way delay of ith packet. The value of α is chosen to be 0.998002 in [30]. 2) fast exponential-average (fast-exp): This algorithm is a modified version of exp- avg. fast-exp computes the weighted mean of as [30]: ⎧ ^ ^ ⎪ β d i − 1 + (1 − β ) n i : n i > d i − 1 ^ ⎪ d i = ⎨ ⎪ ^ ⎪ a d i − 1 + (1 − a ) n i : n i ≤ d i − 1 ⎩ where α and β are constant values, satisfying 0 < α < β < 1. In [30] α = 0.998002 and β = 0.750000, this allows fast-exp adapt more quickly to increases in delays ni . 3) minimum delay (min-delay) : This algorithm is more aggressive in minimizing delays. It uses the minimum delay of all packets received in the current talkspurt. Let Si be this set of delays [30]: d i = min j∈Si {n j } ^ 4) spike delay detection (spk-delay): This algorithm focuses on spike which represents MRes Thesis –University of Plymouth 17
  24. 24. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs a sudden and large increase in delays over a sequence number of packets. spk-delay usually obtains the playout delay usig the same equation as exp-avg, despite α is set to be 0.875 in [wan]. During spike, however, spk-delay uses the following ^ ^ d i = d i −1 + ni − ni −1 to catch up the sudden increase of delays. We also present here some more complex algorithms, which have been developed based on the four classical algorithms described above. 5) window: This algorithm is proposed in [31]. It intends to detect spikes like spk- delay. During a spike, the first packet in the spike is used as the playout delay. After the spike, the playout delay is chosen by finding the delay corresponding to the qth quantile of the distribution of the last N (10,000 in [31]) packets received by the receiver. 6) adaptive: In [32], Sun et al had proposed an ‘adaptive’ algorithm to adapt to different networks. The ‘adaptive’ algorithm switch between min-delay and fast-exp ^ depends on d i higher than a delay threshold (e.g.150ms) or not. 7) E-MOS: Fujimoto et al [33] proposed a playout buffer algorithm called E-MOS. The E-MOS algorithm models the delay distribution with the Pareto distribution. The Pareto distribution of delay is integrated with packet loss ratio in a function Q(d) to model the impact of delay and packet loss on speech quality, which is represented by MOS. Upon a packet is received, E-MOS uses the measured one-way delay to update the Pareto distribution. Then, a optimal value of d is chosen as the playout delay if it can maximize speech quality Q(d). 2.3 Automatic Repeat upon reQuest (ARQ) Automatic Repeat reQuest (ARQ) is an error-control system in which a request for re-transmission is generated by the receiver when an error in transmission is detected. A very basic ARQ scheme includes only error detecting and retransmission capabilities. If a packet is found to have errors after decoding, this packet is discarded and a retransmission is requested to the source. The source then retransmits an exact copy of that packet. This process may be repeated indefinitely, but normally an upper bound in the number of retransmissions is set. If errors still persist after the maximum number of allowed retransmissions is reached, higher layer will have to decide how MRes Thesis –University of Plymouth 18
  25. 25. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs the situation is to be handled. For the retransmission procedures using ARQ, the three most popular schemes are [16]: Stop and Wait (SW) In SW-ARQ, the sender, after delivering the first copy of a packet in its buffer, is blocked until a positive acknowledgement (ACK) is received or the timeout is expired. In the first case, sender drops the successful packet from the buffer and transmits next packet, while in the second distance, sender simply retransmits the same packet. Go Back N (GBN) The sender continuously transmits packets stored in its buffer, until a Negative ACK (NACK) is received. In this case, sender stops the transmission of a new packet, pulls back to the packet erroneously received, and retransmits a complete sequence of N packets, starting with NACKed packet, where N is the number of packets transmitted within an average round trip time. Selective Repeat (SR) In this case sender continuously transmits packets stored in its buffer. Whenever a NACK is received, sender stops the transmission of a new packet, pulls back to the packet erroneously received, retransmits only it and begins the transmission of a new packet. It is worth noticing that, in this case, the retransmission of successfully received packet following the corrupted packet is avoided, thus allowing better efficiency. MRes Thesis –University of Plymouth 19
  26. 26. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 3 PERCEIVED SPEECH QUALITY DRIVEN RTRANSMISSION METCHANISM 3.1 Introduction Quality of Service (QoS) support for voice over IP (VoIP) in wireless/mobile networks is an important issue for technical and commercial reasons. However, speech quality for VoIP suffers from high packet loss rates and other impairments in the wireless link. Retransmission mechanisms, such as automatic repeat request (ARQ), have been incorporated in wireless and cellular networks to retransmit lost packets to improve performance in data transmission over wireless. In wireless networks such as 802.11b [1], the retransmission mechanism is a simple Stop & Wait algorithm and is implemented at the Media Access (MAC) layer, in which each transmitted packet must be acknowledged before the next packet can be sent. If in a certain timeout period an acknowledgement is not received by the sender of a frame, the sender will retransmit the frame until a maximal retransmission limit is reached. When the wireless link quality is poor, retransmission of MAC frames can effectively recover corrupted packets that contain bit errors. However, excessive delays may be introduced by retransmission schemes that have significant adverse effects on real-time applications such as VoIP, which are MRes Thesis –University of Plymouth 20
  27. 27. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs sensitive to delay. A simplex retransmission scheme always negatively affects perceived speech quality in VoIP. There exists a tradeoff between packet loss and delay in a variety of retransmission schemes. Improved retransmission mechanisms such as Speech Property-Based ARQ (SPB-ARQ) [21] and Hybrid loss recovery scheme [34] have been proposed to reduce speech distortions by protecting packets that are perceptually more relevant. However, these schemes are only limited to listening-only quality assessment of the effect of the retransmission schemes on speech quality and do not consider the impact of delay which is important for conversation and interactivity. Further, these schemes do not consider the impact of retransmission jitters. Since adaptive jitter buffers would discard inappropriately retransmitted packets, the character of retransmission jitters introduced by different retransmission schemes should be considered. The primary aim of the study reported is to investigate new retransmission mechanisms to improve speech quality for wireless VoIP. In this study, we use a perceived conversational speech quality assessment method [29] to evaluate the performance of current retransmission mechanisms (No retransmission, Full retransmission, SPB retransmission) instead of listening-only method or individual network parameters (e.g. packet loss and delay). We also present a new retransmission policy, which can adapt to the most suitable retransmission mechanism, depending on the wireless link quality and network delay conditions. The ultimate aim of this perceived speech quality driven policy is to achieve optimum speech quality (in terms of the conversational Mean Opinion Score MOSc) in the face of network impairment factors and wireless channel situations, while considering the coupling effect of retransmission jitters and adaptive jitter buffers. 3.2 Related Works 3.2.1 Speech property-based retransmission mechanisms Speech Property-Based QoS control schemes are based on the fact that some voice frames are perceptually more important than others when encoded speech is transferred through packet networks. Recent experimental results show [35], that in some popular codecs used in wireless applications (e.g. AMR) the position of a frame loss has a significant influence on the perceived speech quality. In such codecs, frame loss concealment techniques are used to interpolate the parameters for the loss frames MRes Thesis –University of Plymouth 21
  28. 28. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs from the parameters of the previous frames. Lost voice frames at the beginning of a talkspurt will be concealed using the decoding information of previous unvoiced frames. However, because voiced sounds always have a higher energy than unvoiced sounds, concealment of these frames with unvoiced frames that have lower energy will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced transition stage, it is difficult for the decoder to correctly conceal the loss of voiced frames using the filter coefficients and the excitation for an unvoiced sound, especially when burst loss occurs or the frame size grows. To maximize the perceptual quality at the receiving end, perceptually important voice packets may be protected by giving them a high priory with the unimportant packets handled as 'best-effort'. SPB retransmission, a retransmission scheme that protects only the perceptual important speech frames, is presented in [21] [34]. Experimental results reported in [21] show that SPB retransmission could provides a better speech quality (assessed by EMBSD) than No retransmission scheme, which do not retransmit any packet. In [34], SPB retransmission was shown to be more efficient in reducing retransmission delays than Full retransmission, which retransmits every unacknowledged (unACKed) packet. 3.2.2 Measuring conversational speech quality In previous studies [21][34], the assessment of retransmission schemes was performed using the EMBSD algorithm, which only considers the distortion caused by packet loss. However, in practice both packet loss and delay are crucial in voice conversation and long retransmission delays (e.g. due to long network delay) would seriously impact speech quality. The E-model is introduced by ITU as a non-intrusive quality assessment method to obtain a measure of voice quality. Unfortunately, the E- model is only applicable to a limited number of codecs, which at present does not include the AMR codec. In our simulation, we employed the conversation MOS [29] to qualify the performance of different retransmission schemes. In he conversation speech quality evaluation (see Chapter 2), the ITU PESQ is firstly used to quantify the impact of packet loss on speech quality. The result of this is then converted to the equipment impairment Ie. The average end-to-end delay effect, Id, is then calculated. The E-model is then used to obtain a measure of the speech quality, MOSc, based on Ie and Id (see Figure 3-1). MRes Thesis –University of Plymouth 22
  29. 29. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 3.2.3 Adaptive jitter buffer and retransmission jitters In VoIP applications, jitters are compensated for in the receiver by a jitter buffer. The size of a jitter buffer can be fixed or adjustable. Fixed jitter buffers cannot adapt readily to changes in network delays and as a result are not practical in real VoIP applications. In our study, we investigated fast-exp, one of the classical adaptive jitter buffer algorithms proposed in [30]. By using a smaller weighting factor as delays increase, the fast-exp algorithm can quickly adapt to the increases while avoiding discarding of too many packets. It estimates the current mean network delay (denoted ^ ^ d as i ) and current variance of network delay (denoted as v i ) when a packet arrives. The mean delay estimation equation is given by: ⎧ ^ ^ ^ ⎪ β d i − 1 + (1 − β ) n i : n i > d i − 1 ⎪ d i = ⎨ ⎪ ^ ⎪ a d i − 1 + (1 − a ) n i : n i ≤ d i − 1 ⎩ where ni is the network delay of the ith packet, β = 0.75 and a = 0.99802. The ^ ^ ^ v i = a v i − 1 + (1 − a ) d i − n i following equation is used to estimate vi : . At the beginning of a talkspurt, adaptive jitter buffer changes the play out delay using the , where D is the play out delay and µ is a constant that ^ ^ D = d + µ * v equation: i i can be selected from 1 to 20. We set µ to be 4 in our simulation. It should be noted that for VoIP over wireless, the network delay ni consists of delays introduced by the wireline network and the wireless link. Jitters can be introduced by network congestions in the wireline network or by retransmissions/propagations in the wireless links. In view of the fact that most jitter buffer algorithms were proposed for compensation of network congestion jitters, it should be valuable to investigate the impact of retransmission jitters for VoIP over wireless 3.3 Simulation System Description MRes Thesis –University of Plymouth 23
  30. 30. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Fixed Host Mobile Host Original RTP AMR Adaptive AMR Speech RTP Encoder UDP Playout Decoder UDP Buffer Speech IP Network Marking IP Ethernet Delay MAC Retx. PER Degraded Limit PHY Speech Control Access Point PESQ End-to-end MOS/Ie Delay (Id) MOSc EModel Speech Quality Evaluation Figure 3-1 Simulation Environment Our study is based on network simulator ns-2 [36], in which we simulated a last- hop wireless scenario. Both of the IEEE 802.11 and the Ethernet protocol stack are implemented in the simulator. A two way Bernoulli error model was inserted to simulate the wireless link transmission errors. In 802.11, if the packet size exceeds the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) the packet will be fragmented. Since we set the packet size to 71 bytes, a 12.2kbit rate AMR speech frame for one RTP packet the impact of fragmentation is avoided. The simulation system is given in Figure 3-1. In our simulation, the original speech file is first encoded by the AMR codec and then analyzed to extract the speech marking information (voiced/unvoiced) for each packet. The speech marking information is used with network delay and wireless link quality to control the retransmission policy. The error model determines whether a packet is corrupted or not according to packet error probability (PER). The base station (BS) will neither send an ACK to the sender for a corrupted packet nor present it to the high layer. If the MAC layer of the sender has not received an acknowledgement for a packet, it will retransmit the packet until the packet is ACKed or it reaches the limit of retransmission attempts (we will denote Retransmission as Retx in the rest of this Chapter). In our simulation, we set the Retx attempts limit to 6 for both SPB Retx and Full Retx. In the receiver, the received speech packets are fed to an adaptive jitter MRes Thesis –University of Plymouth 24
  31. 31. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs buffer and subsequently decoded to recover the degraded speech file that is used to obtain a measure of speech quality. In our study, we used combined PESQ and E-Model to evaluate the conversational speech quality as described in Chapter 2. Performance index was obtained averaging the computation results that were obtained from this method for each 20 seconds of the speech file. The following simulation results were obtained by averaging results of 50 simulations with different random seeds to avoid the impact of packet loss locations. The three simulated retransmission schemes are SPB Retx, Full Retx and Null Retx. TABLE 3-1 gives the average number of voiced packets losses of transmitting 73000 speech packets in our simulated wireless network with these schemes. For simplicity, we only simulated the wireless link for the purpose of this study. And only the wireless link (Retx limit exceeded) and the adaptive jitter buffer account for the packet losses. In Table 3-1, most of the losses of voiced packets in Full Retx or SPB Retx are caused by jitter buffer. As we deployed a Bernoulli error model in our simulation, most of the retransmitted packets can be successfully received by the receiver. If the bursty of packet errors is considered, there should be more losses of voiced packets in Full Retx or SPB Retx scheme. Table.3-1- Average voiced packets losses with fast-exp playout buffer Retx Scheme No SPB Full PER Retx Retx Retx 0.0001 15 53 29 0.0005 36 54 27 0.0008 61 51 26 0.001 69 47 22 0.003 144 28 17 0.005 241 22 13 0.01 474 13 9 0.05 2344 42 16 0.10 4678 931 159 It seems very straightforward that SPB Retx should be better than No Retx and at least the same as Full Retx with regard to the performance of protecting voiced frames. MRes Thesis –University of Plymouth 25
  32. 32. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs However, in TABLE 3-1, we can see that Full Retx always has less voiced packets losses, while No Retx has the least lost voiced packets when link quality is good (packet error probability lower than 0.0005). In fact, as in fast-exp algorithm, the estimated playout delay will increase with the number of retransmission jitters increases. When link quality is good, the estimated play out delay keeps at a low level, occasionally retransmitted packets and packets adjacent to them would be discarded by jitter buffer due to jitters they introduced. However, in No Retx scheme, a corrupted packet doesn’t affect its following packets. That’s why it has least packet losses when link quality is very good. On the other hand, in SPB Retx, unvoiced packets are not retransmitted hence the estimated playout delay can’t reflect current wireless link situations when link quality becomes worse. While in Full Retx, every unACKed packets is retransmitted, this is helpful for the adaptive jitter buffer to estimate the playout delay for the next talkspurt. That’s why the adaptive jitter buffer discards more packets in SPB Retx than in Full Retx. 3.4 Performance Comparison of Current Retransmission Schemes Figure 3-2 and Figure 3-3 give the overall packet loss rates and buffered retransmission delay comparison. In Figure 2, we can see that Full Retx keeps the packet loss rate at a low level at the expense of higher delay as plotted in Figure 3 because every unACKed packet is retransmitted. It’s very interesting that when link quality is not too bad (packet error probability up to 0.01), packet loss rate of Full Retx scheme is decreasing while link quality becoming worse. In fact, as we mentioned before, in worse link quality, more retransmissions helps the jitter buffer to estimate playout delay more accurately. However, when link quality is very good (packet error probability up to 0.0005), No Retx can obtain the best packet loss rate because it doesn’t introduce any jitter and few packets is corrupted due to bit errors. As a compromised method, the packet loss rate and Retx delay of SPB Retx is between No Retx and Full Retx. Using the evaluation method described in Chapter 2, we give a more straightforward performance comparison in Figure 4 and Figure 5 for these schemes with MOSc as the metric. Our evaluation didn’t consider the packet losses introduced in the wireline network hence to focus on the performance of Retx schemes. However, we considered network delay in the evaluation. For natural hearing, delays lower than 100ms cannot really be appreciated, but delays above 150ms can obviously affect MRes Thesis –University of Plymouth 26
  33. 33. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 2 10 300 No Retx No Retx SPB Retx SPB Retx Full Retx Full Retx 250 1 10 Buffered Retx Delay (ms) 200 Loss Rate (%) 0 10 150 100 -1 10 50 -2 0 10 -4 -3 -2 -1 0 -4 -3 -2 -1 0 10 10 10 10 10 10 10 10 10 10 Packet Error Probability Packet Error Probability Figure 3-2 Overall packet loss rate comparison Figure 3-3 Buffered retx delay comparison 4.2 Perceived Quality Driven 4 4.1 No Retx SPB Retx 4 Full Retx 3.5 3.9 3.8 3 MOSc MOSc 3.7 3.6 2.5 3.5 3.4 2 Perceived Quality Driven No Retx 3.3 SPB Retx Full Retx 1.5 3.2 -4 -3 -2 -1 0 100 120 140 160 180 200 220 240 260 280 300 10 10 10 10 10 Network Delay Packet Error Probability Figure 3-5 MOSc comparison with packet Figure 3-4 MOSc comparison with 175ms error probability 0.001 network delay MRes Thesis –University of Plymouth 27
  34. 34. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs conversation interactivity [37]. Considering Retx delays rarely exceed 100ms, to obviously reflect the impact of Retx delay, we assume 175ms delay had been introduced in the wireline network and add it to the end-to-end delay in the MOSc evaluation. In Figure 4, the MOSc of Full Retx is lower than No Retx and SPB Retx when packet error probability is lower than 0.003. That’s because Full Retx scheme always introduces more Retx delay, while the perceived speech quality is sensitive to high delay when link quality is good. When packet error probability exceeds 0.003, Full Retx scheme becomes the best, as it can greatly reduce the number of corrupted packets. Figure 3-5 illustrates the performance comparison with different network delays when packet error probability is 0.001. In Figure 3-5, we can see that when delay lower than 150ms, Full Retx can get the best MOSc. When delay is higher than 150ms Null Retx becomes the best, it confirms that 150ms is the threshold above which delay begins to have a severe impact on speech quality. Similar to Figure 4, the performance of SPB is between No Retx and Full Retx, but it doesn’t become the best in both sides of the delay threshold. 3.5 Perceived Speech Quality Driven Retransmission Scheme Considering both No Retx and Full Retx schemes can achieve the best MOSc under different link quality and network delay situations. We propose a new perceived speech quality driven retransmission scheme, which can switch between these two schemes when link quality and network delay changes. The pseudo code of the new scheme is shown in Figure 3-6. Low_Error_Threshold is set to be 0.0005 and High_Error_Threshold is 0.003. Since according the simulation results, when packet error probability is lower than 0.0005, No Retx can achieve the best MOSc even delay is not considered, whereas Full Retx becomes the best when packet error probability exceed 0.003, even network delay is very high. When packet error probability is between 0.0005 and 0.003, the decision should be made according to network delay. In the proposed scheme, Delay_Threshold is set to be 150ms as it’s the threshold that delay begin to obviously affect speech quality. In real applications, we can convert Bit Error Rate (BER) to PER, and BER can be obtained according to bit errors in bit pattern series sent from BS. Network delay can be estimated by deducting average MH to BS handoff delay from average end-to-end delay that can be retrieved from RTP packet header. MRes Thesis –University of Plymouth 28
  35. 35. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs The performance of the new perceived speech driven scheme is also given in Figure 3-4 and Figure 3-5 under different network delay and packet error probability. We can see that the curve of the perceived quality driven scheme is overlapped with parts of No Retx and Full Retx when they achieve best MOSc. As it can switch to the more suitable scheme between No Retx and Full Retx when communication conditions changes. Since this method only uses Full Retx when it’s necessary, it can also achieve the similar retransmission efficiency as SPB Retx while avoid the implementation complexity to obtain speech property information that is necessary for SPB Retx. if (PER < Low_Error_Threshold) . No_Retx(); else if (PER>High_Error_Threshold) Full_Retx(); else { if(Network_Delay<Delay_Threshold) Full_Retx(); else No_Retx(); } Figure 3-6 Perceived speech quality driven Retx scheme pseudo code 3.6 Summary A suitable retransmission scheme is crucial for obtaining the best possible perceived speech quality in wireless VoIP applications. In this Chapter, we investigated the performance of three different retransmission schemes (No Retx, SPB Retx, Full Retx) with regard to the perceived conversational speech quality. The impact of retransmission jitters with an adaptive jitter buffer was also considered. The simulation results show that the performance of these schemes depends on the network delay and wireless link quality. Considering that the wireless environment is variable, we have proposed a perceived speech quality driven retransmission scheme that can adapt to the wireless link quality and network delay conditions. As the SPB Retx is not involved in the new method, the implementation complexity for retrieving speech property information is avoided. Our results show that the proposed method can achieve an optimum MOSc compared to No Retx, Full Retx and SPB Retx. Since the most suitable scheme is deployed by the new method when communication conditions change. In the study, a simplified last hop wireless network is MRes Thesis –University of Plymouth 29
  36. 36. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs implemented to demonstrate wireless voice over IP scenario. Further improvements may be achieved by making the simulation closer to real network, e.g. by incorporating a multi-state error model in the wireless link. MRes Thesis –University of Plymouth 30
  37. 37. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHPAPTER 4 PLAYOUT DELAY CONSTRAINED ARQ and ARQ AWARE PLAYOUT BUFFER 4.1 Introduction Due to the unreliable and error-prone features of wireless channels, assuring acceptable perceived speech quality has been a challenging task for Wireless VoIP. Automatic Repeat on reQuest (ARQ) is one of the packet error recovery techniques for Wireless VoIP and may be a complement or substitute for Forward Error Correction (FEC) because of its efficiency and simplicity. Timer Timer Timer Timer Started Stopped Restarted Stopped Tx Queue Timeout Backoff ACKn ACKn+1 n n+1 n+1 Wireless Channel Rx Buffer Frame Loss Figure 4-1 Stop and Wait ARQ MRes Thesis –University of Plymouth 31
  38. 38. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs In ARQ, the sender sends packets or Protocol Data Units (PDUs) consisting of payload and checksums. According to the result of checksum validation, the receiver sends back acknowledgment messages (e.g. ACK or NACK) to the transmitter. The sender performs packet retransmissions based on such acknowledgments. Basically, ARQ protocols can be categorized as three types: Stop-and-Wait (SW), Go-Back-N (GBN) and Selective Repeat (SR), which are differed in the way of responding to acknowledgments. The details of these three types of ARQ have been described in Chapter 2. In this study, we consider the SW-ARQ in IEEE 802.11 Media Access Control (MAC) Layer [1]. In the 802.11 SW-ARQ, the transmitted packet must be acknowledged before the next packet can be sent. If in a certain timeout period an acknowledgement for a packet is not received by the sender, the sender will retransmit this packet until a maximal retry limit is reached. In the Distributed coordination function (DCF) Mode of IEEE 802.11, there is a Backoff procedure to randomly defer each retransmission hence to avoid collisions of multiple transmitters (see Figure 4-1). With this procedure, corrupted packets may be recovered by the retransmitted copies. However, ARQ schemes also bring a series of problems impacting the perceived speech quality. The retransmission procedure may introduce excessive delays, when the packets have to traverse a high delay wireline network before it reach the wireless part, any retransmissions may considered unnecessary [22]. Number of retransmission attempts may vary according to wireless channel quality, this leads to retransmission jitter. Further, the layered protocol architecture, which puts ARQ and the playout buffer works in different layer, makes things go from bad to worse. Firstly, if an adaptive playout buffer is employed in the Wireless VoIP system, a packet’s delay budget - playout delay is decided at the beginning of each talkspurt. Since the retransmission procedure is only constrained by a fixed maximum retry limit, high retry limit that exceeds available delay budget may lead to unnecessary retransmissions and postpone subsequent packets, while low retry limit may terminate retransmission procedure prematurely with enough delay budget left. Secondly, considering a transmitting queue exists in the sender, a high mean retransmission delay can make incoming packets accumulate in the queue and queuing delay or losses quickly climb up. Thirdly, in current protocol stack, packets that failed in transport or link layer checksum validations are discarded, despite noisy voice packets may be considered useful at the upper layer [38]. MRes Thesis –University of Plymouth 32
  39. 39. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs These problems have been addressed in some previous works. In [39][40][41], the retransmission procedure is still constrained by a fixed maximum retry limit, but it can be terminated at a packet’s deadline (e.g. presentation time). Nevertheless, these works still cannot avoid the prematurely terminating of a retransmission procedure when there is still some delay budget left for more retry attempts, and did not consider the impact of retransmission delays on queuing delays or losses. In [15] UDP-Lite, a modified UDP protocol with partial checksum, has been developed to allow corrupted UDP packet to be reused at application level. However, for Wireless VoIP the MAC layer checksums should be modified as partial as well. Otherwise, noisy packets would have been discarded in MAC layer and never reached upper layers. We extended these ideas in a cross-layer design for Wireless VoIP, where the retransmission procedure is only incorporated in local channel. In our design, link layer ARQ and playout buffer cooperate in an integrated framework, in which 1) retransmission procedure of a packet is constrained in the available delay budget. 2) Speech data is not covered in the checksum of link layer or transport layer packets. And a packet combining process is performed to get a least noisy packet from its retransmitted copies. 3) Estimates delivery delay in the wireless channel separately and limits it in the mean inter-arrival delay of the transmitting queue. Simulation results show that with the help of this design, the simulated Wireless VoIP system gained considerable performance improvement, at the expense of breaking the layered protocol architecture. 4.2 The Cross-Layer Design PLAYOUT BUFFER PLAYOUT TIME RTP RTP b a3 a2 a1 To DECODER UTP UTP PACKET INCOMING QUEUE COMBINNING IP IP ETHERNET c b a 802.11 MAC Retransmission Terminated PHY ACESS POINT FIXED HOST MOBILE HOST Figure 4-2 the Cross-layer design system model MRes Thesis –University of Plymouth 33
  40. 40. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 4.2.1 System model The system model of the proposed cross-layer design is described in Figure 4-2. We considered the last-hop scenario in an IEEE 802.11 wireless network. Our design is composed of two correlated components: playout delay constraint ARQ, in which playout delays become the stop criterion of the retransmission procedure; ARQ aware playout buffer, which calculates packet delivery delay for the wireline and wireless part respectively and constrains the wireless channel delay budget under the arriving interval of incoming packets hence to avoid accumulations of queuing delay. As speech data is not covered by the link layer and transport layer checksums, the playout buffer may receive several noisy versions of a packet. In case of the packet’s correct version hasn’t been received at its presentation time, we employed the Majority-Logic packet combining [44] to further reduce the damaged part and then sent a combined version to the decoder. Details of this technique are presented in Appendix D. The two key components of the cross-layer design are described in the following subsections. 4.2.2 Playout delay constrained ARQ Playout N Received A time? N packet? Y Y Check recei ved Present to copi es of the pl ayout packet upper layer Wait for Y Exist a correct Corrupted? packet Send to Decoder version? Y retransmission N N Terminate current retransmission process Send ACK M ti -l ogi cal ul Appliation & Link packet com ni ng bi Layer Interface Figure 4-3 Block diagram of the playout delay constrained ARQ with packet combining The playout delay constraint ARQ is a specific optimization of current protocol stack for Wireless VoIP. The block diagram of the playout delay constraint ARQ is MRes Thesis –University of Plymouth 34
  41. 41. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs given in Figure 4-3. In the receiver, the 802.11 MAC layer presents every received packet to the upper layer, whether it’s corrupted or not. In the application layer, the playout buffer can terminate a packet’s retransmission procedure at its playout time hence to avoid unnecessary retransmissions. If a corrupted packet hasn’t been recovered by the retransmission procedure, the received noisy copies are combined together by the packet combining module to get a more reliable version, which is then decoded and played out. We still keep the maximum retry limit in the 802.11 SW-ARQ, but it is set to be high enough so as to avoid prematurely terminating of retransmission procedure when there is still delay budget left for more retry attempts. To allow corrupted packets to be presented from link layer to application layer, the link layer and transport layer checksums have to be modified as partial (e.g. UDP-Lite). And the mechanisms that eliminating duplicate PDUs should be turned off for the supported VoIP services. Further application level checksum such as CRC in RTP packet should be enabled hence the application layer can detect correct packets from several copies. 4.2.3 ARQ aware playout buffer 4.2.3.1 Queue model Assume there is a per flow transmission queue at the sender with a large enough queue length, so the queue losses can be ignored and we can focus on the queuing delay. With the IEEE 802.11 SW-ARQ, the transmission queue can be seen as an M/M/1 queuing system with Poisson distribution of packets arrivals and exponential distribution of packets departures [45]. Let α be the average inter-arrival delay and s 1 1 λ = µ = the average packets departure delay. We have a , s where λ and µ are the mean arrival rate and mean service rate. The mean waiting delay in the queue can be computed as 1 a ⋅s TQ = = µ − λ a − s We can deduce that when s → a , TQ → ∞ which means if the mean delivery delay in the wireless channel is not constrained under the mean inter-arrival delay of incoming packets, TQ will quickly climb up. MRes Thesis –University of Plymouth 35

×