Your SlideShare is downloading. ×
Improving Perceived Speech Quality for Wireless VoIP by Cross ...
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Improving Perceived Speech Quality for Wireless VoIP by Cross ...

1,459
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,459
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Improving Perceived Speech Quality for Wireless VoIP By Cross-Layer Designs By Zhuoqun Li This dissertation is submitted to the University of Plymouth in partial fulfilment of the award of Master of Research in Network System Engineering Supervisor Prof. Emmanuel C. Ifeachor School of Computing, Communication and Electronics University of Plymouth September 2003
  • 2. ABSTACT Providing VoIP services with satisfying speech quality in wireless/mobile Internet is difficult because of impairment factors introduced in the wireless channel, such as packet error, delay and jitter. Effective packet error recovery mechanisms such as Automatic Repeat on reQuest (ARQ) in wireless networks are important as they can reduce packet loss due to bit errors. This dissertation is focus on making use of cross-layer techniques to improve the performance of ARQ hence to improve the perceived speech quality for Wireless VoIP, which may be difficult for the layered protocol structure. The research works for this project have been carried out in two steps: First, we use an objective measure of perceived conversational speech quality (MOSc) as a metric to evaluate the performance of three current retransmission schemes (i.e. No Retransmission, Speech Property-Based Retransmission and Full Retransmission). Our findings indicate that the performance of the retransmission mechanisms is a function of both wireless link quality and delay introduced in the wireline network. We also propose a perceived speech quality driven retransmission mechanism, which can automatically switch to the most suitable retransmission schemes according to QoS parameters reported from different layers. Next, we investigate the problems introduced by retransmission procedures of the Stop and Wait ARQ protocol in a Wireless VoIP system. We then propose a cross- layer framework in which 1) the retransmission procedure of the link layer ARQ protocol is constrained by the available playout delay 2) In the playout delay estimation, delivery delay in the wireless channel and wireline network is estimated separately, and the delivery delay in the wireless channel is constrained to avoid delay accumulations in the transmitting queue.3) If the retransmission procedure is terminated prematurely, received noisy copies of a speech packet are combined together to reduce the damaged part and finally played out at the application layer. Simulation results show that these cross-layer designs improved the performance of the Stop and Wait ARQ protocol hence significantly enhanced the perceptual speech quality of a wireless VoIP system. I
  • 3. TABLE OF CONTENTS ABSTACT .................................................................................................................I TABLE OF CONTENTS....................................................................................... II LIST OF FIGURES............................................................................................... IV LIST OF TABLES ................................................................................................. IV ACKOWLEDGEMENTS........................................................................................V CHAPTER 1 .............................................................................................................1 INTRODUCTION...................................................................................................1 1.1 VoIP and Its Application in Wireless Internet.......................................................1 1.2 Motivation ............................................................................................................4 1.2.1 Impairment factors of wireless VoIP speech quality......................................4 1.2.2 Packet error concealment techniques.............................................................6 1.2.3 Cross-layer designs ........................................................................................8 1.2.4 Problem statement..........................................................................................9 1.3 Aims and Objectives...........................................................................................10 1.4 Thesis Contributions...........................................................................................10 1.5 Organization of the Thesis.................................................................................. 11 CHAPTER 2 ........................................................................................................... 12 BACKGROUND THEORIES ............................................................................... 12 2.1 Speech Quality Evaluations................................................................................12 2.1.1 Objective Speech Quality Measurement......................................................12 2.1.2 PESQ............................................................................................................13 2.1.3 E-Model .......................................................................................................14 2.1.4 Conversational speech quality evaluation....................................................15 2.2 Adaptive Playout Buffer .....................................................................................16 2.3 Automatic Repeat upon reQuest (ARQ).............................................................18 CHAPTER 3 PERCEIVED SPEECH QUALITY DRIVEN RTRANSMISSION METCHANISM .........................20 3.1 Introduction ........................................................................................................20 3.2 Related Works.....................................................................................................21 3.2.1 Speech property-based retransmission mechanisms ....................................21 3.2.2 Measuring conversational speech quality ....................................................22 II
  • 4. 3.2.3 Adaptive jitter buffer and retransmission jitters...........................................23 3.3 Simulation System Description ..........................................................................23 3.4 Performance Comparison of Current Retransmission Schemes.........................26 3.5 Perceived Speech Quality Driven Retransmission Scheme ...............................28 3.6 Summary ............................................................................................................29 CHPAPTER 4 PLAYOUT DELAY CONSTRAINED ARQ and ARQ AWARE PLAYOUT BUFFER .................... 31 4.1 Introduction ........................................................................................................31 4.2 The Cross-Layer Design.....................................................................................33 4.2.1 System model...............................................................................................34 4.2.2 Playout delay constrained ARQ ...................................................................34 4.2.3 ARQ aware playout buffer ...........................................................................35 4.2.3.1 Queue model..........................................................................................35 4.2.3.2 ARQ aware playout buffer.....................................................................36 4.3 Simulation Model and Experimental Results .....................................................37 4.3.1 Wireless channel model ...............................................................................37 4.3.2 Voice traffic model.......................................................................................38 4.3.3 Speech quality evaluation ............................................................................38 4.3.4 Simulation results and analysis....................................................................39 4.4 Summary ............................................................................................................41 CHAPTER 5 DISCUSSIONS, SUGGESTIONS for FURTHER WORKS, and CONCLUSIONS...............43 5.1 Discussions .........................................................................................................43 5.2 Suggestions for Further Works ...........................................................................45 5.3 Conclusions ........................................................................................................47 REFERENCES ......................................................................................................49 APPENDICES........................................................................................................53 [APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control ...........................53 [APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ ..............56 [APPENDIX C] C code for Majority-Logic Packet Combining ...........................60 [APPENDIX D] List of Items Included in the Appended CD ...............................63 [APPENDIX E] Published Papers .........................................................................64 III
  • 5. LIST OF FIGURES Figure 1-1 VoIP Protocol Architecture……………………………………………..... 2 Figure 1-2 the Wireless VoIP system overview……………………………………… 3 Figure 1-3 the Basic model of cross-layer designs………………………………….. 8 Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality…………... 13 Figure 2-2 Schematic diagram for MOSc measurement …………………………...15 Figure 2-3 Timing associated with packet i………………………………………... 16 Figure 3-1 Simulation Environment………………………………………………. 24 Figure 3-2 Overall packet loss rate comparison…………………………………… 27 Figure 3-3 Buffered Retx delay comparison……………………………………….. 27 Figure 3-4 MOSc comparison with 175ms network delay………………………… 27 Figure 3-5 MOSc comparison with packet error probability 0.001………………... 27 Figure 3-6 Perceived speech quality driven Retx scheme pseudo code…………… 29 Figure 4-1 Stop and Wait ARQ……………………………………………………. 31 Figure 4-2 the Cross-layer design system model………………………………….. 33 Figure 4-3 Block diagram of the playout delay constraint ARQ with packet combining…………... 34 Figure 4-4 Timing associated with Packet…………………………………………. 36 Figure 4-5 the Simulation Model………………………………………………….. 37 Figure 4-6 Overall packet losses comparison……………………………………… 39 Figure 4-7 End-to-end delays with different inter-arrival delay…………………… 39 Figure 4-8 End-to-end delay comparison…….……………………………………. 39 Figure 4-9 Conversational MOS comparison……………………………………… 39 Figure 5-1 Perceived speech quality driven packet error recovery scheduler……... 46 LIST OF TABLES Table 2-1 MOS scale……………………………………………………………….13 Table.3-1- Average voiced packets losses with fast-exp playout buffer……………25 IV
  • 6. ACKOWLEDGEMENTS I would like to express my sincere and deep gratitude to my supervisor, Professor Emmanuel C. Ifeacher, who provided me the opportunity to commence the study of Master of Research. His continuous advice and encouragements through this study are acknowledged and greatly appreciated. I also had the opportunity to work with researchers in the Centre for Signal Processing and Multimedia Communications I would like to thank them for their friendliness and support. Special thanks go to Ms. Lingfen Sun and Mr. ZiZhi Qiao, for their valuable comments and suggestions. Without their support, this thesis would not have been possible. I would like to acknowledge all my classmates in MRes/Msc NSE and CE&SP, for their generous help and enlightening. With them, I really enjoyed the passed year in University of Plymouth. On the personal side, I would like to thank my parents, for their unending love and support. V
  • 7. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 1 INTRODUCTION 1.1 VoIP and Its Application in Wireless Internet Packet switched networks such as Internet had been developing very fast in the past decades. The advantages of packet switched networks, such as efficiency and flexibility, make them eventually become the terminator of traditional circuit switch networks, i.e. Public Switch Telephone Network (PSTN). VoIP (Voice over Internet Protocol or Voice over Packet) is one of the successful stories about applications of packet networks. Generally, VoIP service is the real time delivery of packetized voice traffic across packet switched networks such as Internet. It provides economical communication expense and suitable speech quality compared with traditional telephone networks. Recently, wireless/mobile communication has been growing rapidly and providing more and more convenient services. It’s not a surprise that there’s a great demand to add voice service to wireless IP networks and wireless handsets. Wireless VoIP services can be provided in Wireless Local Area Network (WLAN) i.e. IEEE 802.11 [1] network or third generation mobile network (3G) i.e. WCDMA [2]. The protocol stack of transmitting VoIP traffic in wireline and wireless network is presented in Figure 1-1. MRes Thesis –University of Plymouth 1
  • 8. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Application Layer RTP RTCP Transport Layer UDP Network Layer IP Data Link Layer IEEE 802.3 IEEE 802.11x Figure 1-1 VoIP Protocol Architecture In application layer, VoIP is supported by RTP (Real-time Transport Protocol) [3]. RTP provides a way to delivery delay-sensitive real-time data. The services provided by RTP include payload type identification; sequence numbering; timestamping and delivery monitoring. RTP Applications typically running on top of UDP, which does not guarantee Quality of Service (QoS) but requiring lower overhead [4]. RTCP (Real-time Control Protocol) is the control protocol associated with RTP. RTCP monitors the quality of service and conveys information about the participants in an on-going session [3]. After voice sample is compressed and digitised, then it is packed as the payload of an IP packet, along with an IP address for the purposes of routing in IP networks. In the link layer, IP packets with speech data are encapsulated in frames and supported by IEEE 802.3 [4] or 802.11 for wireline network and wireless network respectively. Both of these link layer protocols provide services such as framing, error control, flow control. MRes Thesis –University of Plymouth 2
  • 9. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Speech Source Talk Silence Internet Encoder Packetizer Depacketizer Decoder Access Playout Buffer Point Sender Receiver Figure 1-2 the Wireless VoIP system overview Figure 1-2 described a VoIP system implemented in the wireless Internet. Speech is an analog signal that varies slowly in time (with bandwidth not exceeding 4KHz). As depicted in Figure 1-2, the speech source alternates between talking and silence periods, which are typically considered to be exponentially distributed. Before transmitted over packet switched networks, the speech analog signal has to be digitised at the sender; the reverse process is performed at the receiver. The digitalization process is composed of sampling, quantization and encoding. There are many encoding techniques that have been developed and standardized by the ITU. The basic encoder is the ITU G.711 which samples the voice signal in 8 kHz and generates 8-bits per sample. Code Excited Linear Prediction (CELP) based encoders provide rate reduction (i.e. 8 Kbps for G.729, 5.3 and 6.4 Kbps for G.723.1) at the expense of lower quality and additional complexity and encoding delay [5]. For the wireless/mobile communication, codecs with variable rate have been developed, e.g. AMR [6], EVRC [7]. The encoded speech is then packetized into packets of equal size. Each such packet includes the headers at the various protocol layers (e.g. RTP 12 bytes, UDP 8 bytes, IP 20 bytes and 802.11 34 bytes) and the payload comprising the encoded speech for a certain duration depends on the codec deployed (e.g. 20ms for an AMR 12.2k frame). In the study, Wireless VoIP system is considered in a last-hop scenario. In this case, voice streams have to traverse wireline networks before they reach the access point, which is the conjunction point of a wireline network and the wireless channel. MRes Thesis –University of Plymouth 3
  • 10. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs As the voice packets are sent over IP networks and wireless channel, they incur variable delay and possibly loss. In order to provide a smooth playout delay, at the receiver, a playout buffer is used to compensate the delay variations. Packets are held for a later playout time in order to ensure that there are enough packets buffered to be played out continuously. Any packet arriving after its scheduled playout time is discarded. There are two types of playout algorithms: fixed and adaptive. A fixed playout scheme schedules the playout of packets so that the end-to-end delay (including both network and buffering) is the same for all packets. Fixed jitter buffers cannot adapt readily to changes in network delays and as a result are not practical in real VoIP applications. Adaptive playout scheme is more common in VoIP systems. Adaptive playout buffer can adjust playout delay for each talkspurt hence it is more suitable for the time-varying IP networks. The scheduled playout delay is a tradeoff of buffer losses and end-to-end delay. It is important to select the value so as to maximize the quality of voice communications. A large playout delay decreases packet loss due to late arrivals but hinders interactivity between the communicating parties, while small playout delay improves interactivity but causes higher buffer losses and degrades the speech quality. The playout buffer deliver continuous stream of packets with fixed intervals to the depacketiser, whose responsibility is to stretch speech data from the payload and feed them to the decoder. The main function of the decoder is to reconstruct speech signals. Some decoders may implement packet loss concealment (PLC) methods that produce replacement for the lost data packets. Having been depacketized and decoded, speech signals are finally played out by the VoIP end devices. 1.2 Motivation 1.2.1 Impairment factors of wireless VoIP speech quality Perceived speech quality of VoIP is defined in subjective according as perceived by the end users. Despite its costs saving benefits, providing acceptable perceived speech quality is the key for the success of VoIP service. Currently, IP Telephony still can’t provide a very satisfied quality due to lots of impairments factors introduced in the transmission path over IP networks. When VoIP is applied in wireless/mobile IP networks, because of the unreliability of wireless channel performance and the MRes Thesis –University of Plymouth 4
  • 11. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs uncertainty of the mobility of wireless handsets, the speech quality will be more aggravated. There exist many correlated impairment factors that may seriously affect the perceived speech quality of Wireless VoIP. In this study, the main impairment factors are concluded as packet losses, bit errors, end-to-end delays, jitters and coding. Packet Loss Packet loss is a major impairment factor. It causes more noticeable degradation in voice quality than any other impairment factors. During their trips in the inter- connected IP networks, speech packets may be lost due to router overflow or network link congestion. On the other hand, VoIP applications are supported by the connectionless protocol - UDP, which means speech packets may travel over different paths in the IP networks before they arrive at the destination. This result in some speech packets being out of sequence and are discarded at the receiver. Lost packets may be reconstructed by the decoder from related information. But it is impossible to completely rescue speech information carried by the lost packets. Bit Error Bit error is not really a problem for VoIP in wireline networks, as it does not happen very often. However, if wireless channels are included in the traverse path of speech packets, bit errors become a challenging nutshell. In the wireless environment, the digital signal wave is exposed to absorption, scattering, interference and multi- path fading. All these effects contribute to the Signal to Noise Ratio (SNR) at the receiver and hence determine the performance of Bit Error Rate (BER). For packet communications, the result of bit errors is packet loss if the whole packet is covered by a checksum. However, if a partial checksum is used specifically for VoIP applications, speech packets contain bit errors in the payload are still decoded and played out. In this case, the effect of bit error on the perceived speech quality is determined by the positions and number of bit errors. End-to-end delay Delay does not directly cause any reduction in speech information but affects the interactive nature of conversations. The end-to-end delay encompasses: a. the delay incurred in encoding and decoding; b. the delay incurred in packetization; c. the delay incurred in the path from the sender to the receiver (e.g. transmission time over IP MRes Thesis –University of Plymouth 5
  • 12. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs networks, queuing delays in network elements, propagation and retransmission time in wireless channel); d. the delay incurred in the playout buffer. For natural hearing, delays lower than 100ms cannot really be noticed by most users, between 100ms and 300ms delay begin to affect conversation interactivity [9]. Longer delays are obvious to the user and make conversations becomes impossible. Jitter Jitter is defined as a variation in the delay of received packets. At the sending side, packets are sent in a continuous stream with the packets being spaced evenly apart. Due to network congestion, improper queuing, or configuration errors, the interval between adjacent packets changes constantly, hence the delay between each packet can vary instead of remaining constant. Jitters can make voice very annoying to the audience. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence and re-sequence if necessary. This job is normally performed by playout buffer, which maintains constant packet intervals at the expense of additional playout delay or packet losses due to not arriving in time. Coding In the process of transforming analog speech signal to digital bit streams, some codecs also use compression techniques to remove redundant or less important speech information, as a way to reduce transmission bandwidth requirement while preserving perceptual important voice signals. This procedure leads to a certain amount of speech information lost hence affects the speech quality perceived by the user at the receiving side. For Wireless VoIP, speech quality can be also affected the error-correction mechanism used by codecs. 1.2.2 Packet error concealment techniques Packet error due to packet loss or bit error has been a critical impairment factor to the perceived speech quality of Wireless VoIP. Many packet error concealment techniques have been developed and improved with great effort. But these techniques are far from perfect and even can not work properly in new communication environment such as the growing wireless/mobile internet. Some of the main packet MRes Thesis –University of Plymouth 6
  • 13. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs error recovery methods are described hereafter: Forward Error Correction Forward Error Correction (FEC) [11] enables lost data to be recovered at the receiver without further reference to the sender. Both the original data and the redundant information are transmitted to the receiver. There are two kinds of redundant information: those that are either independent or dependent on the media stream. The media-independent FEC does not need to know the original data type. In media-independent FEC, original data together with some redundant data are transmitted to the receiver. In media dependent or specific FEC, if an original data packet is lost, redundant data packets, which are related to the specific media, are used to recover the loss. Usually, the redundant packet is produced using a lower- bandwidth encoding method than the primary encoding, which results in lower quality than the original one. The expenses of using FEC are reduced bandwidth efficiency and increasing end-to-end delay, for the redundant information is transmitted behind the packet it protects. Interleaving Interleaving has been widely used in mobile networks to distributed burst frame errors in several channels. In VoIP applications, if the size of a data unit produced at a time by a coder is smaller than the allowed payload size in a packet, then a few data units may be combined into a single packet. However, in order to reduce the packet- loss effects, or burst bit error effects in wireless environment, the original data units are not combined in the same sequential order as produced by the coder, instead they are interleaved by the transmitter. The resulting small gap intervals correspond typically to speech intervals considerably shorter than a phoneme length. Therefore, humans are able to mentally interpolate the gap intervals, and speech intelligibility is not decreased. UDP Lite UDP Lite [15] is designed for the applications that prefer to have damaged data delivered rather than discarded by the network. For VoIP over wireless, it’s not necessary to discard speech frames that contain only several bit errors. In IP layer, the IP header has no checksum to cover the IP payload. However UDP checksum covers MRes Thesis –University of Plymouth 7
  • 14. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs the entire datagram including media payload. In fact, in real network applications, it’s the application layer, not the transport layer, knows best what should be verified by the checksum. UDP Lite provides a checksum with optionally partial coverage. Automatic Retransmission reQuest In Automatic Retransmission reQuest (ARQ) [16], when receiver can’t correctly receive a packet, sender will retransmit it for several times. ARQ-based schemes mainly consist of three parts: a. lost data detection by the receiver or by the sender (timeout); b. acknowledgment strategy: The receiver sends acknowledgments that indicate which data are received or which data are missing; c. retransmission strategy: It determines which data are retransmitted by the sender. Although it is robust and efficient against the burst losses, ARQ also bring a series of problems to real-time applications with delay constraint. 1.2.3 Cross-layer designs IP networks have been successfully supported by the layered protocol architecture since their early development stage. However, for the real-time applications such as Wireless VoIP, the layered architecture may prevent them to be readily adaptive for the instantaneous change of communication environment and consequently can seriously impact their performance. Examples of system performance degradation due to lack of co-operations among different layers have been given in [18]. Corresponding solutions for the problems introduced by the Qos inforamtion mapping and Joint-Layer QoS techniques Figure 1-3 the Basic model of cross-layer designs MRes Thesis –University of Plymouth 8
  • 15. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs layered protocol architecture have been developed and named as cross-layer approach or cross-layer design. The objective of cross-layer designs is to achieve efficient QoS support and network resource allocating by joint-layer techniques, such as QoS knowledge sharing and QoS mechanisms cooperation among different layers (see Figure 1-3). The system performance of future networks may be enhanced by such cross-layer designs between PHY, MAC and higher layer protocols. Cross-layer designs have been addressed in many recent literatures. Krishnamachari et al [19] proposed a cross-layer framework to enhance the performance of video streaming. This framework can adaptively optimize link layer ARQ, application layer FEC and packetization according to wireless channel conditions. In [20], a cross-layer design was developed to control transmissions of video streams over wireless based on the information of prefetched video (application layer), signal strength and multiple access interference (physical layer). 1.2.4 Problem statement In this dissertation, we raise the following research questions regarding the improvement of perceived speech quality for Wireless VoIP by cross-layer approach. What are the impairment factors of Wireless VoIP applications? What are the pros and cons of ARQ mechanisms? Is the performance of Wireless VoIP System improved by ARQ mechanisms in terms of perceived speech quality? How to optimize current ARQ schemes to improve speech quality? And how to mapping real-time network and wireless channel QoS parameters into ARQ protocol optimization? What are the effects of the interactions between ARQ mechanisms with other components of the Wireless VoIP system? How to cope with these effects if they are negative? How to make use other packet error concealment technologies with ARQ? Or how to use ARQ as a complement mechanism for other packet error concealment MRes Thesis –University of Plymouth 9
  • 16. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs technologies? How to establish a cross-layer framework in which we can optimize the QoS techniques located in different layer with a joint-layer analysis? And how to establish a profile of real-time predicted speech quality and QoS parameters collected from different layers and eventually make this profile become the scheduler of a cross-layer framework? Bearing these questions in mind, we have reviewed lots of related literatures and carried out research works toward their corresponding solutions. 1.3 Aims and Objectives The aim of this project is to develop and evaluate a cross-layer framework to improve perceived speech quality for Wireless VoIP systems. This framework is expected to utilize QoS parameters from multiple layers and optimize QoS techniques located in different layers based on a joint-layer analysis, consequently to achieve efficient and significant speech quality improvement, which may be very hard or even impossible for single layer approaches. 1.4 Thesis Contributions The contributions of this dissertation are listed hereafter: We identify the impairment factors for perceived speech quality of Wireless VoIP and specifically focus on the impact of ARQ mechanisms. We use an objective measure of perceived conversational speech quality (MOSc) as a metric to evaluate the performance of three current retransmission schemes including no retransmission, Speech Property-Based (SPB) [21] retransmission and full retransmission, while considering the impact of retransmission jitters. Our findings indicate that the performance of the retransmission mechanisms is a function of both wireless link quality and delay introduced in the wireline network. And the SPB retransmission, which is supposed to protect only perceptual important speech frames, may not achieve the expected performance as it introduces two much jitters. We propose a new perceived speech quality driven retransmission mechanism [22] MRes Thesis –University of Plymouth 10
  • 17. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs which may be used to improve speech quality for wireless VoIP (in terms of the objective mean opinion score) by switching between No retransmission and Full retransmission according to different communication conditions. Through simulations, we show that the proposed method can achieve an optimum MOSc compared to no retransmission, full retransmission and SPB retransmission, and it can also achieve the similar retransmission efficiency as SPB retransmission while avoid the implementation complexity to obtain speech property information that is necessary for SPB retransmission We propose a cross-layer design in which 1) retransmission procedure of the link layer Automatic Repeat on request (ARQ) protocol is constrained by the available delay budget estimated by the application level playout buffer. 2) If the retransmission procedure is terminated prematurely, received noisy copies of a speech packet are presented to application layer and finally played out. 3) In the playout delay estimation, delivery delay in the wireless channel is estimated separately and constrained to avoid delay accumulations in the transmitting queue. The simulation results show that the perceptual speech quality of a wireless VoIP system can be significantly enhanced, since retransmission delay, playout buffer losses, queuing delay and losses are reduced by this design. 1.5 Organization of the Thesis The rest of this dissertation is organized as follows. Chapter 2 provides an introduction to some basic theories related to this project, such as speech quality evaluation, adaptive playout buffer and Automatic Retransmission reQuest (ARQ) protocol. In Chapter 3, we look at the impairment factors introduced by ARQ schemes, and introduce a perceived speech quality driven retransmission scheme to achieve optimum conversational speech quality. In Chapter 4, we consider problems introduced by an ARQ protocol when it works with other components of a Wireless VoIP system (e.g. transmitting queue, adaptive playout buffer) in the layered protocol architecture, and propose a cross-layer design as a solution for the presented problems. Finally, in Chapter 5 we discuss the research outcome of this project, and present extensions and ideas for future works, a short conclusion is also presented to conclude this thesis. MRes Thesis –University of Plymouth 11
  • 18. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 2 BACKGROUND THEORIES 2.1 Speech Quality Evaluations 2.1.1 Objective Speech Quality Measurement In voice communications, the mean opinion score (MOS) provides a numerical measure of the quality of human speech at the receiving end. MOS indicates the speech quality perceived by the listener and can range from 1 (bad) to 5 (excellent) as presented in Table 2-1. There are number of measurements methods are available to measure speech quality of a VoIP system. Basically, speech quality measurements can be divided into two categories, subjective measurements and objective measurements. Subjective speech quality measurement requires a large group of people involved to attend the test. It is time consuming, unrepeatable and expensive. Compared with subjective tests, objective tests are repeatable, automatic and do not suffer from environment effects. The most popular objective measurements are Perceptual Evaluation of Speech Quality (PESQ) [23] and E-model [24]. PESQ is also categorized as a kind of intrusive speech quality measurement, as it requires the original speech signal with the degraded one to perform the quality evaluation. While E-model is categorized as one of the non-intrusive speech quality measurement, as it is parameter-based and does not require the help or original speech signal. MRes Thesis –University of Plymouth 12
  • 19. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Quality Scale Score Listening Effort Scale Excellent 5 No effort required Good 4 No appreciable effort required Fair 3 Moderate effort required Poor 2 Considerable effort required Bad 1 No meaning understood with reasonable effort Table 2-1 MOS scale 2.1.2 PESQ PESQ was specifically developed to be applicable to end-to-end voice quality testing under real network conditions. The result of comparing the reference and degraded signals is a quality score. The simplified system model of PESQ is given in Figure 2-2. It consists of three key modules: time alignment module, perceptual transform module and cognition/judgment module. The time alignment model synchronized the degraded signal with the reference signal. The perceptual transform module transforms the signal into a psychophysical representation that approximates human perception. The cognition/judgment module maps the difference between original (reference) signal and distorted (degraded) signal into estimated perceptual distortion and then further mapped into Mean Opinion Score (MOS) scale. Original Speech Perceptual Estimated Time Transform Distortion Alignment Module Model Cognition/Judgment Module Perceptual Distorted Transform Speech Module Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality t The evaluated results given by PESQ have been calibrated using a large database MRes Thesis –University of Plymouth 13
  • 20. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs of subjective tests. PESQ takes into account signal degradation such as coding distortions, errors, packet losses, delay and variable delay, and filtering with transfer function equalization, time alignment, and a new algorithm for averaging distortions over time. However, PESQ does not take into account the subjective effect of level changes in the network, echo, and the effect of round-trip delay on conversation. 2.1.3 E-Model The E-Model is a computational model, standardized by ITU-T in [24][27][28]. It uses transmission parameters to predict the subjective speech quality of packtized voice. E-Model has proven to be useful as a transmission-planning tool, for assessing the combined effects of variations in several transmission parameters that affect conversational1 quality of telephony [24]. The primary output from the EModel is the "Rating Factor" R, and R can be further transformed to give estimates of customer opinion by mapping it to the MOS scale. The EModel Equation for “Rating Factor” is R = R0 − I d − I s − I e + A This equation results in an R factor between 0 and 100. The components of R are: R0, base R value (noise level); Id, representing the effects of impairments occurring simultaneously with the speech signal; Is, representing the effects of impairments occurring simultaneously with the speech signal; Ie, representing the effects of "equipment” such as DCME or Voice over IP networks; A, the advantage factor, used to compensate for the allowance users make for poor quality when given some additional convenience (e.g. 0 for wireline and 10 for GSM) Delay impairment Id The Id factor models the quality degradation due to one-way or “mouth-to-ear” delay. Id can be computed from the one-way delay as [29]: I d = 0.024Ta + 0.11(Ta − 177.3) H (Ta − 177.3) ⎧H ( x) = 0 if x<0 where ⎪ ⎨ ⎪ H ( x) = 1 if x≥0 ⎩ Ta represents one-way delay ( or “mouth-to-ear” delay) in milliseconds. MRes Thesis –University of Plymouth 14
  • 21. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Equipment impairment Ie The loss impairment Ie captures the distortion of the original voice signal due to low-rate codec, and packet losses in both the network and the playout buffer. Currently, the E-Model can only cope with speech distortion introduced by several codecs i.e. G.729 or G.723. Mapping R factor into MOS scale We can map R into MOS scale by the following equations [24]: MOS=1 if R ≤ 0 MOS = 1 + 0.035R + R( R − 60)(100 − R)7 ×10 −6 if 0 ≤ R < 100 MOS =4.5 if R ≥ 100 2.1.4 Conversational speech quality evaluation Reference Trace data (loss) Degraded speech speech Encoder Loss process Decoder PESQ Ie MOS MOS->R MOSc E-Model Concepts Trace data (delay) Delay model Id Figure 2-2 Schematic diagram for MOSc measurement Perceived speech quality during a VoIP conversation can be expressed as a conversational Mean Opinion Score (MOSc). MOSc values can be obtained by subjective listening tests or by objective evaluation methods, such as the EModel. As described in Section 2.1.2, the E-Model consists of very complicated equations and is not applicable to some impairment factors, such as some codecs or bit errors in the payload. A prediction method for perceived conversational speech quality has been MRes Thesis –University of Plymouth 15
  • 22. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs proposed in [29]. This schematic diagram of this new method is illustrated in Figure 2-3. In this method, MOS index produced by PESQ is firstly transformed to R scale by R pesq = 3.026 x3 − 25.314 x 2 + 87.060 x − 57.336 where x represents MOS index from PESQ. Then equipment impairment factor Ie can be computed as Ie=R0-Rpesq, with delay impairment factor Id, we can get R scale value by R=R0-Id-Ie, finally get MOSc from R according to the standard E-Model equations. Hence, the impairments of delay, packet loss, coding and bit error can all be represented in the evaluated value of MOSc. 2.2 Adaptive Playout Buffer Playout buffer can be fixed or adaptive. In the fixed playout buffer, the playout delay for a packet stream is preset before a conversation begins. So the fixed playout buffer cannot readily adapt to the time-varying network conditions and may result in poor speech quality. For this reason, adaptive playout buffer is considered. A lot of works have been done in developing adaptive playout buffer algorithms to achieve the best balance between playout delay and packet losses in playout buffer. Recent work di ni bi receiver ai pi sender ti Figure 2-3 Timing associated with packet i in addressing the problem specifically for the Internet can be found in [30][31][32][33]. In this section, we briefly review some playout buffer algorithms from these literatures. The details of applications of adaptive playout buffer in our Wireless VoIP system can be found in Chapter 3, 4. In [30], Ramjee et. al. proposed four algorithms (e.g. ‘exp-avg’, ‘fast-exp’, ‘min- delay’ and ‘spk-delay’) to adjust playout delay according to estimated network delay ^ performance. These algorithms estimate mean and variation of network delay d i and MRes Thesis –University of Plymouth 16
  • 23. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs ^ v i on the arrival of the ith packet. The playout delay is adjusted at the beginning of each talkspurt. Let ti be the timestamp of packet i which is the first packet in a talkspurt, the playout time pi is computed as ^ ^ pi = ti + di + µ ⋅ v i where µ is a constant. The playout time for the subsequent packets j in the same talkspurt pj is computed as pj = pi + t j − ti (see Figure 2-4 for the related timing notations). ^ In these four algorithms v i is given by ^ ^ ^ v i = α ⋅ v i −1 + (1 − α ) ⋅ abs(d i − n)i ^ But they differ in the computation of d i . ^ 1) exponential-average (exp-avg): In this algorithm, the mean delay d i is estimated through an exponentially weighted average [30]: ^ ^ di = α ⋅ d i + (1 − α ) ⋅ ni where ni means the one-way delay of ith packet. The value of α is chosen to be 0.998002 in [30]. 2) fast exponential-average (fast-exp): This algorithm is a modified version of exp- avg. fast-exp computes the weighted mean of as [30]: ⎧ ^ ^ ⎪ β d i − 1 + (1 − β ) n i : n i > d i − 1 ^ ⎪ d i = ⎨ ⎪ ^ ⎪ a d i − 1 + (1 − a ) n i : n i ≤ d i − 1 ⎩ where α and β are constant values, satisfying 0 < α < β < 1. In [30] α = 0.998002 and β = 0.750000, this allows fast-exp adapt more quickly to increases in delays ni . 3) minimum delay (min-delay) : This algorithm is more aggressive in minimizing delays. It uses the minimum delay of all packets received in the current talkspurt. Let Si be this set of delays [30]: d i = min j∈Si {n j } ^ 4) spike delay detection (spk-delay): This algorithm focuses on spike which represents MRes Thesis –University of Plymouth 17
  • 24. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs a sudden and large increase in delays over a sequence number of packets. spk-delay usually obtains the playout delay usig the same equation as exp-avg, despite α is set to be 0.875 in [wan]. During spike, however, spk-delay uses the following ^ ^ d i = d i −1 + ni − ni −1 to catch up the sudden increase of delays. We also present here some more complex algorithms, which have been developed based on the four classical algorithms described above. 5) window: This algorithm is proposed in [31]. It intends to detect spikes like spk- delay. During a spike, the first packet in the spike is used as the playout delay. After the spike, the playout delay is chosen by finding the delay corresponding to the qth quantile of the distribution of the last N (10,000 in [31]) packets received by the receiver. 6) adaptive: In [32], Sun et al had proposed an ‘adaptive’ algorithm to adapt to different networks. The ‘adaptive’ algorithm switch between min-delay and fast-exp ^ depends on d i higher than a delay threshold (e.g.150ms) or not. 7) E-MOS: Fujimoto et al [33] proposed a playout buffer algorithm called E-MOS. The E-MOS algorithm models the delay distribution with the Pareto distribution. The Pareto distribution of delay is integrated with packet loss ratio in a function Q(d) to model the impact of delay and packet loss on speech quality, which is represented by MOS. Upon a packet is received, E-MOS uses the measured one-way delay to update the Pareto distribution. Then, a optimal value of d is chosen as the playout delay if it can maximize speech quality Q(d). 2.3 Automatic Repeat upon reQuest (ARQ) Automatic Repeat reQuest (ARQ) is an error-control system in which a request for re-transmission is generated by the receiver when an error in transmission is detected. A very basic ARQ scheme includes only error detecting and retransmission capabilities. If a packet is found to have errors after decoding, this packet is discarded and a retransmission is requested to the source. The source then retransmits an exact copy of that packet. This process may be repeated indefinitely, but normally an upper bound in the number of retransmissions is set. If errors still persist after the maximum number of allowed retransmissions is reached, higher layer will have to decide how MRes Thesis –University of Plymouth 18
  • 25. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs the situation is to be handled. For the retransmission procedures using ARQ, the three most popular schemes are [16]: Stop and Wait (SW) In SW-ARQ, the sender, after delivering the first copy of a packet in its buffer, is blocked until a positive acknowledgement (ACK) is received or the timeout is expired. In the first case, sender drops the successful packet from the buffer and transmits next packet, while in the second distance, sender simply retransmits the same packet. Go Back N (GBN) The sender continuously transmits packets stored in its buffer, until a Negative ACK (NACK) is received. In this case, sender stops the transmission of a new packet, pulls back to the packet erroneously received, and retransmits a complete sequence of N packets, starting with NACKed packet, where N is the number of packets transmitted within an average round trip time. Selective Repeat (SR) In this case sender continuously transmits packets stored in its buffer. Whenever a NACK is received, sender stops the transmission of a new packet, pulls back to the packet erroneously received, retransmits only it and begins the transmission of a new packet. It is worth noticing that, in this case, the retransmission of successfully received packet following the corrupted packet is avoided, thus allowing better efficiency. MRes Thesis –University of Plymouth 19
  • 26. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 3 PERCEIVED SPEECH QUALITY DRIVEN RTRANSMISSION METCHANISM 3.1 Introduction Quality of Service (QoS) support for voice over IP (VoIP) in wireless/mobile networks is an important issue for technical and commercial reasons. However, speech quality for VoIP suffers from high packet loss rates and other impairments in the wireless link. Retransmission mechanisms, such as automatic repeat request (ARQ), have been incorporated in wireless and cellular networks to retransmit lost packets to improve performance in data transmission over wireless. In wireless networks such as 802.11b [1], the retransmission mechanism is a simple Stop & Wait algorithm and is implemented at the Media Access (MAC) layer, in which each transmitted packet must be acknowledged before the next packet can be sent. If in a certain timeout period an acknowledgement is not received by the sender of a frame, the sender will retransmit the frame until a maximal retransmission limit is reached. When the wireless link quality is poor, retransmission of MAC frames can effectively recover corrupted packets that contain bit errors. However, excessive delays may be introduced by retransmission schemes that have significant adverse effects on real-time applications such as VoIP, which are MRes Thesis –University of Plymouth 20
  • 27. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs sensitive to delay. A simplex retransmission scheme always negatively affects perceived speech quality in VoIP. There exists a tradeoff between packet loss and delay in a variety of retransmission schemes. Improved retransmission mechanisms such as Speech Property-Based ARQ (SPB-ARQ) [21] and Hybrid loss recovery scheme [34] have been proposed to reduce speech distortions by protecting packets that are perceptually more relevant. However, these schemes are only limited to listening-only quality assessment of the effect of the retransmission schemes on speech quality and do not consider the impact of delay which is important for conversation and interactivity. Further, these schemes do not consider the impact of retransmission jitters. Since adaptive jitter buffers would discard inappropriately retransmitted packets, the character of retransmission jitters introduced by different retransmission schemes should be considered. The primary aim of the study reported is to investigate new retransmission mechanisms to improve speech quality for wireless VoIP. In this study, we use a perceived conversational speech quality assessment method [29] to evaluate the performance of current retransmission mechanisms (No retransmission, Full retransmission, SPB retransmission) instead of listening-only method or individual network parameters (e.g. packet loss and delay). We also present a new retransmission policy, which can adapt to the most suitable retransmission mechanism, depending on the wireless link quality and network delay conditions. The ultimate aim of this perceived speech quality driven policy is to achieve optimum speech quality (in terms of the conversational Mean Opinion Score MOSc) in the face of network impairment factors and wireless channel situations, while considering the coupling effect of retransmission jitters and adaptive jitter buffers. 3.2 Related Works 3.2.1 Speech property-based retransmission mechanisms Speech Property-Based QoS control schemes are based on the fact that some voice frames are perceptually more important than others when encoded speech is transferred through packet networks. Recent experimental results show [35], that in some popular codecs used in wireless applications (e.g. AMR) the position of a frame loss has a significant influence on the perceived speech quality. In such codecs, frame loss concealment techniques are used to interpolate the parameters for the loss frames MRes Thesis –University of Plymouth 21
  • 28. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs from the parameters of the previous frames. Lost voice frames at the beginning of a talkspurt will be concealed using the decoding information of previous unvoiced frames. However, because voiced sounds always have a higher energy than unvoiced sounds, concealment of these frames with unvoiced frames that have lower energy will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced transition stage, it is difficult for the decoder to correctly conceal the loss of voiced frames using the filter coefficients and the excitation for an unvoiced sound, especially when burst loss occurs or the frame size grows. To maximize the perceptual quality at the receiving end, perceptually important voice packets may be protected by giving them a high priory with the unimportant packets handled as 'best-effort'. SPB retransmission, a retransmission scheme that protects only the perceptual important speech frames, is presented in [21] [34]. Experimental results reported in [21] show that SPB retransmission could provides a better speech quality (assessed by EMBSD) than No retransmission scheme, which do not retransmit any packet. In [34], SPB retransmission was shown to be more efficient in reducing retransmission delays than Full retransmission, which retransmits every unacknowledged (unACKed) packet. 3.2.2 Measuring conversational speech quality In previous studies [21][34], the assessment of retransmission schemes was performed using the EMBSD algorithm, which only considers the distortion caused by packet loss. However, in practice both packet loss and delay are crucial in voice conversation and long retransmission delays (e.g. due to long network delay) would seriously impact speech quality. The E-model is introduced by ITU as a non-intrusive quality assessment method to obtain a measure of voice quality. Unfortunately, the E- model is only applicable to a limited number of codecs, which at present does not include the AMR codec. In our simulation, we employed the conversation MOS [29] to qualify the performance of different retransmission schemes. In he conversation speech quality evaluation (see Chapter 2), the ITU PESQ is firstly used to quantify the impact of packet loss on speech quality. The result of this is then converted to the equipment impairment Ie. The average end-to-end delay effect, Id, is then calculated. The E-model is then used to obtain a measure of the speech quality, MOSc, based on Ie and Id (see Figure 3-1). MRes Thesis –University of Plymouth 22
  • 29. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 3.2.3 Adaptive jitter buffer and retransmission jitters In VoIP applications, jitters are compensated for in the receiver by a jitter buffer. The size of a jitter buffer can be fixed or adjustable. Fixed jitter buffers cannot adapt readily to changes in network delays and as a result are not practical in real VoIP applications. In our study, we investigated fast-exp, one of the classical adaptive jitter buffer algorithms proposed in [30]. By using a smaller weighting factor as delays increase, the fast-exp algorithm can quickly adapt to the increases while avoiding discarding of too many packets. It estimates the current mean network delay (denoted ^ ^ d as i ) and current variance of network delay (denoted as v i ) when a packet arrives. The mean delay estimation equation is given by: ⎧ ^ ^ ^ ⎪ β d i − 1 + (1 − β ) n i : n i > d i − 1 ⎪ d i = ⎨ ⎪ ^ ⎪ a d i − 1 + (1 − a ) n i : n i ≤ d i − 1 ⎩ where ni is the network delay of the ith packet, β = 0.75 and a = 0.99802. The ^ ^ ^ v i = a v i − 1 + (1 − a ) d i − n i following equation is used to estimate vi : . At the beginning of a talkspurt, adaptive jitter buffer changes the play out delay using the , where D is the play out delay and µ is a constant that ^ ^ D = d + µ * v equation: i i can be selected from 1 to 20. We set µ to be 4 in our simulation. It should be noted that for VoIP over wireless, the network delay ni consists of delays introduced by the wireline network and the wireless link. Jitters can be introduced by network congestions in the wireline network or by retransmissions/propagations in the wireless links. In view of the fact that most jitter buffer algorithms were proposed for compensation of network congestion jitters, it should be valuable to investigate the impact of retransmission jitters for VoIP over wireless 3.3 Simulation System Description MRes Thesis –University of Plymouth 23
  • 30. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Fixed Host Mobile Host Original RTP AMR Adaptive AMR Speech RTP Encoder UDP Playout Decoder UDP Buffer Speech IP Network Marking IP Ethernet Delay MAC Retx. PER Degraded Limit PHY Speech Control Access Point PESQ End-to-end MOS/Ie Delay (Id) MOSc EModel Speech Quality Evaluation Figure 3-1 Simulation Environment Our study is based on network simulator ns-2 [36], in which we simulated a last- hop wireless scenario. Both of the IEEE 802.11 and the Ethernet protocol stack are implemented in the simulator. A two way Bernoulli error model was inserted to simulate the wireless link transmission errors. In 802.11, if the packet size exceeds the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) the packet will be fragmented. Since we set the packet size to 71 bytes, a 12.2kbit rate AMR speech frame for one RTP packet the impact of fragmentation is avoided. The simulation system is given in Figure 3-1. In our simulation, the original speech file is first encoded by the AMR codec and then analyzed to extract the speech marking information (voiced/unvoiced) for each packet. The speech marking information is used with network delay and wireless link quality to control the retransmission policy. The error model determines whether a packet is corrupted or not according to packet error probability (PER). The base station (BS) will neither send an ACK to the sender for a corrupted packet nor present it to the high layer. If the MAC layer of the sender has not received an acknowledgement for a packet, it will retransmit the packet until the packet is ACKed or it reaches the limit of retransmission attempts (we will denote Retransmission as Retx in the rest of this Chapter). In our simulation, we set the Retx attempts limit to 6 for both SPB Retx and Full Retx. In the receiver, the received speech packets are fed to an adaptive jitter MRes Thesis –University of Plymouth 24
  • 31. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs buffer and subsequently decoded to recover the degraded speech file that is used to obtain a measure of speech quality. In our study, we used combined PESQ and E-Model to evaluate the conversational speech quality as described in Chapter 2. Performance index was obtained averaging the computation results that were obtained from this method for each 20 seconds of the speech file. The following simulation results were obtained by averaging results of 50 simulations with different random seeds to avoid the impact of packet loss locations. The three simulated retransmission schemes are SPB Retx, Full Retx and Null Retx. TABLE 3-1 gives the average number of voiced packets losses of transmitting 73000 speech packets in our simulated wireless network with these schemes. For simplicity, we only simulated the wireless link for the purpose of this study. And only the wireless link (Retx limit exceeded) and the adaptive jitter buffer account for the packet losses. In Table 3-1, most of the losses of voiced packets in Full Retx or SPB Retx are caused by jitter buffer. As we deployed a Bernoulli error model in our simulation, most of the retransmitted packets can be successfully received by the receiver. If the bursty of packet errors is considered, there should be more losses of voiced packets in Full Retx or SPB Retx scheme. Table.3-1- Average voiced packets losses with fast-exp playout buffer Retx Scheme No SPB Full PER Retx Retx Retx 0.0001 15 53 29 0.0005 36 54 27 0.0008 61 51 26 0.001 69 47 22 0.003 144 28 17 0.005 241 22 13 0.01 474 13 9 0.05 2344 42 16 0.10 4678 931 159 It seems very straightforward that SPB Retx should be better than No Retx and at least the same as Full Retx with regard to the performance of protecting voiced frames. MRes Thesis –University of Plymouth 25
  • 32. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs However, in TABLE 3-1, we can see that Full Retx always has less voiced packets losses, while No Retx has the least lost voiced packets when link quality is good (packet error probability lower than 0.0005). In fact, as in fast-exp algorithm, the estimated playout delay will increase with the number of retransmission jitters increases. When link quality is good, the estimated play out delay keeps at a low level, occasionally retransmitted packets and packets adjacent to them would be discarded by jitter buffer due to jitters they introduced. However, in No Retx scheme, a corrupted packet doesn’t affect its following packets. That’s why it has least packet losses when link quality is very good. On the other hand, in SPB Retx, unvoiced packets are not retransmitted hence the estimated playout delay can’t reflect current wireless link situations when link quality becomes worse. While in Full Retx, every unACKed packets is retransmitted, this is helpful for the adaptive jitter buffer to estimate the playout delay for the next talkspurt. That’s why the adaptive jitter buffer discards more packets in SPB Retx than in Full Retx. 3.4 Performance Comparison of Current Retransmission Schemes Figure 3-2 and Figure 3-3 give the overall packet loss rates and buffered retransmission delay comparison. In Figure 2, we can see that Full Retx keeps the packet loss rate at a low level at the expense of higher delay as plotted in Figure 3 because every unACKed packet is retransmitted. It’s very interesting that when link quality is not too bad (packet error probability up to 0.01), packet loss rate of Full Retx scheme is decreasing while link quality becoming worse. In fact, as we mentioned before, in worse link quality, more retransmissions helps the jitter buffer to estimate playout delay more accurately. However, when link quality is very good (packet error probability up to 0.0005), No Retx can obtain the best packet loss rate because it doesn’t introduce any jitter and few packets is corrupted due to bit errors. As a compromised method, the packet loss rate and Retx delay of SPB Retx is between No Retx and Full Retx. Using the evaluation method described in Chapter 2, we give a more straightforward performance comparison in Figure 4 and Figure 5 for these schemes with MOSc as the metric. Our evaluation didn’t consider the packet losses introduced in the wireline network hence to focus on the performance of Retx schemes. However, we considered network delay in the evaluation. For natural hearing, delays lower than 100ms cannot really be appreciated, but delays above 150ms can obviously affect MRes Thesis –University of Plymouth 26
  • 33. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 2 10 300 No Retx No Retx SPB Retx SPB Retx Full Retx Full Retx 250 1 10 Buffered Retx Delay (ms) 200 Loss Rate (%) 0 10 150 100 -1 10 50 -2 0 10 -4 -3 -2 -1 0 -4 -3 -2 -1 0 10 10 10 10 10 10 10 10 10 10 Packet Error Probability Packet Error Probability Figure 3-2 Overall packet loss rate comparison Figure 3-3 Buffered retx delay comparison 4.2 Perceived Quality Driven 4 4.1 No Retx SPB Retx 4 Full Retx 3.5 3.9 3.8 3 MOSc MOSc 3.7 3.6 2.5 3.5 3.4 2 Perceived Quality Driven No Retx 3.3 SPB Retx Full Retx 1.5 3.2 -4 -3 -2 -1 0 100 120 140 160 180 200 220 240 260 280 300 10 10 10 10 10 Network Delay Packet Error Probability Figure 3-5 MOSc comparison with packet Figure 3-4 MOSc comparison with 175ms error probability 0.001 network delay MRes Thesis –University of Plymouth 27
  • 34. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs conversation interactivity [37]. Considering Retx delays rarely exceed 100ms, to obviously reflect the impact of Retx delay, we assume 175ms delay had been introduced in the wireline network and add it to the end-to-end delay in the MOSc evaluation. In Figure 4, the MOSc of Full Retx is lower than No Retx and SPB Retx when packet error probability is lower than 0.003. That’s because Full Retx scheme always introduces more Retx delay, while the perceived speech quality is sensitive to high delay when link quality is good. When packet error probability exceeds 0.003, Full Retx scheme becomes the best, as it can greatly reduce the number of corrupted packets. Figure 3-5 illustrates the performance comparison with different network delays when packet error probability is 0.001. In Figure 3-5, we can see that when delay lower than 150ms, Full Retx can get the best MOSc. When delay is higher than 150ms Null Retx becomes the best, it confirms that 150ms is the threshold above which delay begins to have a severe impact on speech quality. Similar to Figure 4, the performance of SPB is between No Retx and Full Retx, but it doesn’t become the best in both sides of the delay threshold. 3.5 Perceived Speech Quality Driven Retransmission Scheme Considering both No Retx and Full Retx schemes can achieve the best MOSc under different link quality and network delay situations. We propose a new perceived speech quality driven retransmission scheme, which can switch between these two schemes when link quality and network delay changes. The pseudo code of the new scheme is shown in Figure 3-6. Low_Error_Threshold is set to be 0.0005 and High_Error_Threshold is 0.003. Since according the simulation results, when packet error probability is lower than 0.0005, No Retx can achieve the best MOSc even delay is not considered, whereas Full Retx becomes the best when packet error probability exceed 0.003, even network delay is very high. When packet error probability is between 0.0005 and 0.003, the decision should be made according to network delay. In the proposed scheme, Delay_Threshold is set to be 150ms as it’s the threshold that delay begin to obviously affect speech quality. In real applications, we can convert Bit Error Rate (BER) to PER, and BER can be obtained according to bit errors in bit pattern series sent from BS. Network delay can be estimated by deducting average MH to BS handoff delay from average end-to-end delay that can be retrieved from RTP packet header. MRes Thesis –University of Plymouth 28
  • 35. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs The performance of the new perceived speech driven scheme is also given in Figure 3-4 and Figure 3-5 under different network delay and packet error probability. We can see that the curve of the perceived quality driven scheme is overlapped with parts of No Retx and Full Retx when they achieve best MOSc. As it can switch to the more suitable scheme between No Retx and Full Retx when communication conditions changes. Since this method only uses Full Retx when it’s necessary, it can also achieve the similar retransmission efficiency as SPB Retx while avoid the implementation complexity to obtain speech property information that is necessary for SPB Retx. if (PER < Low_Error_Threshold) . No_Retx(); else if (PER>High_Error_Threshold) Full_Retx(); else { if(Network_Delay<Delay_Threshold) Full_Retx(); else No_Retx(); } Figure 3-6 Perceived speech quality driven Retx scheme pseudo code 3.6 Summary A suitable retransmission scheme is crucial for obtaining the best possible perceived speech quality in wireless VoIP applications. In this Chapter, we investigated the performance of three different retransmission schemes (No Retx, SPB Retx, Full Retx) with regard to the perceived conversational speech quality. The impact of retransmission jitters with an adaptive jitter buffer was also considered. The simulation results show that the performance of these schemes depends on the network delay and wireless link quality. Considering that the wireless environment is variable, we have proposed a perceived speech quality driven retransmission scheme that can adapt to the wireless link quality and network delay conditions. As the SPB Retx is not involved in the new method, the implementation complexity for retrieving speech property information is avoided. Our results show that the proposed method can achieve an optimum MOSc compared to No Retx, Full Retx and SPB Retx. Since the most suitable scheme is deployed by the new method when communication conditions change. In the study, a simplified last hop wireless network is MRes Thesis –University of Plymouth 29
  • 36. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs implemented to demonstrate wireless voice over IP scenario. Further improvements may be achieved by making the simulation closer to real network, e.g. by incorporating a multi-state error model in the wireless link. MRes Thesis –University of Plymouth 30
  • 37. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHPAPTER 4 PLAYOUT DELAY CONSTRAINED ARQ and ARQ AWARE PLAYOUT BUFFER 4.1 Introduction Due to the unreliable and error-prone features of wireless channels, assuring acceptable perceived speech quality has been a challenging task for Wireless VoIP. Automatic Repeat on reQuest (ARQ) is one of the packet error recovery techniques for Wireless VoIP and may be a complement or substitute for Forward Error Correction (FEC) because of its efficiency and simplicity. Timer Timer Timer Timer Started Stopped Restarted Stopped Tx Queue Timeout Backoff ACKn ACKn+1 n n+1 n+1 Wireless Channel Rx Buffer Frame Loss Figure 4-1 Stop and Wait ARQ MRes Thesis –University of Plymouth 31
  • 38. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs In ARQ, the sender sends packets or Protocol Data Units (PDUs) consisting of payload and checksums. According to the result of checksum validation, the receiver sends back acknowledgment messages (e.g. ACK or NACK) to the transmitter. The sender performs packet retransmissions based on such acknowledgments. Basically, ARQ protocols can be categorized as three types: Stop-and-Wait (SW), Go-Back-N (GBN) and Selective Repeat (SR), which are differed in the way of responding to acknowledgments. The details of these three types of ARQ have been described in Chapter 2. In this study, we consider the SW-ARQ in IEEE 802.11 Media Access Control (MAC) Layer [1]. In the 802.11 SW-ARQ, the transmitted packet must be acknowledged before the next packet can be sent. If in a certain timeout period an acknowledgement for a packet is not received by the sender, the sender will retransmit this packet until a maximal retry limit is reached. In the Distributed coordination function (DCF) Mode of IEEE 802.11, there is a Backoff procedure to randomly defer each retransmission hence to avoid collisions of multiple transmitters (see Figure 4-1). With this procedure, corrupted packets may be recovered by the retransmitted copies. However, ARQ schemes also bring a series of problems impacting the perceived speech quality. The retransmission procedure may introduce excessive delays, when the packets have to traverse a high delay wireline network before it reach the wireless part, any retransmissions may considered unnecessary [22]. Number of retransmission attempts may vary according to wireless channel quality, this leads to retransmission jitter. Further, the layered protocol architecture, which puts ARQ and the playout buffer works in different layer, makes things go from bad to worse. Firstly, if an adaptive playout buffer is employed in the Wireless VoIP system, a packet’s delay budget - playout delay is decided at the beginning of each talkspurt. Since the retransmission procedure is only constrained by a fixed maximum retry limit, high retry limit that exceeds available delay budget may lead to unnecessary retransmissions and postpone subsequent packets, while low retry limit may terminate retransmission procedure prematurely with enough delay budget left. Secondly, considering a transmitting queue exists in the sender, a high mean retransmission delay can make incoming packets accumulate in the queue and queuing delay or losses quickly climb up. Thirdly, in current protocol stack, packets that failed in transport or link layer checksum validations are discarded, despite noisy voice packets may be considered useful at the upper layer [38]. MRes Thesis –University of Plymouth 32
  • 39. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs These problems have been addressed in some previous works. In [39][40][41], the retransmission procedure is still constrained by a fixed maximum retry limit, but it can be terminated at a packet’s deadline (e.g. presentation time). Nevertheless, these works still cannot avoid the prematurely terminating of a retransmission procedure when there is still some delay budget left for more retry attempts, and did not consider the impact of retransmission delays on queuing delays or losses. In [15] UDP-Lite, a modified UDP protocol with partial checksum, has been developed to allow corrupted UDP packet to be reused at application level. However, for Wireless VoIP the MAC layer checksums should be modified as partial as well. Otherwise, noisy packets would have been discarded in MAC layer and never reached upper layers. We extended these ideas in a cross-layer design for Wireless VoIP, where the retransmission procedure is only incorporated in local channel. In our design, link layer ARQ and playout buffer cooperate in an integrated framework, in which 1) retransmission procedure of a packet is constrained in the available delay budget. 2) Speech data is not covered in the checksum of link layer or transport layer packets. And a packet combining process is performed to get a least noisy packet from its retransmitted copies. 3) Estimates delivery delay in the wireless channel separately and limits it in the mean inter-arrival delay of the transmitting queue. Simulation results show that with the help of this design, the simulated Wireless VoIP system gained considerable performance improvement, at the expense of breaking the layered protocol architecture. 4.2 The Cross-Layer Design PLAYOUT BUFFER PLAYOUT TIME RTP RTP b a3 a2 a1 To DECODER UTP UTP PACKET INCOMING QUEUE COMBINNING IP IP ETHERNET c b a 802.11 MAC Retransmission Terminated PHY ACESS POINT FIXED HOST MOBILE HOST Figure 4-2 the Cross-layer design system model MRes Thesis –University of Plymouth 33
  • 40. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 4.2.1 System model The system model of the proposed cross-layer design is described in Figure 4-2. We considered the last-hop scenario in an IEEE 802.11 wireless network. Our design is composed of two correlated components: playout delay constraint ARQ, in which playout delays become the stop criterion of the retransmission procedure; ARQ aware playout buffer, which calculates packet delivery delay for the wireline and wireless part respectively and constrains the wireless channel delay budget under the arriving interval of incoming packets hence to avoid accumulations of queuing delay. As speech data is not covered by the link layer and transport layer checksums, the playout buffer may receive several noisy versions of a packet. In case of the packet’s correct version hasn’t been received at its presentation time, we employed the Majority-Logic packet combining [44] to further reduce the damaged part and then sent a combined version to the decoder. Details of this technique are presented in Appendix D. The two key components of the cross-layer design are described in the following subsections. 4.2.2 Playout delay constrained ARQ Playout N Received A time? N packet? Y Y Check recei ved Present to copi es of the pl ayout packet upper layer Wait for Y Exist a correct Corrupted? packet Send to Decoder version? Y retransmission N N Terminate current retransmission process Send ACK M ti -l ogi cal ul Appliation & Link packet com ni ng bi Layer Interface Figure 4-3 Block diagram of the playout delay constrained ARQ with packet combining The playout delay constraint ARQ is a specific optimization of current protocol stack for Wireless VoIP. The block diagram of the playout delay constraint ARQ is MRes Thesis –University of Plymouth 34
  • 41. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs given in Figure 4-3. In the receiver, the 802.11 MAC layer presents every received packet to the upper layer, whether it’s corrupted or not. In the application layer, the playout buffer can terminate a packet’s retransmission procedure at its playout time hence to avoid unnecessary retransmissions. If a corrupted packet hasn’t been recovered by the retransmission procedure, the received noisy copies are combined together by the packet combining module to get a more reliable version, which is then decoded and played out. We still keep the maximum retry limit in the 802.11 SW-ARQ, but it is set to be high enough so as to avoid prematurely terminating of retransmission procedure when there is still delay budget left for more retry attempts. To allow corrupted packets to be presented from link layer to application layer, the link layer and transport layer checksums have to be modified as partial (e.g. UDP-Lite). And the mechanisms that eliminating duplicate PDUs should be turned off for the supported VoIP services. Further application level checksum such as CRC in RTP packet should be enabled hence the application layer can detect correct packets from several copies. 4.2.3 ARQ aware playout buffer 4.2.3.1 Queue model Assume there is a per flow transmission queue at the sender with a large enough queue length, so the queue losses can be ignored and we can focus on the queuing delay. With the IEEE 802.11 SW-ARQ, the transmission queue can be seen as an M/M/1 queuing system with Poisson distribution of packets arrivals and exponential distribution of packets departures [45]. Let α be the average inter-arrival delay and s 1 1 λ = µ = the average packets departure delay. We have a , s where λ and µ are the mean arrival rate and mean service rate. The mean waiting delay in the queue can be computed as 1 a ⋅s TQ = = µ − λ a − s We can deduce that when s → a , TQ → ∞ which means if the mean delivery delay in the wireless channel is not constrained under the mean inter-arrival delay of incoming packets, TQ will quickly climb up. MRes Thesis –University of Plymouth 35
  • 42. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 4.2.3.2 ARQ aware playout buffer For Wireless VoIP, the network delay is composed of delivery delays in wireline and wireless part. In our design, besides adjusting playout delay for each talkspurt, the ARQ aware Playout Buffer is able to estimate required delivery delay in the wireless and wireline part separately. Figure 4-4 gives the timing notations associated with the playout buffer algorithm. ^ d i ni nwi nci Receiver ai ri Retry Access Attempts Point Sender ti Figure 4-4 Timing associated with Packet Since every noisy copy produced in the retransmission procedure was not discarded, there may be several copies of a packet exist in the playout buffer. Let ai be the receiver timestamp of the first arrived copy of ith packet, and ti be the sender timestamp. We can compute delivery delay in wireline network for packet i (denoted by nwi) as nwi = ai − ti . Let ri be the receiver timestamp of the last arrived copy. The delivery delay in wireless channel of packet i (denoted by nci) can be computed as nci = ri − ai . If no retransmission required for packet i, ri = ai . However, recall that the waiting delay in the transmission queue will quickly climb up if the mean delivery delay in the wireless channel higher than the mean inter-arrival delay of the incoming packets (denoted by σ i ). The playout buffer should be able to limit nci under σ i when ri − ai ≥ σ i . σ i can be estimated as: σ i = α ⋅ σ i −1 + (1 − α ) ⋅ abs (σ i i − σ i −1 ) where α is the same constant as used in the estimation of v i where α is the same ^ ^ constant as used in the estimation of v i and it is set to be 0.99802 in the simulation. The computing formula for network delay ni can be summarized as: MRes Thesis –University of Plymouth 36
  • 43. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs ⎧nwi + ri − ai ri − ai < σi ⎪ ni = nwi + nci = ⎨ nwi + σi ri − ai ≥ σi ⎪ nwi ri = ai ⎩ The ARQ aware playout buffer is only differed with other algorithms in the way of ^ computing network delay ni, We can estimate mean network delay d i according to present algorithms, e.g. the ‘adaptive’ algorithm proposed in. [35]: {n }. ^ ^ if (d i ≥ delay _ threshold ) d i = min j∈ S i j ⎧ ^ ^ ^ ⎪ β d i −1 + (1 − β ) n i ⎪ n i > d i −1 else d i = ⎨ ⎪ ^ ⎪ a d i −1 + (1 − a ) n i ⎩ n i ≤ d i −1 Details of this algorithm can be found in Chapter 2. 4.3 Simulation Model and Experimental Results As presented in Figure 4-5, the simulation model is comprised of the following components: a voice traffic model, AMR encoder and decoder, a playout buffer, and a wireless network simulator that integrated the 802.11 SW-ARQ and a simple Bernoulli bit error model. Wireless Playout Encoder Network Decoder Buffer Simulator Conversational End-to-end Speech Quality Delay Evaluation Voice MOSc Traffic Figure 4-5 the Simulation Model 4.3.1 Wireless channel model We employed a simple Bernoulli model for bit errors, which lead to packet corruptions in the payload and the packet header. The probability of PHY layer packets corrupted by bit errors PER can be computed as follows: PER = 1 − (1 − BER) ph+ pl MRes Thesis –University of Plymouth 37
  • 44. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs where BER is the Bit Error Rate and ph is the packet overhead size from physical level. For our simulations we have used a value of 784 bits for ph: 24, 34, 20, 8, 12 bytes at the PHY, MAC, IP, UDP and RTP layer respectively (no header compression is used). pl is the payload size, which is set to be 32 bytes corresponding to an AMR 12.2K voice frame. Let ω denote the estimated playout delay, and Rϖ be the corresponding maximum retry limit constrained by ω . We can also compute the probability of a packet being recovered after Rϖ times of retransmissions PKR as PKR = PERRϖ −1 ⋅ (1 − PER) And the probability of the bit errors happen in the packet header PHE can be given by: 1 PHE = 1 − (1 − BER ) ph ⋅ pl + 1 If a packet always contains bit errors in its header in R times of retransmissions, the speech data carried by this packet can not be reused. The probability of this event PLS is: PLS = PHE Rϖ 4.3.2 Voice traffic model The voice traffic model can be simply represented by the on-off model [48]. In the on-off model a two-state chain is assumed, one corresponds to the talkspurt and one for the silence periods. The holding time in the two states is assumed to follow an exponential distribution. In our simulation we selected a mean of 1.0 sec and 1.5 sec for talkspurt state and silence state respectively as suggested in [49] 4.3.3 Speech quality evaluation In our simulation model, we employed the conversational speech quality evaluation method [29] to qualify the performance of different simulation strategies. This method combined PESQ and E-Model to measure the perceived speech quality, the results is represented by MOSc (Conversational Mean Opinion Score). The details of this method can be found in Chapter 2. In this method, the impact of bit errors in the payload, packet losses and delay all contribute to the degradation of final evaluated speech quality. MRes Thesis –University of Plymouth 38
  • 45. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 4.3.4 Simulation results and analysis We considered three strategies in the simulation study: Strategy A. SW-ARQ and ‘adaptive’ playout buffer without the proposed cross-layer design, Strategy B. playout delay constrained ARQ with ‘adaptive’ playout buffer, and Strategy C playout delay constrained ARQ with ARQ aware playout buffer. The simulation results were obtained by averaging results of 30 trials with different random seeds to avoid the impact of packet loss or bit error locations. Each trial continued for 200 seconds corresponding to 10,000 PDUs (one PDU encapsulated one RTP packet). Figure 4-6 shows the overall packet loss ratio comparison for these strategies. When BER increases, Strategy A discard many corrupted packets that can not be fully recovered before their playout time. Strategy B and C are the same policy regarding packet losses. Both of them reuse noisy packets and only discard those packets that cannot reach the receiver before their playout time. The result is that Strategy B and C only discard a small percentage of packets compared to Strategy A, even when the wireless channel is very noisy. MRes Thesis –University of Plymouth 39
  • 46. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs 70 350 Strategy A inter-arrvial delay: 26ms Strategy B inter-arrvial delay: 28ms 60 Strategy C iinter-arrvial delay: 30ms 300 50 E n d -t o -e n d d e la y (m s ) P a c k e t lo s s R a t io (% ) 250 40 30 200 20 150 10 0 100 -4 -3 -4 -3 10 10 10 10 BER BER Figure 6 overall packet losses Figure 7 End-to-end delay VS inter- arrival delay in Strategy A 350 4.5 Strategy A Strategy B 4 Strategy C 300 3.5 E n d -t o -e n d d e la y (m s ) C o n v e rs a t io n a l M O S 3 250 2.5 200 2 1.5 150 1 Strategy A Strategy B Strategy C 0.5 100 -4 -3 -4 -3 10 10 10 10 BER BER Figure 8 end-to-end delay comparison Figure 9 conversational MOS comparison MRes Thesis –University of Plymouth 40
  • 47. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs We also plotted end-to-end delays under different inter-arrival delays and wireless channel conditions in Figure 4-7 and 4-8 with a fixed 100ms delay in the wireline −4 network. In Figure 4-7, the delay curves begin to spread at BER 5 × 10 . The curve for the shortest inter-arrival delay (26ms) increases fastest. It reflects the queue model that the closer between the mean inter-arrival delay and the delivery delay in the wireless channel, the higher the queuing delays or the end-to-end delays. In Figure 4-8, we can see that the end-to-end delays of these strategies are climbing with the increasing of BER. Strategy B performs slightly better than Strategy A when BER become worse, as Strategy B has the capacity to terminate unnecessary retransmission. Strategy C outperforms Strategy A and B with a more stable curve, as it managed to avoid queuing delay accumulations. It should be noted that the delay curves decreased at some points where the ‘adaptive’ playout buffer switches to the ‘min-delay’ algorithm more frequently. The performance enhancement achieved by the cross-layer design in terms of conversational Mean Opinion Score (MOSc) are presented in Figure 4-9. From Figure 4-9, we can see that the curve of Strategy A and B deceases significantly after BER 10- 4 . At a BER of around 10-3, Strategy A already reaches 1.0, which is the worst MOSc. On the contrary, Strategy C, or the cross-layer design, still achieves MOSc 3.0 at the same BER. 4.4 Summary We investigated problems introduced by the IEEE 802.11 SW-ARQ protocol when it works with other components of a Wireless VoIP system (e.g. transmitting queue, adaptive playout buffer) in the layered protocol architecture, and propose a cross-layer design as a solution for the presented problems. The proposed cross-layer design is composed of two correlated components: 1) playout delay constrained ARQ, in which a packet’s playout time is the deadline of its retransmission procedure, and instead of simply discard corrupted packets, noisy copies of a packet can be combined and then played out. 2) ARQ aware playout buffer, in which requirements for the delivery delay in wireless channel (e.g. not to advocate queuing delay) is considered in playout delay estimation. Through simulations, we show that the proposed cross- layer design can improve the perceived speech quality of a Wireless VoIP system in terms of conversational Mean Opinion Score (MOSc). In our simulation, the wireless MRes Thesis –University of Plymouth 41
  • 48. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs channel errors are represented by a simple Bernoulli error model. Further improvements may be achieved by making use of multi-state error models to simulate transmission errors in wireless channel. MRes Thesis –University of Plymouth 42
  • 49. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs CHAPTER 5 DISCUSSIONS, SUGGESTIONS for FURTHER WORKS, and CONCLUSIONS 5.1 Discussions So far, based on the research works we have done in this study, we can discuss the research questions raised at the beginning of this dissertation. a. What are the impairment factors of Wireless VoIP applications? For VoIP, the impairment factors have been concluded as packet loss, delay, jitter and coding. Besides these impairment factors, for Wireless VoIP, bit errors can be concluded as another impairment factor. If the whole packet carrying speech data is covered by checksums (UDP checksum or MAC checksum), the effect of bit errors perceived at the application level is also packet loss. However, if we applied a partial checksum to cover only the packet header, the effect of bit errors can be packet loss or speech distortion, depends on the positions of bit errors are inside the packet header or payload b. What are the pros and cons of ARQ mechanisms? Is the performance of Wireless VoIP System improved by ARQ mechanisms in terms of perceived speech quality? MRes Thesis –University of Plymouth 43
  • 50. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Compared to FEC that requires extra overhead, ARQ is a simple and efficient way to recover damaged packets. The main problems introduced by ARQ schemes are retransmission delay and jitter. Normally, the perceived speech quality of a Wireless VoIP system can be significantly enhanced by using variation of ARQ schemes, except some cases, e.g. low BER and high wireline network delay. But the use of ARQ schemes should be constrained in a certain level, i.e. constrain the delay for retransmission procedure in the playout delay or inter-arrival delay of the transmitting queue at the access point. c. How to optimize current ARQ schemes to improve speech quality? And how to mapping real-time network and wireless channel QoS parameters into ARQ protocol optimization? ARQ schemes can be optimized to achieve retransmission efficiency, e.g. only retransmitting import speech packets in SPB ARQ or switching between No Retransmission and Full Retransmission in the proposed perceived speech quality driven scheme. Another optimized version of ARQ is playout delay constrained ARQ, which can terminate retransmission procedure of ARQ whenever necessary. All these optimizations need QoS parameters to make decisions. The QoS parameters may be obtained from other layers, namely, playout delay from application layer, wireless channel performance from physical layer and other information from joint-layer analysis. d. What are the effects of the interactions between ARQ mechanisms with other components of the Wireless VoIP system? How to cope with these effects if they are negative? One example of interactions between ARQ and other components of the Wireless VoIP system is the effect of playout buffer on ARQ. If the retransmission procedure is only constrained by a fixed maximum retry limit, retransmission procedure with high retry limit may exceed available delay budget, leading to unnecessary retransmissions and subsequent packets postponed, with low retry limit retransmission procedure may be terminated prematurely before it reach the playout time. The corresponding solution is the proposed playout delay constrained ARQ, for which the retry limit is the estimated playout delay. MRes Thesis –University of Plymouth 44
  • 51. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs e. How to make use other packet error concealment technologies with ARQ? Or how to use ARQ as a complement mechanism for other packet error concealment technologies? Using ARQ as a complement mechanism for other packet error recovery techniques, e.g. FEC, has been addressed in previous works. In this study, we investigated the performance of a cross-layer design, which incorporated ARQ, majority-logical packet combining and partial checksum. The several noisy copies, which were produced from the retransmission procedure of ARQ, can result in a least noisy copy through a packet combining process. We conclude that a hybrid packet recovery solution can achieve better performance gain than a single one, provided appropriate scheduling among available packet error concealment techniques.. f. How to establish a cross-layer framework in which we can optimize the QoS techniques located in different layer with a joint-layer analysis? And how to establish a profile of real-time predicted speech quality and QoS parameters collected from different layers and eventually make this profile become the scheduler of a cross-layer framework? In this study, we have achieved considerable improvement of speech quality by simply adapting QoS parameters into the optimization of ARQ schemes with joint- layer analysis. More works left for future studies to establish a perceived speech quality driven cross-layer framework, in which QoS parameters from different layers, evaluated speech quality feed back from the receiver and the Service Level Agreement (SLA) are contributed to the decisions about using which packet error recovery techniques and how to combined them together in a inter-cognizing way. 5.2 Suggestions for Further Works In this study, the packet error recovery techniques of the cross-layer designs are driven by network parameters. In future works, we plan to improve the performance of cross-layer designs by establishing a more sophisticated perceived speech quality driven close-loop packet error recovery scheduler. By close-loop, we mean the effects of the strategy issued by the cross-layer design can be feedback and contribute to the next phase of strategy-making. In fact, we expect the perceived speech quality driven packet error recovery MRes Thesis –University of Plymouth 45
  • 52. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs scheduler have the following abilities: 1) collect QoS parameters (e.g. BER, end-to- end delay, packet loss, and bandwidth) from different layers to form a profile of current communication environment; 2) considering the performance feedback, the situations of current communication environment and the users’ requirement (e.g. SLA), produce an optimized packet error recovery strategy; 3) according to the decided strategy, packet error recovery techniques are scheduled and several techniques may be used at the same time, e.g. link layer ARQ and application level FEC; 4) speech quality is evaluated periodically and sent back with other QoS parameters to the Scheduler as the input of strategy-making. Figure 5-1 illustrated the block diagram of the perceived speech quality driven packet error recovery scheduler. The Scheduler will be composed of three key components: Speech Fixed Host Degraded Source Mobile Host Speech RTP Adaptive Decoder Encoder RTP Playout UDP UDP Buffer IP IP Ethernet MAC Access Point PHY Error recovery strategy FEC QoS parameters Booster Packet ARQ Error Perceived Recovery Speech ... Scheduler Quality Feedback (MOSc, end-to-end delay, etc.) Evaluation Figure 5-1 Perceived speech quality driven packet error recovery scheduler Packet error recovery scheduler: The central part of the framework is a real-time scheduler located in the mobile host or wireless handset. The scheduler takes into consideration variations due to channel error rate, overall packet loss rate, speech quality feedback etc. and tries to produce an optimal packet error recovery strategy for local wireless channel to maximize the perceived speech quality with available resource. The packet error recovery strategy may address the problem about which packet error recovery technique should be scheduled, FEC, ARQ, low coding rate or hybrid? The specification for a specific technique can be provided as well, e.g. coding rate, delay budget for ARQ. MRes Thesis –University of Plymouth 46
  • 53. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Booster: a Booster will be patched in the access point (AP). The Booster will have the capacity of per flow service differentiation and admission control in the distributed coordinated function (DCF). The Booster will also cooperate with the Scheduler to differentiate wireless channel delivery delay from network delay or wireless channel packet losses from network congestion losses. Further, the Booster can be designed to change its QoS policies according to the packet error recovery strategy issued by the scheduler. Perceived Speech quality evaluation and feedback: the perceived speech quality evaluation module is located in the receiver. This module will evaluate perceived speech quality at a specified interval and send back the results, normally the conversational Mean Opinion Score, with other QoS parameters such as end-to-end delay to the sender. Such feedback information can be carried by the RTCP report or other forms of in-band signaling. It should be noted that Figure 5-1 only gives the scenario of a Mobile Host sending out speech traffic. In fact, the perceived speech quality evaluation module should be also located in the Mobile host itself, and feed evaluated quality indexes to local Scheduler in the case of the Mobile host is receiving speech traffic in a conversation, Besides these functionality considerations, more details about implementation complexity, resource requirement etc. will be considered as well. 5.3 Conclusions Perceived speech quality is crucial for the success of Wireless VoIP, a typical application in the up coming wireless Internet or “4G”. The impairment factors to the perceived speech quality of a Wireless VoIP system can be summarized as packet loss, end-to-end delay, jitter, bit error and coding. In this study, we investigated the problems introduced by ARQ schemes with regard to the perceived speech quality. We tried to optimize current ARQ protocol by mapping cross-layer QoS parameters into the scheduling and configuration of the retransmission procedure in ARQ. We proposed a perceived speech quality driven retransmission scheme, which can switch to the most suitable retransmission schemes according to QoS parameters reported from lower or upper layer. We also developed a cross-layer framework, in which the MRes Thesis –University of Plymouth 47
  • 54. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs retransmission procedure of the ARQ protocol is determined by the available playout delay and the delivery delay in the wireless channel is constrained in the network delay estimation. Through simulation results, we showed that these cross-layer techniques can achieve significant performance gains. But the works have been done are far from perfect, towards an integrated perceived speech quality driven cross-layer framework, more effort are required in future studies. MRes Thesis –University of Plymouth 48
  • 55. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs REFERENCES [1] IEEE Standards Department, IEEE 802.11 Standard for Wireless LAN, Medium Access Control (MAC) and Physical Layer (PHY) Specification, 1999 [2] 3GPP2 C.S0001-B, Introduction to cdma2000 Spread Spectrum Systems, MAY 2002 [3] Schulzrinne H., Castner S., Frederick R and Jacobson V.,RFC 1889: RTP: a Transport Protocol for Real-time Applications, 1996 [4] Tanenbaum A.S. Computer networks, Prentice-Hall, 1996, ISBN 0-13-394248-1 [5] Thomas J.Kostas et al, Real-Time Voice Over Packet-Switchced Networks, IEEE Network, 12(1): 18-27, January, 1998 [6] 3GPP TS.26090: Mandatory Speech Codec speech processing functions AMR speech Codec; Transcoding functions, DEC 1999 [7] 3GPP2 C.S0014-0: Enhanced Variable Rate Codec (EVRC), JAN 1997 [8] S Rudkin, A Grace and M W Whybray, Real-time application on the Internet, British Telecom Technology Journal,Vol 15,No2,April 1997. [9] Agilent Technologies, Web ProForum Tutorials, Voice Quality (VQ) in Converging Telephony and IP networks, http://www.iec.org/tutorials/voqual.pdf [10] M. Veeraraghavan, N. Cocker, and T. Moors, Support of voice services in IEEE 802.11 wireless LANs, Proc. Infocom, Anchorage, Alaska, April 2001 [11] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley and J. Crowcroft, RFC3453: The Use of Forward Error Correction (FEC) in Reliable Multicast, DEC 2002 [12] Moo Young Kim, Renat Vafin, Packet-Loss Recovery Techniques for VoIP, Technical Report, Royal Institute of Technology (KTH), Sweden [13] Wenyu Jiang, Henning Schulzrinne, Comparison and Optimization of Packet Loss Repair Methods on VoIP Perceived Quality under Bursty Loss , NOSSDAV 2002 [14] C. S. Perkins, O. Hodson and V. Hardman, A Survey of Packet-Loss Recovery Techniques for Streaming Audio, IEEE Network Magazine, SEP/OCT 1998. [15] L. A. Larzon, M. Degermark, and S. Pink, “The UDP Lite Protocol,” Internet Draft draft-ietf-tsvwg-udp-lite-00.txt, Jan. 2002. [16] Leon-Garcia and Widjaja, Communication Networks: Fundamental Concepts and MRes Thesis –University of Plymouth 49
  • 56. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs Key Architectures, McGraw-Hill, 2000, ISBN 0070228396 [17] Qian Zhang, Wenwu Zhu, and Ya-Qin Zhang, A Cross-layer Qos-Supporting Framework for Multimedia Delivery over Wireless Internet, Proc. 12th Packet Video Workshop (PV2002), Pittsburgh, USA, 2002 [18] Sanjay Shakkottai, Theodore S. Rappaport and Peter C. Karlsson, Cross-layer Design for Wireless Networks, Technical Report Submitted for Journal Publication, 2003 [19] S. Krishnamachari, M.V. D. Schaar, S. Chor and X. Xu,Video Streaming over Wireless LANs: A Cross-layer Approach, Proc. Packet Video, Nantes, France, APR 2003 [20] Yo Huh, Ming Hu, Martin Reisslein, and Junshan Zhang, MAI-JSQ: A Cross- Layer Design for Real-Time Video Streaming in Wireless Networks, Technical Report Telecommunications Research Center, Dept. of Electrical Eng., Arizona State University, AUG 2002. [21] H Sanneck, N Tuong L Le et al, Selective Packet Prioritization for Wireless Voice over IP, 4th Int Sym Wireless Personal Multimedia Communication, Denmark, 2001 [22] Z.Li, L.Sun, Z.Qiao and E.Ifeachor, Perceived Speech Quality Driven Retransmission Mechanism for Wireless VoIP, Proc. IEE 3G 2003 pp395-399, London, UK, JUN 2003 [23] ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. [24] ITU-T Recommendation G.107 (05/2000), The E-model, a computational model for use in transmission planning. [25] ITU-T Recommendation P.830, Subjective Performance Assessment of Telephone-band and Wideband Digital Codes. [26] Athina. P. Markopoulou, Access the Quality of Multimedia Communication over Internet Backbone Networks, PHD thesis, Department of Electronical Engineering, Stanford University, USA, OCT 2002 [27] ITU-T Recommendation G..108, Application of the Emodel: a planning guide, SEP 1998 [28] ITU-T Recommendation G.113, Transmission impairments due to speech processing, FEB 2001 MRes Thesis –University of Plymouth 50
  • 57. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs [29] Lingfen Sun and Emmanuel Ifeachor, "New Methods for Voice Quality Evaluation for IP Networks", Proc. of 18th International Teletraffic Congress (ITC18), Berlin, Germany, SEP 2003 [30] R.Ramachandran, J.Kurose, D.Towsley and H.Schulzrinne, 1994, Adaptive playout mechanisms for packetized audio applications in wide-area networks, Proc. of IEEE Inforcom, vol.2, pp.680-688 [31] S. B. Moon, J. Kurose, and D. Towsley, “Packet audio playout delay adjustment: performance bounds and algorithms,” ACM/Springer Multimedia Systems, vol. 5, pp. 17–28, JAN 1998. [32] L Sun, E.C.Ifeachor, 2003, Prediction of Perceived Conversational Speech Quality and Effects of Playout Buffer Algorithms, Proc. of IEEE ICC 2003 [33] Kouhei Fujimoto, Shingo Ata, and Masayuki Murata. Playout control for streaming applications by statistical delay analysis, Proc. IEEE ICC, vol.8, pp 2337- 2342, JUN 2001. [34] C.Hoene, I.Carreras, A.Wolisz, 2001, Voice over IP: Improving the Quality over Wireless LAN by Adopting a Booster Mechanism – An Experiment Approach. Proc. SPIE 2001 - Voice over IP (VoIP) Technology, pp. 157- Denver, Colorado, USA [35] L.F.Sun, G.Wade, B.M.Lines and E.C.Ifeachor, 2001, Impact of Packet Loss Location on Perceived Speech Quality ,Proceedings of 2nd IP-Telephony Workshop (IPTEL '01), Columbia University, New York, pp.114-122. [36] The Network Simulator - ns-2, http://www.isi.edu/nsnam/ns/ [37] ITU-T G.114, One-Way Transmission Time, FEB 1999 [38] Florian Hammer, Peter Reichl, Tomas Nordstrom, Gernot Kubin, Corrupted Speech Data Considered Useful, in Proceeding First ISCA Tutorial and Research Workshop on Auditory Quality of Systems, Mont Cenis, Germany, April 2003 [39] E. Uhlemann, T. M. Aulin, L. K. Rasmussen and P.-A.Wiberg, “Concatenated hybrid ARQ - A flexible scheme for wireless real-time communication”, IEEE Real- Time and Embedded Tech. and Appl. Symp., SEP 2002 [40] Christos Papadopoulos, Gurudatta M.Parulkar, Retransmission-Based Error Control for Continuous Media Applications, Proc. NOSSDAV, 1996 [41] Guijin Wang, Qian Zhang, Wenwu Zhu and Ya-Qin Zhang, Channel-Adaptive Error Control for Scalable Video over Wireless Channel, the 7th International workshop on Mobile Multimedia Communcations (Momuc), Japan, Oct.2000 MRes Thesis –University of Plymouth 51
  • 58. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs [42] Qingwen Liu, Shengli Zhou, and Georgios B. Giannakis, Cross-Layer Combining of Adaptive Modulation and Coding with Truncated ARQ over Wireless Links, IEEE Transactions On Wireless Communications, 2004 (To appear) [43] Richard Han, David Messerschmitt, A Progressively Reliable Transport Protocol For Interactive Wireless Multimedia, ACM Multimedia Systems Journal, MAR 1999 [44] Stephen B.Wicker, Adaptive Rate Error Control Through the Use of Diversity Combining and Majority-Logic Decoding in a Hybrid-ARQ Protocol, IEEE Transactions on communications, VOL.39, NO.3, MAR 1991 [45] E. PAGE, Queuing system in OR, the Butterworths Group, 1972, ISBN 0408702370 [46] F.Cali, M.Conti and E.Gregori, “IEEE 802.11 wireless LAN: Capacity analysis and protocol enhancement”, Proc. IEEE INFOCOM, 1998 [47] J. Rosenberg, L. Qiu and H. Schulzrinne, ‘Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms on the Internet’, Proc. of IEEE Infocom 2000, vol.3 pp.1705-1714 [48] P. Brady, ‘A Technique for Inversting On-Off Patterns of Speech’, Bell System Technical Journal, 44(1):1-22, JAN 1965. [49] ITU-T Recommendation P.59, Telephone transmission quality objective measuring apparatus: Artificial conversational speech. [50] Shyan S.Chakraborty, Erkki Yli-Juuti, and Markku Liinaharja, An ARQ Scheme with Packet Combining, IEEE Communications Letters, 1998 [51] E.Uhlemann., T.M. Aulin, L.K. Rasmussen and P-Arne Wiberg. Packet Combining and Doping in Concatenated Hybrid ARQ Schemes Using Iterative Decoding, Proc. of IEEE WCNC 2003 [52] Wenyu Jiang, Henning Schulzrinne, Comparison and Optimization of Packet Loss Repair Methods on VoIP Perceived Quality under Bursty Loss, NOSSDAV 2002 MRes Thesis –University of Plymouth 52
  • 59. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs APPENDICES [APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control /* Modifications In mac-802_11.h */ class Mac802_11 : public Mac { public: Mac802_11(PHY_MIB* p, MAC_MIB *m); static int retr; … } // TCL Hooks for the simulator static class Mac802_11Class : public TclClass { public: Mac802_11Class() : TclClass("Mac/802_11") {} TclObject* create(int, const char*const*) { return (new Mac802_11(&PMIB, &MMIB)); } virtual void bind(); virtual int method(int argc, const char*const* argv); } class_mac802_11; /* Modifications in mac-802_11.cc */ void Mac802_11Class::bind() { //Call to base class bind() must precede add_method() TclClass::bind(); add_method("retrNo"); } int Mac802_11Class::method(int ac, const char*const* av) { Tcl& tcl = Tcl::instance(); int argc = ac - 2; const char*const* argv = av + 2; if (argc == 2) { if (strcmp(argv[1], "retrNo") == 0) { tcl.resultf("%d", Mac802_11::retr); return (TCL_OK); } } else if (argc == 3) { if (strcmp(argv[1], "retrNo") == 0) { Mac802_11::retr= atoi(argv[2]); //set value of the static variable here return (TCL_OK); } } return TclClass::method(ac, av); } MRes Thesis –University of Plymouth 53
  • 60. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs //Retransmission Routines void Mac802_11::RetransmitDATA() { struct hdr_cmn *ch; struct hdr_mac802_11 *mh; u_int32_t *rcount, *thresh; assert(mhBackoff_.busy() == 0); assert(pktTx_); assert(pktRTS_ == 0); ch = HDR_CMN(pktTx_); mh = HDR_MAC802_11(pktTx_); /* * Broadcast packets don't get ACKed and therefore * are never retransmitted. */ if((u_int32_t)ETHER_ADDR(mh->dh_da) == MAC_BROADCAST) { //Packet::free(pktTx_); pktTx_ = 0; /* * Backoff at end of TX. */ //rst_cw(); //mhBackoff_.start(cw_, is_idle()); //return; // these lines are commented so ARQ mechanism can be //used for any topology } macmib_->ACKFailureCount++; if((u_int32_t) ch->size() <= macmib_->RTSThreshold) { rcount = &ssrc_; thresh = &macmib_->ShortRetryLimit; } else { rcount = &slrc_; //thresh = &macmib_->LongRetryLimit; // set the value of retransmission limit *thresh=Mac802_11::retr; printf("threshold=%dn",*thresh); } (*rcount)++; if(*rcount > *thresh) { macmib_->FailedCount++; /* tell the callback the send operation failed before discarding the packet */ hdr_cmn *ch = HDR_CMN(pktTx_); if (ch->xmit_failure_) { ch->size() -= ETHER_HDR_LEN11; ch->xmit_reason_ = XMIT_REASON_ACK; ch->xmit_failure_(pktTx_->copy(), ch->xmit_failure_data_); } MRes Thesis –University of Plymouth 54
  • 61. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs discard(pktTx_, DROP_MAC_RETRY_COUNT_EXCEEDED); pktTx_ = 0; printf("(%d)DATA discarded: count exceededn",sta_seqno_); *rcount = 0; rst_cw(); } else { struct hdr_mac802_11 *dh; dh = HDR_MAC802_11(pktTx_); dh->dh_fc.fc_retry = 1; sendRTS(ETHER_ADDR(mh->dh_da)); //printf("(%d)retxing data:%x..sendRTS..n",index_,pktTx_); inc_cw(); mhBackoff_.start(cw_, is_idle()); } } MRes Thesis –University of Plymouth 55
  • 62. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs [APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ # wireless2.tcl # simulation of a wired-cum-wireless scenario consisting of 2 wired nodes # connected to a wireless domain through a base-station node. #================================================================== # Define options #================================================================== set opt(chan) Channel/WirelessChannel ;# channel type set opt(prop) Propagation/TwoRayGround ;# radio-propagation model set opt(netif) Phy/WirelessPhy ;# network interface type set opt(mac) Mac/802_11 ;# MAC type set opt(ifq) Queue/DropTail/PriQueue ;# interface queue type set opt(ll) LL ;# link layer type set opt(ant) Antenna/OmniAntenna ;# antenna model set opt(ifqlen) 25000 ;# max packet in ifq set opt(nn) 1 ;# number of mobilenodes set opt(adhocRouting) DSDV ;# routing protocol set opt(x) 500 ;# x coordinate of topology set opt(y) 500 ;# y coordinate of topology set opt(seed) [lindex $argv 0] ;# seed for random number gen. set opt(stop) 20000 ;# time to stop simulation set opt(utp1-start) 2.0 set num_wired_nodes 2 set num_bs_nodes 1 # ================================================================ # check for boundary parameters and random seed if { $opt(x) == 0 || $opt(y) == 0 } { puts "No X-Y boundary values given for wireless topologyn" } if {$opt(seed) > 0} { puts "Seeding Random number generator with $opt(seed)n" ns-random $opt(seed) } # create simulator instance set ns_ [new Simulator] set erate [lindex $argv 1] puts "erate $erate n" proc UniformErr {} { global erate set em [new ErrorModel] $em set rate_ $erate $em unit pkt $em ranvar [new RandomVariable/Uniform] return $em } $ns_ node-config -IncomingErrProc UniformErr -OutgoingErrProc UniformErr MRes Thesis –University of Plymouth 56
  • 63. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs # set up for hierarchical routing $ns_ node-config -addressType hierarchical AddrParams set domain_num_ 2 ;# number of domains lappend cluster_num 2 1 ;# number of clusters in each domain AddrParams set cluster_num_ $cluster_num lappend eilastlevel 1 1 2 ;# number of nodes in each cluster AddrParams set nodes_num_ $eilastlevel ;# of each domain set tracefd [open wireless2.tr w] #set namtrace [open wireless2.nam w] $ns_ trace-all $tracefd #$ns_ namtrace-all-wireless $namtrace $opt(x) $opt(y) # Create topography object set topo [new Topography] #set mac80211 [new Mac/802_11] # define topology $topo load_flatgrid $opt(x) $opt(y) # create God create-god [expr $opt(nn) + $num_bs_nodes] #create wired nodes set temp {0.0.0 0.1.0} ;# hierarchical addresses for wired domain for {set i 0} {$i < $num_wired_nodes} {incr i} { set W($i) [$ns_ node [lindex $temp $i]] } # configure for base-station node $ns_ node-config -adhocRouting $opt(adhocRouting) -llType $opt(ll) -macType $opt(mac) -ifqType $opt(ifq) -ifqLen $opt(ifqlen) -antType $opt(ant) -propType $opt(prop) -phyType $opt(netif) -channelType $opt(chan) -macTrace OFF -wiredRouting ON -agentTrace ON -routerTrace OFF -topoInstance $topo #create base-station node set temp {1.0.0 1.0.1 1.0.2 1.0.3} ;# hier address to be used for wireless ;# domain set BS(0) [$ns_ node [lindex $temp 0]] $BS(0) random-motion 0 ;# disable random motion #provide some co-ord (fixed) to base station node $BS(0) set X_ 1.0 $BS(0) set Y_ 2.0 $BS(0) set Z_ 0.0 #configure for mobilenodes MRes Thesis –University of Plymouth 57
  • 64. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs $ns_ node-config -wiredRouting OFF for {set j 0} {$j < $opt(nn)} {incr j} { set node_($j) [ $ns_ node [lindex $temp [expr $j+1]] ] $node_($j) base-station [AddrParams addr2id [$BS(0) node-addr]] } #create links between wired and BS nodes $ns_ duplex-link $W(0) $W(1) 5Mb 2ms DropTail $ns_ duplex-link $W(1) $BS(0) 5Mb 2ms DropTail $ns_ duplex-link-op $W(0) $W(1) orient down $ns_ duplex-link-op $W(1) $BS(0) orient left-down # setup TCP connections set udp1 [new Agent/UDP] $udp1 set class_ 2 set null1 [new Agent/Null] set cbr1 [new Application/Traffic/CBR] $cbr1 set packetSize_ 32 $cbr1 set interval_ 0.020 $cbr1 attach-agent $udp1 $cbr1 set maxpkts_ 1 #per packet control $ns_ attach-agent $node_(0) $udp1 $ns_ attach-agent $BS(0) $null1 $ns_ connect $udp1 $null1 # Define initial node position in nam for {set i 0} {$i < $opt(nn)} {incr i} { # 20 defines the node size in nam, must adjust it according to your # scenario # The function must be called after mobility model is defined $ns_ initial_node_pos $node_($i) 5 } # begin to read in per packet information, i.e. Voiced or Unvoiced set pattern_file_name abmixed.vo set pattern_fid [open $pattern_file_name r] set cbrtime 0.0 set j -1 puts "Reading Speech Property Marking files.............." while {[eof $pattern_fid]==0} { incr j gets $pattern_fid current_line scan $current_line "%d" voice_flag set r($j) $voice_flag } set i 0 while {$i<=$j} { $ns_ at [expr $i*0.027] "Mac/802_11 retrNo 9;$cbr1 start" incr i } set opt(stop) [expr $i*0.02+10000] MRes Thesis –University of Plymouth 58
  • 65. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs $ns_ at $opt(stop) "$cbr1 stop" # Tell all nodes when the simulation ends for {set i } {$i < $opt(nn) } {incr i} { $ns_ at $opt(stop).00001 "$node_($i) reset"; } $ns_ at $opt(stop).00002 "$BS(0) reset"; $ns_ at $opt(stop).0002 "puts "NS EXITING..." ; $ns_ halt" $ns_ at $opt(stop).01 "stop" proc stop {} { global ns_ tracefd namtrace # $ns_ flush-trace close $tracefd close $namtrace #exec nam wireless2-out.nam & exit 0 } puts "Starting Simulation..." $ns_ run MRes Thesis –University of Plymouth 59
  • 66. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs [APPENDIX C] C Code for Majority-Logic Packet Combining Packet combining techniques are used in the decoding process in packet switched networks with ARQ protocol. The motivation to use packet combining is that a received packet always contains at least a small amount of useful information. This information can be used in conjunction with other received copies of the packet to obtain an estimate of the transmitted data that is more reliable than that obtainable from any single copy. There are two basic approaches to combine multiple received packets: code combining and diversity combining. Diversity combining differs from code combining in that multiple copies of a packet encoded at rate R are combined bit by bit to create a single codeword from the original rate R code. Each bit in the resulting packets make more reliable through the receipt of multiple copies of each bit. Despite it is not as powerful as code combining; diversity combining is much simpler to implement. Majority-logic diversity combining is the use of multiple copies of each transmitted bit in a voting scheme to obtain a single more reliable version of each bit. Majority-logic packet combining rule The majority-logic packet combining rule is the simplified majority-logic decoding rule [44]. Let J be the number of received copies of a packet. Let Bi , k , 0 ≤ k ≤ J be the set of bits with the same position i in packet copies of J. Let ⎢J ⎥ η be the number of bits with the value one in bits set Bi , k . Ifη ≥ ⎢ ⎥ + 1 , Bi in final ⎣2⎦ ⎢ J − 1⎥ combined packet is determined to have a value of one. If η ≤ ⎢ , Bi is ⎣ 2 ⎥ ⎦ determined to have a value of zero. It should be noted that if J is even, η may equal to J ⎢ J − 1⎥ ⎢J ⎥ , so ⎢ 2 ⎥ < η < ⎢ ⎥ + 1 . In this case, we can increase J to be odd through further 2 ⎣ ⎦ ⎣2⎦ retransmission or to take 50 percent of risk if there is no time for further retransmissions. MRes Thesis –University of Plymouth 60
  • 67. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs C code for majority-logic packet combining and producing bit errors in payload: #ifdef HAVE_CONFIG_H #include <config.h> #endif #include <stdio.h> #include <stdlib.h> enum RXFrameType { RX_SPEECH_GOOD = 0, RX_SPEECH_PROBABLY_DEGRADED, RX_SPARE, RX_SPEECH_BAD, RX_SID_FIRST, RX_SID_UPDATE, RX_SID_BAD, RX_NO_DATA, RX_N_FRAMETYPES /* number of frame types */ }; enum TXFrameType { TX_SPEECH = 0, TX_SID_FIRST, TX_SID_UPDATE, TX_NO_DATA, TX_N_FRAMETYPES /* number of frame types */ }; typedef short Word16; #define SERIAL_SIZE 1+244+4+1 int main(int argc, char *argv[]) { FILE *file_serial, *lossfile,*losspattern; Word16 serial[SERIAL_SIZE],serial_noisy[6][SERIAL_SIZE]; int frame,iCombine,i,j,iseed,iMajority,erase_flag; float rm,errate; char buf[50]; if(argc<6) {printf("Usage: crpacket amr_encodedfile loss_pattern_file output_lossfile Error_rate randomseedn"); exit(0);} if((file_serial=fopen(argv[1],"rb"))==NULL){ printf( "%s cannot be opened for readn",argv[1]); exit(0);} if( (lossfile=fopen(argv[3],"wb")) ==NULL){ printf( "%s cannot be opened for writen",argv[3] ); exit(0);} if( (losspattern=fopen(argv[2],"rb")) ==NULL){ printf( "%s cannot be opened for readn",argv[2] ); exit(0);} //iCombine=atoi(argv[3]); errate=atof(argv[4]); iseed=atoi(argv[5]); frame=0; srand48(iseed); while (fread (serial, sizeof(Word16), SERIAL_SIZE, file_serial) == SERIAL_SIZE) MRes Thesis –University of Plymouth 61
  • 68. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs { printf ("nframe=%d ", ++frame); fgets(buf,50,losspattern); sscanf(buf,"%d %d",&erase_flag,&iCombine); if(iCombine<0 || iCombine >6) iCombine=0; if(erase_flag==0) serial[0]=TX_NO_DATA; else if(iCombine!=0) { //Multi-logical packet combining for(j=0;j<iCombine;j++){ for(i=0;i<SERIAL_SIZE;i++) serial_noisy[j][i]=serial[i]; for(i=1;i<SERIAL_SIZE-5;i++) {rm=drand48(); if(rm<=errate) serial_noisy[j][i]=!serial_noisy[j][i]; } } //corrupt original packet for(i=1;i<SERIAL_SIZE-5;i++) {rm=drand48();//Benoulli random error if(rm<=errate) serial[i]=!serial[i]; } //Multi-logical packet combining for(i=1;i<SERIAL_SIZE-5;i++){ iMajority=1; for(j=0;j<iCombine;j++) if(serial[i]==serial_noisy[j][i]) iMajority++; if(iMajority<(iCombine/2+iCombine%2)) {serial[i]=!serial[i];printf("combined ");} } } if (fwrite (serial, sizeof (Word16), SERIAL_SIZE, lossfile) != SERIAL_SIZE) { fprintf(stderr, "nerror writing output file: %sn", argv[2]); }; } fflush(lossfile); fclose(file_serial); fclose(lossfile); fclose(losspattern); return EXIT_SUCCESS; } MRes Thesis –University of Plymouth 62
  • 69. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs [APPENDIX D] List of Items Included in the Appended CD The following items are included in the appended CD: Thesis The e-copy of the thesis (Word/PDF) Papers Papers published or going to be published (Word/PDF) References Papers/Documents referenced in the thesis Presentation Slides presented in the MRes Viva Software Developed programs for the project, including matlab/C ++ source codes and sripts. And related software tools (e.g. AMR codec and PESQ), data (e.g. ITU- T speech file ). MRes Thesis –University of Plymouth 63
  • 70. Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs [APPENDIX E] Published Papers [1] Z.Li, L.Sun, Z.Qiao and E.Ifeachor, Perceived Speech Quality Driven Retransmission Mechanism for Wireless VoIP, Proc. IEE 3G 2003 pp395-399, London, UK, JUN 2003 MRes Thesis –University of Plymouth 64
  • 71. PERCEIVED SPEECH QUALITY DRIVEN RETRANSMISSION MECHANISM FOR WIRELESS VoIP Z Li, L Sun, Z Qiao and E Ifeachor Department of Communication and Electronic Engineering University of Plymouth, Plymouth, U.K. Abstract—Effective link Layer retransmission variety of retransmission schemes. Improved mechanisms in wireless networks are important as they retransmission mechanisms such as Hybrid loss recovery can reduce packet loss due to bit errors. For wireless scheme [2] and Speech Property-Based ARQ (SPB-ARQ) voice over IP (VoIP) , a key question that needs to be [3] have been proposed to reduce speech distortions by addressed in order to provide the best possible perceived protecting packets that are perceptually more relevant. speech quality is how to utilize retransmission schemes to However, these schemes are only limited to listening-only recover corrupted packets whilst avoiding excessive quality assessment of the effect of the retransmission retransmission delays. The contributions of this paper are schemes on speech quality and do not consider the impact two fold. First, we use an objective measure of perceived of delay which is important for conversation and conversational speech quality (MOSc) as a metric to interactivity. Further, these schemes do not consider the evaluate the performance of three current retransmission impact of retransmission jitters. Since adaptive jitter schemes (i.e. No Retransmission, Speech Property-Based buffers would discard inappropriately retransmitted packets, Retransmission and Full Retransmission), while the character of retransmission jitters introduced by considering the impact of retransmission jitters. Our different retransmission schemes should be considered. findings indicate that the performance of the retransmission The primary aim of the study reported in the paper is mechanisms is a function of both wireless link quality and to investigate new retransmission mechanisms to improve delay introduced in the wireline network. Second, we speech quality for wireless VoIP. The contributions of the propose a new perceived speech quality driven paper are twofold. First, we propose the use of a perceived retransmission mechanism which may be used to achieve conversational speech quality assessment method [4] to optimum perceived speech quality for wireless VoIP (in evaluate the performance of current retransmission terms of the objective mean opinion score) by switching to mechanisms (No retransmission, Full retransmission, SPB the most suitable retransmission schemes under different retransmission) instead of listening-only method or communication conditions. individual network parameters (e.g. packet loss and delay). Second, we present a new retransmission policy, which can I.INTRODUCTION adapt to the most suitable retransmission mechanism, Quality of Service (QoS) support for voice over IP depending on the wireless link quality and network delay (VoIP) in wireless/mobile networks is an important issue conditions. The ultimate aim of this perceived speech for technical and commercial reasons. However, speech quality driven policy is to achieve optimum speech quality quality for VoIP suffers from high packet loss rates and (in terms of the conversational Mean Opinion Score MOSc) other impairments in the wireless link. Retransmission in the face of network impairment factors and wireless mechanisms, such as automatic repeat request (ARQ), have channel situations, while considering the coupling effect of been incorporated in wireless and cellular networks to retransmission jitters and adaptive jitter buffers. retransmit lost packets to improve performance in data The paper is organized as follows, In Section II we transmission over wireless. In wireless networks such as describe the basic issues and methodology, including 802.11b [1], the retransmission mechanism is a simple retransmission mechanisms, conversational speech quality Stop & Wait algorithm and is implemented at the Media evaluation and adaptive jitter buffers. Section III describes Access (MAC) layer, in which each transmitted packet our simulation system. Results of simulations and the must be acknowledged before the next packet can be sent. proposed perceived speech driven retransmission scheme is If in a certain timeout period an acknowledgement is not presented in Section IV. Section V concludes this paper. received by the sender of a frame, the sender will retransmit the frame until a maximal retransmission limit is II.BASIC ISSUES AND METHODOLOGY reached. When the wireless link quality is poor, A. Speech Property-based Retransmission Mechanisms retransmission of MAC frames can effectively recover Speech Property-Based QoS control schemes are corrupted packets that contain bit errors. based on the fact that some voice frames are perceptually However, excessive delays may be introduced by more important than others when encoded speech is retransmission schemes that have significant adverse transferred through packet networks. Recent experimental effects on real-time applications such as VoIP, which are results show [5], that in some popular codecs used in sensitive to delay. A simplex retransmission scheme wireless applications (e.g. AMR) the position of a frame always negatively affects perceived speech quality in VoIP. loss has a significant influence on the perceived speech There exists a tradeoff between packet loss and delay in a
  • 72. quality. In such codecs, frame loss concealment readily to changes in network delays and as a result are techniques are used to interpolate the parameters for the not practical in real VoIP applications. In our study, we loss frames from the parameters of the previous frames. investigated fast-exp, one of the classical adaptive jitter Lost voice frames at the beginning of a talkspurt will be buffer algorithms proposed in [7]. By using a smaller concealed using the decoding information of previous weighting factor as delays increase, the fast-exp algorithm unvoiced frames. However, because voiced sounds always can quickly adapt to the increases while avoiding have a higher energy than unvoiced sounds, concealment discarding of too many packets. It estimates the current of these frames with unvoiced frames that have lower ^ mean network delay (denoted as d i ) and current variance energy will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced transition stage, it is of network delay (denoted as v ) when a packet arrives. ^ i difficult for the decoder to correctly conceal the loss of The mean delay estimation equation is given by: voiced frames using the filter coefficients and the  ^ ^ excitation for an unvoiced sound, especially when burst  β d i −1 + (1 − β ) n i : n i > d i −1  loss occurs or the frame size grows.   ^ To maximise the perceptual quality at the receiving  a d i −1 + (1 − a ) n i : n i ≤ d i −1  end, perceptually important voice packets may be protected by giving them a high priory with the where ni is the network delay of the ith packet, β = 0.75 unimportant packets handled as 'best-effort' . For SPB and a = 0.99802. The following equation is used to retransmission, a retransmission scheme that protects only ^ ^ ^ the perceptual important speech frames, is presented in estimate v i : v i = a v + (1 − a ) d i − n . At the beginning i −1 i [2][3]. Experimental results reported in [2] show that SPB retransmission could provides a better speech quality of a talkspurt, adaptive jitter buffer changes the play out ^ ^ (assessed by EMBSD) than No retransmission scheme, delay using the equation: D = d i + µ * v i , where D is the play out delay and µ is a constant that can be which do not retransmit any packet. In [3], SPB retransmission was shown to be more efficient in reducing retransmission delays than Full retransmission, which selected from 1 to 20. We set µ to be 4 in our simulation. retransmits every unacknowledged (unACKed) packet. It should be noted that for VoIP over wireless, the network delay ni consists of delays introduced by the wireline B. MEASURING CONVERSATIONAL SPEECH network and the wireless link. Jitters can be introduced by QUALITY network congestions in the wireline network or by In previous studies [2][3], the assessment of retransmissions/propagations in the wireless links. In view retransmission schemes was performed using the EMBSD of the fact that most jitter buffer algorithms were proposed algorithm, which only considers the distortion caused by for compensation of network congestion jitters, it should be packet loss. However, in practice both packet loss and valuable to investigate the impact of retransmission jitters delay are crucial in voice conversation and long for VoIP over wireless retransmission delays (e.g. due to long network delay) would seriously impact speech quality . The E-model [6] is III. SIMULATION SYSTEM DESCRIPTION introduced by ITU as a non-intrusive quality assessment Our study is based on network simulator ns-2 [8], in method to obtain a measure of voice quality. which we simulated a last-hop wireless scenario. Both of Unfortunately, the E-model is only applicable to a limited the IEEE 802.11 and the Ethernet protocol stack are number of codecs which at present does not include the implemented in the simulator. A two way Bernoulli error AMR codec. In our simulation, we employed a technique model was inserted to simulate the wireless link that combines the PESQ and the E-model to evaluate the transmission errors. In 802.11, if the packet size exceeds performance of different retransmission schemes. In the the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) combined approach , the ITU PESQ is firstly used to the packet will be fragmented. Since we set the packet size quantify the impact of packet loss on speech quality. The to 71 bytes, a 12.2kbit rate AMR speech frame for one result of this is then converted to the equipment RTP packet the impact of fragmentation is avoided. impairment Ie. The average end-to-end delay effect, Id, is then calculated. The E-model is then used to obtain a The simulation system is given in Figure 1. In our measure of the speech quality, MOSc, based on Ie and Id simulation, the original speech file is first encoded by the (see Figure 1). Details of the implementation of the AMR codec and then analyzed to extract the speech combined method are given in [4] marking information (voiced/unvoiced) for each packet. The speech marking information is used with network C. Adaptive jitter buffer and Retransmission Jitters delay and wireless link quality to control the In VoIP applications, jitters are compensated for in retransmission policy. The error model determines whether the receiver by a jitter buffer. The size of a jitter buffer can a packet is corrupted or not according to be fixed or adjustable. Fixed jitter buffers cannot adapt
  • 73. Fixed Host Mobile Host Original AMR RTP Adaptive AMR Speech Encoder RTP Jitter Decoder UDP Buffer Speech UDP IP Network Marking IP Delay Ethernet Retx. MAC PER Degraded Limit Speech Control PHY BS PESQ End-to-end MOS/Ie Delay (Id) MOSc EModel Speech Quality Evaluation Figure 1 Simulation Environment packet error probability ( PER). The base station (BS) will most of the retransmitted packets can be successfully neither send an ACK to the sender for a corrupted packet received by the receiver. If the bursty of packet errors is nor present it to the high layer. If the MAC layer of the considered, there should be more losses of voiced packets sender has not received an acknowledgement for a packet, in Full Retx or SPB Retx scheme. it will retransmit the packet until the packet is ACKed or it reaches the limit of retransmission (we will denote TABLE.1- Average Voiced Packets Losses With fast-exp Retransmission as Retx in the rest of this paper). In our Jitter Buffer simulation, we set the Retx limit to 6 for both SPB Retx Retx Scheme No SPB Full and Full Retx. In the receiver, the received speech packets PER Retx Retx Retx are fed to an adaptive jitter buffer and subsequently 0.0001 15 53 29 decoded to recover the degraded speech file that is used to 0.0005 36 54 27 obtain a measure of speech quality. 0.0008 61 51 26 In our study, we used combined PESQ and E-Model 0.001 69 47 22 to evaluate the conversational speech quality as described 0.003 144 28 17 in Section II-B. Performance index was obtained 0.005 241 22 13 averaging the computation results that were obtained from 0.01 474 13 9 this method for each 20 seconds of the speech file. 0.05 2344 42 16 0.10 4678 931 159 IV. RESULT ANALYSIS AND THE PROPOSED RETRANSMISSION SCHEME It seems very straightforward that SPB Retx should be The following simulation results were obtained by better than No Retx and at least the same as Full Retx with averaging results of 50 simulations with different random regard to the performance of protecting voiced frames. seeds to avoid the impact of packet loss locations. The However, in TABLE.1, we can see that Full Retx always three simulated retransmission schemes are SPB Retx, Full has less voiced packets losses, while No Retx has the least Retx and Null Retx. lost voiced packets when link quality is good (packet error TABLE.1 gives the average number of voiced packets probability lower than 0.0005). In fact, as in fast-exp losses of transmitting 73000 speech packets in our algorithm, the estimated playout delay will increase with simulated wireless network with these schemes. For the number of retransmission jitters increases. When link simplicity, we only simulated the wireless link for the quality is good, the estimated play out delay keeps at a low purpose of this study. And only the wireless link (Retx level, occasionally retransmitted packets and packets limit exceeded) and the adaptive jitter buffer account for adjacent to them would be discarded by jitter buffer due to the packet losses. In Table.1, most of the losses of voiced jitters they introduced. However, in No Retx scheme, a packets in Full Retx or SPB Retx are caused by jitter buffer. corrupted packet doesn’t affect its following packets. As we deployed a Bernoulli error model in our simulation, That’s why it has least packet losses when link quality is very good. On the other hand, in SPB Retx, unvoiced
  • 74. 2 10 300 No Retx No Retx SPB Retx SPB Retx Full Retx Full Retx 250 1 10 Buffered Retx Delay (ms) 200 Loss Rate (%) 0 10 150 100 -1 10 50 -2 10 0 -4 -3 -2 -1 0 -4 -3 -2 -1 0 10 10 10 10 10 10 10 10 10 10 Packet Error Probability Packet Error Probability Figure 2 Overall packet loss rate comparison Figure 3 Buffered retx delay comparison 4.2 Perceived Quality Driven 4 4.1 No Retx SPB Retx 4 Full Retx 3.5 3.9 3.8 3 MOSc MOSc 3.7 3.6 2.5 3.5 2 Perceived Quality Driven 3.4 No Retx SPB Retx 3.3 Full Retx 1.5 3.2 -4 -3 -2 -1 0 100 120 140 160 180 200 220 240 260 280 300 10 10 10 10 10 Packet Error Probability Network Delay Figure 4 MOSc comparison with 175ms network Figure 5 MOSc comparison with packet error probability 0.001 packets are not retransmitted hence the estimated playout packet loss rate and Retx delay of SPB Retx is between No delay can’t reflect current wireless link situations when Retx and Full Retx. link quality becomes worse. While in Full Retx, every Using the evaluation method described in Section II-B, unACKed packets is retransmitted, this is helpful for the we give a more straightforward performance comparison in adaptive jitter buffer to estimate the playout delay for the Figure 4 and Figure 5 for these schemes with MOSc as the next talkspurt. That’s why the adaptive jitter buffer discard metric. Our evaluation didn’t consider the packet losses more packets in SPB Retx than in Full Retx. introduced in the wireline network hence to focus on the Figure 2 and Figure 3 give the overall packet loss performance of Retx schemes. However, we considered rates and buffered retransmission delay comparison. In network delay in the evaluation. For natural hearing, delays Figure 2, we can see that Full Retx keeps the packet loss lower than 100ms cannot really be appreciated, but delays rate at a low level at the expense of higher delay as plotted above 150ms can obviously affect conversation in Figure 3 because every unACKed packet is interactivity [8]. Considering Retx delays rarely exceed retransmitted. It’s very interesting that when link quality is 100ms, to obviously reflect the impact of Retx delay, we not too bad (packet error probability up to 0.01), packet assume 175ms delay had been introduced in the wireline loss rate of Full Retx scheme is decreasing while link network and add it to the end-to-end delay in the MOSc quality becoming worse. In fact, as we mentioned before, evaluation. In Figure 4, the MOSc of Full Retx is lower in worse link quality, more retransmissions helps the jitter than No Retx and SPB Retx when packet error probability buffer to estimate playout delay more accurately. However, is lower than 0.003. That’s because Full Retx scheme when link quality is very good (packet error probability up always introduces more Retx delay, while the perceived to 0.0005), No Retx can obtain the best packet loss rate speech quality is sensitive to high delay when link quality because it doesn’t introduce any jitter and few packets is is good. When packet error probability exceeds 0.003, Full corrupted due to bit errors. As a compromised method, the Retx scheme becomes the best, as it can greatly reduce the number of corrupted packets. Fig. 5 illustrates the
  • 75. performance comparison with different network delays VII. CONCLUSION when packet error probability is 0.001. In Fig. 5, we can A suitable retransmission scheme is crucial for see that when delay lower than 150ms, Full Retx can get obtaining the best possible perceived speech quality in the best MOSc. When delay is higher than 150ms Null wireless VoIP applications. In this paper, we investigated Retx becomes the best, it confirms that 150ms is the the performance of three different retransmission schemes threshold above which delay begins to have a severe (No Retx, SPB Retx, Full Retx) with regard to the impact on speech quality. Similar to Fig 4, the performance perceived conversational speech quality. The impact of of SPB is between No Retx and Full Retx, but it doesn’t retransmission jitters with an adaptive jitter buffer was also become the best in both sides of the delay threshold. considered. The simulation results show that the Considering both No Retx and Full Retx schemes can performance of these schemes depends on the network achieve the best MOSc under different link quality and delay and wireless link quality. Considering that the network delay situations. We propose a new perceived wireless environment is variable, we have proposed a speech quality driven retransmission scheme, which can perceived speech quality driven retransmission scheme that switch between these two schemes when link quality and can adapt to the wireless link quality and network delay network delay changes. The pseudo code of the new conditions. As the SPB Retx is not involved in the new scheme is shown in Figure 6. Low_Error_Threshold is set method, the implementation complexity for retrieving to be 0.0005 and High_Error_Threshold is 0.003. Since speech property information is avoided. Our results show according the simulation results, when packet error that the proposed method can achieve an optimum MOSc probability is lower than 0.0005, No Retx can achieve the compared to No Retx, Full Retx and SPB Retx. Since the best MOSc even delay is not considered, whereas Full Retx most suitable scheme is deployed by the new method when becomes the best when packet error probability exceed communication conditions changes. In the study, a 0.003, even network delay is very high. When packet error simplified last hop wireless network is implemented to probability is between 0.0005 and 0.003, the decision demonstrate wireless voice over IP scenario. Further should be made according to network delay. In the improvements may be achieved by making the simulation proposed scheme, Delay_Threshold is set to be 150ms as closer to real network, e.g. by incorporating a multi-state it’s the threshold that delay begin to obviously affect error model in the wireless link. speech quality. In real applications, we can convert Bit Reference: Error Rate (BER) to PER, and BER can be obtained [1] IEEE Standards Department, 1999, IEEE 802.11 according to bit errors in bit pattern series sent from BS. Standard for Wireless LAN, Medium Access Control Network delay can be estimated by deducting average MH (MAC) and Physical Layer (PHY) Specification. to BS handoff delay from average end-to-end delay that [2] C.Hoene, I.Carreras, A.Wolisz, 2001, Voice over IP: can be retrieved from RTP packet header. Improving the Quality Over Wireless LAN by Adopting a The performance of the new perceived speech driven Booster Mechanism – An Experiment Approach. Proc. scheme is also given in Figure 4 and Figure 5 under SPIE 2001 - Voice Over IP (VoIP) Technology, pp. 157- different network delay and packet error probability. We Denver, Colorado, USA can see that the curve of the perceived quality driven [3] H Sanneck, N Tuong L Le et al, 2001, Selective Packet scheme is overlapped with parts of No Retx and Full Retx Prioritization for Wireless Voice over IP, 4th Int Sym when they achieve best MOSc. As it can switch to the Wireless Personal Multimedia Communication, Denmark more suitable scheme between No Retx and Full Retx [4] L Sun, E.C.Ifeachor, 2003, Prediction of Perceived when communication conditions changes. Since this Conversational Speech Quality and Effects of Playout method only uses Full Retx when it’s necessary, it can also Buffer Algorithms, to appear in the Proc. of IEEE ICC achieve the similar retransmission efficiency as SPB Retx 2003 while avoid the implementation complexity to obtain [5] L.F.Sun, G.Wade, B.M.Lines and E.C.Ifeachor, 2001, speech property information that is necessary for SPB Retx. Impact of Packet Loss Location on Perceived Speech Quality ,Proceedings of 2nd IP-Telephony Workshop if (PER < Low_Error_Threshold) . (IPTEL '01), Columbia University, New York, pp.114-122. No_Retx(); [6] ITU-T G.107, The E-model, a computational model for else if (PER>High_Error_Threshold) use in transmission planning, May 2000 Full_Retx(); [7] R.Ramachandran, J.Kurose, D.Towsley and else { H.Schulzrinne, 1994, Adaptive playout mechanisms for if(Network_Delay<Delay_Threshold) packetized audio applications in wide-area networks, Proc. Full_Retx(); of IEEE Inforcom, vol.2, pp.680-688 else No_Retx(); [8] The Network Simulator - ns-2, available on line at } http://www.isi.edu/nsnam/ns/ Figure 6 Perceived speech quality driven Retx scheme pseudo code [9] ITU-T G.114, One-Way Transmission Time, Feb 1999

×