Lappeenranta University of Technology
Faculty of Technology Management
Department of Information Technology
ADAPTIVE OPTIM...
ii
ABSTRACT
Lappeenranta University of Technology
Faculty of Technology Management
Department of Information Technology
Tu...
iii
TIIVISTELMÄ
Lappeenrannan teknillinen yliopisto
Teknistaloudellinen tiedekunta
Tietotekniikan osasto
Tuomas Piirainen
...
1
CONTENTS
1 INTRODUCTION ...................................................................................................
2
7 CONCLUSIONS AND FUTURE WORK..............................................................52
REFERENCES ..................
3
SYMBOLS AND ABBREVIATIONS
3GPP 3rd
Generation Partnership Project
ACK Acknowledgement
AMR Adaptive Multi-Rate is a speec...
4
LAN Local Area Network
MAC Medium Access Control
MOS Mean Opinion Score
MTU Maximum Transmission Unit
NTP Network Time P...
5
1 INTRODUCTION
In recent years smart phones have been evolved into multimedia computers. CPU speeds,
bandwidth of wirele...
6
1.2 Scope of the work
In this thesis, application layer improvements are studied to make mobile VoIP
implementation more...
7
2 VOIP OVER IEEE 802.11 WLAN
In this chapter we present an overview of VoIP technology and IEEE 802.11 WLAN
standards an...
8
The main components of a VoIP system participating in media processing in order to
deliver end-to-end media are: acousti...
9
ITU-T G.711. G.711 is perhaps the most commonly used codec, because it provides good
speech quality (MOS 4.3) and low co...
10
Some properties of introduced codecs are represented in Table 1.
Table 1: Common VoIP codecs and their properties [9], ...
11
[11]. The easiest option for adjusting scheduling is during silent periods because then
adjustments are least noticeabl...
12
causing interference problems in Europe due to its 5 GHz frequency band. Interference
issues and stricter radio regulat...
13
Figure 2.3: 802.11x MAC Coordination Functions
A contention-free service can be provided only in infrastructure network...
14
processing capacity available for VoIP software. Also dedicated IP phones can be designed
to meet the performance requi...
15
3 BATTERY LIFE IMPLICATIONS OF VOIP
The most energy consuming components of mobile devices are usually memory, display,...
16
during idle periods. For example, in Symbian OS there is null a.k.a. idle thread that gets
control when there is not ot...
17
In general, software power optimization problems can be categorized as presented in Table
2 [13]:
Table 2: Categories o...
18
management aspect focuses on when to switch the radio between different operational
modes to maximize time spent in low...
19
3.2.1 Energy efficiency of IEEE 802.11 WLAN
In this section we discuss those characteristics of 802.11 standards which ...
20
The IEEE 802.11 standard defines a common platform for both power management
schemes and includes definition of two dif...
21
The pitfall of the legacy power save is that it introduces delays to communication (AP to
station) while the station is...
22
Upon receiving a frame with EOSP set to 1 the station is permitted to enter the doze mode.
If there are more data frame...
23
Application-level trigger frame generation is also problematic if downlink traffic is not
regular. In that case the AP ...
24
the more vulnerable transmission tends to be for bit errors. One of the reasons is the higher
intersymbol interference ...
25
overhead increases when each data fragment is encapsulated with MAC & PHY layer
headers. Link layer adaptation may, how...
26
chosen [27]. The authors showed that combined adaptation saves considerably more
energy than pure rate adaptation schem...
27
with high power, bigger packets when the channel is good again. Depending on the radio
bearer, the packetization period...
28
Telephony application domain specific optimization could be switching off backlight
and/or display immediately when a c...
29
4 ASSESSING VOIP CALL QUALITY
Adaptation strategies are often complex and include different trade-offs between desired
...
30
Figure 4.1: Parameters affecting VoIP call quality
End-to-end delay a.k.a. latency defines the time it takes a packet t...
31
delay when the user does not have the possibility to charge the battery for a while and has
important calls to make.
Sp...
32
Packet loss is mainly due to congestion and connection outages. Some network elements
may flush their buffers in conges...
33
quality is described with a Mean Opinion Score (MOS) value, which ranges from 1 (bad)
to 5 (excellent). Subjective MOS ...
34
is suitable for narrow-band speech quality evaluation and does not consider impairments
related to two-way interaction ...
35
4.2.1 The E-Model
The ITU-T E-Model is perhaps the most widely used parameter-based conversational
speech quality estim...
36
ID (delay impairment factor) represents all impairments due to the delay of voice signals
and is further composed of ta...
37
codecs, as well as listening and conversational quality estimation. It is productized by
Telchemy [51].
PsyVoIP takes i...
38
ED IIR  36.93 (4.2)
Refer to appendix 1 for detailed derivation of the formula.
Delay impairment factor ID includes ...
39
5 OPTIMIZATION SCHEMES FOR MOBILE VOIP
In this chapter the most promising application level optimization schemes for Vo...
40
delay is achieved by halving the round trip time. However, this is only a rough
approximation for end-to-end delay beca...
41
conversational call quality into account. So, the basic idea is to optimize packet size purely
for energy-saving and ma...
42
Application layer packet size adaptation is partially frustrated if packets are fragmented in
the network layer (IP-sta...
43
application layer does not have the possibility to affect the time when WLAN software
receives packets.
The proposed me...
44
One example of a codec mode adaptation technique for speech transmission over 802.11
wireless packet network is present...
45
Figure 5.2: Flow chart of adaptive coding rate algorithm
The advantage of this approach compared to the packet rate ada...
46
specification recommends that IP terminals should not set CMR and so transmission
adaptation should be available in man...
47
6 MEASUREMENTS AND EVALUATION OF THE
RESULTS
In this chapter, a test setup for energy-efficiency measurements is descri...
48
All tests were executed under good channel conditions with an AP dedicated to the test
without interfering traffic.
6.2...
49
In Figure 6.2 power consumption with 20 ms packet time and a maximum narrowband
AMR bit rate (12.2 kbps) is illustrated...
50
In Figure 6.3 power consumption with 40 ms packet time is presented. From the diagram
we can see that power consumption...
51
Finally, power consumption with 60 ms packet time was measured. Results are illustrated
in Figure 6.4. As expected, the...
52
7 CONCLUSIONS AND FUTURE WORK
The main objective of this thesis was to examine how a VoIP application could adjust its
...
53
For the quality model, the contribution of delay impairment factor ID to overall call quality
can be modelled universal...
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Upcoming SlideShare
Loading in...5
×

Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects

791

Published on

Master's Thesis

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
791
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects"

  1. 1. Lappeenranta University of Technology Faculty of Technology Management Department of Information Technology ADAPTIVE OPTIMIZATION SCHEMES FOR MOBILE VOIP APPLICATIONS BATTERY LIFE AND CALL QUALITY ASPECTS Examiners: Professor Jari Porras and D.Sc. Kari Heikkinen Instructor: M.Sc. Markus Kaikkonen Tuomas Piirainen Tervakukkatie 46 C 18 90580 Oulu Tel. +358 40 715 8418
  2. 2. ii ABSTRACT Lappeenranta University of Technology Faculty of Technology Management Department of Information Technology Tuomas Piirainen Adaptive optimization schemes for mobile VoIP applications Battery life and call quality aspects Thesis for the Degree of Master of Science in Technology 2008 61 pages, 12 figures, 3 tables and 1 appendix Examiners: Professor Jari Porras D.Sc. Kari Heikkinen Keywords: VoIP, QoS, 802.11, power management, battery life, adaptation, mobility In this thesis programmatic, application-layer means for better energy-efficiency in the VoIP application domain are studied. The work presented concentrates on optimizations which are suitable for VoIP-implementations utilizing SIP and IEEE 802.11 technologies. Energy-saving optimizations can have an impact on perceived call quality, and thus energy-saving means are studied together with those factors affecting perceived call quality. In this thesis a general view on a topic is given. Based on theory, adaptive optimization schemes for dynamic controlling of application’s operation are proposed. A runtime quality model, capable of being integrated into optimization schemes, is developed for VoIP call quality estimation. Based on proposed optimization schemes, some power consumption measurements are done to find out achievable advantages. Measurement results show that a reduction in power consumption is possible to achieve with the help of adaptive optimization schemes.
  3. 3. iii TIIVISTELMÄ Lappeenrannan teknillinen yliopisto Teknistaloudellinen tiedekunta Tietotekniikan osasto Tuomas Piirainen Mukautuvat optimointimallit langattomille VoIP -sovelluksille Näkökulmina akun kesto ja puhelun laatu Diplomityö 2008 61 sivua, 12 kuvaa, 3 taulukkoa ja 1 liite Tarkastajat: Professori Jari Porras TkT Kari Heikkinen Hakusanat: VoIP, QoS, 802.11, virran hallinta, akunkesto, mukautuvuus, liikkuvuus Keywords: VoIP, QoS, 802.11, power management, battery life, adaptation, mobility Tässä diplomityössä tutkitaan ohjelmallisia, sovelluskerroksen keinoja virrankulutuksen pienentämiseksi VoIP-sovellusalueella. Työssä keskitytään optimointimenetelmiin, jotka soveltuvat SIP ja IEEE 802.11 WLAN -teknologioita hyödyntäville toteutuksille. Virransäästöoptimoinnit voivat vaikuttaa käyttäjän kokemaan puhelun laatuun, joten virransäästökeinoja tutkitaan yhdessä puhelun laatuun liittyvien tekijöiden kanssa. Työssä muodostetaan teoreettinen kokonaiskuva aiheesta ja kehitetään mukautuvia optimointimalleja, joiden avulla toimintaa voidaan säätää dynaamisesti kulloisiinkin olosuhteisiin sopiviksi. VoIP-puhelun laadun arvioimista varten esitellään ajonaikaiseen käyttöön soveltuva laatumalli, joka on mahdollista integroida osaksi optimointimallien toimintaa. Esitettyjen optimointimallien pohjalta tehdään joitain virrankulutusmittauksia saavutettavissa olevien hyötyjen selvittämiseksi. Mittaustulokset osoittavat, että mukautuvien optimointimallien avulla on mahdollista saavuttaa virrankulutussäästöjä.
  4. 4. 1 CONTENTS 1 INTRODUCTION .....................................................................................................5 1.1 Objectives of the thesis ..................................................................................5 1.2 Scope of the work...........................................................................................6 1.3 Structure of the thesis.....................................................................................6 2 VOIP OVER IEEE 802.11 WLAN............................................................................7 2.1 Overview on Internet telephony.....................................................................7 2.2 Overview of 802.11 standards......................................................................11 2.3 Mobile VoIP special characteristics.............................................................13 3 BATTERY LIFE IMPLICATIONS OF VOIP........................................................15 3.1 Power-aware software development ............................................................15 3.2 Power management of the wireless interface...............................................17 3.2.1 Energy efficiency of IEEE 802.11 WLAN..............................................19 3.2.1.1 802.11 Legacy power save .................................................................20 3.2.1.2 802.11e U-APSD (Unscheduled Automatic Power Save Delivery)...21 3.2.2 Link adaptation strategies for energy-efficiency.....................................23 3.2.2.1 Transmission rate adaptation ..............................................................23 3.2.2.2 Adaptive packet size control...............................................................24 3.2.2.3 Transmission power control (TPC) ....................................................25 3.3 Power management of display .....................................................................27 4 ASSESSING VOIP CALL QUALITY....................................................................29 4.1 Factors affecting perceived call quality .......................................................29 4.2 Approaches for voice quality assessment ....................................................32 4.2.1 The E-Model............................................................................................35 4.2.2 P.VTQ......................................................................................................36 4.3 Quality model for VoIP call quality estimation ...........................................37 5 OPTIMIZATION SCHEMES FOR MOBILE VOIP..............................................39 5.1 Adaptive packet rate control at application layer.........................................39 5.2 Adaptive transmission time determination...................................................42 5.3 Throughput optimization by codec and codec mode adaptation..................43 6 MEASUREMENTS AND EVALUATION OF THE RESULTS...........................47 6.1 Measurement setup.......................................................................................47 6.2 Measurement results.....................................................................................48
  5. 5. 2 7 CONCLUSIONS AND FUTURE WORK..............................................................52 REFERENCES ....................................................................................................................54 APPENDICES
  6. 6. 3 SYMBOLS AND ABBREVIATIONS 3GPP 3rd Generation Partnership Project ACK Acknowledgement AMR Adaptive Multi-Rate is a speech data compression scheme. AMR-WB Adaptive Multi-Rate WideBand is a wideband version of narrowband AMR. ANN Artificial Neural Network AP Access Point CPU Central Processing Unit DCF Distributed Coordination Function DTIM Delivery Traffic Indicator Map DTX Discontinuous Transmission is a technique used by the speech codecs to lower bandwidth requirements. ETSI European Telecommunications Standards Institute FEC Forward Error Correction is an error control scheme for data transmission utilizing redundant data addition to the carried data packets. G.711 G.711 is an ITU-T standard for audio compression. G.729 G.729 is a voice data compression algorithm producing audio frames of 10 milliseconds. GPRS General Packet Radio Service GPS Global Positioning System GSM Global System for Mobile Communications is a digital mobile telephony system widely used. HCF Hybrid Coordination Function IEEE Institute of Electrical and Electronics Engineers is an international professional society promoting the development of electro technology. The IEEE fosters many industrial standards. iLBC Internet Low Bit rate Codec IP Internet Protocol ITU International Telecommunication Union ITU-T International Telecommunication Union – Telecommunications standardization sector
  7. 7. 4 LAN Local Area Network MAC Medium Access Control MOS Mean Opinion Score MTU Maximum Transmission Unit NTP Network Time Protocol is a protocol used to synchronize computer clock times in a network of computers. OS Operating System OSI Open Systems Interconnection PC Personal Computer PCM Pulse Code Modulation PLC Packet Loss Concealment PCF Point Coordination Function PESQ Perceptual Evaluation of Speech Quality PS Power Save QoS Quality of Service QoE Quality of Experience RF Radio Frequency RTCP Real-Time Transport Control Protocol RTP Real-Time Transport Protocol SDK Software Development Kit SDP Session Description Protocol SID Silence Indicator SIP Session Initiation Protocol TPC Transmit/Transmission Power Control TU Time Unit UI User Interface VAD Voice Activity Detection VoIP Voice over Internet Protocol VoWLAN Voice over Wireless Local Area Network WLAN Wireless Local Area Network
  8. 8. 5 1 INTRODUCTION In recent years smart phones have been evolved into multimedia computers. CPU speeds, bandwidth of wireless interfaces, and available memory have been increasing rapidly to meet functional requirements. On-board cameras are a standard accessory, and GPS capable phones are becoming common. All that feature richness has increased power consumption while battery technology has not been able to improve battery capacity at the same rate. For that reason, it is increasingly important to limit power consumption on different design levels. One approach is application-level optimization, which aims to achieve efficient resource usage on an application’s domain. In the mobile VoIP application domain, battery life has its effect on available talk time, which is one of the most important factors in VoIP quality of experience (QoE). Talk time should equal GSM cellular terminals talk time to offer usability and assure good QoE. However, concentration on energy efficiency alone is not enough. As we will show, software optimizations designed to enable better energy efficiency can have an impact on perceived call quality, which is another big factor of VoIP QoE. Therefore battery life optimizations must be studied together with call quality concerns in mind. 1.1 Objectives of the thesis The main objective of this thesis is to find means to reduce power consumption in the mobile VoIP application domain using software. Power saving techniques can have an impact on perceived speech quality, which presents us with a trade-off optimization problem to be solved. This work concentrates on those optimization schemes which are feasible to implement in practice taking into account constraints set by the used protocols and interoperability concerns. In this work a survey of previous studies and research on field of power management and adaptive VoIP is done, and based on the most promising techniques, optimization schemes are introduced. For call quality estimation, a run-time call quality assessment algorithm is proposed. Finally, some basic power consumption measurements are done to find out achievable advantages with proposed optimization schemes.
  9. 9. 6 1.2 Scope of the work In this thesis, application layer improvements are studied to make mobile VoIP implementation more energy efficient. Other means, such as protocol improvements, are out of the scope of this thesis though they may be mentioned in the work. Hardware issues are not a concern either. The work is mainly a theoretical analysis of what can be done and further studies are needed to adapt results into real-life VoIP implementations. The main domain for the work is SIP VoIP over 802.11 WLAN. However, optimization schemes should be generic enough to be usable with other signalling protocols. A Symbian OS based mobile terminal is used as a reference environment for the analysis and in the practical part of the work. 1.3 Structure of the thesis After this introduction, the thesis continues with chapter 2, which provides an overview on VoIP technology and wireless networks. In chapter 3, we dive into previous studies about battery life and energy management on various areas. In chapter 4, call quality assessment methods are discussed in general, and a quality model for call quality evaluation is proposed. Chapter 5 presents optimization schemes for adaptive VoIP application based on a literature survey. In chapter 6, some introductory measurements on battery consumption for future study are presented. Differences in power consumption with different test setups are discussed. Chapter 7 summarizes the results of the whole work and points out some topics for further research.
  10. 10. 7 2 VOIP OVER IEEE 802.11 WLAN In this chapter we present an overview of VoIP technology and IEEE 802.11 WLAN standards and discuss some characteristics of mobile VoIP. 2.1 Overview on Internet telephony Internet telephony allows for the provision of voice services across networks using internet protocols. Internet telephony consists of signalling and transmission protocols. Voice sessions are established, controlled and terminated with signalling protocols like SIP [1] or H.323 [2] and media are carried with transport protocols like RTP [3]. The principal components of a VoIP system covering end-to-end voice transmission are presented in Figure 2.1. First, the sender captures speech signal and compresses it by using some codec (encoding). Then compressed media frames are packetized and sent to the network. The receiver does depacketization and frames are passed to the codec decoder through a play out scheduler. Finally, frames are played out through the speaker. Figure 2.1: Block diagram of media processing Acoustic processing Encoder Input media frame Compressed media frames Packetize Network DepacketizePlayout schedulerDecoderAcoustic processing
  11. 11. 8 The main components of a VoIP system participating in media processing in order to deliver end-to-end media are: acoustic processing, speech codecs, transport and network layer protocols, IP network, and a playout scheduler. Acoustic processing represents a very important phase in an overall media path affecting voice quality. Acoustic processing maintains a pleasant level of input and output signals. It filters out background noise so that speech can be separated from the acoustic signal. Acoustic processing must also take care of echo cancellation; the speaker signal is easily fed back to the microphone if the hand-set isn’t used. In this thesis, acoustic processing isn’t considered in detail, however. The purpose of speech codecs is to compress digitalized speech in order to lower bandwidth requirements while maintaining acceptable speech quality. Characteristics of speech codecs include: coding rate (bit/s), frame rate (Hz), algorithmic delay (ms) introduced to the voice processing, complexity, and speech quality (MOS). Because human communication is periodical by nature, some codecs use discontinuous transmission (DTX) a.k.a. silence suppression as a technique to lower bandwidth usage. A codec can use a voice activity detection (VAD) algorithm to decide when to enter discontinuous transmission mode. During a DTX period, the speech codec generates silence description frames at a lower rate than speech frames. Silence description frames can be used to generate artificial background noise at the receiver end so that the user would not assume that call is dropped. If the codec does not support generation of silence description frames, the terminal does not need to send anything during a silent period. On the receiving side, speech decoders usually use a packet loss concealment (PLC) algorithm to compensate for missing frames. Compensation can be done by repeating the last received frame or by extrapolating. Speech codecs are designed with specific goals in mind, which makes them suitable for particular situations such as packet losses occurring in the transmission path. That provides an interesting optimization opportunity through codec change adaptation. In the following paragraphs the most common codecs used in the VoIP systems are introduced.
  12. 12. 9 ITU-T G.711. G.711 is perhaps the most commonly used codec, because it provides good speech quality (MOS 4.3) and low complexity due to its simple linear quantization. It does not introduce algorithmic delay because coding is performed sample by sample. It comes with a cost of relatively high bandwidth requirements taking in all 64 kbps. G.711 has two variants: a-law and µ-law. µ-law is used in Americas while the rest of world uses a-law. ITU has standardized a packet loss concealment (ITU G.711 Appendix I), which limits the effects of occasional packet losses. [4] ITU-T G.729. G.729 partitions speech into 10 ms frames, which corresponds to 80 bits. The encoder takes 16-bit linear PCM data sampled at 8 kHz as input data and produces 8 kbps coded data. The algorithmic delay of the coder is 15 ms (10 ms input frame and 5 ms look-ahead). The codec includes a packet loss concealment algorithm. G.729 has coding memory, which makes it vulnerable to frame errors and packet losses; an error occurring within a previous frame affects the processing of the next one. [5] Adaptive Multi-Rate. AMR was originally developed for GSM, but it is also a selected 3GPP mandatory codec. Bits of the AMR frame are ordered into three significance classes, which make it possible to decode frame partially even if some part is corrupted. AMR is capable of changing bit rates dynamically, which allows it to adapt to the capacity of transmission channel. Supported bit rates range from 4.75 kbps to 12.2 kbps. AMR has a built-in forward error correction mechanism based on redundant data sending, which can be used to dynamically protect flow against transmission errors. It utilizes receiver side PLC and sender side VAD algorithms to compensate for frame losses and lower bandwidth usage during silent periods. [6] Adaptive Multi-Rate Wideband (AMR-WB). AMR-WB provides better speech quality due to a wider speech bandwidth but works otherwise in the same manner as narrow-band AMR. Supported bit rates range from 6.60 kbps to 23.85 kbps. [7] iLBC. iLBC has been developed by Global IP Sound and is designed for narrowband speech. It is especially good at tolerating packet loss and thus suitable for use in lossy channels. Speech quality is at the same level with G.729, but in lossy channels iLBC outperforms G.729. [8], [9]
  13. 13. 10 Some properties of introduced codecs are represented in Table 1. Table 1: Common VoIP codecs and their properties [9], [10] Codec Coding rate (kbps) Speech quality (MOS) Frame size (ms) AMR 4.75-12.2 3.5-4.0 20 AMR-WB 6.6-23.85 Up to 4.5 20 G.711 64.0 4.5 - G.729A 8.0 4.0 10 iLBC 13.33, 15.2 4.0 30, 20 After VoIP application has captured speech frames, it concatenates one or multiple frames into one packet (packetization) according to the format used. When the outgoing packet traverses the protocol stack, every protocol layer adds its own headers to the packet. In an IP network every packet can traverse its own path to the sender. Packets can get lost due to congestion, transmission errors or due to connection outages (in wireless networks). A playout scheduler a.k.a. a de-jittering buffer makes temporal buffering of received media frames to compensate for variations in packet transmission times known as jitter. If a packet comes too late to be played out on time, it is usually dropped. Thus a frame loss metric seen by the VoIP application is the sum of real transmission loss and excessive delay introduced by a possibly congested network. Because jitter buffer introduces additional delay to the end-to-end transmission time, it must not keep frames in a buffer longer than necessary. Instead, frames coming too late should be dropped. Finding the optimal trade-off between end-to-end delay and dropped packet rate is one of the most critical tasks in the design of a VoIP application. The playout scheduling scheme can be either static or adaptive. With the static scheme packets are discarded if they exceed a fixed maximum transmission time. The adaptive playout buffer adjusts playout time dynamically based on the delay process of the network
  14. 14. 11 [11]. The easiest option for adjusting scheduling is during silent periods because then adjustments are least noticeable. 2.2 Overview of 802.11 standards 802.11 refers to a family of specifications developed by the Institute of Electrical and Electronics Engineers (IEEE) for wireless LANs. IEEE 802.11 is a member of the IEEE 802 family, which is a series of standards for local area networks. All 802 standards are focused on the two lowest layers of the OSI reference model incorporating both physical and data link layers. Thus 802 networks have both MAC and physical (PHY) layers. The MAC is a set of rules about how to access the medium and send data, whereas PHY is comprised of detailed rules for transmission and reception. The relation of IEEE 802.11 standards to the OSI model is illustrated in Figure 2.2. Figure 2.2: IEEE 802.11 standards and relation to OSI model The original 802.11 specification was ratified by IEEE as the first standard for wireless LANs in 1997. The standard included two PHY layer definitions for data rates of 1 Mbps and 2 Mbps at 2.4 GHz. The 802.11b standard was ratified in 1999, providing data rates up to 11 Mbps at 2.4 GHz. Other additions include 802.11a for 54 Mbps at 5 GHz and 802.11g up to 54 Mbps at 2.4 GHz. Besides PHY layer additions there have been enhancements also to the 802.11 MAC. For example, the 802.11a standard was originally developed for the United States only and was OSI Physical Layer OSI Data Link Layer 802.2 Logical Link Control (LLC) sublayer 802.11 MAC sublayer 802.11 FHSS 802.11 DSSS 802.11a OFDM 802.11b HR/DSSS 802.11g ERP
  15. 15. 12 causing interference problems in Europe due to its 5 GHz frequency band. Interference issues and stricter radio regulations were addressed with the 802.11h standard that defines mechanisms for spectrum and transmit power management to the 802.11 MAC and 802.11a PHY. The transmit power reporting mechanism introduced in 802.11h makes intelligent transmit power control feasible at the MAC layer. 802.11 networks can operate either in infrastructure or ad-hoc mode. In infrastructure mode there is at least one centralized entity called an access point (AP), which is connected to the wired network infrastructure, and a set of wireless end stations. All data transmissions between stations are required to go through the AP. In ad-hoc mode wireless stations are allowed to communicate with each other directly on a peer-to-peer basis. Access to the shared wireless medium is controlled by MAC coordination functions. The original 802.11 MAC specification defines the Distributed Coordination Function (DCF) for Ethernet like CSMA/CA access and Point Coordination Function (PCF) for contention- free service. DCF is responsible for asynchronous data services, while PCF offers time- bounded services. Original MAC coordination functions cannot provide good quality of service when high background traffic is present. IEEE has addressed this problem by standardizing the QoS enhanced MAC sublayer specification 802.11e [12]. In order to achieve QoS, 802.11e defines separate priority queues for different traffic categories. This way data packets belonging to a higher priority category (e.g. real-time VoIP) gain access to the medium with higher probability. 802.11e defines Hybrid Coordination Function (HCF), which includes QoS enhanced versions for DCF and PCF: Enhanced Distributed Channel Access (EDCA) and HCF Controlled Channel Access (HCCA). Relationships of different coordination functions are illustrated in Figure 2.3. DCF is a base on which other coordination functions are built.
  16. 16. 13 Figure 2.3: 802.11x MAC Coordination Functions A contention-free service can be provided only in infrastructure networks, because it requires an AP. Quality of service can be provided in any 802.11 network that has HCF support in stations. 2.3 Mobile VoIP special characteristics There are two fundamental, differentiating factors between wired and wireless VoIP: characteristics originating from wireless medium usage and consequences of mobility. These factors are interrelated. The greatest challenge for mobile VoIP is compensation of voice impairments due to the usage of wireless network. Wireless channels have a higher error rate compared to wired networks. The wireless network range may fluctuate and there may be shadow regions with no coverage. The available bandwidths are lower than in wired networks and they can vary more during the connection, which increases problems due to congestion. User motion sets its own challenges for a system design; the user can easily walk into a shadow region and get annoyed with poor call quality. On the whole, wireless network utilization sets much harder performance requirements for a mobile VoIP application compared to wired VoIP. Another challenge is the terminal’s processing performance. Wired VoIP clients include software clients running on PCs and dedicated, fixed IP phones. PCs have plenty of DCF PCFHCCA Contention-based access Contention-free access EDCA HCF
  17. 17. 14 processing capacity available for VoIP software. Also dedicated IP phones can be designed to meet the performance requirements of VoIP application. In both cases the terminal’s performance is not likely the limiting factor for the quality. Sufficiency of performance is not that obvious with wireless VoIP clients. Modern smart phones are becoming closer and closer to personal computers in functionality and processing power. This kind of feature richness comes, however, at the cost of reduced operation time due to increased processing capabilities and due to intention for a more general system design so that the requirements of various applications would be fulfilled. Unfortunately battery technology has advanced very slowly compared to the increase in processing capabilities. A VoIP application running in a smart phone is typically just another application among others without having any special privileges (except for process/thread priority tuning). Available processing capacity must be shared among various applications, which can be run simultaneously. For that reason, the VoIP application process and threads should be granted as high a priority as possible to ensure sufficient resources to handle the requirements of real-time traffic. In addition, the execution platform can make efficient implementation of VoIP application more difficult compared to the cellular application. Symbian OS, for example, has optimized its telephony architecture for cellular calls by having a dedicated processor for the purpose. Thus it may be a very challenging task for a VoIP application to achieve equal talk times and quality compared with the cellular equivalent inside the same device.
  18. 18. 15 3 BATTERY LIFE IMPLICATIONS OF VOIP The most energy consuming components of mobile devices are usually memory, display, wireless communications, and CPU [13]. Modern smart phones may also have a camera, GPS and hard drive, all of which consume remarkable amounts of energy. The biggest concern in the VoIP application domain is finding optimal utilization of the radio interface under different network conditions. In this work, optimal radio interface utilization refers to the operation, which keeps power consumption as small as possible while maintaining acceptable call quality. Based on the observations above, we identify more detailed factors which can affect battery life in mobile terminals when considering the VoIP application domain. First, we discuss power-aware software development issues and identify some general programming techniques for battery saving and some Symbian OS specific mechanisms. After that characteristics of 802.11 radio interface are discussed in detail. Lastly, alternatives for energy-efficient display utilization are explored. 3.1 Power-aware software development The operation system is mainly responsible for defining and implementing power management architecture for energy-efficient hardware component usage. However, some resources are controllable at the application or middleware layers either explicitly or implicitly. For example, the camera and display can be typically utilized explicitly at the application layer, and efficient CPU and memory utilization is a concern of whole system software. There is no single thing, which alone could solve energy consumption problems. When system resources are used in an efficient and economical way, the operating system has more possibilities to switch unused components to a low power state. For application controllable resources, application design must meet energy concerns. For example, an application should keep different radios powered only for the time it is using them. Software architecture implicitly affects energy consumption by orienting towards more or less CPU intensive solutions. Thinner architecture reduces the volume of instructions executed and accordingly the system has more possibilities to enter a low-power state
  19. 19. 16 during idle periods. For example, in Symbian OS there is null a.k.a. idle thread that gets control when there is not other thread ready to run. The null thread puts the CPU in a low- power mode where execution stops until a hardware interrupt is asserted. One software architecture level optimization possibility is the selection of the multi-tasking model used. Traditional systems merely use process-threading model for multi-tasking implementation, but Symbian OS also provides a mechanism called active object framework. With active objects multi-tasking can be performed inside the same thread, which reduces overhead due to context-switching between threads. Less overhead leads to reduced power consumption. In this manner, applications should prefer active object mechanism to threading whenever possible. Code level optimizations are usually better to leave to the compiler, but with appropriate design, better performance may be achievable. For example, timer usage must be considered carefully because the device must wake up at every timer interrupt. Thus too frequently activated or unnecessary ticking timers can shorten battery life. Presented below are some other design guidelines for achieving better power-efficiency:  Memory usage should be minimized not only due to the limited amount of memory in mobile terminals but also because moving less data around consumes less power. Also, some hardware architectures are able to power off unused memory blocks.  Applications should be event-driven so that an application can be put to sleep when there is no user interaction.  Idle timeouts should be used for devices like the camera to release them automatically when they are no longer in use. That way the device is not powered unnecessary if the user forgets to close the application.  In communication applications, unused media streams should be stopped instead of only muting them to avoid data being unnecessarily processed.
  20. 20. 17 In general, software power optimization problems can be categorized as presented in Table 2 [13]: Table 2: Categories of energy-related software problems [13] Category Description Transition When should a component switch from one power mode to another? Load-change How should the components’ functionality be modified to take advantage of low-power modes more often? Adaptation How can the software be modified to permit novel, power-saving uses of the components? Transition a.k.a. prediction strategies determines when to switch from one power mode to another. Transition strategy could define, for example, when to put the wireless interface to the power saving mode. The idle timeout mechanism described earlier is also a form of transition strategy. In this case, the prediction strategy is to assume that the longer device has been idling, the longer it will be idling. Methods striving for efficient load distribution are known as load-change strategies. It is not necessarily required that the load of a component should be reduced when reordering might be sufficient. For the hard disk, for example, it is better to make several disk requests one after the other and then spin the disk down. A poorer alternative would be to spin the disk up and down for each request. Adaptation strategies enable the component to be used in new power-saving ways. For example, quality of audio or video stream could be degraded to reduce processor load. 3.2 Power management of the wireless interface Two different approaches to energy savings for wireless communications are presently available. Research has focused on dynamic transmission power control (TPC) of radio interface and inserting power management logic into different layers of the OSI network protocol stack [14]. In the context of wireless communications, the dynamic power
  21. 21. 18 management aspect focuses on when to switch the radio between different operational modes to maximize time spent in low power states (transition strategy). In general, energy consumption of the wireless interface depends on its operation state. A sleeping interface is one, which has entered into a low power state and cannot receive or transmit data. When the interface is awake, it can be said to be logically in transmit, receive or idle/listening state. Most energy is consumed in the transmit state while in receive and idle states energy consumption is nearly equal, because the interface must sense the channel, even if idling. Different operational modes are illustrated in Figure 3.1. Figure 3.1: Operation modes of a wireless interface When sleep-wake transition strategy is utilized, one thing to consider is the energy transient that is consumed when the device switches between sleep and active modes. A wake-up time of, for example 2 ms would mean that a device can save power by entering sleep mode only if it needs to wake up for its next transmission or reception at a time considerably longer than 2 ms. Previous studies suggest that wireless communication generates most of its energy waste by 1) retransmitting packets after collision on the communication medium, 2) overhearing traffic intended for another node, 3) handling protocol control packets, and 4) listening for packets when there is no traffic on the network [15]. Reducing these four actions at any layer may result in better energy efficiency. transmit receive sleep idle
  22. 22. 19 3.2.1 Energy efficiency of IEEE 802.11 WLAN In this section we discuss those characteristics of 802.11 standards which can have an influence on battery life. The main interest is to analyze the applicability of power saving features of the original IEEE 802.11 standard [16] and its QoS enhancements defined in 802.11e [12] with VoIP traffic. A common feature for all 802.11 MAC access schemes is that detection of transmission errors is done by the use of positive acknowledgements (ACK); every successfully received unicast frame must be acknowledged. If a transmission error occurs, the frame is retransmitted until it is successfully received or the maximum number of retransmissions has been reached. Transmission errors can happen either due to mere bit errors or as a result of interference. Interference may happen when two or more stations (e.g. mobile devices) try to access the wireless medium simultaneously (collision). That may happen if transmitting stations are far enough from each other to not sense a busy medium before starting sending. This is known as a hidden node or hidden terminal problem [17]. Naturally retransmissions increase the time spent in transmit mode and thus increase energy consumption. The retransmission overhead can be lowered with efficient link utilization, which is an important optimization area. Link adaptation strategies are described later in chapter 3.2.2. 802.11 standards have specified two power saving techniques: the 802.11 legacy power save and 802.11e U-APSD power save scheme. Both mechanisms define rules how a station can enter the low power sleep state. Based on these rules, several power management schemes have been proposed to enter adaptively power saving doze state at appropriate moments. Using power save mode requires the presence of a central entity, which buffers packets destined to the sleeping stations. Thus the power save mechanisms of 802.11 standards are applicable only in infrastructure networks where access points can buffer packets. Another reason is that in ad-hoc networks, stations act as routers and it would be detrimental to the operation of the network if some stations were sleeping.
  23. 23. 20 The IEEE 802.11 standard defines a common platform for both power management schemes and includes definition of two different power states for each station as follows [16]: 1) Awake state. Station has fully powered WLAN interface and consumes energy for frame transmitting/receiving and channel sensing. 2) Doze state. WLAN interface consumes very low power and the station is unable to send or receive any frames. Additionally, the standard defines two power management modes. Active mode and power save (PS) mode. When the station is in active mode, it can send and receive frames at any time; it is in awake state. When operating in PS mode, the station is normally in doze state and enters into the awake state to transmit/receive frames. When the terminal wants to enter power save mode, it sends an 802.11 frame to indicate its current access point (AP) about that intent. After that, the access point buffers transmissions to the terminal. 3.2.1.1 802.11 Legacy power save When the terminal is in power saving mode, it switches the transceiver to a low power sleep state and wakes up periodically to receive beacon messages from the access point. Beacon messages indicate whether the access point has buffered frames for the terminal. If there are, the terminal asks access point to send the buffered frames. Access points broadcast beacons at regular intervals. That so-called “beacon interval” or “beacon period” is measured in time units (TU) of 1024 µs. The Delivery Traffic Indicator Map (DTIM) period specifies how often a terminal in power save mode should wake up to listen buffered multicast and broadcast messages from an AP. The DTIM period is measured in beacon intervals. A typical beacon interval length is 100 TUs and the DTIM period varies between 1 and 3. For example, if the DTIM period is 2, the terminal wakes up every second beacon interval to check for buffered frames. The terminal can fetch buffered frames one by one by sending so-called PS-poll frames. [16] In general, WLAN implementations do not support power save in ad hoc mode. In ad hoc mode, all nodes in the network act as routers and repeaters for the other nodes. If some nodes are sleeping, overall network performance would therefore not be acceptable.
  24. 24. 21 The pitfall of the legacy power save is that it introduces delays to communication (AP to station) while the station is operating in PS mode. If the beacon interval is set at the typical 100 TUs, the station will receive downlink frames in a burst once per 100 milliseconds. This addition to the end-to-end delay may degrade service quality noticeably with VoIP calls, where end-to-end delay should not exceed 250 ms [18]. In addition, the bursty reception of downlink frames sets high requirements for jitter buffer implementation so that the rate of dropped packets would not increase significantly. There are also other quality of service issues with legacy power save. The standard mandates that the PS-poll frame must use best effort priority instead of the higher voice priority because the station may not know the priority of buffered frames at the AP. Thus downlink frames are granted only best effort priority, which may degrade service quality if there is both data and voice traffic in the network. [19] Due to the above mentioned problems legacy power save utilization is not recommended for VoIP applications. However, if battery life is to be optimized at the expense of interaction loss, VoIP application should utilize a high quality adaptive jitter buffer to prevent dropped packets from occurring due to bursts in downlink traffic. 3.2.1.2 802.11e U-APSD (Unscheduled Automatic Power Save Delivery) Unscheduled automatic power save delivery is part of the 802.11e WLAN QoS standard. U-APSD improves QoS provided to stations, which use the EDCA mechanism for channel access. The basic idea of U-APSD is to use a specific period called an unscheduled service period (U-SP) for downlink frame delivery to the station. The station is expected to be awake during U-SP. The station starts U-SP by sending either data or a null-data frame to the uplink. Frame used to initialize the U-SP is called a trigger frame. When the AP receives a trigger frame, it knows that station is awake and can start the downlink frame delivery. The U-SP ends when the station receives data or a null-data frame with EOSP (end of service period) field set to 1 in QoS control field. The maximum length of the U-SP is defined by MAX SP (maximum service period) length, which indicates the maximum number of frames the AP can deliver during a U-SP. [12]
  25. 25. 22 Upon receiving a frame with EOSP set to 1 the station is permitted to enter the doze mode. If there are more data frames destined to the station, the AP informs the station by setting more data field in the MAC header to 1. When the station receives a frame with EOSP and more data fields set, it may decide to start a new U-SP to get the remaining frames immediately. If the station does not have data frames to send, the station can send a QoS null frame to request the delivery of buffered frames. [12] This enables usage of U-APSD with applications, which does not generate regular uplink traffic to meet QoS requirements for the application. In this case the U-APSD operation is essentially the same as a legacy power save operation (polling approach). U-APSD has three main advantages with respect to 802.11 legacy power save mode [20]: 1) Because triggers can be generated at any moment, the maximum delay for downlink frame delivery can be bound based on application QoS requirements instead of depending on the beacon listen interval configuration of the 802.11 power save. 2) The overhead required to retrieve frames from the AP is smaller, because the sent data frames trigger delivery of downlink frames. Thus not so many null-data frames are needed and a significant amount of PS-poll overhead required by the legacy power save is reduced. 3) A U-APSD capable AP can deliver up to a station the configured maximum service period length number of frames per trigger compared to legacy power save, which requires a PS-poll for obtaining every buffered frame. From the VoIP point of view, the main advantage to the legacy PS is that the delivery of downlink packets can be triggered at the appropriate moment while being in power-save mode. That way interaction loss due to increased delay in power save mode can be bound. This requires that the VoIP application is able to configure QoS requirements to the MAC- layer software, which can then assure that delay requirements are met by sending trigger frames at appropriate moments. Application-level generation of trigger frames would have a much higher overhead due to the addition of higher level protocol headers and would produce end-to-end traffic increasing further network load. There would also be a risk that an inappropriately selected trigger frame format poses interoperability problems when processed at the receiver side.
  26. 26. 23 Application-level trigger frame generation is also problematic if downlink traffic is not regular. In that case the AP may not have any buffered frames when trigger frames are sent. The terminal needs to wake up at beacon interval anyway, and if blind polling is also used, energy efficiency would decrease due to unsynchronized sleep-wake transitions. 3.2.2 Link adaptation strategies for energy-efficiency Link adaptation refers to those automatic operations the mobile station is doing to adapt its transmission parameters to the current conditions of the wireless channel. Typically an adaptation criterion is maximum throughput, which is also beneficial from an energy- saving point of view: good throughput is achieved by minimizing retransmissions and protocol overhead. Basic parameters to adapt according to the channel condition are PHY data rate, packet size, and transmission power. The basic 802.11 standard does not define any procedures for rate switching or transmission power control. Only edge conditions for the operation are defined. For example, standard defines 4 and 8 power levels for DSSS and FHSS respectively. These power levels are implementation-dependent. Standard also defines Received Signal Strength Indicator (RSSI), which is used by some adaptation algorithms. [16] Because standards are open on adaptation algorithms, numerous studies on the effects of rate adaptation and transmission power control have been proposed. Algorithms measure signal quality either directly by the signal-to-noise-ratio (SNR) or indirectly by observing how many frames must be retransmitted. Some proposed algorithms rely on feedback from the receiver. These proposals mostly do not conform to the 802.11 MAC standard as such, because basic 802.11 standards do not define any protocol means for the receiver to give feedback. There is, however, emerging 802.11h standard that gives feasible mechanisms for feedback-based link adaptation when it comes to transmission power control [21]. 3.2.2.1 Transmission rate adaptation Data rate adaptation can save power by minimizing time spent in the transmitting state. With good channel conditions time spent in transmit state per one frame is directly proportional to the rate; the highest data rate is preferable. But the higher the data rate is
  27. 27. 24 the more vulnerable transmission tends to be for bit errors. One of the reasons is the higher intersymbol interference at higher rates. The simplest approaches use a single parameter for link quality estimation. For example Chevillat et al. has proposed an algorithm for link adaptation [22], which relies solely on the 802.11 error recovery mechanism and keeps track of how many retransmissions are needed. This proposal has a drawback in that it neglects hidden station problem: the receiver may fail to receive frames due to interference with the transmitting station at the range of the receiver. In this situation, retransmission count does not indicate the quality of local link. In [23] the authors propose a link adaptation strategy for improving the system throughput by adapting the transmission rate to the current link condition. The proposed algorithm uses Received Signal Strength (RSS) along with the number of retransmissions. The idea is that when the receiver uses a fixed transmission power, RSS should be indicative of the changes in path loss and channel behaviour. With that restriction, the method is applicable for infrastructure and ad-hoc networks and all physical layer specifications. With RSS measurements the algorithm is able to react to moving stations better than simpler methods based purely on retransmission counts. 3.2.2.2 Adaptive packet size control Adaptive packet size control mechanisms rely on the observation that when radio channel conditions get weaker, the probability for bit errors in transmission grows. So when packet sizes get bigger, the probability for the individual packet to get lost due to transmission error increases. With WLAN that means that also MAC-layer retransmissions happen more often, which has a detrimental effect on the energy efficiency of mobile device. Lettieri et al. have studied MAC-layer packet size control mechanisms in [24] and [25]. WLAN MAC defines fragmentation and reassembly mechanisms which makes this kind of link adaptation possible to implement at the MAC level by adapting MTU size dynamically. Link layer fragmentation has an advantage in that only the lost fragment must be retransmitted in contrast to network layer fragmentation where the entire packet must be resent end-to-end if any of the fragments are lost. The drawback is that the relative
  28. 28. 25 overhead increases when each data fragment is encapsulated with MAC & PHY layer headers. Link layer adaptation may, however, boost speed over a single hop when the wireless medium is noisy. Packet size adaptation logic can be inserted into the application layer also, but with more restrictions. The VoIP media engine only has control over how many audio frames are encapsulated within one RTP packet, which makes adaptation resolution weaker than what is achievable with the link layer approach. With sample-based codecs such as G.711 one can in theory have a resolution of one sample, but this would most likely cause interoperability problems. Also the protocol overhead would be too big in comparison to the transmitted payload. The application layer has the most information regarding the overall situation such as end- to-end packet loss due to e.g. congestion in network, which suggests that overall performance should be optimized at the application layer. Thus a promising optimization option for energy efficient VoIP implementation is to adapt packet size according to the application specific performance goals. 3.2.2.3 Transmission power control (TPC) Controlling transmission power is a promising approach because the amplifier is a significant source of power drain in any mobile device. In addition to saving energy, TPC can have additional benefits such as an increase in network capacity due to lower occupied ranges. The drawback is that even if the transmitting terminal gains access to the medium more often with more efficient spectrum usage, TPC is shown to aggravate the hidden node problem under contention-based MAC-access schemes like DCF [26]. TPC is more direct energy-saving approach than indirect packet size and transmission rate adaptation methods. Authors have typically suggested joined TPC mechanisms where transmit power is adjusted according to the packet size, data rate or channel state. The easiest network configuration for applying TPC is an infrastructure network with PCF access scheme, because then the hidden node problem is not a concern. Qiao et al. have proposed a method where both transmit power and transmission rates are adaptively
  29. 29. 26 chosen [27]. The authors showed that combined adaptation saves considerably more energy than pure rate adaptation schemes. That approach requires a feedback mechanism for path loss estimation and thus the method is not suitable to implement in practice until the 802.11h standard has emerged. In [28] the authors propose fully 802.11 standard-compliant methods to jointly optimize transmission power and data rate. The authors introduce algorithms for high-performance and low-power operation modes. In high-performance mode, the optimization goal is to transmit at the highest possible transmit rate so that system throughput is maximized. The transmit power is of secondary importance. In the low-power mode, the optimization strategy is to send with the lowest possible power, and the data rate is then adjusted accordingly. It is shown that in the most cases the best choice is to send data frames with the highest possible rate with rather high power levels. One possible drawback of the methods defined in [28] is that results are based on the fact that ACK frames are sent with maximum power. This may slightly reduce the applicability of the method: in heterogeneous ad-hoc networks all stations do not necessary behave in the same way. From the results it can be seen that improper step size selection for power level adaptation algorithms can dramatically reduce throughput the reason being that a large decrease in transmit power results in additional transmission errors. For that reason the algorithm may not be applicable with all PHY specifications, because e.g. FHSS only allows 4 transmit power levels, which results in quite large power level step sizes. One way to minimize power consumption is to adapt transmission power according to the packet size or vice versa. Simulation results of Ebert et al. suggest that small packets should be sent with a low RF power while bigger packets should be sent with higher RF transmit power [29]. Small packets are less vulnerable to packet corruption which decreases MAC-level retransmissions and makes energy-efficiency better. Transmission power can be also adapted according to channel state information on short time scales. Examples for this approach include [30] and [31]. If energy-efficiency is to be optimized, the basic approach is to not waste energy to overcome bad channel conditions. Thus terminal should send short low power packets when the channel is noisy and continue
  30. 30. 27 with high power, bigger packets when the channel is good again. Depending on the radio bearer, the packetization period has an effect on the overall energy-efficiency. Since e.g. WLAN continues to send MAC-level retransmission despite channel conditions, overall energy-efficiency could be optimized by lowering the packetization period at application layer when bad channel conditions are detected. 3.3 Power management of display There is not much to do programmatically for the energy consumption of displays. In practice, every mobile phone already takes advantage of dimming or switching off the backlight when the user does not interact with the UI. Even more energy savings can be achieved by completely switching off the display. Power consumption of the display can also be reduced slightly by switching to monochrome mode or by lowering the refresh rate [13]. An unlighted display shows colours poorly anyway so switching to monochrome mode can save power without degrading the user experience. An application should redraw display as infrequently as possible and redraw only that area of the screen that needs updating. Also, updating areas outside the screen should be avoided if the rendered area may be larger than screen. When an application is moved to the background, redrawing and other operations should be paused if relevant only when in the foreground. One approach for energy efficient display usage is zoned backlighting as proposed by Flinn and Satyanarayanan [32]. The authors define zoned backlighting as a display feature that allows independent control of the illumination level for different regions of the screen under software control. The idea is most usable in multi-windowing environments where the application window in focus could be highlighted by dimming other windows. Typically the applications of current mobile terminals cover the entire display, because display sizes are quite small. However, partial displaying idea is used in the S60 SDK screen saver framework, which allows defining of the active screen area where the screen saver plug-in is going to draw during the next refresh period.
  31. 31. 28 Telephony application domain specific optimization could be switching off backlight and/or display immediately when a conversation starts and the phone is held against the ear without waiting for a default user inactivity period.
  32. 32. 29 4 ASSESSING VOIP CALL QUALITY Adaptation strategies are often complex and include different trade-offs between desired qualities. Tuning one parameter may lead to a performance increase in one particular area but may have undesired impacts on the overall call quality. For example, decreasing the dropped packet rate by increasing jitter buffer length improves listening-only speech quality but may decrease conversational call quality due to increased mouth-to-ear delay. For that reason, effects on the overall call quality should be taken into account when making optimizations to the VoIP transmission path. In this chapter we first study factors affecting call quality as perceived by the user and then make a survey on relevant state-of-the-art methodologies for call quality assessment. Finally, based on the survey, a quality model suitable for a real-time VoIP call quality estimation is proposed. 4.1 Factors affecting perceived call quality In this section factors affecting perceived voice quality are shortly described. Most impairment to voice quality is introduced by the packet switched network and wireless interface with time-varying capacity, packet loss rate, and delay. Of course, the quality of the terminals acoustic processing has an impact on voice quality, but that is not considered in this thesis. There are two primary qualities, which affect perceived VoIP call quality: end-to-end delay and speech quality. Primary qualities are further affected by several other parameters, which depend on used VoIP configuration and network characteristics. Affecting parameters are illustrated in Figure 4.1.
  33. 33. 30 Figure 4.1: Parameters affecting VoIP call quality End-to-end delay a.k.a. latency defines the time it takes a packet to get across the packet- switched network to its destination. A long delay makes conversation more like a radiophone discussion, because feedback from the other party doesn’t come speedily enough. Interaction loss starts to happen gradually, when end-to-end delay exceeds 250 ms [18]. Longer delay may also cause an echo effect where the speaker hears his own voice. Latency is a sum of many factors. Algorithmic delay of used codec, the packetization period of frames, serialization of frames, network delay, and frame buffering at the receiver all add some delay. Also, forward error correction (FEC) schemes (like e.g. RFC 2198 [33]) usually add delay, because the receiver must wait for error correction data before playing data out. In addition to technical latency measure, some psychological factors can have an impact on how good the user experience is. People may be ready to bargain about latencies in certain situations. People are, for example, happy with free VoIP calls even if delays may be longer than in cellular calls. Longer talk times are also usually better than short end-to-end VoIP Call Quality End-to-end delay Speech quality - network delay - forward error correction - codec & coding mode - playout buffer size - packetization rate - codec & coding mode - silence suppression - acoustic processing Dropped Packet Rate - packet loss rate - forward error correction - playout buffer size - protocol performance
  34. 34. 31 delay when the user does not have the possibility to charge the battery for a while and has important calls to make. Speech quality. Codecs have a great impact on perceived voice quality. In addition to the absolute coding quality, the ability to tolerate network introduced impairments differs among codecs. Codecs have been typically designed to perform well in certain environments. For example, AMR can adapt its bit rate according to current network congestion, whereas the iLBC codec has been designed to perform well in a lossy channel [6], [8]. Silence suppression a.k.a. DTX may affect speech quality indirectly by lowering used bandwidth and thus network load, which helps with impairments due to network congestion. Silence suppression can also degrade call quality if total silence is not compensated at the receiver. Typically VoIP capable devices generate artificial, low volume noise during silent periods so that the user does not start to believe that the call is dropped. That mechanism is known as comfort noise generation. With RTP, silence description parameters for comfort noise generation at the receiver can be sent as specified in RFC 3389 [34]. The acoustic processing capabilities of the device have an impact on speech quality. The device must keep loudness at a pleasant level in various conditions (gain control). Background noise should be eliminated from the input signal before the encoding process to achieve a better encoding result. The device may also receive a delayed version of the local speaker’s voice, which must be addressed by some kind of echo cancellation mechanism. Dropped packet rate. Defines the frequency with which a packet doesn’t get played in its destination in available time. The term includes lost packets, packets dropped due to bit errors and packets which come too late to be played out. PLC algorithms can compensate for losses of distinct frames, but consecutive losses may cause the user to suffer from annoying speech-clipping.
  35. 35. 32 Packet loss is mainly due to congestion and connection outages. Some network elements may flush their buffers in congestion situations. For the same reasons, packets may come too late for playout. A dropped packet rate can be reduced by proper adjustment of the playout buffer and using some forward error correction scheme. Although FEC schemes can help in packet loss and bit error situations, they may increase congestion and thus packet loss, because redundancy sending consumes more bandwidth. The playout buffer is a primary means for jitter and packet reordering compensation. Jitter can be defined as a variance in a packet arrival times. Jitter distribution depends on both the network path packets are traversing and competing traffic sharing the same path. Competing traffic is the primary cause of jitter, resulting in varying queuing delays at the intermediary routers. Packets may even arrive reordered at the receiver due to the different paths packets may traverse. Finding optimal playout buffer size is a challenging task, because a longer buffer increases mouth-to-ear delay, which makes conversational call quality worse. A playout buffer should be adaptively adjusted according to the present network state. Protocol performance can have an impact on dropped packet rate, especially when a required bandwidth is considered. Each protocol layer adds its own headers around the voice payload to be delivered. This introduces a big overhead and can produce delays and congestion in the network. This is a big issue especially with a low bandwidth radio bearer such as GPRS, because the data rate remaining for payload is significantly lower. Header compression, like e.g. Robust Header Compression described in [35], saves energy consumed in communication but overall energy consumption may not decrease as much as expected due to higher computational cost (computation – communication trade-off). 4.2 Approaches for voice quality assessment In this section different ways to assess voice quality are categorized and methods suitable for real-time VoIP call quality estimation are highlighted. Two approaches for voice quality assessment presently exist: subjective tests and objective measurements a.k.a. instrumental evaluation. Relating to subjective tests ITU-T has standardized the process for human quality assessment [36]. According to the process,
  36. 36. 33 quality is described with a Mean Opinion Score (MOS) value, which ranges from 1 (bad) to 5 (excellent). Subjective MOS grading has been found to be the most reliable technique of speech quality evaluation so far. The drawbacks are that subjective tests require much work, money, and time. Repeatability is also difficult due to the natural variance in human grading. In addition, they cannot be used for long term or large scale voice quality monitoring. Instrumental methods predict human ratings by using mathematical quality models. Two distinct approaches for objective evaluation exist: intrusive and non-intrusive based on whether a reference speech is needed or not. Objective evaluation methods are calibrated by data from subjective tests. The classification of speech quality assessment methods is illustrated in Figure 4.2. Figure 4.2: Classification of speech quality assessment methods [37] Intrusive methods compare original speech fragments from the encoder with the degraded versions of fragments at the other end of the voice path. Speech quality estimation is based on the measured amount of distortion. Intrusive schemes are more reliable and better suited for the measurements of quality as perceived by end users, because they use original reference signal. For the same reason they are unsuitable for monitoring real time traffic. One popular example of intrusive methods is the Perceptual Evaluation of Speech Quality (PESQ) [38], which was standardized as an ITU-T P.862 recommendation in 2001. PESQ calibration Speech Quality Assessment Subjective methods Objective methods Non-intrusive methods Intrusive methods Parameter- based methods Signal-based methods Comparison- based methods
  37. 37. 34 is suitable for narrow-band speech quality evaluation and does not consider impairments related to two-way interaction like variable delay and listener echo. Non-intrusive methods assess the quality of distorted speech in the absence of the reference signal and are better suited to the supervision of network QoS. ITU-T has standardized recommendation P.563 for non-intrusive evaluation of speech quality in narrow-band telephone applications [39]. There are two categories of non-intrusive methods: signal-based and parameter-based. Signal-based methods estimate speech quality by directly analyzing degraded speech signals. One example of signal-based methods is ANIQUE+ proposed by Kim & Taraff [40]. Most signal-based schemes are listening-only models and do not consider factors affecting conversational quality such as mouth-to-ear delay. For that reason they are not suitable for assessing conversational quality of real-time voice. Parameter-based methods use network and/or speech related parameters like delay and coding mode for speech quality estimation. They try to establish the relationship between perceived voice quality and network/non-network related parameters. Typical methods utilizing parametric approach are ITU-T recommendation G.107 (E-Model [41]) and artificial neural network (ANN) models. One example of an ANN model has been proposed by Mohamed et al. [42]. The suggested model does not consider transmission delay though and is thus unsuitable for conversational call quality estimation. Another real-time quality model also considering conversational quality is proposed by Sun and Ifeachor [43]. The authors combine the PESQ and E-Model and describe a set of linear equations that approximate perceived voice quality. The method is suitable for adaptive QoS control for VoIP applications. Also Hoene combines PESQ and E-Model in his quality model for adaptive VoIP applications but also takes into account playout scheduling schemes [10]. Parameter-based approaches are the most suitable methods found so far for real-time VoIP call quality estimation and therefore a more detailed description of some of them is done in the following chapters.
  38. 38. 35 4.2.1 The E-Model The ITU-T E-Model is perhaps the most widely used parameter-based conversational speech quality estimation method. With E-Model, conversational MOS scores can be estimated from IP network and/or terminal parameters. The E-Model was originally meant to be a transmission planning tool for telecommunication systems [41]. Notwithstanding, various methods for voice quality prediction rely on E-model nowadays [44], [45], [46], [47]. The fundamental principle of the E-Model is that psychological factors on the psychological scale are additive [41]. E-Model applies this concept and combines individual impairments due to both speech and network characteristics into a single measure of conversational voice quality called R-factor. The R-factor ranges from the best value of 100 to the worst value 0. However, 70 can be considered as a minimal quality for telephone calls. [41] The ITU-T recommendation G.109 [48] defines the speech transmission quality categories with R-value ranges as illustrated in Table 3. Table 3: Speech transmission quality categories [48] R-value range Speech transmission quality category User satisfaction 90 ≤ R < 100 Best Very satisfied 80 ≤ R < 90 High Satisfied 70 ≤ R < 80 Medium Some users dissatisfied 60 ≤ R < 70 Low Many users dissatisfied 50 ≤ R < 60 Poor Nearly all users dissatisfied The calculation of R-value is based on 21 input parameters that present network, terminal, and environmental quality factors. The basic formula of E-model is as below: AIIIRR EDS  0 (4.1) Where R0 (basic signal-to-noise ratio) groups the effects of noise sources. IS (simultaneous impairment factor) represents impairments occurring simultaneously with speech like quantization noise, received speech level and side tone level.
  39. 39. 36 ID (delay impairment factor) represents all impairments due to the delay of voice signals and is further composed of talker echo, listener echo, and too long absolute delay impairment factors. IE (equipment impairment factor) captures the effect of information loss due to coding scheme, packet loss or uncompensatable jitter. A (advantage factor) represents user willingness to accept quality degradation in return for some advantage like ease of access or a free VoIP call. The E-Model is based on fixed, empirical formulae, which make it quite reliable. The flip side is that subjective tests are required to derive model parameters and thus the basic E- model specification is applicable for a restricted number of codecs and network conditions. Further subjective tests are required for new emerging codecs. [37], [41] Another limitation of the E-model is that it relies on some static transmission parameters (e.g. average mouth-to-ear delay and average packet loss) that do not change during run- time. Thus it does not capture the effects of time varying impairments like delay variation and may therefore give misleading results if used as such for a human rating prediction. Actually ITU-T does not recommend the E-model as a tool for human rating prediction [41]. Limitations of the E-model regarding to the delay and packet loss variation have been addressed by Annex B of ETSI TS 102 024-5 technical specification [49]. Methodology regarding effects of burst packet loss has been adapted from Clark’s work from year 2001 [46]. 4.2.2 P.VTQ ITU-T P.VTQ is a non-intrusive parametric VoIP call monitoring standard, which was in method selection phase at the time of writing this thesis. According to Takahashi et al. competing schemes to be included in the standard are VQMon and PsyVoIP [50]. Both approaches use parameters from RTP- and RTCP-streams to compute voice quality impairments and may be integrated to the E-Model framework. VQMon methodology calculates equipment impairment factor IE taking into account burst packet loss situations as proposed by Clark [46]. VQMon supports narrow and wideband
  40. 40. 37 codecs, as well as listening and conversational quality estimation. It is productized by Telchemy [51]. PsyVoIP takes into account differences between VoIP devices and allows calibration of the call quality estimation model with PESQ. This way proprietary jitter buffer and error concealment implementations are taken into account [52]. PsyVoIP is a product of Psytechnics [53]. 4.3 Quality model for VoIP call quality estimation The main requirement for a quality model is that it must be good enough to reliably show the effect of optimization schemes. The quality model does not need to consider acoustic capabilities of the terminal, because they remain constant. Optimization schemes considered in this thesis only affect transmission behaviour and wireless link utilization. Receiver side behaviour and acoustic processing capabilities remains fixed. Therefore, call quality estimation based on transmission qualities is enough when impairments introduced by different optimization schemes are compared. According to the general requirements above, the following criteria for a quality model were identified: - is a fully automatic, instrumental method suitable for passive quality monitoring - is free from license feeds and patents - has low computational cost and delay, works at run-time Though the E-Model has its weaknesses, an E-Model based monitoring tool meets the above mentioned requirements and is a suitable tool when comparing the impact of transmission impairments introduced by different VoIP optimizations on conversational voice quality. Terms R0, IS, and A from equation (4.1) consists of several parameters which do not depend on packet transport. Therefore, the most relevant factors in the context of VoIP application are ID and IE. When G.109 recommended default values for non-relevant parameters are used, the equation (4.1) can be simplified to cover the effect of transport level quantities only (R0=94.77, IS=1.41, A=0):
  41. 41. 38 ED IIR  36.93 (4.2) Refer to appendix 1 for detailed derivation of the formula. Delay impairment factor ID includes talker echo, listener echo, and absolute delay impairments. We further simplify the estimation method and assume perfect echo cancellation in which case impairment factor ID can be reduced to the following expression as shown by Cole & Rosenbluth [44]: )3.177()3.177(11.0024.0  dHddID (4.3) Where d is mouth-to-ear delay (absolute delay) in milliseconds and H(x) is the heaviside function defined as: H(x) = 0, if x < 0, else H(x) = 1 (4.4) For impairment factor IE some values for limited set of codecs and packet times with certain packet loss distributions are available in G.113 specification [54]. More extensive measurements of impairment factors are needed so that the equation (4.2) would be applicable in the call quality monitoring application.
  42. 42. 39 5 OPTIMIZATION SCHEMES FOR MOBILE VOIP In this chapter the most promising application level optimization schemes for VoIP are described and considered in detail based on theory and previous work on the subject. The applicability of schemes is discussed by considering limitations set by standards and protocol and interoperability issues. Proposed schemes use channel state and network conditions as input parameters and produce behaviour, which tries to save battery and maintain acceptable call quality. 5.1 Adaptive packet rate control at application layer Bigger packets incorporate less protocol overhead, which is more energy-efficient if transmission conditions in the wireless network are good. When the packet rate is lowered, the terminal also has more time to spend in power save state. When wireless link quality is bad, it is more probable that smaller packets are get through without bit errors and thus should be preferred if link layer uses packet retransmission logic in an error situation. Bit errors occurring in wired network have no effect on energy-efficiency when protocols above link layer do not incorporate packet retransmission logic. In addition, congestion of the transmission path must also be taken into account. Mahlo et al. have shown in their study that bigger packets should be used in congestion situation [55]. They showed also that lowering packet rate is enough when link technology with a high packet switching overhead like IEEE 802.11 is used. Clues about network congestion status are available through RTCP receiver reports if RTCP is used (which is in most cases true with VoIP calls). Basic RTCP defined in [3] provides directly only a “cumulative number of packets lost” and “inter-arrival jitter” metrics, but RTCP XR [56] can provide more detailed QoS statistics relating to VoIP call quality. From a call quality monitoring point of view, the difficulty is how to measure mouth-to-ear delay at runtime. Network delay can be estimated by calculating round trip delay from an NTP timestamp of basic RTCP receiver reports [3]. Getting an estimation of the end-to-end
  43. 43. 40 delay is achieved by halving the round trip time. However, this is only a rough approximation for end-to-end delay because network path may have asymmetric delays. Also, delay metrics for RTP are likely to be different from RTCP metrics due to a higher utilized bandwidth. Further, mouth-to-ear delay cannot be calculated without knowledge of jitter buffer size at the receiver. Again, RTCP-XR reports include more comprehensive statistics like delay metrics and jitter buffer parameters [56]. Based on this information, mouth-to-ear delay can be calculated more reliably. In summary, one should usually use as low a packet rate as possible when considering VoIP over IEEE 802.11. The only exception to this rule is operation under bad channel conditions in which case smaller packets should be used. To be able to adapt packet rate we should have information about current wireless signal quality both at the sender and the receiver and network congestion status. At the time of writing there was not a standard-compliant mechanism to give feedback about signal quality at the receiver. Without that information we cannot know if packet loss or jitter reported by the receiver is due to network congestion or bad channel conditions at the receiver end. If packet size is simply adapted according to the feedback from the receiver we are likely to make a situation worse due to conflicting adaptation strategies in these scenarios. Without call quality concerns we could (from a local power-save point of view) adapt packet size according to our own wireless network signal quality and battery state. The signal quality detection problem is noted in previous studies but a standard solution does not yet exist. An example of a non-standard solution to the problem is one proposed by Yoshimura et al. [57]. In their proposed solution there is an RTP monitoring agent in the boundary of wired and wireless network that sends RTCP reports regarding quality information of the wired network. The sender can distinguish network congestion from bad wireless link conditions by comparing RTCP reports from the monitoring agent and from the peer. In this thesis only a simple rate control algorithm is proposed because reliable information regarding signal quality at the other end is not available. Absolute delay metrics are not available either without RTPC-XR reports and thus no optimization scheme can take
  44. 44. 41 conversational call quality into account. So, the basic idea is to optimize packet size purely for energy-saving and make a fallback to normal operation if RTCP reports indicate deterioration in transmission quality. That way some energy saving advantage can be achieved without making call quality too bad compared to an unoptimized operation. The proposed algorithm is a modified version from the one presented by Barberis et al. [58]. The algorithm runs at the sender and uses information carried with cyclic RTCP receiver reports. The sender starts with a configured minimum packet size. If RTCP statistics indicate no problem and local wireless signal quality is good, packet size is increased by the codec’s frame size. This is repeated always when a new RTCP report is received until the configured maximum for a packet size is reached. If RTCP statistics starts to show some problems or local wireless signal quality deteriorates, fallback to normal operation is done and packet size is accordingly lowered to the configured minimum. Adaptation decisions are based on highest acceptable packet loss and jitter thresholds. Some examples for threshold values can be found from [59]. A flow chart of the algorithm is presented in Figure 5.1. Figure 5.1: Flow chart of the adaptive packet rate control algorithm RTCP report update metrics high losses? high jitter? increase packet size by frame size wait next RTCP report decrease packet size to minimum noyes noyes
  45. 45. 42 Application layer packet size adaptation is partially frustrated if packets are fragmented in the network layer (IP-stack) or link layer software. In that case packet size control performed at the application layer does not have an effect on transmission efficiency over the first hop. Better power save state utilization is still, however, possible if the packet rate is lowered. When fragmentation alternatives are considered, one should notice that network layer fragmentation has a disadvantage that reassembly is done by the final destination. If any fragments are lost, the whole packet must be retransmitted. On the contrary, if fragmentation is performed at the 802.11 link layer, only the missing fragment needs to be retransmitted. Thus link layer fragmentation could be a better alternative when local channel conditions are not good. In addition, adaptation can be done with higher resolution at link layer which can make energy-efficiency better and boost speed over a single hop. Link and application layer adaptations can, however, be used as complementary strategies, because at the application level we can affect to network congestion, and at link layer we can optimize transmission according to local channel conditions. The disadvantages of the packet rate control approach are increased mouth-to-ear delay and possible interoperability problems if other end is not able to handle abnormal packet sizes. However, the majority of implementations should be able to handle different packet times up to 200 ms as specified in [60] unless the operation is restricted by negotiation. A maximum packet time can be negotiated with SDP for some audio payload formats like e.g. iLBC [61]. 5.2 Adaptive transmission time determination In [62] the authors introduce the idea of optimal transmission time determination. In this method uplink and downlink traffic are synchronized so that the time spent in power save mode is maximized. The study is done for infrastructure mode network and using legacy 802.11. The study assumes that a terminal can enter sleep mode immediately after sending the packet. In practice that is not true, but rather state transition may include delay. Another drawback of the proposed method is that it does not take into account the jitter of incoming data flow. Further, the algorithm expects symmetric data flow, and DTX used by a remote party is likely to confuse the proposed algorithm. With legacy 802.11, the
  46. 46. 43 application layer does not have the possibility to affect the time when WLAN software receives packets. The proposed method does not work with U-APSD power save either, because downlink packets are received only when sending some data to the AP. Actually, U-APSD makes the proposed scheme needless, because in a way U-APSD uses the same synchronization idea of uplink and downlink traffic. Buffered frames are sent as a result of uplink traffic generation. However, the method inspires to the idea where uplink packets are buffered and sent in bursts instead of making the packetization period longer. U-APSD enables the terminal to enter sleep mode immediately after uplink packets are sent and pending packets fetched from AP [12]. This way overhead due to rapid state transitions to sleep mode and back can be lowered, which saves energy. Bursty sending may have a negative effect upon call quality if the receiver is not able to handle bursts. Compared to the packet rate control approach, increased overhead due to protocol headers may compensate for any possible advantages. The network may also get congested during bursty periods. One advantage compared to the packet rate control method is that sending packets in bursts does not require receiver support for larger packet sizes. 5.3 Throughput optimization by codec and codec mode adaptation As we have noticed earlier, one should use as small packets as possible in bad channel conditions to minimize bit errors and energy consuming retransmissions. Retransmissions can be decreased with the packet rate adaptation approach as far as it goes, but a codec’s frame size sets a limit for adaptation resolution. However, resolution can be further increased with the codecs that allow dynamic bit rate adjustments (codec mode adaptation). Possible coding rates for some common codecs are presented in Table 1. Another alternative to reduce packet size is switching to the less bandwidth consuming codec (codec adaptation).
  47. 47. 44 One example of a codec mode adaptation technique for speech transmission over 802.11 wireless packet network is presented by Servetti and De Martin [63]. The speech coding rate is adapted according to instantaneous wireless channel conditions. As an example, the variable bit rate codec AMR is used in the simulations. The proposed rate selection algorithm uses only two rates. In good channel conditions the maximum bit rate 12.2 kbps is used and in bad channel conditions the output rate is decreased to the minimum 4.75 kbps. According to the simulations, the proposed adaptation strategy outperforms consistently the constant bit rate approach. Packet loss rates and end-to-end delays were both decreased with the adaptation scheme. A possible drawback of the approach is that speech quality may be decreased unnecessarily much due to the very limited set of allowed rates. By allowing intermediate rates in moderate channel conditions, speech quality could be maintained at a better level. On the other hand, wireless signal strength changes tend to be sudden and more drastic lowering of the bit rate could be a good option. In this thesis a modified algorithm based on the study in [63] is presented. When channel conditions are good, the bit rate is changed to the configured maximum. Under bad channel conditions the bit rate is changed to the configured minimum. Between mode changes, the configured or negotiated minimum time is waited. For example, the AMR payload format specification defines a negotiable “mode-change-period” parameter [64]. Suitable threshold values for deciding good and bad channel conditions must be investigated experimentally with a target system. If signal quality is moderate and neither threshold is achieved, no action is taken. A low chart of the algorithm is presented in Figure 5.2.
  48. 48. 45 Figure 5.2: Flow chart of adaptive coding rate algorithm The advantage of this approach compared to the packet rate adaptation mechanism is that mouth-to-ear delay is not increased. Moreover, packet loss rate and jitter due to the variable amount of retransmissions needed are both decreased under bad channel conditions. The proposed scheme works well only with codecs which have the built-in capability for bit rate adaptation during session without out-of-band negotiation. For some codecs, alternative bit rates can be negotiated, but not changed in the middle of the session without re-negotiation, e.g. iLBC [61]. AMR is a good example of a multi-mode codec suitable for this kind of adaptation. AMR implementations should be able to handle mode changes to arbitrary modes at the resolution of one frame time unless restricted through the negotiation. The AMR payload format specification defines the operation of mode adaptation so that the decoder has control over which mode should be used by the encoder. The desired mode is signalled to the sender (encoder) by using the so called Codec Mode Request (CMR) field in outgoing packets. Thus source side mode adaptation according to the local signal quality is not possible if the peer wants to receive with a certain mode. However, the signal quality good quality? bit rate=maximum configured bit rate wait for minimum mode change period bit rate=minimum configured bit rate noyes bad quality? yesno
  49. 49. 46 specification recommends that IP terminals should not set CMR and so transmission adaptation should be available in many cases. [64] Throughput can be optimized also by switching to a less bandwidth consuming codec when signal quality deteriorates. With SDP, several alternative media formats can be negotiated during initial session setup [65]. That allows dynamic media format changing during the session without re-negotiation. Unfortunately, hardware mobile phones have limited processing and memory capacity and do not support media format changing on-the- fly in general. Most implementations select only one media format to be used in the session in which case this option is unavailable. However, the option may be available when the peer is a softphone running on a personal computer. Another option for codec changing with SIP is to issue re-invite with modified parameters. If the new offer is unacceptable by the answerer, the old parameters should remain in place according to the SDP offer answer model [65]. Thus it should be safe to try session modification, though it may be the case that the new offer is rejected.
  50. 50. 47 6 MEASUREMENTS AND EVALUATION OF THE RESULTS In this chapter, a test setup for energy-efficiency measurements is described and measurement results are evaluated. Only basic power consumption measurements were done due to unavailability of certain features on tested software and the lack of a suitable WLAN emulator. 6.1 Measurement setup Energy-efficiency was measured in a U-APSD capable infrastructure network. Measurements were done using the AMR codec with 20 ms, 40 ms, and 60 ms packet times with maximum bit rate 12.2 kbps. In addition, the effect of bit rate changing was tested by measuring power consumption with two different bit rates, 4.75 kbps and 12.2 kbps, using 20 ms frame size. Power consumption measurements were done programmatically with a Nokia Energy Profiler tool using a Nokia N95 S60 mobile phone as hardware. That tool allows real-time profiling of the energy consumption in the target device. The tool can be used in devices with Nokia S60 3rd edition, Feature pack 1 and onwards. The Energy Profiler and related documentation is available on Forum Nokia [66]. Comfort noise and DTX were disabled during measurements to assure continuous data flow between terminals. Undeterministic gaps in data flow would otherwise obscure measurements, because a test run utilizing more DTX would have more time to spend in power-save mode. As a test signal some speaking was performed at each phone in turn until five minutes was elapsed. Speaking was performed to estimate conversational call quality; the contribution of speaking to power consumption is minimal, because with DTX disabled, the speech payload is transmitted even during periods of silence. The impact of the display’s backlight to power consumption was also measured by letting the backlight go off after about 30 seconds. Only the VoIP application and energy-profiler application were kept running during measurements to minimize the impact of other software.
  51. 51. 48 All tests were executed under good channel conditions with an AP dedicated to the test without interfering traffic. 6.2 Measurement results In Figure 6.1 power consumption with 20 ms packet time and a minimum narrowband AMR bit rate (4.75 kbps) is presented. From the results it can be seen that contribution of the backlight to overall power consumption during VoIP call is approximately 20 mA (230 mA with backlight on versus 210 mA with backlight off). That makes about a 10% reduction in power consumption when the backlight is kept off during the call. VoIP Call U-APSD ON, 4.75 kbps, packet time 20ms 0.23 0.21 0 0.1 0.2 0.3 0.4 0.5 0.6 0 22 42 63 85 108 131 152 172 193 218 240 262 282 304 Time (s) CurrentConsumption(A) Samples Backlight ON Backlight OFF Figure 6.1: Power consumption with AMR codec with 20 ms packet time and 4.75 kbps bit rate Speech quality was quite bad due to the low bit rate but understandable anyway. This low bit rate is recommended only when channel conditions are so bad that the needed retransmissions due to transmission errors can be reduced by lowering the bit rate.
  52. 52. 49 In Figure 6.2 power consumption with 20 ms packet time and a maximum narrowband AMR bit rate (12.2 kbps) is illustrated. From the results we see that increased processor load with higher bit rate does not increase overall power consumption during VoIP call noticeably. Power consumption was the same as with the minimum bit rate. Thus bit rate changing is not an effective way to reduce power consumption when channel conditions are fine. VoIP Call U-APSD ON, 12.2 kbps, packet time 20ms 0.23 0.21 0 0.1 0.2 0.3 0.4 0.5 0.6 0 22 43 64 86 109 130 152 174 195 216 237 259 281 304 Time (s) CurrentConsumption(A) Samples Backlight ON Backlight OFF Figure 6.2: Power consumption with AMR codec with 20 ms packet time and 12.2 kbps bit rate Speech quality was very good compared to speech quality observed with minimum bit rate. Also conversational call quality was good and mouth-to-ear delay was not noticeable with 20 ms packet time and 12.2 kbps bit rate. The maximum bit rate should be used unless peer/gateway requests for another codec mode or local wireless channel conditions are bad.
  53. 53. 50 In Figure 6.3 power consumption with 40 ms packet time is presented. From the diagram we can see that power consumption is about 10 mA less when compared to consumption with 20 ms packet time (200 mA versus 210 mA with backlight off). As a percentage reduction is about 5%. With N95 8GB: s default 1200 mAh battery about 17 minutes longer talk time is achieved compared to talk-time with 20 ms packet time (5h 43 min. versus 6h). VoIP Call U-APSD ON, 12.2 kbps, packet time 40ms 0.22 0.20 0 0.1 0.2 0.3 0.4 0.5 0.6 0 19 41 61 81 103 124 144 166 186 207 228 248 270 292 Time (s) CurrentConsumption(A) Samples Backlight ON Backlight OFF Figure 6.3: Power consumption with AMR codec with 40 ms packet time and 12.2 kbps bit rate Subjective estimation showed no degradation in conversational call quality with 40 ms packet time compared to quality with 20 ms packet time. 40 ms packet time can be normally used without making mouth-to-ear delay too bad, but it is recommended that there is call quality monitoring logic in place when anything other than default packet time is used. Also, packet time should be lowered when wireless channel conditions get weaker.
  54. 54. 51 Finally, power consumption with 60 ms packet time was measured. Results are illustrated in Figure 6.4. As expected, there was a further reduction in power consumption compared to the measurement with 40 ms packet time. The reduction in power consumption was same as between 20 ms and 40 ms packet times, which is 10 mA. VoIP Call U-APSD ON, 12.2 kbps, packet time 60ms 0.21 0.19 0 0.1 0.2 0.3 0.4 0.5 0.6 0 20 40 61 82 102 123 144 166 186 205 226 246 267 288 Time (s) CurrentConsumption(A) Samples Backlight ON Backlight OFF Figure 6.4: Power consumption with AMR codec with 60 ms packet time and 12.2 kbps bit rate Speech was understandable and not cracking with 60 ms packet time and subjectively estimated, conversational call quality was still good. However, this big packet size is susceptible to network delay variations and bad channel conditions, and should not be used unless reliable QoS monitoring and signal quality observation logic is available. If network conditions can be discerned as good, 60 ms packet time can be used quite safely. The user is unlikely to notice an extra 40 ms delay compared to the operation with 20 ms packet time. Instead, the user is probably pleased with half an hour longer talk time compared to talk time with 20 ms packet time (5h 43 min. versus 6h 19 min. with 1200 mAh battery).
  55. 55. 52 7 CONCLUSIONS AND FUTURE WORK The main objective of this thesis was to examine how a VoIP application could adjust its operation to present conditions in the environment in power-aware manner. Besides battery life, another big factor affecting VoIP call QoE is perceived call quality. These factors interrelate and optimizations to one area can pose degradation problems in another. For that reason, energy-saving means were studied together with the factors affecting perceived call quality. In this thesis we focused on programmatic, application-layer means for better energy- efficiency in the mobile VoIP application domain. Besides application-layer means, we discussed also improvements in the other layers of software. We concentrated on optimizations which are suitable for mobile VoIP-implementations utilizing SIP and IEEE 802.11 technologies. We identified three relevant optimization areas for mobile VoIP applications. First, attention should be paid to general software efficiency, secondly wireless interface utilization can be made more energy-efficient, and finally display usage can be optimized. We conducted a literature survey on these areas and provided a theoretical background on what can be done in the field of energy-management. Basically, all adaptation strategies aim to decrease power consumption by more efficient resource usage. Based on previous studies, we identified factors affecting perceived call quality. We also made a survey on different voice quality estimation approaches. Based on the theory and previous work, we presented a theoretical quality model for real-time call quality estimation. The quality model can be integrated to the optimization scheme in question. To be able to utilize the quality model on the transmitter side, some detailed statistics regarding VoIP call are needed from the audio receiver. The recommended, standard compliant way to create a feedback channel is to implement support for RTCP-XR reports. Statistics calculated from basic RTCP would not be reliable enough for call quality monitoring purposes.
  56. 56. 53 For the quality model, the contribution of delay impairment factor ID to overall call quality can be modelled universally, but for the equipment impairment factor IE no available comprehensive public measurement data exists. More extensive measurements of impairment factors with different codecs and network metrics are needed to be able to adapt operation in a call quality aware manner. We identified some potential optimization schemes for better energy-management in the wireless interface utilization area. Schemes covered transmitter side adaptation mechanisms and were related to dynamic changing of transmission properties according to the current network conditions. When call quality is considered, we must observe whether call quality is deteriorating due to any optimization attempt. We found that quality conserving adaptation decisions cannot be done merely based on packet loss or jitter metrics, because those metrics can indicate either network congestion or weak signal strength conditions at the receiver end. In these scenarios adaptation strategies will conflict with each other. Without signal strength knowledge, adaptation can only be based on trial and error so that when one strategy seems to fail, another one is tried. We found that some optimization means are better to implement below the application- layer. For example, throughput optimization with link-layer fragmentation can save energy without deteriorating call quality when the local signal strength is weak. However, we showed that optimizations in different architectural layers can be complementary to each other. For example, dynamic transmit power control can save power regardless of application-layer optimizations. The drawback is that this particular technique is possible only if 802.11h is supported. In this thesis only basic power consumption measurements were done and more comprehensive measurement data is needed to find out suitable thresholds for the proposed algorithms and make them reliable. Measurements should be done in a controlled environment where different network properties like jitter, packet loss or signal strength can be adjusted. Nonetheless, we have shown in this study that adaptive optimization schemes can reduce power-consumption.

×