Adaptive Optimization Schemes for Mobile VoIP Applications - Battery Life and Call Quality Aspects
Lappeenranta University of Technology
Faculty of Technology Management
Department of Information Technology
ADAPTIVE OPTIMIZATION SCHEMES FOR MOBILE
BATTERY LIFE AND CALL QUALITY ASPECTS
Examiners: Professor Jari Porras and D.Sc. Kari Heikkinen
Instructor: M.Sc. Markus Kaikkonen
Tervakukkatie 46 C 18
Tel. +358 40 715 8418
Lappeenranta University of Technology
Faculty of Technology Management
Department of Information Technology
Adaptive optimization schemes for mobile VoIP applications
Battery life and call quality aspects
Thesis for the Degree of Master of Science in Technology
61 pages, 12 figures, 3 tables and 1 appendix
Examiners: Professor Jari Porras
D.Sc. Kari Heikkinen
Keywords: VoIP, QoS, 802.11, power management, battery life, adaptation, mobility
In this thesis programmatic, application-layer means for better energy-efficiency in the
VoIP application domain are studied. The work presented concentrates on optimizations
which are suitable for VoIP-implementations utilizing SIP and IEEE 802.11 technologies.
Energy-saving optimizations can have an impact on perceived call quality, and thus
energy-saving means are studied together with those factors affecting perceived call
In this thesis a general view on a topic is given. Based on theory, adaptive optimization
schemes for dynamic controlling of application’s operation are proposed. A runtime
quality model, capable of being integrated into optimization schemes, is developed for
VoIP call quality estimation.
Based on proposed optimization schemes, some power consumption measurements are
done to find out achievable advantages. Measurement results show that a reduction in
power consumption is possible to achieve with the help of adaptive optimization schemes.
Lappeenrannan teknillinen yliopisto
Mukautuvat optimointimallit langattomille VoIP -sovelluksille
Näkökulmina akun kesto ja puhelun laatu
61 sivua, 12 kuvaa, 3 taulukkoa ja 1 liite
Tarkastajat: Professori Jari Porras
TkT Kari Heikkinen
Hakusanat: VoIP, QoS, 802.11, virran hallinta, akunkesto, mukautuvuus, liikkuvuus
Keywords: VoIP, QoS, 802.11, power management, battery life, adaptation, mobility
Tässä diplomityössä tutkitaan ohjelmallisia, sovelluskerroksen keinoja virrankulutuksen
pienentämiseksi VoIP-sovellusalueella. Työssä keskitytään optimointimenetelmiin, jotka
soveltuvat SIP ja IEEE 802.11 WLAN -teknologioita hyödyntäville toteutuksille.
Virransäästöoptimoinnit voivat vaikuttaa käyttäjän kokemaan puhelun laatuun, joten
virransäästökeinoja tutkitaan yhdessä puhelun laatuun liittyvien tekijöiden kanssa.
Työssä muodostetaan teoreettinen kokonaiskuva aiheesta ja kehitetään mukautuvia
optimointimalleja, joiden avulla toimintaa voidaan säätää dynaamisesti kulloisiinkin
olosuhteisiin sopiviksi. VoIP-puhelun laadun arvioimista varten esitellään ajonaikaiseen
käyttöön soveltuva laatumalli, joka on mahdollista integroida osaksi optimointimallien
Esitettyjen optimointimallien pohjalta tehdään joitain virrankulutusmittauksia
saavutettavissa olevien hyötyjen selvittämiseksi. Mittaustulokset osoittavat, että
mukautuvien optimointimallien avulla on mahdollista saavuttaa virrankulutussäästöjä.
1 INTRODUCTION .....................................................................................................5
1.1 Objectives of the thesis ..................................................................................5
1.2 Scope of the work...........................................................................................6
1.3 Structure of the thesis.....................................................................................6
2 VOIP OVER IEEE 802.11 WLAN............................................................................7
2.1 Overview on Internet telephony.....................................................................7
2.2 Overview of 802.11 standards......................................................................11
2.3 Mobile VoIP special characteristics.............................................................13
3 BATTERY LIFE IMPLICATIONS OF VOIP........................................................15
3.1 Power-aware software development ............................................................15
3.2 Power management of the wireless interface...............................................17
3.2.1 Energy efficiency of IEEE 802.11 WLAN..............................................19
22.214.171.124 802.11 Legacy power save .................................................................20
126.96.36.199 802.11e U-APSD (Unscheduled Automatic Power Save Delivery)...21
3.2.2 Link adaptation strategies for energy-efficiency.....................................23
188.8.131.52 Transmission rate adaptation ..............................................................23
184.108.40.206 Adaptive packet size control...............................................................24
220.127.116.11 Transmission power control (TPC) ....................................................25
3.3 Power management of display .....................................................................27
4 ASSESSING VOIP CALL QUALITY....................................................................29
4.1 Factors affecting perceived call quality .......................................................29
4.2 Approaches for voice quality assessment ....................................................32
4.2.1 The E-Model............................................................................................35
4.3 Quality model for VoIP call quality estimation ...........................................37
5 OPTIMIZATION SCHEMES FOR MOBILE VOIP..............................................39
5.1 Adaptive packet rate control at application layer.........................................39
5.2 Adaptive transmission time determination...................................................42
5.3 Throughput optimization by codec and codec mode adaptation..................43
6 MEASUREMENTS AND EVALUATION OF THE RESULTS...........................47
6.1 Measurement setup.......................................................................................47
6.2 Measurement results.....................................................................................48
7 CONCLUSIONS AND FUTURE WORK..............................................................52
SYMBOLS AND ABBREVIATIONS
Generation Partnership Project
AMR Adaptive Multi-Rate is a speech data compression scheme.
AMR-WB Adaptive Multi-Rate WideBand is a wideband version of narrowband
ANN Artificial Neural Network
AP Access Point
CPU Central Processing Unit
DCF Distributed Coordination Function
DTIM Delivery Traffic Indicator Map
DTX Discontinuous Transmission is a technique used by the speech codecs to
lower bandwidth requirements.
ETSI European Telecommunications Standards Institute
FEC Forward Error Correction is an error control scheme for data transmission
utilizing redundant data addition to the carried data packets.
G.711 G.711 is an ITU-T standard for audio compression.
G.729 G.729 is a voice data compression algorithm producing audio frames of 10
GPRS General Packet Radio Service
GPS Global Positioning System
GSM Global System for Mobile Communications is a digital mobile telephony
system widely used.
HCF Hybrid Coordination Function
IEEE Institute of Electrical and Electronics Engineers is an international
professional society promoting the development of electro technology. The
IEEE fosters many industrial standards.
iLBC Internet Low Bit rate Codec
IP Internet Protocol
ITU International Telecommunication Union
ITU-T International Telecommunication Union – Telecommunications
LAN Local Area Network
MAC Medium Access Control
MOS Mean Opinion Score
MTU Maximum Transmission Unit
NTP Network Time Protocol is a protocol used to synchronize computer clock
times in a network of computers.
OS Operating System
OSI Open Systems Interconnection
PC Personal Computer
PCM Pulse Code Modulation
PLC Packet Loss Concealment
PCF Point Coordination Function
PESQ Perceptual Evaluation of Speech Quality
PS Power Save
QoS Quality of Service
QoE Quality of Experience
RF Radio Frequency
RTCP Real-Time Transport Control Protocol
RTP Real-Time Transport Protocol
SDK Software Development Kit
SDP Session Description Protocol
SID Silence Indicator
SIP Session Initiation Protocol
TPC Transmit/Transmission Power Control
TU Time Unit
UI User Interface
VAD Voice Activity Detection
VoIP Voice over Internet Protocol
VoWLAN Voice over Wireless Local Area Network
WLAN Wireless Local Area Network
In recent years smart phones have been evolved into multimedia computers. CPU speeds,
bandwidth of wireless interfaces, and available memory have been increasing rapidly to
meet functional requirements. On-board cameras are a standard accessory, and GPS
capable phones are becoming common. All that feature richness has increased power
consumption while battery technology has not been able to improve battery capacity at the
same rate. For that reason, it is increasingly important to limit power consumption on
different design levels. One approach is application-level optimization, which aims to
achieve efficient resource usage on an application’s domain.
In the mobile VoIP application domain, battery life has its effect on available talk time,
which is one of the most important factors in VoIP quality of experience (QoE). Talk time
should equal GSM cellular terminals talk time to offer usability and assure good QoE.
However, concentration on energy efficiency alone is not enough. As we will show,
software optimizations designed to enable better energy efficiency can have an impact on
perceived call quality, which is another big factor of VoIP QoE. Therefore battery life
optimizations must be studied together with call quality concerns in mind.
1.1 Objectives of the thesis
The main objective of this thesis is to find means to reduce power consumption in the
mobile VoIP application domain using software. Power saving techniques can have an
impact on perceived speech quality, which presents us with a trade-off optimization
problem to be solved. This work concentrates on those optimization schemes which are
feasible to implement in practice taking into account constraints set by the used protocols
and interoperability concerns.
In this work a survey of previous studies and research on field of power management and
adaptive VoIP is done, and based on the most promising techniques, optimization schemes
are introduced. For call quality estimation, a run-time call quality assessment algorithm is
Finally, some basic power consumption measurements are done to find out achievable
advantages with proposed optimization schemes.
1.2 Scope of the work
In this thesis, application layer improvements are studied to make mobile VoIP
implementation more energy efficient. Other means, such as protocol improvements, are
out of the scope of this thesis though they may be mentioned in the work. Hardware issues
are not a concern either. The work is mainly a theoretical analysis of what can be done and
further studies are needed to adapt results into real-life VoIP implementations.
The main domain for the work is SIP VoIP over 802.11 WLAN. However, optimization
schemes should be generic enough to be usable with other signalling protocols. A
Symbian OS based mobile terminal is used as a reference environment for the analysis and
in the practical part of the work.
1.3 Structure of the thesis
After this introduction, the thesis continues with chapter 2, which provides an overview on
VoIP technology and wireless networks. In chapter 3, we dive into previous studies about
battery life and energy management on various areas. In chapter 4, call quality assessment
methods are discussed in general, and a quality model for call quality evaluation is
proposed. Chapter 5 presents optimization schemes for adaptive VoIP application based on
a literature survey. In chapter 6, some introductory measurements on battery consumption
for future study are presented. Differences in power consumption with different test setups
are discussed. Chapter 7 summarizes the results of the whole work and points out some
topics for further research.
2 VOIP OVER IEEE 802.11 WLAN
In this chapter we present an overview of VoIP technology and IEEE 802.11 WLAN
standards and discuss some characteristics of mobile VoIP.
2.1 Overview on Internet telephony
Internet telephony allows for the provision of voice services across networks using internet
protocols. Internet telephony consists of signalling and transmission protocols. Voice
sessions are established, controlled and terminated with signalling protocols like SIP  or
H.323  and media are carried with transport protocols like RTP .
The principal components of a VoIP system covering end-to-end voice transmission are
presented in Figure 2.1. First, the sender captures speech signal and compresses it by using
some codec (encoding). Then compressed media frames are packetized and sent to the
network. The receiver does depacketization and frames are passed to the codec decoder
through a play out scheduler. Finally, frames are played out through the speaker.
Figure 2.1: Block diagram of media processing
Acoustic processing Encoder
Input media frame
DepacketizePlayout schedulerDecoderAcoustic processing
The main components of a VoIP system participating in media processing in order to
deliver end-to-end media are: acoustic processing, speech codecs, transport and network
layer protocols, IP network, and a playout scheduler.
Acoustic processing represents a very important phase in an overall media path affecting
voice quality. Acoustic processing maintains a pleasant level of input and output signals. It
filters out background noise so that speech can be separated from the acoustic signal.
Acoustic processing must also take care of echo cancellation; the speaker signal is easily
fed back to the microphone if the hand-set isn’t used. In this thesis, acoustic processing
isn’t considered in detail, however.
The purpose of speech codecs is to compress digitalized speech in order to lower
bandwidth requirements while maintaining acceptable speech quality. Characteristics of
speech codecs include: coding rate (bit/s), frame rate (Hz), algorithmic delay (ms)
introduced to the voice processing, complexity, and speech quality (MOS).
Because human communication is periodical by nature, some codecs use discontinuous
transmission (DTX) a.k.a. silence suppression as a technique to lower bandwidth usage. A
codec can use a voice activity detection (VAD) algorithm to decide when to enter
discontinuous transmission mode. During a DTX period, the speech codec generates
silence description frames at a lower rate than speech frames. Silence description frames
can be used to generate artificial background noise at the receiver end so that the user
would not assume that call is dropped. If the codec does not support generation of silence
description frames, the terminal does not need to send anything during a silent period. On
the receiving side, speech decoders usually use a packet loss concealment (PLC) algorithm
to compensate for missing frames. Compensation can be done by repeating the last
received frame or by extrapolating.
Speech codecs are designed with specific goals in mind, which makes them suitable for
particular situations such as packet losses occurring in the transmission path. That provides
an interesting optimization opportunity through codec change adaptation. In the following
paragraphs the most common codecs used in the VoIP systems are introduced.
ITU-T G.711. G.711 is perhaps the most commonly used codec, because it provides good
speech quality (MOS 4.3) and low complexity due to its simple linear quantization. It does
not introduce algorithmic delay because coding is performed sample by sample. It comes
with a cost of relatively high bandwidth requirements taking in all 64 kbps. G.711 has two
variants: a-law and µ-law. µ-law is used in Americas while the rest of world uses a-law.
ITU has standardized a packet loss concealment (ITU G.711 Appendix I), which limits the
effects of occasional packet losses. 
ITU-T G.729. G.729 partitions speech into 10 ms frames, which corresponds to 80 bits.
The encoder takes 16-bit linear PCM data sampled at 8 kHz as input data and produces 8
kbps coded data. The algorithmic delay of the coder is 15 ms (10 ms input frame and 5 ms
look-ahead). The codec includes a packet loss concealment algorithm. G.729 has coding
memory, which makes it vulnerable to frame errors and packet losses; an error occurring
within a previous frame affects the processing of the next one. 
Adaptive Multi-Rate. AMR was originally developed for GSM, but it is also a selected
3GPP mandatory codec. Bits of the AMR frame are ordered into three significance classes,
which make it possible to decode frame partially even if some part is corrupted. AMR is
capable of changing bit rates dynamically, which allows it to adapt to the capacity of
transmission channel. Supported bit rates range from 4.75 kbps to 12.2 kbps. AMR has a
built-in forward error correction mechanism based on redundant data sending, which can
be used to dynamically protect flow against transmission errors. It utilizes receiver side
PLC and sender side VAD algorithms to compensate for frame losses and lower bandwidth
usage during silent periods. 
Adaptive Multi-Rate Wideband (AMR-WB). AMR-WB provides better speech quality
due to a wider speech bandwidth but works otherwise in the same manner as narrow-band
AMR. Supported bit rates range from 6.60 kbps to 23.85 kbps. 
iLBC. iLBC has been developed by Global IP Sound and is designed for narrowband
speech. It is especially good at tolerating packet loss and thus suitable for use in lossy
channels. Speech quality is at the same level with G.729, but in lossy channels iLBC
outperforms G.729. , 
Some properties of introduced codecs are represented in Table 1.
Table 1: Common VoIP codecs and their properties , 
Codec Coding rate (kbps) Speech quality
Frame size (ms)
AMR 4.75-12.2 3.5-4.0 20
AMR-WB 6.6-23.85 Up to 4.5 20
G.711 64.0 4.5 -
G.729A 8.0 4.0 10
iLBC 13.33, 15.2 4.0 30, 20
After VoIP application has captured speech frames, it concatenates one or multiple frames
into one packet (packetization) according to the format used. When the outgoing packet
traverses the protocol stack, every protocol layer adds its own headers to the packet. In an
IP network every packet can traverse its own path to the sender. Packets can get lost due to
congestion, transmission errors or due to connection outages (in wireless networks).
A playout scheduler a.k.a. a de-jittering buffer makes temporal buffering of received media
frames to compensate for variations in packet transmission times known as jitter. If a
packet comes too late to be played out on time, it is usually dropped. Thus a frame loss
metric seen by the VoIP application is the sum of real transmission loss and excessive
delay introduced by a possibly congested network. Because jitter buffer introduces
additional delay to the end-to-end transmission time, it must not keep frames in a buffer
longer than necessary. Instead, frames coming too late should be dropped. Finding the
optimal trade-off between end-to-end delay and dropped packet rate is one of the most
critical tasks in the design of a VoIP application.
The playout scheduling scheme can be either static or adaptive. With the static scheme
packets are discarded if they exceed a fixed maximum transmission time. The adaptive
playout buffer adjusts playout time dynamically based on the delay process of the network
. The easiest option for adjusting scheduling is during silent periods because then
adjustments are least noticeable.
2.2 Overview of 802.11 standards
802.11 refers to a family of specifications developed by the Institute of Electrical and
Electronics Engineers (IEEE) for wireless LANs. IEEE 802.11 is a member of the IEEE
802 family, which is a series of standards for local area networks. All 802 standards are
focused on the two lowest layers of the OSI reference model incorporating both physical
and data link layers. Thus 802 networks have both MAC and physical (PHY) layers. The
MAC is a set of rules about how to access the medium and send data, whereas PHY is
comprised of detailed rules for transmission and reception. The relation of IEEE 802.11
standards to the OSI model is illustrated in Figure 2.2.
Figure 2.2: IEEE 802.11 standards and relation to OSI model
The original 802.11 specification was ratified by IEEE as the first standard for wireless
LANs in 1997. The standard included two PHY layer definitions for data rates of 1 Mbps
and 2 Mbps at 2.4 GHz. The 802.11b standard was ratified in 1999, providing data rates up
to 11 Mbps at 2.4 GHz. Other additions include 802.11a for 54 Mbps at 5 GHz and
802.11g up to 54 Mbps at 2.4 GHz.
Besides PHY layer additions there have been enhancements also to the 802.11 MAC. For
example, the 802.11a standard was originally developed for the United States only and was
OSI Physical Layer
OSI Data Link Layer
802.2 Logical Link Control (LLC) sublayer
802.11 MAC sublayer
causing interference problems in Europe due to its 5 GHz frequency band. Interference
issues and stricter radio regulations were addressed with the 802.11h standard that defines
mechanisms for spectrum and transmit power management to the 802.11 MAC and
802.11a PHY. The transmit power reporting mechanism introduced in 802.11h makes
intelligent transmit power control feasible at the MAC layer.
802.11 networks can operate either in infrastructure or ad-hoc mode. In infrastructure
mode there is at least one centralized entity called an access point (AP), which is connected
to the wired network infrastructure, and a set of wireless end stations. All data
transmissions between stations are required to go through the AP. In ad-hoc mode wireless
stations are allowed to communicate with each other directly on a peer-to-peer basis.
Access to the shared wireless medium is controlled by MAC coordination functions. The
original 802.11 MAC specification defines the Distributed Coordination Function (DCF)
for Ethernet like CSMA/CA access and Point Coordination Function (PCF) for contention-
free service. DCF is responsible for asynchronous data services, while PCF offers time-
Original MAC coordination functions cannot provide good quality of service when high
background traffic is present. IEEE has addressed this problem by standardizing the QoS
enhanced MAC sublayer specification 802.11e . In order to achieve QoS, 802.11e
defines separate priority queues for different traffic categories. This way data packets
belonging to a higher priority category (e.g. real-time VoIP) gain access to the medium
with higher probability. 802.11e defines Hybrid Coordination Function (HCF), which
includes QoS enhanced versions for DCF and PCF: Enhanced Distributed Channel Access
(EDCA) and HCF Controlled Channel Access (HCCA). Relationships of different
coordination functions are illustrated in Figure 2.3. DCF is a base on which other
coordination functions are built.
Figure 2.3: 802.11x MAC Coordination Functions
A contention-free service can be provided only in infrastructure networks, because it
requires an AP. Quality of service can be provided in any 802.11 network that has HCF
support in stations.
2.3 Mobile VoIP special characteristics
There are two fundamental, differentiating factors between wired and wireless VoIP:
characteristics originating from wireless medium usage and consequences of mobility.
These factors are interrelated.
The greatest challenge for mobile VoIP is compensation of voice impairments due to the
usage of wireless network. Wireless channels have a higher error rate compared to wired
networks. The wireless network range may fluctuate and there may be shadow regions with
no coverage. The available bandwidths are lower than in wired networks and they can vary
more during the connection, which increases problems due to congestion. User motion sets
its own challenges for a system design; the user can easily walk into a shadow region and
get annoyed with poor call quality. On the whole, wireless network utilization sets much
harder performance requirements for a mobile VoIP application compared to wired VoIP.
Another challenge is the terminal’s processing performance. Wired VoIP clients include
software clients running on PCs and dedicated, fixed IP phones. PCs have plenty of
Contention-based access Contention-free access
processing capacity available for VoIP software. Also dedicated IP phones can be designed
to meet the performance requirements of VoIP application. In both cases the terminal’s
performance is not likely the limiting factor for the quality.
Sufficiency of performance is not that obvious with wireless VoIP clients. Modern smart
phones are becoming closer and closer to personal computers in functionality and
processing power. This kind of feature richness comes, however, at the cost of reduced
operation time due to increased processing capabilities and due to intention for a more
general system design so that the requirements of various applications would be fulfilled.
Unfortunately battery technology has advanced very slowly compared to the increase in
A VoIP application running in a smart phone is typically just another application among
others without having any special privileges (except for process/thread priority tuning).
Available processing capacity must be shared among various applications, which can be
run simultaneously. For that reason, the VoIP application process and threads should be
granted as high a priority as possible to ensure sufficient resources to handle the
requirements of real-time traffic. In addition, the execution platform can make efficient
implementation of VoIP application more difficult compared to the cellular application.
Symbian OS, for example, has optimized its telephony architecture for cellular calls by
having a dedicated processor for the purpose. Thus it may be a very challenging task for a
VoIP application to achieve equal talk times and quality compared with the cellular
equivalent inside the same device.
3 BATTERY LIFE IMPLICATIONS OF VOIP
The most energy consuming components of mobile devices are usually memory, display,
wireless communications, and CPU . Modern smart phones may also have a camera,
GPS and hard drive, all of which consume remarkable amounts of energy. The biggest
concern in the VoIP application domain is finding optimal utilization of the radio interface
under different network conditions. In this work, optimal radio interface utilization refers
to the operation, which keeps power consumption as small as possible while maintaining
acceptable call quality.
Based on the observations above, we identify more detailed factors which can affect
battery life in mobile terminals when considering the VoIP application domain. First, we
discuss power-aware software development issues and identify some general programming
techniques for battery saving and some Symbian OS specific mechanisms. After that
characteristics of 802.11 radio interface are discussed in detail. Lastly, alternatives for
energy-efficient display utilization are explored.
3.1 Power-aware software development
The operation system is mainly responsible for defining and implementing power
management architecture for energy-efficient hardware component usage. However, some
resources are controllable at the application or middleware layers either explicitly or
implicitly. For example, the camera and display can be typically utilized explicitly at the
application layer, and efficient CPU and memory utilization is a concern of whole system
There is no single thing, which alone could solve energy consumption problems. When
system resources are used in an efficient and economical way, the operating system has
more possibilities to switch unused components to a low power state. For application
controllable resources, application design must meet energy concerns. For example, an
application should keep different radios powered only for the time it is using them.
Software architecture implicitly affects energy consumption by orienting towards more or
less CPU intensive solutions. Thinner architecture reduces the volume of instructions
executed and accordingly the system has more possibilities to enter a low-power state
during idle periods. For example, in Symbian OS there is null a.k.a. idle thread that gets
control when there is not other thread ready to run. The null thread puts the CPU in a low-
power mode where execution stops until a hardware interrupt is asserted.
One software architecture level optimization possibility is the selection of the multi-tasking
model used. Traditional systems merely use process-threading model for multi-tasking
implementation, but Symbian OS also provides a mechanism called active object
framework. With active objects multi-tasking can be performed inside the same thread,
which reduces overhead due to context-switching between threads. Less overhead leads to
reduced power consumption. In this manner, applications should prefer active object
mechanism to threading whenever possible.
Code level optimizations are usually better to leave to the compiler, but with appropriate
design, better performance may be achievable. For example, timer usage must be
considered carefully because the device must wake up at every timer interrupt. Thus too
frequently activated or unnecessary ticking timers can shorten battery life.
Presented below are some other design guidelines for achieving better power-efficiency:
Memory usage should be minimized not only due to the limited amount of memory
in mobile terminals but also because moving less data around consumes less power.
Also, some hardware architectures are able to power off unused memory blocks.
Applications should be event-driven so that an application can be put to sleep when
there is no user interaction.
Idle timeouts should be used for devices like the camera to release them
automatically when they are no longer in use. That way the device is not powered
unnecessary if the user forgets to close the application.
In communication applications, unused media streams should be stopped instead of
only muting them to avoid data being unnecessarily processed.
In general, software power optimization problems can be categorized as presented in Table
Table 2: Categories of energy-related software problems 
Transition When should a component switch from one
power mode to another?
Load-change How should the components’ functionality
be modified to take advantage of low-power
modes more often?
Adaptation How can the software be modified to permit
novel, power-saving uses of the components?
Transition a.k.a. prediction strategies determines when to switch from one power mode to
another. Transition strategy could define, for example, when to put the wireless interface to
the power saving mode. The idle timeout mechanism described earlier is also a form of
transition strategy. In this case, the prediction strategy is to assume that the longer device
has been idling, the longer it will be idling.
Methods striving for efficient load distribution are known as load-change strategies. It is
not necessarily required that the load of a component should be reduced when reordering
might be sufficient. For the hard disk, for example, it is better to make several disk
requests one after the other and then spin the disk down. A poorer alternative would be to
spin the disk up and down for each request.
Adaptation strategies enable the component to be used in new power-saving ways. For
example, quality of audio or video stream could be degraded to reduce processor load.
3.2 Power management of the wireless interface
Two different approaches to energy savings for wireless communications are presently
available. Research has focused on dynamic transmission power control (TPC) of radio
interface and inserting power management logic into different layers of the OSI network
protocol stack . In the context of wireless communications, the dynamic power
management aspect focuses on when to switch the radio between different operational
modes to maximize time spent in low power states (transition strategy).
In general, energy consumption of the wireless interface depends on its operation state. A
sleeping interface is one, which has entered into a low power state and cannot receive or
transmit data. When the interface is awake, it can be said to be logically in transmit,
receive or idle/listening state. Most energy is consumed in the transmit state while in
receive and idle states energy consumption is nearly equal, because the interface must
sense the channel, even if idling. Different operational modes are illustrated in Figure 3.1.
Figure 3.1: Operation modes of a wireless interface
When sleep-wake transition strategy is utilized, one thing to consider is the energy
transient that is consumed when the device switches between sleep and active modes. A
wake-up time of, for example 2 ms would mean that a device can save power by entering
sleep mode only if it needs to wake up for its next transmission or reception at a time
considerably longer than 2 ms.
Previous studies suggest that wireless communication generates most of its energy waste
by 1) retransmitting packets after collision on the communication medium, 2) overhearing
traffic intended for another node, 3) handling protocol control packets, and 4) listening for
packets when there is no traffic on the network . Reducing these four actions at any
layer may result in better energy efficiency.
3.2.1 Energy efficiency of IEEE 802.11 WLAN
In this section we discuss those characteristics of 802.11 standards which can have an
influence on battery life. The main interest is to analyze the applicability of power saving
features of the original IEEE 802.11 standard  and its QoS enhancements defined in
802.11e  with VoIP traffic.
A common feature for all 802.11 MAC access schemes is that detection of transmission
errors is done by the use of positive acknowledgements (ACK); every successfully
received unicast frame must be acknowledged. If a transmission error occurs, the frame is
retransmitted until it is successfully received or the maximum number of retransmissions
has been reached. Transmission errors can happen either due to mere bit errors or as a
result of interference. Interference may happen when two or more stations (e.g. mobile
devices) try to access the wireless medium simultaneously (collision). That may happen if
transmitting stations are far enough from each other to not sense a busy medium before
starting sending. This is known as a hidden node or hidden terminal problem .
Naturally retransmissions increase the time spent in transmit mode and thus increase
energy consumption. The retransmission overhead can be lowered with efficient link
utilization, which is an important optimization area. Link adaptation strategies are
described later in chapter 3.2.2.
802.11 standards have specified two power saving techniques: the 802.11 legacy power
save and 802.11e U-APSD power save scheme. Both mechanisms define rules how a
station can enter the low power sleep state. Based on these rules, several power
management schemes have been proposed to enter adaptively power saving doze state at
Using power save mode requires the presence of a central entity, which buffers packets
destined to the sleeping stations. Thus the power save mechanisms of 802.11 standards are
applicable only in infrastructure networks where access points can buffer packets. Another
reason is that in ad-hoc networks, stations act as routers and it would be detrimental to the
operation of the network if some stations were sleeping.
The IEEE 802.11 standard defines a common platform for both power management
schemes and includes definition of two different power states for each station as follows
1) Awake state. Station has fully powered WLAN interface and consumes energy for
frame transmitting/receiving and channel sensing.
2) Doze state. WLAN interface consumes very low power and the station is unable to
send or receive any frames.
Additionally, the standard defines two power management modes. Active mode and power
save (PS) mode. When the station is in active mode, it can send and receive frames at any
time; it is in awake state. When operating in PS mode, the station is normally in doze state
and enters into the awake state to transmit/receive frames. When the terminal wants to
enter power save mode, it sends an 802.11 frame to indicate its current access point (AP)
about that intent. After that, the access point buffers transmissions to the terminal.
18.104.22.168 802.11 Legacy power save
When the terminal is in power saving mode, it switches the transceiver to a low power
sleep state and wakes up periodically to receive beacon messages from the access point.
Beacon messages indicate whether the access point has buffered frames for the terminal. If
there are, the terminal asks access point to send the buffered frames.
Access points broadcast beacons at regular intervals. That so-called “beacon interval” or
“beacon period” is measured in time units (TU) of 1024 µs. The Delivery Traffic Indicator
Map (DTIM) period specifies how often a terminal in power save mode should wake up to
listen buffered multicast and broadcast messages from an AP. The DTIM period is
measured in beacon intervals. A typical beacon interval length is 100 TUs and the DTIM
period varies between 1 and 3. For example, if the DTIM period is 2, the terminal wakes
up every second beacon interval to check for buffered frames. The terminal can fetch
buffered frames one by one by sending so-called PS-poll frames. 
In general, WLAN implementations do not support power save in ad hoc mode. In ad hoc
mode, all nodes in the network act as routers and repeaters for the other nodes. If some
nodes are sleeping, overall network performance would therefore not be acceptable.
The pitfall of the legacy power save is that it introduces delays to communication (AP to
station) while the station is operating in PS mode. If the beacon interval is set at the typical
100 TUs, the station will receive downlink frames in a burst once per 100 milliseconds.
This addition to the end-to-end delay may degrade service quality noticeably with VoIP
calls, where end-to-end delay should not exceed 250 ms . In addition, the bursty
reception of downlink frames sets high requirements for jitter buffer implementation so
that the rate of dropped packets would not increase significantly.
There are also other quality of service issues with legacy power save. The standard
mandates that the PS-poll frame must use best effort priority instead of the higher voice
priority because the station may not know the priority of buffered frames at the AP. Thus
downlink frames are granted only best effort priority, which may degrade service quality if
there is both data and voice traffic in the network. 
Due to the above mentioned problems legacy power save utilization is not recommended
for VoIP applications. However, if battery life is to be optimized at the expense of
interaction loss, VoIP application should utilize a high quality adaptive jitter buffer to
prevent dropped packets from occurring due to bursts in downlink traffic.
22.214.171.124 802.11e U-APSD (Unscheduled Automatic Power Save Delivery)
Unscheduled automatic power save delivery is part of the 802.11e WLAN QoS standard.
U-APSD improves QoS provided to stations, which use the EDCA mechanism for channel
access. The basic idea of U-APSD is to use a specific period called an unscheduled service
period (U-SP) for downlink frame delivery to the station. The station is expected to be
awake during U-SP. The station starts U-SP by sending either data or a null-data frame to
the uplink. Frame used to initialize the U-SP is called a trigger frame. When the AP
receives a trigger frame, it knows that station is awake and can start the downlink frame
delivery. The U-SP ends when the station receives data or a null-data frame with EOSP
(end of service period) field set to 1 in QoS control field. The maximum length of the U-SP
is defined by MAX SP (maximum service period) length, which indicates the maximum
number of frames the AP can deliver during a U-SP. 
Upon receiving a frame with EOSP set to 1 the station is permitted to enter the doze mode.
If there are more data frames destined to the station, the AP informs the station by setting
more data field in the MAC header to 1. When the station receives a frame with EOSP and
more data fields set, it may decide to start a new U-SP to get the remaining frames
immediately. If the station does not have data frames to send, the station can send a QoS
null frame to request the delivery of buffered frames.  This enables usage of U-APSD
with applications, which does not generate regular uplink traffic to meet QoS requirements
for the application. In this case the U-APSD operation is essentially the same as a legacy
power save operation (polling approach).
U-APSD has three main advantages with respect to 802.11 legacy power save mode :
1) Because triggers can be generated at any moment, the maximum delay for downlink
frame delivery can be bound based on application QoS requirements instead of depending
on the beacon listen interval configuration of the 802.11 power save.
2) The overhead required to retrieve frames from the AP is smaller, because the sent data
frames trigger delivery of downlink frames. Thus not so many null-data frames are needed
and a significant amount of PS-poll overhead required by the legacy power save is
3) A U-APSD capable AP can deliver up to a station the configured maximum service
period length number of frames per trigger compared to legacy power save, which requires
a PS-poll for obtaining every buffered frame.
From the VoIP point of view, the main advantage to the legacy PS is that the delivery of
downlink packets can be triggered at the appropriate moment while being in power-save
mode. That way interaction loss due to increased delay in power save mode can be bound.
This requires that the VoIP application is able to configure QoS requirements to the MAC-
layer software, which can then assure that delay requirements are met by sending trigger
frames at appropriate moments.
Application-level generation of trigger frames would have a much higher overhead due to
the addition of higher level protocol headers and would produce end-to-end traffic
increasing further network load. There would also be a risk that an inappropriately selected
trigger frame format poses interoperability problems when processed at the receiver side.
Application-level trigger frame generation is also problematic if downlink traffic is not
regular. In that case the AP may not have any buffered frames when trigger frames are
sent. The terminal needs to wake up at beacon interval anyway, and if blind polling is also
used, energy efficiency would decrease due to unsynchronized sleep-wake transitions.
3.2.2 Link adaptation strategies for energy-efficiency
Link adaptation refers to those automatic operations the mobile station is doing to adapt its
transmission parameters to the current conditions of the wireless channel. Typically an
adaptation criterion is maximum throughput, which is also beneficial from an energy-
saving point of view: good throughput is achieved by minimizing retransmissions and
protocol overhead. Basic parameters to adapt according to the channel condition are PHY
data rate, packet size, and transmission power.
The basic 802.11 standard does not define any procedures for rate switching or
transmission power control. Only edge conditions for the operation are defined. For
example, standard defines 4 and 8 power levels for DSSS and FHSS respectively. These
power levels are implementation-dependent. Standard also defines Received Signal
Strength Indicator (RSSI), which is used by some adaptation algorithms. 
Because standards are open on adaptation algorithms, numerous studies on the effects of
rate adaptation and transmission power control have been proposed. Algorithms measure
signal quality either directly by the signal-to-noise-ratio (SNR) or indirectly by observing
how many frames must be retransmitted. Some proposed algorithms rely on feedback from
the receiver. These proposals mostly do not conform to the 802.11 MAC standard as such,
because basic 802.11 standards do not define any protocol means for the receiver to give
feedback. There is, however, emerging 802.11h standard that gives feasible mechanisms
for feedback-based link adaptation when it comes to transmission power control .
126.96.36.199 Transmission rate adaptation
Data rate adaptation can save power by minimizing time spent in the transmitting state.
With good channel conditions time spent in transmit state per one frame is directly
proportional to the rate; the highest data rate is preferable. But the higher the data rate is
the more vulnerable transmission tends to be for bit errors. One of the reasons is the higher
intersymbol interference at higher rates.
The simplest approaches use a single parameter for link quality estimation. For example
Chevillat et al. has proposed an algorithm for link adaptation , which relies solely on
the 802.11 error recovery mechanism and keeps track of how many retransmissions are
needed. This proposal has a drawback in that it neglects hidden station problem: the
receiver may fail to receive frames due to interference with the transmitting station at the
range of the receiver. In this situation, retransmission count does not indicate the quality of
In  the authors propose a link adaptation strategy for improving the system throughput
by adapting the transmission rate to the current link condition. The proposed algorithm
uses Received Signal Strength (RSS) along with the number of retransmissions. The idea is
that when the receiver uses a fixed transmission power, RSS should be indicative of the
changes in path loss and channel behaviour. With that restriction, the method is applicable
for infrastructure and ad-hoc networks and all physical layer specifications. With RSS
measurements the algorithm is able to react to moving stations better than simpler methods
based purely on retransmission counts.
188.8.131.52 Adaptive packet size control
Adaptive packet size control mechanisms rely on the observation that when radio channel
conditions get weaker, the probability for bit errors in transmission grows. So when packet
sizes get bigger, the probability for the individual packet to get lost due to transmission
error increases. With WLAN that means that also MAC-layer retransmissions happen more
often, which has a detrimental effect on the energy efficiency of mobile device. Lettieri et
al. have studied MAC-layer packet size control mechanisms in  and .
WLAN MAC defines fragmentation and reassembly mechanisms which makes this kind of
link adaptation possible to implement at the MAC level by adapting MTU size
dynamically. Link layer fragmentation has an advantage in that only the lost fragment must
be retransmitted in contrast to network layer fragmentation where the entire packet must be
resent end-to-end if any of the fragments are lost. The drawback is that the relative
overhead increases when each data fragment is encapsulated with MAC & PHY layer
headers. Link layer adaptation may, however, boost speed over a single hop when the
wireless medium is noisy.
Packet size adaptation logic can be inserted into the application layer also, but with more
restrictions. The VoIP media engine only has control over how many audio frames are
encapsulated within one RTP packet, which makes adaptation resolution weaker than what
is achievable with the link layer approach. With sample-based codecs such as G.711 one
can in theory have a resolution of one sample, but this would most likely cause
interoperability problems. Also the protocol overhead would be too big in comparison to
the transmitted payload.
The application layer has the most information regarding the overall situation such as end-
to-end packet loss due to e.g. congestion in network, which suggests that overall
performance should be optimized at the application layer. Thus a promising optimization
option for energy efficient VoIP implementation is to adapt packet size according to the
application specific performance goals.
184.108.40.206 Transmission power control (TPC)
Controlling transmission power is a promising approach because the amplifier is a
significant source of power drain in any mobile device. In addition to saving energy, TPC
can have additional benefits such as an increase in network capacity due to lower occupied
ranges. The drawback is that even if the transmitting terminal gains access to the medium
more often with more efficient spectrum usage, TPC is shown to aggravate the hidden
node problem under contention-based MAC-access schemes like DCF .
TPC is more direct energy-saving approach than indirect packet size and transmission rate
adaptation methods. Authors have typically suggested joined TPC mechanisms where
transmit power is adjusted according to the packet size, data rate or channel state.
The easiest network configuration for applying TPC is an infrastructure network with PCF
access scheme, because then the hidden node problem is not a concern. Qiao et al. have
proposed a method where both transmit power and transmission rates are adaptively
chosen . The authors showed that combined adaptation saves considerably more
energy than pure rate adaptation schemes. That approach requires a feedback mechanism
for path loss estimation and thus the method is not suitable to implement in practice until
the 802.11h standard has emerged.
In  the authors propose fully 802.11 standard-compliant methods to jointly optimize
transmission power and data rate. The authors introduce algorithms for high-performance
and low-power operation modes. In high-performance mode, the optimization goal is to
transmit at the highest possible transmit rate so that system throughput is maximized. The
transmit power is of secondary importance. In the low-power mode, the optimization
strategy is to send with the lowest possible power, and the data rate is then adjusted
accordingly. It is shown that in the most cases the best choice is to send data frames with
the highest possible rate with rather high power levels.
One possible drawback of the methods defined in  is that results are based on the fact
that ACK frames are sent with maximum power. This may slightly reduce the applicability
of the method: in heterogeneous ad-hoc networks all stations do not necessary behave in
the same way. From the results it can be seen that improper step size selection for power
level adaptation algorithms can dramatically reduce throughput the reason being that a
large decrease in transmit power results in additional transmission errors. For that reason
the algorithm may not be applicable with all PHY specifications, because e.g. FHSS only
allows 4 transmit power levels, which results in quite large power level step sizes.
One way to minimize power consumption is to adapt transmission power according to the
packet size or vice versa. Simulation results of Ebert et al. suggest that small packets
should be sent with a low RF power while bigger packets should be sent with higher RF
transmit power . Small packets are less vulnerable to packet corruption which
decreases MAC-level retransmissions and makes energy-efficiency better.
Transmission power can be also adapted according to channel state information on short
time scales. Examples for this approach include  and . If energy-efficiency is to be
optimized, the basic approach is to not waste energy to overcome bad channel conditions.
Thus terminal should send short low power packets when the channel is noisy and continue
with high power, bigger packets when the channel is good again. Depending on the radio
bearer, the packetization period has an effect on the overall energy-efficiency. Since e.g.
WLAN continues to send MAC-level retransmission despite channel conditions, overall
energy-efficiency could be optimized by lowering the packetization period at application
layer when bad channel conditions are detected.
3.3 Power management of display
There is not much to do programmatically for the energy consumption of displays. In
practice, every mobile phone already takes advantage of dimming or switching off the
backlight when the user does not interact with the UI. Even more energy savings can be
achieved by completely switching off the display. Power consumption of the display can
also be reduced slightly by switching to monochrome mode or by lowering the refresh rate
. An unlighted display shows colours poorly anyway so switching to monochrome
mode can save power without degrading the user experience.
An application should redraw display as infrequently as possible and redraw only that area
of the screen that needs updating. Also, updating areas outside the screen should be
avoided if the rendered area may be larger than screen. When an application is moved to
the background, redrawing and other operations should be paused if relevant only when in
One approach for energy efficient display usage is zoned backlighting as proposed by
Flinn and Satyanarayanan . The authors define zoned backlighting as a display feature
that allows independent control of the illumination level for different regions of the screen
under software control. The idea is most usable in multi-windowing environments where
the application window in focus could be highlighted by dimming other windows.
Typically the applications of current mobile terminals cover the entire display, because
display sizes are quite small. However, partial displaying idea is used in the S60 SDK
screen saver framework, which allows defining of the active screen area where the screen
saver plug-in is going to draw during the next refresh period.
Telephony application domain specific optimization could be switching off backlight
and/or display immediately when a conversation starts and the phone is held against the ear
without waiting for a default user inactivity period.
4 ASSESSING VOIP CALL QUALITY
Adaptation strategies are often complex and include different trade-offs between desired
qualities. Tuning one parameter may lead to a performance increase in one particular area
but may have undesired impacts on the overall call quality. For example, decreasing the
dropped packet rate by increasing jitter buffer length improves listening-only speech
quality but may decrease conversational call quality due to increased mouth-to-ear delay.
For that reason, effects on the overall call quality should be taken into account when
making optimizations to the VoIP transmission path.
In this chapter we first study factors affecting call quality as perceived by the user and then
make a survey on relevant state-of-the-art methodologies for call quality assessment.
Finally, based on the survey, a quality model suitable for a real-time VoIP call quality
estimation is proposed.
4.1 Factors affecting perceived call quality
In this section factors affecting perceived voice quality are shortly described. Most
impairment to voice quality is introduced by the packet switched network and wireless
interface with time-varying capacity, packet loss rate, and delay. Of course, the quality of
the terminals acoustic processing has an impact on voice quality, but that is not considered
in this thesis.
There are two primary qualities, which affect perceived VoIP call quality: end-to-end delay
and speech quality. Primary qualities are further affected by several other parameters,
which depend on used VoIP configuration and network characteristics. Affecting
parameters are illustrated in Figure 4.1.
Figure 4.1: Parameters affecting VoIP call quality
End-to-end delay a.k.a. latency defines the time it takes a packet to get across the packet-
switched network to its destination. A long delay makes conversation more like a
radiophone discussion, because feedback from the other party doesn’t come speedily
enough. Interaction loss starts to happen gradually, when end-to-end delay exceeds 250 ms
. Longer delay may also cause an echo effect where the speaker hears his own voice.
Latency is a sum of many factors. Algorithmic delay of used codec, the packetization
period of frames, serialization of frames, network delay, and frame buffering at the
receiver all add some delay. Also, forward error correction (FEC) schemes (like e.g. RFC
2198 ) usually add delay, because the receiver must wait for error correction data
before playing data out.
In addition to technical latency measure, some psychological factors can have an impact on
how good the user experience is. People may be ready to bargain about latencies in certain
situations. People are, for example, happy with free VoIP calls even if delays may be
longer than in cellular calls. Longer talk times are also usually better than short end-to-end
VoIP Call Quality
End-to-end delay Speech quality
- network delay
- forward error correction
- codec & coding mode
- playout buffer size
- packetization rate
- codec & coding mode
- silence suppression
- acoustic processing
Dropped Packet Rate
- packet loss rate
- forward error correction
- playout buffer size
- protocol performance
delay when the user does not have the possibility to charge the battery for a while and has
important calls to make.
Speech quality. Codecs have a great impact on perceived voice quality. In addition to the
absolute coding quality, the ability to tolerate network introduced impairments differs
among codecs. Codecs have been typically designed to perform well in certain
environments. For example, AMR can adapt its bit rate according to current network
congestion, whereas the iLBC codec has been designed to perform well in a lossy channel
Silence suppression a.k.a. DTX may affect speech quality indirectly by lowering used
bandwidth and thus network load, which helps with impairments due to network
congestion. Silence suppression can also degrade call quality if total silence is not
compensated at the receiver. Typically VoIP capable devices generate artificial, low
volume noise during silent periods so that the user does not start to believe that the call is
dropped. That mechanism is known as comfort noise generation. With RTP, silence
description parameters for comfort noise generation at the receiver can be sent as specified
in RFC 3389 .
The acoustic processing capabilities of the device have an impact on speech quality. The
device must keep loudness at a pleasant level in various conditions (gain control).
Background noise should be eliminated from the input signal before the encoding process
to achieve a better encoding result. The device may also receive a delayed version of the
local speaker’s voice, which must be addressed by some kind of echo cancellation
Dropped packet rate. Defines the frequency with which a packet doesn’t get played in its
destination in available time. The term includes lost packets, packets dropped due to bit
errors and packets which come too late to be played out. PLC algorithms can compensate
for losses of distinct frames, but consecutive losses may cause the user to suffer from
Packet loss is mainly due to congestion and connection outages. Some network elements
may flush their buffers in congestion situations. For the same reasons, packets may come
too late for playout. A dropped packet rate can be reduced by proper adjustment of the
playout buffer and using some forward error correction scheme. Although FEC schemes
can help in packet loss and bit error situations, they may increase congestion and thus
packet loss, because redundancy sending consumes more bandwidth.
The playout buffer is a primary means for jitter and packet reordering compensation. Jitter
can be defined as a variance in a packet arrival times. Jitter distribution depends on both
the network path packets are traversing and competing traffic sharing the same path.
Competing traffic is the primary cause of jitter, resulting in varying queuing delays at the
intermediary routers. Packets may even arrive reordered at the receiver due to the different
paths packets may traverse. Finding optimal playout buffer size is a challenging task,
because a longer buffer increases mouth-to-ear delay, which makes conversational call
quality worse. A playout buffer should be adaptively adjusted according to the present
Protocol performance can have an impact on dropped packet rate, especially when a
required bandwidth is considered. Each protocol layer adds its own headers around the
voice payload to be delivered. This introduces a big overhead and can produce delays and
congestion in the network. This is a big issue especially with a low bandwidth radio bearer
such as GPRS, because the data rate remaining for payload is significantly lower. Header
compression, like e.g. Robust Header Compression described in , saves energy
consumed in communication but overall energy consumption may not decrease as much as
expected due to higher computational cost (computation – communication trade-off).
4.2 Approaches for voice quality assessment
In this section different ways to assess voice quality are categorized and methods suitable
for real-time VoIP call quality estimation are highlighted.
Two approaches for voice quality assessment presently exist: subjective tests and objective
measurements a.k.a. instrumental evaluation. Relating to subjective tests ITU-T has
standardized the process for human quality assessment . According to the process,
quality is described with a Mean Opinion Score (MOS) value, which ranges from 1 (bad)
to 5 (excellent). Subjective MOS grading has been found to be the most reliable technique
of speech quality evaluation so far. The drawbacks are that subjective tests require much
work, money, and time. Repeatability is also difficult due to the natural variance in human
grading. In addition, they cannot be used for long term or large scale voice quality
Instrumental methods predict human ratings by using mathematical quality models. Two
distinct approaches for objective evaluation exist: intrusive and non-intrusive based on
whether a reference speech is needed or not. Objective evaluation methods are calibrated
by data from subjective tests. The classification of speech quality assessment methods is
illustrated in Figure 4.2.
Figure 4.2: Classification of speech quality assessment methods 
Intrusive methods compare original speech fragments from the encoder with the degraded
versions of fragments at the other end of the voice path. Speech quality estimation is based
on the measured amount of distortion. Intrusive schemes are more reliable and better suited
for the measurements of quality as perceived by end users, because they use original
reference signal. For the same reason they are unsuitable for monitoring real time traffic.
One popular example of intrusive methods is the Perceptual Evaluation of Speech Quality
(PESQ) , which was standardized as an ITU-T P.862 recommendation in 2001. PESQ
Speech Quality Assessment
Subjective methods Objective methods
Non-intrusive methods Intrusive methods
is suitable for narrow-band speech quality evaluation and does not consider impairments
related to two-way interaction like variable delay and listener echo.
Non-intrusive methods assess the quality of distorted speech in the absence of the
reference signal and are better suited to the supervision of network QoS. ITU-T has
standardized recommendation P.563 for non-intrusive evaluation of speech quality in
narrow-band telephone applications . There are two categories of non-intrusive
methods: signal-based and parameter-based.
Signal-based methods estimate speech quality by directly analyzing degraded speech
signals. One example of signal-based methods is ANIQUE+ proposed by Kim & Taraff
. Most signal-based schemes are listening-only models and do not consider factors
affecting conversational quality such as mouth-to-ear delay. For that reason they are not
suitable for assessing conversational quality of real-time voice.
Parameter-based methods use network and/or speech related parameters like delay and
coding mode for speech quality estimation. They try to establish the relationship between
perceived voice quality and network/non-network related parameters. Typical methods
utilizing parametric approach are ITU-T recommendation G.107 (E-Model ) and
artificial neural network (ANN) models. One example of an ANN model has been
proposed by Mohamed et al. . The suggested model does not consider transmission
delay though and is thus unsuitable for conversational call quality estimation.
Another real-time quality model also considering conversational quality is proposed by
Sun and Ifeachor . The authors combine the PESQ and E-Model and describe a set of
linear equations that approximate perceived voice quality. The method is suitable for
adaptive QoS control for VoIP applications. Also Hoene combines PESQ and E-Model in
his quality model for adaptive VoIP applications but also takes into account playout
scheduling schemes .
Parameter-based approaches are the most suitable methods found so far for real-time VoIP
call quality estimation and therefore a more detailed description of some of them is done in
the following chapters.
4.2.1 The E-Model
The ITU-T E-Model is perhaps the most widely used parameter-based conversational
speech quality estimation method. With E-Model, conversational MOS scores can be
estimated from IP network and/or terminal parameters. The E-Model was originally meant
to be a transmission planning tool for telecommunication systems . Notwithstanding,
various methods for voice quality prediction rely on E-model nowadays , , ,
The fundamental principle of the E-Model is that psychological factors on the
psychological scale are additive . E-Model applies this concept and combines
individual impairments due to both speech and network characteristics into a single
measure of conversational voice quality called R-factor. The R-factor ranges from the best
value of 100 to the worst value 0. However, 70 can be considered as a minimal quality for
telephone calls.  The ITU-T recommendation G.109  defines the speech
transmission quality categories with R-value ranges as illustrated in Table 3.
Table 3: Speech transmission quality categories 
R-value range Speech transmission quality
90 ≤ R < 100 Best Very satisfied
80 ≤ R < 90 High Satisfied
70 ≤ R < 80 Medium Some users dissatisfied
60 ≤ R < 70 Low Many users dissatisfied
50 ≤ R < 60 Poor Nearly all users dissatisfied
The calculation of R-value is based on 21 input parameters that present network, terminal,
and environmental quality factors. The basic formula of E-model is as below:
AIIIRR EDS 0 (4.1)
R0 (basic signal-to-noise ratio) groups the effects of noise sources.
IS (simultaneous impairment factor) represents impairments occurring simultaneously with
speech like quantization noise, received speech level and side tone level.
ID (delay impairment factor) represents all impairments due to the delay of voice signals
and is further composed of talker echo, listener echo, and too long absolute delay
IE (equipment impairment factor) captures the effect of information loss due to coding
scheme, packet loss or uncompensatable jitter.
A (advantage factor) represents user willingness to accept quality degradation in return for
some advantage like ease of access or a free VoIP call.
The E-Model is based on fixed, empirical formulae, which make it quite reliable. The flip
side is that subjective tests are required to derive model parameters and thus the basic E-
model specification is applicable for a restricted number of codecs and network conditions.
Further subjective tests are required for new emerging codecs. , 
Another limitation of the E-model is that it relies on some static transmission parameters
(e.g. average mouth-to-ear delay and average packet loss) that do not change during run-
time. Thus it does not capture the effects of time varying impairments like delay variation
and may therefore give misleading results if used as such for a human rating prediction.
Actually ITU-T does not recommend the E-model as a tool for human rating prediction
. Limitations of the E-model regarding to the delay and packet loss variation have been
addressed by Annex B of ETSI TS 102 024-5 technical specification . Methodology
regarding effects of burst packet loss has been adapted from Clark’s work from year 2001
ITU-T P.VTQ is a non-intrusive parametric VoIP call monitoring standard, which was in
method selection phase at the time of writing this thesis. According to Takahashi et al.
competing schemes to be included in the standard are VQMon and PsyVoIP . Both
approaches use parameters from RTP- and RTCP-streams to compute voice quality
impairments and may be integrated to the E-Model framework.
VQMon methodology calculates equipment impairment factor IE taking into account burst
packet loss situations as proposed by Clark . VQMon supports narrow and wideband
codecs, as well as listening and conversational quality estimation. It is productized by
PsyVoIP takes into account differences between VoIP devices and allows calibration of the
call quality estimation model with PESQ. This way proprietary jitter buffer and error
concealment implementations are taken into account . PsyVoIP is a product of
4.3 Quality model for VoIP call quality estimation
The main requirement for a quality model is that it must be good enough to reliably show
the effect of optimization schemes. The quality model does not need to consider acoustic
capabilities of the terminal, because they remain constant. Optimization schemes
considered in this thesis only affect transmission behaviour and wireless link utilization.
Receiver side behaviour and acoustic processing capabilities remains fixed. Therefore, call
quality estimation based on transmission qualities is enough when impairments introduced
by different optimization schemes are compared.
According to the general requirements above, the following criteria for a quality model
- is a fully automatic, instrumental method suitable for passive quality monitoring
- is free from license feeds and patents
- has low computational cost and delay, works at run-time
Though the E-Model has its weaknesses, an E-Model based monitoring tool meets the
above mentioned requirements and is a suitable tool when comparing the impact of
transmission impairments introduced by different VoIP optimizations on conversational
Terms R0, IS, and A from equation (4.1) consists of several parameters which do not depend
on packet transport. Therefore, the most relevant factors in the context of VoIP application
are ID and IE. When G.109 recommended default values for non-relevant parameters are
used, the equation (4.1) can be simplified to cover the effect of transport level quantities
only (R0=94.77, IS=1.41, A=0):
ED IIR 36.93 (4.2)
Refer to appendix 1 for detailed derivation of the formula.
Delay impairment factor ID includes talker echo, listener echo, and absolute delay
impairments. We further simplify the estimation method and assume perfect echo
cancellation in which case impairment factor ID can be reduced to the following expression
as shown by Cole & Rosenbluth :
)3.177()3.177(11.0024.0 dHddID (4.3)
Where d is mouth-to-ear delay (absolute delay) in milliseconds and H(x) is the heaviside
function defined as:
H(x) = 0, if x < 0, else H(x) = 1 (4.4)
For impairment factor IE some values for limited set of codecs and packet times with
certain packet loss distributions are available in G.113 specification . More extensive
measurements of impairment factors are needed so that the equation (4.2) would be
applicable in the call quality monitoring application.
5 OPTIMIZATION SCHEMES FOR MOBILE VOIP
In this chapter the most promising application level optimization schemes for VoIP are
described and considered in detail based on theory and previous work on the subject. The
applicability of schemes is discussed by considering limitations set by standards and
protocol and interoperability issues.
Proposed schemes use channel state and network conditions as input parameters and
produce behaviour, which tries to save battery and maintain acceptable call quality.
5.1 Adaptive packet rate control at application layer
Bigger packets incorporate less protocol overhead, which is more energy-efficient if
transmission conditions in the wireless network are good. When the packet rate is lowered,
the terminal also has more time to spend in power save state. When wireless link quality is
bad, it is more probable that smaller packets are get through without bit errors and thus
should be preferred if link layer uses packet retransmission logic in an error situation. Bit
errors occurring in wired network have no effect on energy-efficiency when protocols
above link layer do not incorporate packet retransmission logic.
In addition, congestion of the transmission path must also be taken into account. Mahlo et
al. have shown in their study that bigger packets should be used in congestion situation
. They showed also that lowering packet rate is enough when link technology with a
high packet switching overhead like IEEE 802.11 is used.
Clues about network congestion status are available through RTCP receiver reports if
RTCP is used (which is in most cases true with VoIP calls). Basic RTCP defined in 
provides directly only a “cumulative number of packets lost” and “inter-arrival jitter”
metrics, but RTCP XR  can provide more detailed QoS statistics relating to VoIP call
From a call quality monitoring point of view, the difficulty is how to measure mouth-to-ear
delay at runtime. Network delay can be estimated by calculating round trip delay from an
NTP timestamp of basic RTCP receiver reports . Getting an estimation of the end-to-end
delay is achieved by halving the round trip time. However, this is only a rough
approximation for end-to-end delay because network path may have asymmetric delays.
Also, delay metrics for RTP are likely to be different from RTCP metrics due to a higher
utilized bandwidth. Further, mouth-to-ear delay cannot be calculated without knowledge of
jitter buffer size at the receiver. Again, RTCP-XR reports include more comprehensive
statistics like delay metrics and jitter buffer parameters . Based on this information,
mouth-to-ear delay can be calculated more reliably.
In summary, one should usually use as low a packet rate as possible when considering
VoIP over IEEE 802.11. The only exception to this rule is operation under bad channel
conditions in which case smaller packets should be used.
To be able to adapt packet rate we should have information about current wireless signal
quality both at the sender and the receiver and network congestion status. At the time of
writing there was not a standard-compliant mechanism to give feedback about signal
quality at the receiver. Without that information we cannot know if packet loss or jitter
reported by the receiver is due to network congestion or bad channel conditions at the
receiver end. If packet size is simply adapted according to the feedback from the receiver
we are likely to make a situation worse due to conflicting adaptation strategies in these
scenarios. Without call quality concerns we could (from a local power-save point of view)
adapt packet size according to our own wireless network signal quality and battery state.
The signal quality detection problem is noted in previous studies but a standard solution
does not yet exist. An example of a non-standard solution to the problem is one proposed
by Yoshimura et al. . In their proposed solution there is an RTP monitoring agent in
the boundary of wired and wireless network that sends RTCP reports regarding quality
information of the wired network. The sender can distinguish network congestion from bad
wireless link conditions by comparing RTCP reports from the monitoring agent and from
In this thesis only a simple rate control algorithm is proposed because reliable information
regarding signal quality at the other end is not available. Absolute delay metrics are not
available either without RTPC-XR reports and thus no optimization scheme can take
conversational call quality into account. So, the basic idea is to optimize packet size purely
for energy-saving and make a fallback to normal operation if RTCP reports indicate
deterioration in transmission quality. That way some energy saving advantage can be
achieved without making call quality too bad compared to an unoptimized operation.
The proposed algorithm is a modified version from the one presented by Barberis et al.
. The algorithm runs at the sender and uses information carried with cyclic RTCP
receiver reports. The sender starts with a configured minimum packet size. If RTCP
statistics indicate no problem and local wireless signal quality is good, packet size is
increased by the codec’s frame size. This is repeated always when a new RTCP report is
received until the configured maximum for a packet size is reached. If RTCP statistics
starts to show some problems or local wireless signal quality deteriorates, fallback to
normal operation is done and packet size is accordingly lowered to the configured
minimum. Adaptation decisions are based on highest acceptable packet loss and jitter
thresholds. Some examples for threshold values can be found from .
A flow chart of the algorithm is presented in Figure 5.1.
Figure 5.1: Flow chart of the adaptive packet rate control algorithm
size by frame
packet size to
Application layer packet size adaptation is partially frustrated if packets are fragmented in
the network layer (IP-stack) or link layer software. In that case packet size control
performed at the application layer does not have an effect on transmission efficiency over
the first hop. Better power save state utilization is still, however, possible if the packet rate
When fragmentation alternatives are considered, one should notice that network layer
fragmentation has a disadvantage that reassembly is done by the final destination. If any
fragments are lost, the whole packet must be retransmitted. On the contrary, if
fragmentation is performed at the 802.11 link layer, only the missing fragment needs to be
retransmitted. Thus link layer fragmentation could be a better alternative when local
channel conditions are not good. In addition, adaptation can be done with higher resolution
at link layer which can make energy-efficiency better and boost speed over a single hop.
Link and application layer adaptations can, however, be used as complementary strategies,
because at the application level we can affect to network congestion, and at link layer we
can optimize transmission according to local channel conditions.
The disadvantages of the packet rate control approach are increased mouth-to-ear delay
and possible interoperability problems if other end is not able to handle abnormal packet
sizes. However, the majority of implementations should be able to handle different packet
times up to 200 ms as specified in  unless the operation is restricted by negotiation. A
maximum packet time can be negotiated with SDP for some audio payload formats like
e.g. iLBC .
5.2 Adaptive transmission time determination
In  the authors introduce the idea of optimal transmission time determination. In this
method uplink and downlink traffic are synchronized so that the time spent in power save
mode is maximized. The study is done for infrastructure mode network and using legacy
802.11. The study assumes that a terminal can enter sleep mode immediately after sending
the packet. In practice that is not true, but rather state transition may include delay.
Another drawback of the proposed method is that it does not take into account the jitter of
incoming data flow. Further, the algorithm expects symmetric data flow, and DTX used by
a remote party is likely to confuse the proposed algorithm. With legacy 802.11, the
application layer does not have the possibility to affect the time when WLAN software
The proposed method does not work with U-APSD power save either, because downlink
packets are received only when sending some data to the AP. Actually, U-APSD makes the
proposed scheme needless, because in a way U-APSD uses the same synchronization idea
of uplink and downlink traffic. Buffered frames are sent as a result of uplink traffic
However, the method inspires to the idea where uplink packets are buffered and sent in
bursts instead of making the packetization period longer. U-APSD enables the terminal to
enter sleep mode immediately after uplink packets are sent and pending packets fetched
from AP . This way overhead due to rapid state transitions to sleep mode and back can
be lowered, which saves energy.
Bursty sending may have a negative effect upon call quality if the receiver is not able to
handle bursts. Compared to the packet rate control approach, increased overhead due to
protocol headers may compensate for any possible advantages. The network may also get
congested during bursty periods. One advantage compared to the packet rate control
method is that sending packets in bursts does not require receiver support for larger packet
5.3 Throughput optimization by codec and codec mode adaptation
As we have noticed earlier, one should use as small packets as possible in bad channel
conditions to minimize bit errors and energy consuming retransmissions. Retransmissions
can be decreased with the packet rate adaptation approach as far as it goes, but a codec’s
frame size sets a limit for adaptation resolution. However, resolution can be further
increased with the codecs that allow dynamic bit rate adjustments (codec mode
adaptation). Possible coding rates for some common codecs are presented in Table 1.
Another alternative to reduce packet size is switching to the less bandwidth consuming
codec (codec adaptation).
One example of a codec mode adaptation technique for speech transmission over 802.11
wireless packet network is presented by Servetti and De Martin . The speech coding
rate is adapted according to instantaneous wireless channel conditions. As an example, the
variable bit rate codec AMR is used in the simulations. The proposed rate selection
algorithm uses only two rates. In good channel conditions the maximum bit rate 12.2 kbps
is used and in bad channel conditions the output rate is decreased to the minimum 4.75
kbps. According to the simulations, the proposed adaptation strategy outperforms
consistently the constant bit rate approach. Packet loss rates and end-to-end delays were
both decreased with the adaptation scheme. A possible drawback of the approach is that
speech quality may be decreased unnecessarily much due to the very limited set of allowed
rates. By allowing intermediate rates in moderate channel conditions, speech quality could
be maintained at a better level. On the other hand, wireless signal strength changes tend to
be sudden and more drastic lowering of the bit rate could be a good option.
In this thesis a modified algorithm based on the study in  is presented. When channel
conditions are good, the bit rate is changed to the configured maximum. Under bad channel
conditions the bit rate is changed to the configured minimum. Between mode changes, the
configured or negotiated minimum time is waited. For example, the AMR payload format
specification defines a negotiable “mode-change-period” parameter . Suitable
threshold values for deciding good and bad channel conditions must be investigated
experimentally with a target system. If signal quality is moderate and neither threshold is
achieved, no action is taken. A low chart of the algorithm is presented in Figure 5.2.
Figure 5.2: Flow chart of adaptive coding rate algorithm
The advantage of this approach compared to the packet rate adaptation mechanism is that
mouth-to-ear delay is not increased. Moreover, packet loss rate and jitter due to the
variable amount of retransmissions needed are both decreased under bad channel
The proposed scheme works well only with codecs which have the built-in capability for
bit rate adaptation during session without out-of-band negotiation. For some codecs,
alternative bit rates can be negotiated, but not changed in the middle of the session without
re-negotiation, e.g. iLBC . AMR is a good example of a multi-mode codec suitable for
this kind of adaptation. AMR implementations should be able to handle mode changes to
arbitrary modes at the resolution of one frame time unless restricted through the
The AMR payload format specification defines the operation of mode adaptation so that
the decoder has control over which mode should be used by the encoder. The desired mode
is signalled to the sender (encoder) by using the so called Codec Mode Request (CMR)
field in outgoing packets. Thus source side mode adaptation according to the local signal
quality is not possible if the peer wants to receive with a certain mode. However, the
configured bit rate
wait for minimum
mode change period
configured bit rate
specification recommends that IP terminals should not set CMR and so transmission
adaptation should be available in many cases. 
Throughput can be optimized also by switching to a less bandwidth consuming codec
when signal quality deteriorates. With SDP, several alternative media formats can be
negotiated during initial session setup . That allows dynamic media format changing
during the session without re-negotiation. Unfortunately, hardware mobile phones have
limited processing and memory capacity and do not support media format changing on-the-
fly in general. Most implementations select only one media format to be used in the session
in which case this option is unavailable. However, the option may be available when the
peer is a softphone running on a personal computer.
Another option for codec changing with SIP is to issue re-invite with modified parameters.
If the new offer is unacceptable by the answerer, the old parameters should remain in place
according to the SDP offer answer model . Thus it should be safe to try session
modification, though it may be the case that the new offer is rejected.
6 MEASUREMENTS AND EVALUATION OF THE
In this chapter, a test setup for energy-efficiency measurements is described and
measurement results are evaluated. Only basic power consumption measurements were
done due to unavailability of certain features on tested software and the lack of a suitable
6.1 Measurement setup
Energy-efficiency was measured in a U-APSD capable infrastructure network.
Measurements were done using the AMR codec with 20 ms, 40 ms, and 60 ms packet
times with maximum bit rate 12.2 kbps. In addition, the effect of bit rate changing was
tested by measuring power consumption with two different bit rates, 4.75 kbps and 12.2
kbps, using 20 ms frame size.
Power consumption measurements were done programmatically with a Nokia Energy
Profiler tool using a Nokia N95 S60 mobile phone as hardware. That tool allows real-time
profiling of the energy consumption in the target device. The tool can be used in devices
with Nokia S60 3rd
edition, Feature pack 1 and onwards. The Energy Profiler and related
documentation is available on Forum Nokia .
Comfort noise and DTX were disabled during measurements to assure continuous data
flow between terminals. Undeterministic gaps in data flow would otherwise obscure
measurements, because a test run utilizing more DTX would have more time to spend in
power-save mode. As a test signal some speaking was performed at each phone in turn
until five minutes was elapsed. Speaking was performed to estimate conversational call
quality; the contribution of speaking to power consumption is minimal, because with DTX
disabled, the speech payload is transmitted even during periods of silence.
The impact of the display’s backlight to power consumption was also measured by letting
the backlight go off after about 30 seconds. Only the VoIP application and energy-profiler
application were kept running during measurements to minimize the impact of other
All tests were executed under good channel conditions with an AP dedicated to the test
without interfering traffic.
6.2 Measurement results
In Figure 6.1 power consumption with 20 ms packet time and a minimum narrowband
AMR bit rate (4.75 kbps) is presented. From the results it can be seen that contribution of
the backlight to overall power consumption during VoIP call is approximately 20 mA (230
mA with backlight on versus 210 mA with backlight off). That makes about a 10%
reduction in power consumption when the backlight is kept off during the call.
U-APSD ON, 4.75 kbps, packet time 20ms
0 22 42 63 85 108 131 152 172 193 218 240 262 282 304
Figure 6.1: Power consumption with AMR codec with 20 ms packet time and 4.75 kbps bit rate
Speech quality was quite bad due to the low bit rate but understandable anyway. This low
bit rate is recommended only when channel conditions are so bad that the needed
retransmissions due to transmission errors can be reduced by lowering the bit rate.
In Figure 6.2 power consumption with 20 ms packet time and a maximum narrowband
AMR bit rate (12.2 kbps) is illustrated. From the results we see that increased processor
load with higher bit rate does not increase overall power consumption during VoIP call
noticeably. Power consumption was the same as with the minimum bit rate. Thus bit rate
changing is not an effective way to reduce power consumption when channel conditions
U-APSD ON, 12.2 kbps, packet time 20ms
0 22 43 64 86 109 130 152 174 195 216 237 259 281 304
Figure 6.2: Power consumption with AMR codec with 20 ms packet time and 12.2 kbps bit rate
Speech quality was very good compared to speech quality observed with minimum bit rate.
Also conversational call quality was good and mouth-to-ear delay was not noticeable with
20 ms packet time and 12.2 kbps bit rate. The maximum bit rate should be used unless
peer/gateway requests for another codec mode or local wireless channel conditions are bad.
In Figure 6.3 power consumption with 40 ms packet time is presented. From the diagram
we can see that power consumption is about 10 mA less when compared to consumption
with 20 ms packet time (200 mA versus 210 mA with backlight off). As a percentage
reduction is about 5%. With N95 8GB: s default 1200 mAh battery about 17 minutes
longer talk time is achieved compared to talk-time with 20 ms packet time (5h 43 min.
U-APSD ON, 12.2 kbps, packet time 40ms
0 19 41 61 81 103 124 144 166 186 207 228 248 270 292
Figure 6.3: Power consumption with AMR codec with 40 ms packet time and 12.2 kbps bit rate
Subjective estimation showed no degradation in conversational call quality with 40 ms
packet time compared to quality with 20 ms packet time. 40 ms packet time can be
normally used without making mouth-to-ear delay too bad, but it is recommended that
there is call quality monitoring logic in place when anything other than default packet time
is used. Also, packet time should be lowered when wireless channel conditions get weaker.
Finally, power consumption with 60 ms packet time was measured. Results are illustrated
in Figure 6.4. As expected, there was a further reduction in power consumption compared
to the measurement with 40 ms packet time. The reduction in power consumption was
same as between 20 ms and 40 ms packet times, which is 10 mA.
U-APSD ON, 12.2 kbps, packet time 60ms
0 20 40 61 82 102 123 144 166 186 205 226 246 267 288
Figure 6.4: Power consumption with AMR codec with 60 ms packet time and 12.2 kbps bit rate
Speech was understandable and not cracking with 60 ms packet time and subjectively
estimated, conversational call quality was still good. However, this big packet size is
susceptible to network delay variations and bad channel conditions, and should not be used
unless reliable QoS monitoring and signal quality observation logic is available. If network
conditions can be discerned as good, 60 ms packet time can be used quite safely. The user
is unlikely to notice an extra 40 ms delay compared to the operation with 20 ms packet
time. Instead, the user is probably pleased with half an hour longer talk time compared to
talk time with 20 ms packet time (5h 43 min. versus 6h 19 min. with 1200 mAh battery).
7 CONCLUSIONS AND FUTURE WORK
The main objective of this thesis was to examine how a VoIP application could adjust its
operation to present conditions in the environment in power-aware manner. Besides battery
life, another big factor affecting VoIP call QoE is perceived call quality. These factors
interrelate and optimizations to one area can pose degradation problems in another. For
that reason, energy-saving means were studied together with the factors affecting perceived
In this thesis we focused on programmatic, application-layer means for better energy-
efficiency in the mobile VoIP application domain. Besides application-layer means, we
discussed also improvements in the other layers of software. We concentrated on
optimizations which are suitable for mobile VoIP-implementations utilizing SIP and IEEE
We identified three relevant optimization areas for mobile VoIP applications. First,
attention should be paid to general software efficiency, secondly wireless interface
utilization can be made more energy-efficient, and finally display usage can be optimized.
We conducted a literature survey on these areas and provided a theoretical background on
what can be done in the field of energy-management. Basically, all adaptation strategies
aim to decrease power consumption by more efficient resource usage.
Based on previous studies, we identified factors affecting perceived call quality. We also
made a survey on different voice quality estimation approaches. Based on the theory and
previous work, we presented a theoretical quality model for real-time call quality
estimation. The quality model can be integrated to the optimization scheme in question. To
be able to utilize the quality model on the transmitter side, some detailed statistics
regarding VoIP call are needed from the audio receiver. The recommended, standard
compliant way to create a feedback channel is to implement support for RTCP-XR reports.
Statistics calculated from basic RTCP would not be reliable enough for call quality
For the quality model, the contribution of delay impairment factor ID to overall call quality
can be modelled universally, but for the equipment impairment factor IE no available
comprehensive public measurement data exists. More extensive measurements of
impairment factors with different codecs and network metrics are needed to be able to
adapt operation in a call quality aware manner.
We identified some potential optimization schemes for better energy-management in the
wireless interface utilization area. Schemes covered transmitter side adaptation
mechanisms and were related to dynamic changing of transmission properties according to
the current network conditions. When call quality is considered, we must observe whether
call quality is deteriorating due to any optimization attempt. We found that quality
conserving adaptation decisions cannot be done merely based on packet loss or jitter
metrics, because those metrics can indicate either network congestion or weak signal
strength conditions at the receiver end. In these scenarios adaptation strategies will conflict
with each other. Without signal strength knowledge, adaptation can only be based on trial
and error so that when one strategy seems to fail, another one is tried.
We found that some optimization means are better to implement below the application-
layer. For example, throughput optimization with link-layer fragmentation can save energy
without deteriorating call quality when the local signal strength is weak. However, we
showed that optimizations in different architectural layers can be complementary to each
other. For example, dynamic transmit power control can save power regardless of
application-layer optimizations. The drawback is that this particular technique is possible
only if 802.11h is supported.
In this thesis only basic power consumption measurements were done and more
comprehensive measurement data is needed to find out suitable thresholds for the proposed
algorithms and make them reliable. Measurements should be done in a controlled
environment where different network properties like jitter, packet loss or signal strength
can be adjusted. Nonetheless, we have shown in this study that adaptive optimization
schemes can reduce power-consumption.