VoIP in 3G Networks: An End-to-End Quality of Service Analysis
VoIP in 3G Networks: An End-to-
End Quality of Service Analysis
Renaud Cuny1, Ari Lakaniemi2
Nokia Networks Nokia Research Center
P.O.Box 301, 00045 Nokia Group, Finland P.O.Box 407, 00045 Nokia Group, Finland
Abstract-- This paper presents the results of a Quality of interactive communication sessions between users . Such
Service (QoS) study for VoIP service over 3G WCDMA sessions can include voice, but also e.g. video, chat, interactive
networks. An end-to-end simulation platform has been used for games, and virtual reality. Finally, the convergence towards
this purpose. The simulations have been run using Adaptive packet switched and IP technology may convince mobile
Multi-Rate (AMR) speech codec at 12.2 kbit/s with combination operators to go for solutions that are truly all-IP in order to
of RTP, UDP and IPv6 protocols. The simulated transmission simplify network interconnection and network management.
path includes two radio links (uplink and downlink), connected
Naturally, for wide end user acceptance and deployment, the
with a packet switched core network and UTRAN Radio Access
VoIP service is required to provide similar perceived voice
Networks with several different radio transmission conditions.
quality as provided by current highly optimized GSM
Furthermore, RObust Header Compression is applied in both
networks. The challenges for achieving this include typical
radio links. The results include buffering statistics, end-to-end
VoIP related QoS problems, such as packet loss, delay, and
delay estimates, and packet loss statistics.
delay variation (i.e. jitter), as well as additional overhead
brought by the VoIP protocol stack. Therefore the end-to-end
VoIP QoS should be studied and evaluated carefully. As an
During the last few years, the voice over data network example, it is likely that the packet switched technology,
services have gained increased popularity. Quick growth of the although managed by e.g. Differentiated Services , will
Internet Protocol (IP) based networks, especially the Internet, generate more delay and jitter than the circuit switched
has directed a lot of interest towards Voice over IP (VoIP). technology. Further additional delay and jitter may be caused
The VoIP technology has been used in some cases, to replace by the packet segmentation in the radio interface. The end-to-
traditional long-distance telephone technology, for reduced end delay is likely to be close to the maximum delay still
costs for the end-user. Naturally to make VoIP infrastructure providing acceptable conversational quality (around 250-
and services commercially viable, the Quality of Service 300ms ), extra attention needs to be paid to jitter: too much
(QoS) needs to be at least close to the one provided by the jitter for a voice stream may be problematic since basic jitter
Public Switched Telephone Network (PSTN). On the other compensation methods may not apply very well or have
hand, VoIP associated technology will bring to the end user limited effects. So one important issue to investigate is
value added services that are currently not available in PSTN. whether the jitter in 3G networks, will have negative impact on
On the other front, the current development in the cellular the end-user perceived voice quality.
radio network technologies are paving the way towards IP This paper is organized as follows. Section II presents in
capable radio networks. The so called Third Generation (3G) details the end-to-end VoIP simulator used for this study. Each
cellular networks, developed and standardized by the Third component of the tool is described in detail. Section III
Generation Partnership Project (3GPP), will provide IP over presents the simulation results focusing on packet loss ratio
wireless services, enabling therefore also VoIP. In current and end-to-end delays. Finally, the conclusion in section IV
cellular systems, e.g. in GSM, the telephony service is based summarizes the main finding of this study and points out the
on circuit switched approach. This service is currently highly areas that could be investigated further.
optimized for transmission of voice, thereby providing good
speech quality and good spectral efficiency. However, carrying II. END-TO-END VOIP SIMULATION
VoIP will be also possible in 3G WCDMA networks, e.g.
Protocols used by the VoIP over 3G can be roughly divided
3GPP release 5, and may be of special interest for the mobile
into two categories: signaling related protocols and media
network operators for multiple reasons: Firstly, as the
related protocols. Although the signalling protocols, such as
bandwidth for individual flows in packet switched domain is
SIP, are very important part of a VoIP system, in this study we
not reserved in advance, the multiplexing effects should bring
concentrate only on media related protocols and transmission
significant capacity savings. Secondly, VoIP service will be
of media data.
supported by the Session Initiation Protocol (SIP), which is a
text-based protocol, similar to HTTP and SMTP, for initiating
To run the end-to-end simulations we developed a VoIP RObust Header Compression (ROHC) protocol  has been
speech simulator application for modeling the telephony developed to tackle this problem. ROHC provides link-based
application and protocol layers from application down to IP compression of IP/UDP/RTP headers, in best case down to 1
and PDCP. The lower layers required for radio link and core byte. The effective compression makes use of the fact that
network modelling were simulated using external simulation majority of the fields in the combined IP/UDP/RTP header
tools and the resulting network conditions were applied in the either remain constant or introduce constant change throughout
VoIP speech simulator using error pattern files. The different a session. However, the maximum compression mentioned
components of the simulation chain are described in detail in above can only be reached when imposing some limitations, a
the following subsections. more typical compressed header size would be three or four
bytes. The ROHC operation is based on synchronized
A. Speech application compression (at the sender site) and decompression (at the
On application level we assumed usage of Adaptive Multi- receiver site) contexts. The decompression context is
Rate (AMR) speech codec, which is a mandatory codec for initialised by transmitting full IP/UDP/RTP headers in the
conversational speech services within 3G systems. For all beginning of the session. Also irregularities in the transmitted
simulation runs we selected usage of AMR 12.2 kbit/s mode stream e.g. by DTX operation or lost packet can introduce
with DTX functionality enabled, and employed bandwidth compressed headers slightly larger than in the optimal state. In
efficient mode of the AMR RTP payload format. This implies error prone transmission conditions a feedback mechanism is
that during talk spurts the source generates 32-byte speech important part of robust compression operation, enabling
payload at 20 ms intervals, while due to DTX during silence recovery in case the synchronization between compressor and
periods we will have 7-byte payload carrying Silence de-compressor is lost. The ROHC protocol was implemented
Descriptor (SID) frame at 160 ms intervals. in our simulator.
We further assumed the typical VoIP protocol stack The ROCH in R-MODE is assumed on both radio links,
employing Real-Time Transport Protocol (RTP) encapsulated providing feedback mechanism to enable safe convergence to
in User Datagram Protocol (UDP), which is further carried by optimal compression state. We also assume that ROHC
the IP. The combination of these protocols introduces total of Context Identifier is transmitted as a part of the compressed
40 bytes header data when using IP version 4 (IPv4), and bytes packet. These settings imply that the minimum size of a
header when using IP version 6 (IPv6). We selected IPv6, compressed IP/UDP/RTP header is four bytes.
which has two implications: the size of an IP packet carrying
C. Radio network modeling
one AMR frame will be either 92 bytes (speech) or 67 bytes
(SID), and we need to enable UDP checksum because the IPv6 The model for the radio network included the actual radio
header does not include a checksum of its own but the most link, processing in layers below PDCP and access transport in
critical fields of the header are covered as part of the UDP UTRAN. The radio link error patterns were prepared using a
pseudo header. separate WCDMA system simulator. Three different radio
conditions were investigated, introducing frame error rates
Protocol layers below IP follow the 3GPP release 5
(FER) of 1%, 3% and 5%. Additionally we also included
specifications, as illustrated in Figure 1.
error-free case in the set of simulation conditions. Different
Application error patterns were prepared for both uplink (UL) and
E.g., IP , E.g., IP, downlink (DL), and the error patterns were obtained from a
traced terminal that was moving along a predefined route.
For UL radio network we assumed processing and transport
PDCP PDCP GTP-U GTP-U GTP-U GTP-U
delay of 36 ms, and for DL radio network the corresponding
RLC RLC UDP/IP UDP/IP UDP/IP UDP/IP delay is 49 ms. Note that 36+49=85 ms is the lower limit for
MAC MAC L2 L2 L2 L2 the time before the ROHC compressor can receive a feedback
L1 L1 L1 L1 L1 L1 message from the decompressor regarding a specific packet.
Uu Iu-PS Gn Gi
MS UTRAN 3G-SGSN 3G-GGSN This delay is significant in such a way that in beginning of a
stream the ROHC decompressor context needs to be initialized
by sending full headers, which will be sent until a feedback
Figure 1 3GPP Protocol stack message indicating successful decompressor context
initialization is received. A similar situation can occur also if
B. Robust Header Compression (ROHC)
the decompression context gets corrupted for some reason, e.g.
When operating in the bandwidth limited 3G networks it is hard handover or excessive amount of transmission errors.
important to use the radio band as effectively as possible, and However, for this work we assumed that no ROHC
header overhead up to 60 bytes can seriously degrade the decompressor context re-initialization is required during a
spectral efficiency of a VoIP service over such link. The session.
D. RLC, MAC and PDCP layers packets is needed to ensure continuous data flow between
asynchronous input and synchronous output. In VoIP this kind
The WCDMA unacknowledged radio mode is the natural
of jitter buffering plays an important part in the overall speech
choice for transmitting the VoIP packets over the radio link.
quality. The basic approach to jitter buffering is to wait for a
This mode provides possibility for segmentation and padding
predetermined time after the reception of the first packet
of IP packets into radio Time Transfer Intervals (TTIs) to
before playing out the frame carried by this packet. The
make best possible usage of allocated radio resources. The
purpose of the playout delay is to allow some variation in the
radio bearer was configured 16 kbit/s; with TTI length of 20ms
arrival times of subsequent packets. Frames arriving after their
this enables transmission 40 bytes of user data at 20ms
scheduled playout time are discarded and in the speech
decoder point of view they are lost frames. Naturally in this
E. Packet switched domain modelling approach the predetermined buffering delay is the most
important factor of the buffering performance: too short
In the packet switched domain we considered the following buffering delay will risk buffer underflows when packets do
delay components: Delay in the IP backbone, delay in the not arrive in time due to jitter, and on the other hand too long
gateway elements (SGSN and GGSN) and delay in IuPS buffering time introduces unnecessarily long delay and can
interface. Typically, the backbone elements (IP routers) and also introduce buffer overflows.
gateways may introduce some jitter to VoIP traffic, depending
on the load in the network. However appropriate traffic However, for this study we configured the jitter buffer in
prioritisation (e.g. based on Differentiated Services) can limit receiving terminal in such a way that no frames were
the queuing delay (and thus potential jitter) to specific values discarded, neither due to late arrival nor due to buffer
defined by the operator. overflow. The main reason for this choice was the aim to
concentrate on the QoS issues that are dependent on the
We modelled this kind of PS domain structure to generate a network.
delay distribution file for a stream of 30 000 packets
transmitted at 20 ms intervals. The resulting delay distribution When considering VoIP traffic over a wireless 3G network,
is illustrated in Figure 2, and it introduces 19 ms average delay it is not sufficient to buffer only in the receiving terminal.
with 1.0 ms standard deviation. The minimum and maximum Actually in this environment the most critical link between
values for the delay are 12.4 ms and 23.7 ms, respectively. asynchronous input and synchronous output is between the PS
core network and the DL radio network. At this point of data
path the units we are buffering are IP packets received from
the packet switched core network, which will be forwarded to
the radio path. Here we assume a slightly different buffering
strategy as described above for jitter buffering in the receiving
terminal: instead of relying on long enough buffering delay we
use FIFO buffer with limited size (as number of packets in the
buffer) and specify a maximum time a packet can be stored in
a buffer. I.e. if a predetermined number of packets are already
stored in the buffer, a new incoming packet will dropped. And
if a packet has been waiting in the buffer for longer time than
specified by the discard timer, it will be dropped to avoid
accumulating delay for subsequent packets. However, to make
sure that the large packets required for ROHC initialization
will get through without unfeasibly large value for discard
timer we made the assumption that a (tail of a) packet that has
been already partially transmitted due to segmentation is never
dropped even if the timer has elapsed.
Figure 2: Delay distribution in PS domain.
Although in general it might not seem sensible to perform
F. Buffering buffering in the transmitting terminal for a VoIP application,
due to strictly limited radio bandwidth, allocated according to
Typically an audio playout device in the receiving terminal optimally compressed headers, and ROHC initialisation
is synchronized to a local clock signal to make sure that there requiring transmission of full IP/UDP/RTP headers, we need
is always signal available for playback. In practice this implies to consider also buffering prior to UL transmission. We apply
that a new frame is required regularly at intervals determined similar buffering mechanism as described for DL, i.e. we
by the frame rate. On the other hand, due to jitter the packets specify fixed size FIFO buffer with a discard timer to make
can arrive at the receiver at irregular rate that is not sure that this bottleneck does not cause unfeasibly long delay.
synchronous to the playout. Therefore, the buffering of speech
G. Additional simulation settings also that the average network delay includes the jitter buffering
time in the receiving terminal.
We used the same speech input sequence for all simulation
runs. This speech sequence has approximately 6 minutes 30 Table 1: Simulation results.
seconds duration and it is an excerpt of a real discussion, and
therefore introduces realistic structure of alternating talk spurts Radio PLR PLR on Total Avg.DL Avg.
and silence periods. The speech is in Finnish and it is recorded link in DL radio PLR buff network
in low-noise office environment. The observed speech activity FER buff delay delay
is approximately 50%. 0% 0.02% 0% 0.06% 9.79ms 221.96ms
We also repeated all simulation scenarios ten times with 1% 0.02% 2.05% 2.08% 9.79ms 221.96ms
different randomly selected starting points in the radio link
error pattern files and in the PS domain delay distribution file 3% 0.02% 6.02% 6.08% 9.79ms 221.96ms
to make sure that the results are not affected by some local
5% 0.02% 10.26% 10.31% 9.79ms 221.96ms
anomaly in the simulated network conditions.
III. SIMULATION RESULTS
The overall frame error rate (FER) can be used as a rough
Since a fixed-delay jitter buffering scheme was assumed in objective speech quality estimate. Typically, with AMR codec
the receiving terminal, the total end-to-end delay is fixed the speech quality can be still considered good when FER is
throughout the session. However, because of the TTI structure, around 1-2%, but it should be noted that also the distribution
packet segmentation at the RLC level and ROHC behaviour of frame losses has an effect on the subjective speech quality.
the network delay on packet level is not fixed throughout the
connection. Packets too large to be carried by a single TTI IV. CONCLUSIONS
need to be segmented over several TTIs thus introducing
Our end-to-end Quality of Service analysis shows that 3GPP
longer transmission delay over a radio link, in most cases in
networks will be able to offer an adequate level of quality for
both uplink and downlink. Furthermore, some of the
Voice over IP (VoIP) services. The difference in QoS with
subsequent packets following the large packets are also
current voice services technology (CS voice) is very small:
segmented over two TTIs although in principle they could fit
The additional packet loss ration introduced by packet
into single TTI because they are not aligned with the TTI
switched characteristics is less than 1%, whereas the end-to-
structure. The reason for this is that when these packets are
end network average delay is expected to be around 220ms.
obtained from the buffer, there is still some room in the tail of
The enabling features for the obtained quality level are
the current radio frame, and as much data as possible from the
beginning of the next packet, if available, are carried here.
N WCDMA unacknowledged mode in radio
For these scenarios there are two causes for frame losses
(packet losses); a packet can be lost on the radio path due to N ROHC at the PDCP layer that allows usage of limited
transmission errors, or a packet can be dropped due to bandwidth radio bearer (16kbits/s).
buffering, either in transmitting terminal, in DL RNC or in
receiving terminal. We would like to point out one observation N Relevant buffering limits and discarding rules in the
regarding frame losses: the observed packet loss rate in the PDCP buffer (DL) and in the transmitting terminal to
radio link seems to be slightly higher than the nominal frame avoid potential cumulative delay and jitter.
error rate specified for the error patterns over all radio FER
N Differentiated Services support in the core network and
conditions. Because of the segmentation a loss of single radio
backbone to ensure minimal buffering delay in packet
frame can cause loss of two packets: when a radio frame
carrying data from two separate packets is lost, both these
packets will be unusable and will be dropped by the receiver. Nevertheless there are few other important aspects that
The simulation results are summarized in Table 1. The require further investigations in order to determine if VoIP
results include packet loss rate (PLR) and buffering time services will be quickly deployed in 3G networks.
statistics, as well as end-to-end delays in different scenarios. 1. The User Equipment (UE) may contribute to the mouth-
There is also a further breakdown of packet loss statistics into to-ear delay: The processing time needed to compress the
losses due to DL buffering and losses due to transmission VoIP headers should not be negligible. Also, because the
errors on the radio path. Since we carry one AMR frame per first few packets during the ROHC initialisation phase are
packet the FER at speech decoder input equals PLR. Note that transmitted with full headers, the UE may require special
losses in the UL terminal buffering are not presented in the buffering mechanism in order to minimize the delay.
table, but they are included in the total packet loss rate. Note 2. The radio capacity needed to transfer VoIP flows is
slightly higher than the capacity needed for sending circuit
switched voice frames, even with header compression. A
detailed analysis, that would take pricing into account,
would be useful to determine if offering VoIP services in
3G networks is efficient and interesting from an operator
The authors wish to thank Zhi-chun Honkasalo and Mattias
Wahlqvist for their frequent feedback along this study. Mika
Kolehmainen and Outi Hiironniemi also contributed to this
work by providing support for radio link error and PS domain
 IETF Session Initiation Protocol (SIP) Working Group,
 IETF Differentiated Services (DiffServ) Working
 ITU-T Recommendation G.114, “One-way
transmission time”, 05/2000
 RFC 3095, “RObust Header Compression (ROHC);
Framework and four profiles: RTP, UDP, ESP, and
uncompressed”, July 2001