Validating voice over LTE end-to-end


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Validating voice over LTE end-to-end

  1. 1. Good grades for VoLTE4Validating voice over LTEend-to-endAs LTE becomes more widely adopted, interest in how to carry voice over LTE is growing rapidly. M IC H A E L A N E H I L L , M AGN U S L A R S S ON, GÖR A N S T RÖM BE RG, E R IC PA R S ON SUnlike previous 3GPP wireless necessary migration to VoIP, extensive software interfacing between speechtechnologies, LTE has no circuit- work needs to be carried out so that the BOX B sampling, RTP/IP packet generation,switched bearer to support mobile-broadband capabilities of LTE MOS-LSQM header compression and modem sched-voice, so carrying voice over networks reach the same coverage and Mean opinion uling functions.LTE requires a migration to a reliability levels, as is possible for circuit- score – listen- For VoLTE solutions to reach the lev-voice over IP (VoIP) solution – switched voice in 2G and 3G networks. ing quality el of maturity required for commercialcommonly under the umbrella of Similarly, the implementation of LTE subjective mixed deployment, a number of lab, field andIMS. Until this migration occurs, devices and the way they interact with (MOS-LQSM) is market trials must be conducted. LabLTE-capable handsets need to the network must mature so that per- the unit for sub- trials have been under way at Ericssonrevert to 2G or 3G for voice calls: formance levels are equivalent to those jective grading in for nearly two years, and over-the-airan approach that is not ideal in achievable with 3G handsets. a MOS test that (OTA) field trials for about a year. Thethe long term. Driven by major When compared to its predeces- includes both most recent field trials were carried outoperators, the voice over LTE sors, LTE differs in many ways. Most narrowband and using commercial-track form-factor-(VoLTE) ecosystem is maturing significantly, the nature of dynam- wideband voice. accurate smartphones, as these devicesrapidly and tracking well to target ic scheduling coupled with variable are approaching the levels of integra-commercial timelines. retransmissions, interspersed with tion required for industry players to ful- Several VoIP solutions for wireline IP inter-cell handover, can introduce sig- ly understand how target solutions willnetworks have existed for some time nificant jitter. To cope with this, devices perform end-to-end (E2E). Some of thenow, whereas solutions for wireless net- require sophisticated jitter-buffer solu- trials included video-telephony as anworks are less mature. To facilitate the tions. LTE introduces new hardware and enhanced communication service. BOX A Terms and abbreviations 2G 2nd-generation wireless telephone standard Server technology H.263 ITU-T video compression standard – OTA over-the-air 3G 3rd-generation wireless telephone predecessor to H.264 OTT over-the-top technology H.264 ITU-T video compression standard PDN-GW packet data network gateway 3GPP 3rd Generation Partnership Project equivalent to MPEG-4 AVC (Part 10) PLR packet loss ratio AILG air interface load generator HARQ hybrid automatic repeat request PS packet-switched AMR-NB Adaptive Multi-Rate Narrowband HSS Home Subscriber Server QCI QoS class identifier AMR-WB Adaptive Multi-Rate Wideband IMS IP Multimedia Subsystem QoS quality of service AVC advanced video coding IP Internet Protocol QVGA Quarter Video Graphics Array CBP Constrained Baseline Profile ISC IMS service control/IMS service RAN radio-access network CBR constant bit-rate control interface RF radio frequency CS circuit-switched ITU-T International Telecommunication RTP Real-time Transport Protocol CSCF Call Session Control Function Union Telecommunication SGW service gateway DL downlink Standardization Sector SINR signal-to-interference-plus-noise ratio E2E end-to-end JBM jitter-buffer management SIP Session Initiation Protocol ECM EPS Connection Management LTE Long Term Evolution SPD speech path delay EPC Evolved Packet Core MME Mobility Management Entity TCP Transmission Control Protocol EPS Evolved Packet System MOS mean opinion score UE user equipment FER frame error rate MOS- UL uplink fps frames per second LSQM mean opinion score – listening quality VoIP voice over IP FTP File Transfer Protocol subjective mixed VoLTE voice over LTE G.114 ITU-T one-way transmission time MPEG Moving Picture Experts Group WCDMA Wideband Code Division aspect of transmission quality MTAS Multimedia Telephony Application Multiple AccessE R I C S S O N R E V I E W • 1 2012
  2. 2. 5This article covers the key performance PS, a large amount of jitter is introduced of the packet scheduler may also intro-characteristics of voice and video-tele- in the LTE radio network – as a result duce significant jitter, at times transmit-phony solutions according to the GSMA of speech- and video-quality and net- ting packets as soon as they arrive and atIR.92 and IR.94 specifications, from a work-capacity/coverage optimization. other times, very close to the maximumcontrol- and media-plane perspective. Despite the introduction of jitter, the delay budget (taking into account theThe field-trial results show that the eco- target is still to provide a fixed E2E delay HARQ retransmissions). In some cases,system is maturing quickly, with good for PS-call users. the handling techniques lead to the lateE2E performance observed. Certain The two factors directly related to the arrival of packets, which – from a qual-devices exhibited implementation LTE radio network that impact the qual- ity perspective – results in a lost (faulty)weaknesses that were subsequently cor- ity of speech and video are packet delay speech or video frame.rected. Although more rigorous testing and packet loss rate. These factors are, in The JBM in the terminal needs to beneeds to be carried out, current results turn, highly dependent on the capabil- designed so that the user perceives delaysuggest that the ecosystem is on track ity of the LTE radio network to perform variations caused in the radio networkto meet the deployment timeframe link adaptation and packet scheduling as negligible.desired by leading VoLTE operators. under various load and interference conditions, while maintaining efficient Call setup timeCharacteristics: media quality use of the radio-interface resources. The elapsed time between the momentUser perception of the overall merits of (link adaptation refers to the the abili- the originating UE sends an SIP INVITEa VoLTE call is determined by the time ty to change transmission mode based to initiate the call and when 180spent setting it up, and the quality of on the radio condition of the terminal RINGING is received by the terminat-the speech (and eventual video) aspects and packet scheduling refers to the pri- ing UE is defined as the call setup time.of the session. oritization of packets between differ- Speech quality varies according to a ent terminals and IP flows). The use of Targets – media qualitynumber of factors relating to the ter- L2 HARQ retransmission in LTE can cre- The overall quality target for VoLTE isminal and the network. For terminals, ate significant additional jitter in packet for voice calls to be perceived as on aspeech quality depends on the perfor- delivery; some packets may be delivered par with, or better than, a 3G call. Usingmance of the microphone and speaker, on first transmission, while others may WCDMA as the 3G benchmark, theas well as the functionality to handle use up to the maximum configured default target is for speech quality toacoustic echo, background noise, com- number of retransmissions to be deliv- be greater than 3.5 MOS-LQSM on a call-pensation of speech level, and especial- ered successfully. The dynamic nature by-call basis, as illustrated in Figure the speech codec and jitter-bufferm­ anagement (JBM). Video quality is also determined by FIGURE 1 Speech quality versus frame error ratea combination of factors, even more ofwhich are terminal-related than withspeech. The quality of JBM, the cam- Mean opinion score – listening quality subjective mixed (MOS-LQSM)era and display, the appropriate video 5.0codec, the settings of frame-rate and AMR–WB 12.65image fidelity for the format used, as AMR–WB 8.85 4.5well as the adaption of speech versus AMR–WB 6.60 AMR–NB 12.2video delay (lip-synch alignment) all 4.0affect user-perceived video quality. In the absence of media gateway func-tions, the network impacts speech and 3.5video quality in terms of transport andmobility. The network mainly affects 3.0quality through available bandwidth,packet loss (which causes speech/video 2.5frame error), and delay as well as inter-ruption time at handover. 2.0 Except for the speech- and video-path delay factor, the impact of all oth-er speech and video quality factors is 1.5the same, regardless of whether thecall is handled over a circuit-switched 1.0(3G/2G) or a packet switched (LTE) net- 0% 1% 2% 3% frame error ratework. For calls over CS, the delay is fixedon both an E2E and on a node level fortheir entire duration. For calls over E R I C S S O N R E V I E W • 1 2012
  3. 3. Good grades for VoLTE6 by WCDMA. Assuming the H.264 FIGURE 2 Speech quality versus mouth-to-ear delay BOX C Constrained Baseline Profile (CBP) codec † For voice, the is used, the performance targets – as E model rating recommenda- illustrated in Figure 3 – are: 100 tion is for a 1:1 a video bit-rate of more than 350kbps§; ITU-T G.114 Very satisfied users relationship to a PLR* of less than 0.1 percent – corre- Satisfied users exist between sponding to a visible error no more than 80 Some dissatisfied users speech frames once every 20 seconds; and Many dissatisfied users and RTP packets a camera-to-display delay of less than Nearly all dissatisfied users 80 – in other words, 400ms – of which more than half is dedi- PLR should be cated to handling the jitter introduced by the same as FER. the LTE radio network††. 70 ‡ In reality, 99 Basically all packet loss and introduced percent of the jitter come from the LTE air interface. RTP packets 60 will have been Call setup time delivered to the The target for VoLTE-to-VoLTE calls is to 50 UE within the have shorter setup times than WCDMA- 100 200 300 400 500 600 budgeted time. to-WCDMA calls. The target setup time Mouth to ear delay (ms) § Note that only for VoLTE calls is to be below three sec- video bit-rate onds – the setup time for a WCDMA is mentioned, call is typically more than this. If the implying that t ­ erminating UE is in an ongoing data The focus here is the network, so a packet loss ratio (PLR)† of less than transport bit- session, the setup time for the VoLTE callin terms of transport parameters, this one percent; and rate must be a will be significantly lower as no ­ aging pmeans a frame error rate (FER) of less a mouth-to-ear delay of less than 200ms little higher. is needed.than 1 percent and a mouth-to-ear delay – of which nearly half of the time is dedi- * For video, mul-of less than 200ms (reflecting a UE to UE cated to handling jitter from the LTE Measured VoLTE voice and video tiple RTP pack-call), as shown in Figure 2. radio network‡. performance ets – at least two To achieve the same speech quality For a VoLTE video call, the overall at 350kbps – are During the second half of 2011, Ericssonfor a VoLTE call as for a WCDMA call, the target is to deliver video quality that needed to trans- conducted a VoLTE field trial, which wastarget performance levels are: is much better than that provided port one video further complemented by LTE OTA‡‡ frame, implying lab tests. The objective for both types of that the relation- tests was to characterize and assess the FIGURE 3 Video quality versus video bit-rate for low and medium ship between quality and performance for voice and motion content lost frames video calls over LTE in varying radio, and packets for load and mobility conditions. video is not 1:1. The tests used the latest develop- H.264 MOS scores for QVGA MOS ments for facilitating VoLTE, including †† In reality, the Ericsson LTE solution, a multi-ven- 5.0 99.9 percent of dor IMS solution and pre-commercial Low motion the RTP packets 4.5 will have been UEs with VoLTE capability. All were in a Medium motion delivered to the commercial deployment configuration. 4.0 UE within the Test conditions budgeted time. 3.5 To ensure that the tests reflected real- world conditions, each case was exe- 3.0 cuted with varying RF and mobility scenarios. The following conditions 2.5 were used in the trial: good: SINR of more than 16dB; 2.0 medium: SINR of 2dB to 16dB; 1.5 poor: SINR of less than 2dB; single-sector mobility: the drive route for 1.0 this test condition covered varying loca- 0 125 250 375 500 tions within the cell/sector, from near the Bit-rate (kbps) eNodeB site to the cell edge; intra-eNodeB handover; and inter-eNodeB handover.E R I C S S O N R E V I E W • 1 2012
  4. 4. 7The originating UE was moving aboutin locations with various RF conditions FIGURE 4 Data capture pointswhile the terminating UE was station-ary in good RF conditions. Windows server Embedded Linux serverBackground traffic load Wireshark log Wireshark log Wireshark logTo enhance the approximation to real-world conditions, all test cases were exe-cuted with background load generatedin the same cell as the test calls. Three Radio Networktypes of background load were simulat-ed: TCP traffic, load on the calling UE, UE S1-U S1-MME S5 SGi ISCand eNodeB load. The iPerf tool executing on four devic-es (PCs with dongles) generated the TCPtraffic. All devices were stationary, posi-tioned in locations with varying RF con- S1-MME MMEditions, generating DL traffic with twodevices generating UL traffic. Typical S11throughput measured 4-5Mbps on the S1-U S5 ISCDL and 0.5-1Mbps on the UL. A Linuxserver connected via the SGi interface X2 SGi SGW PDN- CSCF MTASin the core network site served as both GWserver and client for each UE. An FTP client was used to generateload on the calling UE. This client down- Internetloaded a large file from the Linux serv-er in the core network site while the testcall was in progress. Throughput varieddepending on RF conditions, and wastypically in the range of 1-5Mbps. at 12.65kbps. To benefit from the full used for all video test cases: The AILG tool was used to generate BOX D capability and voice quality of mobile profile: Constrained Baseline Profile;load on the eNodeB. The target load lev- ‡‡ Over-the-air broadband, only AMR-WB at 12.65kbps compression level: 1.2 (384kbps max vid-el was set to 100 percent, to ensure that test system, was used throughout the trial. eo bit-rate);the eNodeB was under maximum load indicating that For video telephony calls, two choices resolution: QVGA 320x240;regardless of the actual traffic level. the UE is con- of codec were available: H.263 and the frame rate: 20fps; and nected to the more recent H.264/AVC (MPEG-4 Part bit-rate control: constant bit-rate (CBR).QoS configuration lab network via 10). Again, to benefit from the full capa-All test cases were executed using QCI 5 the air interface, bility of the network, the H.264 codec Data collection and analysis toolsfor SIP signaling, QCI 1 for voice and QCI not via a coaxial was chosen as a video codec through- A variety of tools for data collection8 for default/background traffic. For the cable. out the trial. were used during the trial, the choice ofpurposes of this study, video was car- The H.264 codec used in the trial which was determined by the definedried on a dedicated non-guaranteed bit- offered a limited set of parameter con- KPIs and by the test-case design for mea-rate bearer. figurations. The following settings were suring them. A combination ofUser equipmentTwo types of UE were used in the trial. A Table 1: Test cases executed in the lab and field trialsleading terminal vendor provided a pre-commercial handset with VoLTE capa- VoLTE session setup over QCI 5 VoLTE session setup over QCI 5bilities that was used as the originating Good RF Poor RF with background loadand terminating device in all test cases. UE-A ECM-IDLE UE-A ECM-CONNECTEDThe same vendor provided the LTE don- UE-B ECM-IDLE UE-B ECM-CONNECTEDgles for the PC laptops generating back- UE-A ECM-IDLEground data traffic. UE-B ECM-CONNECTEDCodecs UE-A ECM-CONNECTEDThe handset provided by the terminal UE-B ECM-IDLEvendor offered two codec selections for UE-A ECM-CONNECTEDvoice: AMR-NB at 12.2kbps and AMR-WB UE-B ECM-CONNECTED E R I C S S O N R E V I E W • 1 2012
  5. 5. Good grades for VoLTE8 embedded Wireshark in the handset FIGURE 5 Measured call setup time BOX E UEs were used for analysis. Figure 4 §§ Since PLR illustrates the capture points for each Measured setup time and FER here interface. (Seconds) is the same for 6 voice, the term Test results PLR is used for Call setup time Lab tests the results. For VoLTE, call setup time is measured 5 Field tests ** The UE used from the moment the originating in the field trial UE sends the INVITE to the moment 4 RINGING is received from the termi- was a second- generation nating UE. 3 handset. In the The call-setup-time results from the lab trial, a laptop lab and field trials meet the targets for 2 with a non- VoLTE, and are significantly shorter optimized client than WCDMA setup time for non-idle- combined with idle cases. The systematic difference 1 a dongle was between lab and trial can be attributed used. to the UE and the IMS deployment, as 0 these two components contributed to a Idle-idle Idle- Connected- Connected- Poor radio connected idle connected and load greater part of the call setup time in the field trial than in the lab trial. Speech quality If the test results show a PLR§§ of less than one percent and a mouth-to-ear delay of commercial and internal tools was Network data capture points less than 200ms, the quality target forused as a result: A Windows 2003 server located in the speech has been met. Consequently, the Wireshark, an IP packet analyzer; core network site captured Wireshark results of these parameters are used in Audacity, an audio analysis tool for traces on the S1, S5 and SGi interfaces this article to determine whether qual- recording and analyzing analog audio and a Linux server captured Wireshark ity targets were met. samples; traces on the ISC interface. These tra­ Both mouth-to-ear delay and PLR are terminal RF logs; and ces were used primarily for call-setup measured over the QCI 1 bearer, which internal Perl scripts to calculate skew/ latency measurements. For most of the has been optimized for VoIP. clock drift, jitter, RTP latency and PLR. test cases, the traces captured from the As illustrated in Figure 5, the target mouth-to-ear delay – less than 200ms – FIGURE 6 Measured mouth-to-ear delay was met in the field trial but not in the lab trial. The main reason for a high- Mouth-to-ear delay er mouth-to-ear in the lab trial was the (ms) UE**. The UE factors contributing to this 300 UE delays (such as, internal long delay were processing from sound- processing and jitter handling) card implementation and prioritization Estimated encoding/ of other functions. 250 decoding delay The mouth-to-ear delay results, com- Average RTP latency bined with a very low PLR shown in (E2E transport delay) Table 2 – measured to be about 0.4 per- 200 cent at the cell border – indicate that VoLTE speech quality meets its targets (better than 3.5MOS–LQSM), and will 150 offer very good performance in the entire cell. 100 Inter-arrival jitter When comparing voice over packet- 50 switched options with voice over cir- cuit-switched options, the main issue is transport – where the packet-switched 0 radio network introduces a significant Lab Trial amount of jitter. The inter-arrival jitterE R I C S S O N R E V I E W • 1 2012
  6. 6. 9measured in the lab, for voice calls andvideo telephony calls in the field trial FIGURE 7 Voice UL inter-arrival jitter measured in the lab trial (left) and in the fieldwith varying RF conditions, is calculat- trial (right)ed according to the formula in Box F. The values for inter-arrival jitter PDF (%) Inter arrival jitter cell Edge (UL) PDF (%) Inter arrival jitter cell Edge (UL)shown in figures 7, and 8 indicate the 45 50delay – the difference between actual 40 45arrival time and expected arrival time 35 40 35– of the current package with respect 30 30to the previous package. A value of 0ms 25 25indicates that both packets have the 20 20same delay; a positive value indicates 15 15that the delay for the current packet is 10 10longer. A negative value indicates that 5 5the delay for the current packet is short- 0 22 20 16 12 8 4 0 4 8 12 16 20 22 0 22 20 18 16 12 14 10 8 6 4 2 0 2 4 6 8 10 12 14 16 18 20 22 (ms) (ms)er than the preceding one. The inter-arrival jitter for a voicecall at the cell border is illustrated inFigure 7. The inter-arrival jitter mea-sured in the trials is quite adequate, as it ability, users will revert back to existing mid-2011, the first E2E OTA field trials ofis well within the delay budget specified BOX F circuit-switched options or, in some VoLTE were possible using pre-commer-as the target for speech quality. These Inter-arrival cases, simply rely on over-the-top cial form-factor-accurate devices – andresults show that LTE radio-network jitter solutions. these trials form the basis of this article.performance is more than adequate for The inter-arrival Clearly, the performance of the UE The industry is progressing rapidlyproviding good VoLTE speech quality. jitter presented will play an even greater role for servic- in terms of KPIs, measurement practic- was calculated es provided over packet-switched net- es and, ultimately, the performance ofVideo quality for two works than over circuit-switched. To early solutions. While some improve-The inter-arrival jitter for a video tele- consecutive RTP meet VoLTE performance targets, JBM ments remain to be made, particularlyphony call is illustrated in Figure 8. packets: i-1 and for the UE needs to meet specifications, in relation to UE JBM and implementa-Naturally, the measured inter-arrival jit- i, as: and the delay contribution from other tion delays from other UE internal pro-ter is higher for video telephony than for (post-decoder) UE internal processing cessing, the current performance level D(i, i-1) =a voice-only call, due to the larger pack- arrival_time(i) – outside the normal functionality must is already close to, and in some casesets sizes and the associated scheduling arrival_time(i-1) – be negligible. superior to, those specified in the targetvariability. However, since the time allo- (RTP_timestamp(i) – Efforts to validate VoLTE from an objectives. In parallel, it is likely that LTEcated for handling jitter is much greater RTP_timestamp(i-1))/ E2E perspective, which have been schedulers will become more sophisti-for video telephony than for voice-only divider under way for more than a year, have cated in terms of managing mixed voicetelephony, the inter-arrival jitter mea- the key objective of retaining users. and data traffic, while ensuring goodsured should result in a PLR of less than Where the LTE was launched commercially in late battery performance.0.1 percent – as targeted. divider is 16 for 2009, and the first VoLTE functional In short, VoLTE works and is ready The video was run at a CBR of 384kbps AMR-WB and 90 tests were conducted at that time. In for widespread adoption. It is wellwith 20fps, which is in line with the rec- for H.264 early 2010, substantial lab trials were defined in 3GPP and GSMA standards,ommended target. This, combined with undertaken, first to begin understand- it has been implemented and has nowthe low PLR indicated from the inter- ing how to measure voice quality and been verified. Due to its telecom-gradearrival jitter results, indicates that a then to validate it in the first commer- implementation, which supports stablehigh-quality video telephony service cial systems. In Germany in early 2011, quality levels throughout the entire net-can be offered over LTE. the first solution for wireless voice over work regardless of load, VoLTE should, IP using an LTE network was commer- under normal network conditions, out-Conclusion cially deployed, validating many of the perform unpredictable OTT voice andIn early launches of VoLTE, either in QoS and scheduler enablers that are video services.the form of market trials or commer- required for VoLTE. It is a testament tocial availability, the need to ensure a the sound design of LTE that this deploy-good user experience is paramount. ment could occur so quickly after theGiven the trend toward data-rich appli- introduction of a new mobile-broad-cations, voice now represents a small- band wireless technology. Finally, iner fraction of total device usage. It does,however, remain an essential capabil- Table 2: Measured PLRity, and user expectations for voice are Lab Fieldvery high. If voice services do not deliv-er the necessary level of quality and reli- PLR 0.4% 0.4% E R I C S S O N R E V I E W • 1 2012
  7. 7. Good grades for VoLTE10 UL inter-arrival jitter measured in the field trial – video FIGURE 8 Michael Anehill telephony joined Ericsson in 1992. PDF Inter-arrival jitter – cell Edge (UL) He has worked with (%) speech quality since 1996, 20 starting with TDMA 18 followed by PDC, GSM and WCDMA, and is now focusing on 16 Audio Video multimedia telephony. He holds an 14 M.Sc. in applied physics and electrical engineering from Linköping University, 12 Sweden. 10 8 6 4 Magnus Larsson 2 started with Ellemtel (now wholly owned by 0 –60 –50 –40 –30 –20 –10 0 10 20 30 40 50 60 Ericsson) in 1991 to work on (ms) the development of the next generation AXE system. He has spent more than 25 years in the telecoms industry. He is currently a senior specialist in the area of IMS characteristics at Product Development Unit, Core and IMS in Stockholm, Sweden. Göran Strömberg is a system manager at Mobile Broadband Engagement Practices in Region North America. He has recently been working with VoLTE performance and characteristics in networks with commercial deployment configuration. He holds a B.Sc. in applied mathematics and computer science from Uppsala University, Sweden. Eric Parsons is a strategic product manager for telephony over LTE. He has held a number of senior technical leadership positions in wireless product architecture and standards focusing on end-to-end solutions with more than 20 years industry experience. He has published over 20 refereed publications and numerous patents. He has a Ph.D. in Computer Science from the University of Toronto, Canada.E R I C S S O N R E V I E W • 1 2012