SlideShare a Scribd company logo
1 of 15
Download to read offline
Speech Communication 43 (2004) 17–31
                                                                                               www.elsevier.com/locate/specom




        An efficient transcoding algorithm for G.723.1 and
       G.729A speech coders: interoperability between mobile
                       and IP network q
      Sung-Wan Yoon *, Hong-Goo Kang, Young-Cheol Park, Dae-Hee Youn
              MCSP LAB, Department of Electrical & Electronic Engineering, Yonsei University, 134 Shinchon-dong,
                                       Sudaemun-gu, Seoul 120-749, South Korea
                                                   Received 16 December 2003



Abstract
   In this paper, an efficient transcoding algorithm for G.723.1 and G.729A speech coders is proposed. Transcoding in
this paper is completed through four processing steps: LSP conversion, pitch interval conversion, fast adaptive-code-
book search, and fast fixed-codebook search. For maintaining minimum distortion, sensitive parameters to quality such
as adaptive- and fixed-codebooks are re-estimated from synthesized target signals. To reduce overall complexity, other
parameters are directly converted in parametric levels without running through the decoding process. Objective and
subjective preference tests verify that the proposed transcoding algorithm has comparable quality to the classical en-
coder–decoder tandem approach. To compare the complexity of the algorithms, we implement them on the TI
TMS320C6201 DSP chip. As a result, the proposed algorithm achieves 26–38% reduction of the overall complexity with
a shorter processing delay.
Ó 2004 Elsevier B.V. All rights reserved.

Keywords: Speech coding; Transcoding; Tandem; G.723.1; G.729; G.729A; CELP; LSP conversion; Pitch conversion; Fast codebook
search



1. Introduction                                                      distinctive features, but they obviously tend to
                                                                     share common applications such as voice messag-
   For the last two decades, many new speech                         ing, and voice over internet protocol (VoIP). In
coding standards have been established. Among                        such applications, the interoperability between the
them, ITU-T G.723.1 (ITU-T Rec. G.723.1, 1996)                       networks is crucial, so that endpoint devices are
and G.729 (ITU-T Rec. G.729, 1996a) cover a wide                     required to support both standard coders. How-
range of applications requiring low bit rates. Both                  ever, supporting multiple coders in a single device
standards have different applications due to their                    costs system complexity and power, which will
                                                                     become a significant issue when the device needs to
                                                                     be portable and is powered by small dry cell bat-
  q                                                                  teries. Thus, to manifest interoperability, it is
    This work was supported by BERC-KOSEF.
  *
    Corresponding author. Tel.: +82-2-2123-4534; fax: +82-2-
                                                                     desirable for network systems to support commu-
312-4584.                                                            nications between two endpoint devices employing
   E-mail address: yocello@mcsp.yonsei.ac.kr (S.-W. Yoon).           different types of coders.
0167-6393/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.specom.2004.01.004
18                            S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

   A simple approach to overcome this interoper-             connection. But, if the direct mapping is not pos-
ability problem is to place the decoder/encoder of           sible due to dissimilarities in quantizing the exci-
one coder and the encoder/decoder of the other               tation signal, the performance of the direct
coder in tandem. However, the encoder–decoder                mapping method will not be guaranteed. In that
tandem is often associated with several problems.            case, the other algorithms such as partial encoding
First of all, quality degradation of the synthesized         process that uses the source coder information to
speech is inevitable because a speech signal is en-          reduce the complexity of the encoding process in
coded and decoded twice by the two different                  the target coder maintaining the quality should be
speech coders. Quantization errors due to each               applied.
encoding process are accumulated, which results in              In this paper, we propose an efficient transcod-
the degradation of speech quality. In the case of            ing algorithm working with 5.3/6.3 kbit/s G.723.1
cross tandeming, the mean opinion score (MOS)                and 8 kbit/s G.729A (ITU-T Rec. G.729, 1996b)
drops more than that of single tandeming (Cam-               speech coders. According to the survey in (Hersent
pos Neto and Crcoran, 1999, 2000). Second,                   et al., 2000), two coders are the most widely de-
computational demands for the decoder–encoder                ployed standards in VoIP systems nowadays.
tandem may be too high to implement it into the              Considering the frame length of two-speech cod-
service systems for a large number of users. When            ers, parameters corresponding to one frame of
the tandem algorithms are placed in the gateways             G.723.1 are converted to three sets of equivalent
of cable networks or the base stations of wireless           G.729A parameters. The proposed transcoding
networks, the complexity eventually will be limit of         algorithm is composed of four processing steps:
the full communication systems. Finally, tandem              line spectral pair (LSP) conversion, pitch interval
coding is accompanied with an additional delay in            conversion, fast adaptive-codebook search, and
the communication links because an additional                fast fixed-codebook search. In the LSP conversion,
look-ahead delay for the LPC analysis is inevitable          since variation of LSP trajectory in a steady state
to obtain target bitstreams.                                 speech segment of each coder is somehow negli-
   Unlike the tandem coding, transcoding can be              gible in quality aspects, we directly convert them in
used to overcome these difficulties. Transcoding is            the LSP domain using a codebook matching pro-
to translate source bitstreams to target ones                cedure. For the pitch interval conversion, we
without running through complete decoding–                   cannot use a direct mapping method because the
encoding processes. Thus, it can minimize the                search ranges of pitch intervals in two coders are
quality degradation of the synthesized speech and            different. From the observation that pitch contour
computational complexities with no additional                is monotonically varied with a small variation in
delay.                                                       voiced or voice-like speech signals, we estimate the
   Past researches on transcoding, especially                open-loop pitch in constrained regions relating to
(Campos Neto and Crcoran, 1999) which pro-                   the pitch values of the adjacent frames. This is an
posed the transcoding between G.729 and IS-641               efficient way in the respect of complexity reduc-
(TIA/EIA PN-3467, 1996), was focused on the                  tion. In the adaptive-codebook and fixed-code-
conversion of LSP and gain information. In the               book search process, the codebook structures of
process of the other encoding modules such as                the two coders are different from each other. We
adaptive-codebook delay search and fixed-code-                apply a fast search algorithm for reducing the
book index search, both coders share the infor-              complexity of the transcoding system. Using the
mation for excitation of each other. In other                above four processing steps, the proposed trans-
words, the pitch delay and fixed-codebook index               coding algorithm translates bitstreams of the
are not changed, and directly mapped from one                source coder to those of the target coder with
coder to the other coder. It is possible because of          minimum distortion and reduced complexity.
the many similarities in the structure of comprising            To verify the performance of the proposed
the excitation. This algorithm showed the im-                algorithm, we perform objective and subjective
proved performance compared to cross tandeming               tests such as LPC cepstral distance (LPC-CD)
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                          19

(Kitawaki et al., 1998) and perceptual evaluation            used to construct the short-term perceptual
of speech quality (PESQ) (ITU-T Rec. P.862,                  weighting filter which is used for filtering the entire
2000) with various speech sets. In addition we               frame to obtain the perceptual weighted speech
compare the complexity of them to that of the                signal. For every two sub-frames, the open-loop
tandem algorithm by implementing the algorithms              pitch period is computed using the weighted
using TI TMS320C6201 DSP processor (Texas                    speech signal. Later, the speech signal runs
Instruments, 1996). Evaluation results confirm                through the adaptive and fixed-codebook search
that the proposed transcoding achieves the                   procedures on a sub-frame basis. The adaptive-
reduction of overall complexity with a shorter               codebook search is performed using a 5th-order
processing delay. Also, the preference tests reveal          pitch predictor, and the closed-loop pitch and
the superiority of the proposed transcoding algo-            pitch gain are computed. Finally, the non-periodic
rithm in comparison to the tandem.                           component of the excitation is approximated by
   This paper is organized as follows. G.723.1 and           two types of fixed codebooks: multi-pulse maxi-
G.729 speech coding algorithms are briefly de-                mum likelihood quantization (MP-MLQ) for a
scribed in Section 2. In Section 3, transcoding              higher bit rate, and an algebraic code excited linear
algorithms from G.723.1 to G.729A and from                   prediction (ACELP) for a lower bit rate.
G.729A to G.723.1 are proposed. Section 4 de-
scribes the performance evaluation of the pro-               2.2. ITU-T G.729 speech coder
posed transcoding algorithm. Conclusions are
presented in Section 5.                                         ITU-T 8 kbit/s G.729 (ITU-T Rec. G.729,
                                                             1996a) is developed based on conjugated-structure
                                                             ACELP (CS-ACELP). G.729 operates on the
2. ITU-T G.723.1 and G.729A                                  frame length of 10 ms, and there is 5 ms look-
                                                             ahead for linear prediction (LP) analysis, resulting
   Transcoding algorithm in this paper is associ-            in a total 15 ms algorithmic delay. For every 10 ms
ated with ITU-T G.723.1 (ITU-T Rec. G.723.1,                 frame, 10th-order LPC coefficients are computed
1996) and ITU-T G.729A (ITU-T Rec. G.729,                    using the Levinson–Durbin recursion. The LPC
1996b). Each speech coder is widely used for                 coefficients for the 2nd sub-frame are quantized
multimedia communication with low bit-rate                   using multi-stage vector quantization (MSVQ).
applications such as VoIP and digital cellular.              The unquantized LPC coefficients are used to
Their operational features are briefly described in           construct the short-term perceptual weighting fil-
this chapter.                                                ter. After computing the weighted speech signal,
                                                             the open-loop pitch period is computed. To avoid
2.1. ITU-T G.723.1 speech coder                              pitch multipling errors, the delay range is divided
                                                             into three sections, and smaller pitch values are
   ITU-T G.723.1 (ITU-T Rec. G.723.1, 1996), the             favored. Then, the adaptive- and fixed-codebooks
standard for the multimedia communication                    are searched to find optimum codewords. The
speech coder, has two modes whose bit rates are              adaptive-codebook search is performed using a
5.3 and 6.3 kbit/s. G.723.1 takes 30 ms of speech or         1st-order pitch predictor, and a fractional pitch
other audio signals for encoding, and signals with           delay is searched in a 1/3 sample resolution. In the
the same length are reproduced by the decoder. In            fixed-codebook search, non-periodic components
addition, there is a look-ahead of 7.5 ms, resulting         of excitation are modeled using algebraic code-
in a total algorithmic delay of 37.5 ms.                     books with four pulses. For the efficiency of
   For every 60-sample sub-frame, a set of 10th-             quantization of pitch- and fixed-codebook gains,
order linear prediction coefficients (LPC) is com-             two codebooks with conjugate structures are used.
puted. The LPC set of the last sub-frame is                     G.729 Annex A is a complexity-reduced version
quantized using a predictive split vector quantizer          of G.729 (ITU-T Rec. G.729, 1996b). The com-
(PSVQ). The unquantized LPC coefficients are                   plexity of G.729A is about 50% of G.729 (Salami
20                              S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

et al., 1997a). Algorithmic differences between                       3.1. Transcoding from G.723.1 to G.729A
G.729 and G.729A are found in (Salami et al.,
1997b).                                                                  The transcoding algorithm proposed in this
                                                                     paper has an asymmetric structure between Tx and
                                                                     Rx paths. For the transmission of speech data
3. The proposed transcoding algorithm                                from G.723.1 to G.729A, the transcoding process
                                                                     is involved with the LSP conversion and the open-
   A simple transcoding algorithm can be obtained                    loop pitch conversion. Transcoding is executed on
by placing the decoder/encoder of one coder and                      the basis of G.723.1 frame size; thus one G.723.1
the encoder/decoder of the other coder in tandem,                    frame is converted to three frames of G.729A. A
as shown in Fig. 1(a). However, the decoder–en-                      block diagram of the proposed transcoding algo-
coder tandem often raises several problems such as                   rithm from G.723.1 to G.729A is shown in Fig. 2.
degradation of speech quality, high computational                    The left dotted box in the figure corresponds to the
load, and additional delay. All these problems are                   decoder module of G.723.1 and the right one
due to the fact that the speech signal should pass                   corresponds to the encoder module of G.729A.
complete processes for the decoding and encoding                         For transcoding from the G.723.1 and G.729A,
of two associated coders.                                            LSP sets of G.723.1 are converted to those of
   Comparing with the tandem coding, transcod-                       G.729A, and they are quantized by the G729A
ing, a direct translation of one bitstream to an-                    encoding scheme. Then, the quantized LSP sets are
other, is more beneficial. Fig. 1(b) shows a block                    converted to LPC again, so the perceptually
diagram of the transcoding. The transcoding can                      weighted synthesis filter is constructed for each
avoid the degradation of the synthesized speech                      sub-frame. Afterwards, the target signal for open-
quality because the several source parameters are                    loop pitch estimation is computed by filtering the
directly translated in the parametric domain in-                     excitation through the perceptually weighted filter.
stead of being re-estimated from the decoded PCM                     For the open-loop pitch estimation of G.729A, a
data. In this respect, transcoding algorithm is ex-                  pitch smoothing technique with the closed-loop
pected to show less distortion than the tandem                       pitch intervals of G.723.1 and G.729A is used.
coding. Also, no additional delay is required, and                   Considering the similarity and the continuity of
a reduced computational load is expected.                            the pitch parameter for the consecutive frames, the



                                   packets                                           packets
                                encoded by A                 PCM                  encoded by B
                      Encoder                    Encoder                Encoder                   Encoder
                     /Decoder                   /Decoder               /Decoder                  /Decoder

                        A                          A                        B                       B
                                   packets                   PCM                     packets
                                encoded by A                                      encoded by B
                                                   Base station / Gateway
                                                               (a)



                                   packets                                           packets
                                encoded by A                                      encoded by B
                      Encoder                                                                     Encoder
                                                       Mutual Transcoder
                     /Decoder                                                                    /Decoder
                                                            A↔B
                        A                                                                           B
                                   packets                                           packets
                                encoded by A                                      encoded by B
                                                   Base station / Gateway

                                                               (b)

                                           Fig. 1. (a) Tandem and (b) transcoding.
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                                                                             21

                                                                                                                                            <G.729A>
                                                                                                                  LSP index
                      <G.723.1>

                                                                     Perceptually     sw(n)         Open-loop pitch
                     LSP index                 LSP                    Weighting                     Estimation using
                                           Transcoder                   Filter                      Pitch Smoothing


                                                                                                               t(n)       Pirch delay
                                                                                                                                 Li
                                    Li                                        e(n)                 Adaptive-codebook                         BIT-
                   pitch delay,            Adaptive-          u(n)                                        Search
                                           Codebook                       +
                                                                          +                                                      gp
                                                                                                                              pitch gain
                                                                                                                                           PACKING
                  pitch gain, g p          Decoder
                                                                                                         Remove
                                                                                                   Adaptive-codebook
                                                                                                      contribution
                                                                                                               x(n)    Codebook index
                 Fixed codebook              Fixed-           v(n)                                                               ci
                                           Codebook                                                  Fixed-codebook
                    information                                                                          Search
                                            Decoder
                                                                                                                                 gc
                                                                                                                       c odebo ok ga in




                                         Fig. 2. Block diagram of transcoding from G.723.1 to G.729A.


pitch smoothing method estimates the open-loop                                           the bitstream and transmitted to the decoder of
pitch in the constrained narrow region, around the                                       G.729A.
previous closed-loop pitch value of G.729A or the
current closed-loop pitch of G.723.1. Using this
                                                                                         3.1.1. LSP conversion using linear interpolation
approach, it is possible to generate equivalent
                                                                                            A linear interpolation technique is employed to
speech quality to tandem coding while spending
                                                                                         translate the LSP parameters. Given a set of
much less computational load.
                                                                                         G.723.1 LSP parameters, three frame sets of
   After determining the open-loop pitch, the
                                                                                         G.729A LSP parameters are computed because the
adaptive- and fixed-codebooks are searched for
                                                                                         frame length of G.723.1 is three times longer than
G.729. The closed-loop pitch of the second sub-
                                                                                         that of G.729A. Fig. 3 shows the LSP conversion
frame at the current processing frame of G.729 is
                                                                                         procedure, which can be denoted by
used for a candidate of the open-loop pitch esti-
mation for the next frame. It means that the open-                                                8 A
                                                                                                  < p1 ðjÞ;                                             i ¼ 1;
loop pitch of the next frame is estimated around
                                                                                         piB ðjÞ ¼ 1ðp2 ðjÞ þ p3 ðjÞÞ;
                                                                                                        A      A
                                                                                                                                                        i ¼ 2;   1 6 j 6 10;
the closed-loop pitch of the second sub-frame ob-                                                 :2A
                                                                                                    p4 ðjÞ;                                             i ¼ 3;
tained in the current frame. After the translation
of LSP parameters and the open-loop pitch from                                                                                                                             ð1Þ
G.723.1 to G.729A, the adaptive-codebook and
the fixed-codebook parameters of G.729A are                                               where pA and pB are the LSP parameters of
determined by the standard encoding process. Fi-                                         G.723.1 and G.729A, respectively, and i denotes
nally, the parameters of G.729A are encoded to                                           the frame index of G.729A.


                                             1st subframe            2nd subframe              3rd subframe                             4th subframe
                     <G.723.1>

                                             LSP (G.723.1)           LSP (G.723.1)                 LSP (G.723.1)                        LSP (G.723.1)
                                                 p1A                      A
                                                                         p2                              p3A                                  A
                                                                                                                                             p4

                                                                                     Interpolation


                                                 LSP (G.729A)                        LSP (G.729A)                            LSP (G.729A)
                                                             p1B                               B
                                                                                              p2                                            B
                                                                                                                                           p3
                     <G.729A>                                                                                                    3rd frame
                                                       1st frame                      2nd frame

                           Fig. 3. LSP conversion from G.723.1 to G.729A using linear interpolation.
22                                       S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

   To compare the performance of the tandem and                         shown in Fig. 4, the LPC spectrum obtained after
the LSP conversion using linear interpolation, we                       the LSP conversion given in Eq. (1) matches clo-
measured the spectral distortion to validate the                        sely to the reference spectrum, especially in the low
efficiency of the proposed method. The decoded                            frequency region and around the formant. The
LPC coefficients of G.723.1 were used as a refer-                         LPC spectrum after the decoder–encoder tandem,
ence, to compare the distortions of the proposed                        however, indicates a larger spectral distortion than
and tandem method, because we intended to show                          the proposed LSP conversion. Since speech quality
the distortion added to the original spectrum. As                       is mainly determined from the accuracy of low and
shown in Table 1, the spectral distortion of the                        formant frequency region components (Rabiner
transcoding is much less than that of tandem.                           and Schafer, 1978), it can be said that the pro-
Moreover the both outliers of 2–4 dB and above                          posed LSP conversion technique can provide bet-
the 4 dB are extremely small for transcoding cases.                     ter speech quality than the tandem.
We also compared the LPC spectrum of the tan-                              Also, it should be mentioned that the proposed
dem and transcoding to that of the original one.                        LSP conversion method is more beneficial than the
Fig. 4 shows the LPC spectrum in the voiced re-                         simple tandem with regard to complexity and
gion of the speech signal, in which the LPC spec-                       processing delay. As described in (Kang et al.,
trum of G.729A is also shown as a reference. As                         2000), the total delay of a speech communication
                                                                        system is decided by three factors: algorithmic
                                                                        delay, processing delay, and network delay. Con-
Table 1                                                                 sidering the algorithmic and processing delays
Spectral distortion––from G.723.1 to G.729A                             only, the total delays of the decoder–encoder
     Method        Average       Outliers (%)                           tandem and direct LSP conversion, denoted by
                   SD [dB]                                              Dtan and Dtrans are written by
                                                                          AB        AB
                                 2–4 dB                >4 dB
     Tandem        2.81          61.71                 14.03
                                                                        Dtan ¼ 42:5 þ aA þ bA þ aB þ bB ;
                                                                         AB                                              ð2Þ
     Transcoding   0.81           0.29                  0.00
                                                                        Dtrans ¼ 37:5 þ aA þ PAB þ bB ;
                                                                         AB                                              ð3Þ

                                                                  LPC spectrum
                               40
                                                                                      G.729A (reference)
                                                                                      Tandem
                               30                                                     Transcoding


                               20


                               10
                          dB




                                0


                               -10


                               -20


                               -30
                                     0       500    1000       1500   2000   2500     3000    3500     4000
                                                                       Hz

                                     Fig. 4. Comparison of LPC spectrum (from G.723.1 to G.729A).
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                                   23

Table 2
Transmission delay––from G.723.1 to G.729A
                                 Operation                      Tandem (ms)                         Transcoding (ms)
  A (G.723.1)                    Buffering                       37.5                                37.5
                                                                 E                                   E
                                 Encoding                       pA                                  pA
                                                                 D                                   TR
  Intermediate processing        Decoding                       pA                                  pAB
                                                                     E
                                 Encoding                       3 Â pB
                                 Delay                          5                                   0
                                                                     D                                   D
  B (G.729A)                     Decoding                       3 Â pB                              3 Â pB
                                                                        E    D      E    D                  E    TR     D
  Total delay (ms)                                              42:5 þ pA þ pA þ 3ðpB þ pB Þ        37:5 þ pA þ pAB þ 3pB



where subscripts A and B denote G.723.1 and                      sponding to the adjacent sub-frame (Yoon et al.,
G.729A, respectively, and AB denotes the con-                    2001).
nection from A to B. Also, am and bm (m ¼ A or B)                   The proposed open-loop pitch estimation is
denote encoder and decoder delays, respectively,                 simplified using a pitch smoothing scheme shown
and PAB denotes the delay introduced by the pro-                 in Fig. 5. At first, the closed-loop pitch of G.723.1
posed LSP conversion method. In the decoder–                     is compared with the pitch value obtained at the
encoder tandem, an additional 5 ms look-ahead is                 2nd sub-frame of the previous G.729A frame. If
needed for LPC analysis. But the proposed LSP                    the distance between the two pitch values is less
conversion technique does not introduce this look-               than 10 samples, by considering the continuity of
ahead delay because the processing of LPC anal-                  the pitch values, the closed-loop pitch of G.723.1 is
ysis is not executed. As shown in Eqs. (2) and (3),              determined as the open-loop pitch of G.729A.
the total delay of the proposed method is at least 5             Otherwise, the pitch smoothing method is applied.
ms less than that of the decoder–encoder tandem.                 To determine the threshold value of 10 samples for
Because the processing delay, PAB , is less than the             applying the pitch smoothing method in pitch
sum of bA and aB , the total delay will be further               interval conversion, the deviation of the adaptive-
reduced. The delays induced by the decoder–en-                   codebook pitch search range, depending on the
coder tandem and the LSP conversion technique                    corresponding open-loop pitch, is considered. 1
are summarized in Table 2.                                          If the pitch difference is larger than 10 samples,
                   E    D    E    D    TR
   In this table, pA ; pA ; pB ; pB ; pAB is the processing      local delays maximizing Rðki Þ in Eq. (4) are
time of encoding of G.723.1, decoding of G.723.1,                searched in the range of Æ3 samples around the
encoding of G.729A, decoding of G.729A, and                      closed-loop pitch delays of G.723.1 and G.729A,
transcoding from G.723.1 and G.729A, respec-                     respectively:
tively.
   When the LSP conversion is completed, the                                X
                                                                            79
                                                                 Rðki Þ ¼         sw ðnÞ Á sw ðn À ki Þ;
perceptual weighting filter for the G.729A encoder                                                                           ð4Þ
                                                                            n¼0
is constructed using the converted LSP parame-
                                                                 pi À 3 6 ki 6 pi þ 3ði ¼ A; BÞ;
ters. Then, the perceptually weighted speech signal
for the open-loop pitch estimation is generated
using the perceptual weighting filter.                            where sw ðnÞ is the weighted speech signal, sub-
                                                                 scripts A and B, respectively, denote G.723.1 and
                                                                 G.729A, pi s are the closed-loop pitch delays, and
3.1.2. Open-loop pitch estimation using a pitch
                                                                 ki s are the open-loop pitch candidates.
smoothing
   After the LSP conversion, the open-loop pitch
for each frame of G.729A is estimated. In the
proposed transcoding algorithm, the open-loop                       1
                                                                      For example, the closed-loop pitch search range of G.729A
pitch is searched around the interval between the                is from )5 to +4 sample around the open-loop pitch or
pitch values of the source and target coder corre-               preceding sub-frame pitch.
24                                            S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

                                                            1st                 2nd                3rd                 4th
                                                         subframe             subframe           subframe            subframe
                       <G.723.1>
                                                          Clp_A1              Clp_A2               Clp_A3




                                                      Pitch-                   Pitch-                     Pitch-
                                                    Smoothing                 Smoothing                 Smoothing



                       <G.729 A>                            T_op 1                       T _op2                     T_op 3
                                           Clp_B0                    Clp_B1                    Clp_B2                    Clp_B3

                                Preivous
                                                           1st frame                   2nd frame                 3rd frame
                                 frame

                                   Fig. 5. Open-loop pitch estimation using pitch smoothing (G.723.1 fi G.729A).




  After determining the local delays, ti , i ¼ A; B                               G.723.1 is selected. Consequently, the smoothed
maximizing Rðki Þ in each range, Rðki Þ is normalized                             open-loop pitch, Top , is determined as
by the energy at the local maximum delays:
                                                                                       Top ¼ t1
  0              Rðti Þ                                                                R0 ðTop Þ ¼ R0 ðt1 Þ
R ðti Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
           P 2                              i ¼ A; B:                  ð5Þ
                n sw ðn À ti Þ                                                         if R0 ðt2 Þ P 0:75 Á R0 ðTop Þ
                                                                                           R0 ðTop Þ ¼ R0 ðt2 Þ
   To evaluate the validity of this pitch estimation                                       Top ¼ t2
approach, we perform two experiments such as                                           end
observation of the pitch contour of two coders and
comparison of the estimated open-loop pitch                                          As shown in Fig. 6, the pitch contour of the
contour of the target coder with the adaptive-                                    proposed method, dashed line one, matches well
codebook pitch contour of the source and target                                   with the original pitch value of G.729A without
coders. The adaptive-codebook pitch contour of                                    any severe fluctuation. But, in the case of the
target coder matches well with the estimated open-                                tandem approach, a drastic fluctuation appears
loop pitch contour rather than the pitch value of                                 such as pitch multiple errors even in the stable
source one. We infer from the observation that the                                voiced speech segment.
variation of pitch value between adjacent sub-                                       The word ‘‘smoothed’’ means that the rapid
frames is relatively stable and the estimated open-                               and sudden pitch variation often produced by the
loop pitch is close to the previous closed-loop                                   estimation in distorted speech segments, an even
pitch value. Finally, the normalized local maxima                                 voiced or voice-like region, is avoided, so the
are compared with each other, and the local                                       estimated open-loop pitch can represent the
maximum from G.729A is favored. Thus, if the                                      monotonically varying contour.
local maximum of G.729A is larger than 3/4 times                                     In the proposed scheme, the autocorrelation of
of that of G.723.1, the open-loop pitch of G.729A                                 the weighted speech is either computed around the
is determined as the local maximum delay of                                       constrained region or not at all. Also, as the local
G.729A. 2 Otherwise, the local maximum delay of                                   maximum value from the search process of origi-
                                                                                  nal G.729A encoder in full search range is not
                                                                                  involved, the open-loop pitch can be estimated
  2
    The 3/4 was determined from the results of our experiments                    with much less computational load in the con-
based on large number of test database.                                           strained search range. Furthermore, the pitch
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                            25

  4000                                                                            smoothing scheme can reduce the drastic fluctua-
  3000                                                                            tion of pitch contour in a voiced or voice-like
  2000
                                                                                  segment, as shown in Fig. 6, like that of the tan-
                                                                                  dem approach which results in sticky noise due to
  1000
                                                                                  inaccurate pitch information.
        0

 -1000                                                                            3.2. From G.729A to G.723.1
 -2000

 -3000
                                                                                     For the case of speech transmission from
                                                                                  G.729A to G.723.1, the proposed algorithm con-
 -4000
                                                                                  sists of LSP conversion using linear interpolation,
 -5000                                                                            pitch smoothing for the open-loop pitch conver-
 -6000                                                                            sion, fast adaptive-codebook search, and fast
      0          200       400       600     800     1000        1200    1400
                                     sample index                                 fixed-codebook search. Parameters corresponding
                                        (a)                                       to three frames of G.729A are collected and
                                                                                  simultaneously converted to parameters corre-
        65                                                                        sponding to one frame of G.723.1. A block dia-
                                                         reference(G .729)
        60
                                                         tandem                   gram of the proposed transcoding algorithm from
                                                         pitch s moothing
                                                                                  G.729A to G.723.1 is shown in Fig. 7. The overall
        55                                                                        structure is similar to the connection from G.723.1
        50
                                                                                  to G.729A, but fast adaptive-codebook and fixed-
                                                                                  codebook search modules are different from
pitch




        45                                                                        G.723.1 to G.729A.
        40
                                                                                  3.2.1. LSP conversion using a linear interpolation
        35                                                                           LSP conversion from G.729A to G.723.1 is
                                                                                  accomplished with parameters corresponding to
        30
                                                                                  three frames of G.729A. However, only the 4th
        25
             0   2     4         6     8     10     12      14     16        18
                                                                                  frame parameters of G.723.1 are computed via a
                                     sample index                                 linear interpolation, as given by
                                        (b)
                                                                                  pA ðjÞ ¼ w1 Á p1 ðjÞ þ w2 Á p2 ðjÞ
                                                                                                 B             B
Fig. 6. Comparison of the open-loop pitch contour. (a) Voiced                                      B
speech segment. (b) The estimated open-loop pitch contour.                                 þ w3 Á p3 ðjÞ;   1 6 j 6 10;          ð6Þ




                                              Fig. 7. Block diagram of transcoding from G.729A to G.723.1.
26                                    S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

                                               1st frame                 2nd frame                     3rd frame
                                                                                  B                           B
                          <G.729A>                    p1B                        p2                          p3
                                              LSP (G.729A)             LSP (G.729A)                   LSP (G.729A)



                                Previous
                                                                      Interpolation                     Interpolation
                             LSP (G.723.1)
                                                                                                                   pA

                                           LSP (G.723.1)      LSP (G.723.1)           LSP (G.723.1)      LSP (G.723.1)
                          <G.723.1>        1st subframe       2nd subframe         3rd subframe          4th subframe

                             Fig. 8. LSP conversion from G.729A to G.723.1 using linear interpolation.


where superscripts A and B again denote parame-                                  As described in Section 3.1.1, the proposed
ters corresponding to G.723.1 and G.729, respec-                              approach reduces overall complexity and overall
tively, and subscript denotes frame indices of                                processing delays. Processing delays are summa-
G.729. w1 , w2 , and w3 are weighting factors. They                           rized and compared in Table 4, LPC spectra ob-
are usually decided by considering geometric dis-                             tained in a voiced region shown in Fig. 9. As
tance (Epperson, 2002). We set them at 0.05, 0.15                             shown in the figure, the LPC spectrum of the
and 0.80 respectively. The developed LSP con-                                 proposed LSP conversion scheme matches well
version process is illustrated in Fig. 8. In this                             with the reference compared to the tandem coding.
transcoding algorithm, only the LSP set of the 4th
sub-frame is computed by linear interpolation,                                3.2.2. Open-loop pitch estimation using pitch
because the LPC sets of the other sub-frames are                              smoothing
computed by linear interpolation using the LSP                                   Similar to the case from G.723.1 to G.729A, the
information of 4th sub-frames between the previ-                              perceptually weighted speech signal is computed as
ous and current frame. To compare the perfor-                                 a target signal for the open-loop pitch estimation
mance of the tandem and the LSP conversion                                    of G.723.1. The same pitch smoothing technique is
using linear interpolation in this direction, we                              applied. A block diagram of the developed open-
measure the spectral distortion. The decoded LPC                              loop pitch estimation is shown in Fig. 10. As a cost
coefficients of G.729A were used as a reference. As                             function, the normalized cross correlation at the
shown in Table 3, the spectral distortion of the                              pitch-lag candidate is computed, which have been
LSP conversion is much less than that of tandem.                              used for the original open-loop pitch estimation
Moreover the both outliers of 2–4 dB and above                                module in G.723.1 (ITU-T Rec. G.723.1, 1996).
the 4 dB are extremely less, too. Also, as shown in
Fig. 9, the LPC spectrum of the transcoding                                   3.2.3. Fast adaptive-codebook search
matches well with the reference spectrum obtained                                The adaptive-codebook search in G.723.1 uses
from the original G.723.1 encoder compared to                                 a 5th order pitch predictor, and estimates the pitch
that of the tandem coding.                                                    delay and gain simultaneously. Thus, the adaptive
                                                                              codebook search in G.723.1 is computationally
                                                                              demanding mainly because the pitch delay and
Table 3                                                                       pitch gain are searched simultaneously. Previously,
Spectral distortion––from G.729A to G.723.1                                   we proposed a fast adaptive-codebook search
     Method        Average SD     Outliers (%)                                algorithm (Jung et al., 2001). In this algorithm, the
                   [dB]                                                       pitch delay and pitch gain are computed sequen-
                                  2–4 dB              >4 dB
                                                                              tially, rather than simultaneously. At first, the
     Tandem        2.90           66.29               14.47                   pitch delay is computed using a 1st-order pitch
     Transcoding   1.15            3.90                0.02
                                                                              predictor, and later, the pitch gains of the 5th
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                                                    27

                                                                         LPC spectrum
                             40
                                                                                                         G.723.1(original)
                                                                                                         Tandem
                                                                                                         Transcoding
                             30



                             20
                        dB




                             10



                              0



                             -10



                             -20
                                   0        500      1000        1500           2000       2500       3000        3500         4000
                                                                                 Hz

                                       Fig. 9. Comparison of LPC spectrum (from G.729A to G.723.1).




Table 4
Transmission delay––from G.729A to G.723.1
                                         Operation                       Tandem (ms)                                           Transcoding (ms)
  B (G.729A)                             Buffering                        35                                                    35
                                                                              E                                                     E
                                         Encoding                        3 Â pB                                                3 Â pB
                                                                              E                                                 TR
  Intermediate processing                Decoding                        3 Â pA                                                pBA
                                                                          E
                                         Encoding                        pA
                                         Delay                           5                                                     0
                                                                          D                                                     D
  A (G.723.1)                            Decoding                        pA                                                    pA
                                                                                 E    D      E    D                                  E    TR     D
  Total delay (ms)                                                       40 þ 3ðpB þ pB Þ þ pA þ pA                            25 þ pA þ pBA þ 2pB




                                                            1st frame                   2nd frame                  3rd frame
                        <G.729A>
                                                      Clp_B1                                   Clp_B4


                                                       Pitch-                                   Pitch-
                                                     Smoothing                                Smoothing
                       <G.723.1>
                                          Clp_A0                        T_op1                                  T_op2


                                                           1st                    2nd                  3rd               4th
                                                        subframe                subframe            subframe           subframe
                                                                                       Current frame

                        Fig. 10. Open-loop pitch estimation using pitch smoothing (G.729A ! G.723.1).
28                                         S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

order pitch predictor are estimated using the                                                               G.729A pitch gain

computed pitch delay. The pitch gain is computed                                                                 gp
                                                                           Training    G.729A         G.729A             G.723.1
using the function given by                                                speech
                                                                           data        Encoder        Encoder            Encoder

             X
             N À1
MS– 0ACB ¼
  E                 ft½nŠ À g^ À LŠg2
                             t½n
             n¼0
                                                                                                 Adaptive-Codebook
                                                                                                  Gain Search Index
             X
             N À1                 X
                                  N À1                                                            Table Generation
                                                                                                                                indexg

         ¼          t2 ½nŠ À 2g           t½nŠ^½n À LŠ
                                              t                                                                                          G.723.1
                                                                                                                                         pitch gain
                                                                                                                                         table index
             n¼0                  n¼0
                     X
                     N À1
                                                                          Fig. 11. Gain index table generation for fast adaptive-code-
             þ g2           t2 ½n À LŠ;                         ð7Þ       book search.
                     n¼0

where t½nŠ and ^ are target signals of the adap-
                    t½nŠ                                                  tics. The process of the gain index table generation
tive-codebook search and weighted synthesis sig-                          or pre-selection is shown in Fig. 11.
nal, respectively. g is the pitch gain of the 1st-order                       In this approach, the similarity or relationship
pitch predictor, and N is the sub-frame size.                             of pitch gains of each coder is considered. As
Optimum pitch delay minimizing Eq. (7) is ob-                             shown in Fig. 11, the decoded PCM signal from
tained by differentiating Eq. (7) with respect to g                        G.729A is encoded by G.723.1, like a tandem
and by setting the derivative to zero:                                    connection. In this process, we can find the sta-
     PN À1                                                                tistical information between the pitch gains of
                ^
        n¼0 t½nŠt ½n À LŠ
g ¼ PN À1                 :                         ð8Þ                   two coders. The dynamic range of the pitch gain
             ^2 ½n À LŠ
         n¼0 t                                                            value of G.729A is from 0 to 1.2. We divide this
    Substituting Eq. (8) into Eq. (7) gives an                            pitch gain range into eight sub-sections, and the
expression for the weighted squared error in terms                        conjugate structure of G.729A gain codebook
of the pitch delay, and we can conclude that                              is considered for the boundary value of each sub-
minimizing Eq. (7) is equivalent to maximizing                            section. Then, in the following encoding process
       P                     2                                          of G.723.1, the determined adaptive-codebook
            N À1                                                          gain table indices are stored at each sub-frame.
                    ^
            n¼0 t½nŠt ½n À LŠ
  0
Clag ¼     PN À1 2               :              ð9Þ                       As a result, the distribution of the most proba-
                  ^
              n¼0 t ½n À LŠ                                               ble top 40 or 85 gain indices for 85 or 170 entry
                          0                                               gain table, respectively, can be listed up at each
Finally, L maximizing Clag is determined from a
                                                                          pitch gain sub-section of G.729A. For reliability of
closed-loop search procedure. This scheme is
                                                                          gain index distribution, we use speech signals re-
computationally very efficient, and there is no
                                                                          corded by a female and male speaker, with each
quality loss compared to the original coder (Jung
                                                                          sentence 8 s long, and a total of 96 sentences per
et al., 2001).
                                                                          speaker. Results of the subjective listening test
   In addition to above process, we apply an extra
                                                                          confirm that no degradation of speech quality was
algorithm in the vector quantization of pitch gains.
                                                                          introduced.
The pitch gain of the G.723.1 coder is vector-
quantized using an 85 or 170-entry codebook, 170
for lower rate and 85 or 170 for higher rate,                             3.2.4. Fast fixed-codebook search
respectively. This process is another major com-                             In the fixed-codebook search of G.723.1 at 5.3
putational burden for implementation. In the                              kbit/s, four pulses are searched based on an
developed transcoding algorithm, the search range                         ACELP structure for every sub-frame (ITU-T
of the gain codebook is limited depending on the                          Rec. G.723.1, 1996). Each sub-frame is divided
pitch gain of G.729A. In other words, the indices                         into four tracks, and the pulse and sign of each
of the adaptive-codebook gain table are pre-                              pulse are determined using a nested-loop search. In
selected depending on speech signal characteris-                          the theoretically worst case, the pulse locations are
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                                           29
                                                                               sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
searched with the combination of 8 Â 8 Â 8 Â 8 ¼                                   X  p

84 with the analysis-by-synthesis technique. In              LPC CD ¼ 10= log10 2         ðCx ðiÞ À Cy ðiÞÞ2 ;
practice, limiting the number of entering the loop                                              i¼1

for the last pulse search can significantly reduce                                                                              ð10Þ
the complexity of the fixed-codebook search. In
order to simplify the fixed-codebook search of                where Cx ðiÞ and Cy ðiÞ are LPC cepstral coefficients
G.723.1, a depth-first tree search which is used in           of the input and output of the codec, and p is the
the G.729A fixed-codebook search is employed in               order of LPC filter. Measurement results are
this paper. Using this scheme, the combination of            summarized in Tables 5 and 6, where lower LPC-
the pulse location can be reduced down to                    CD and highly PESQ imply better speech quality.
2 Â fð8 Â 8Þ þ ð8 Â 8Þg (Jung et al., 2001).                 As shown, for both measures, the proposed tran-
   The fixed-codebook excitation of G.723.1 at 6.3            coding scheme is judged as having better quality
kbit/s is modeled by MP-MLQ based on multi                   than tandem coding.
pulse excitation. The fixed-codebook parameters
such as quantized gain, the signs and locations of           4.2. Subjective quality evaluation
the pulses are sequentially optimized. This process
is repeated for both the odd and even grids. To                 For subjective evaluations, informal A–B pref-
reduce the complexity of this module, we adopt the           erence tests are conducted. The tests are performed
fast algorithm proposed in (Jung et al., 2001),              by 30 naive listeners. In the tests, the subjects are
which is based on ACELP search. In this fast                 asked to choose a sound between pairs of samples
search algorithm, the 60 positions in a sub-frame            presented through a headset. If the subjects can
are divided into several tracks. Each pulse can              not distinguish the quality difference, they are
have either amplitude +1 or )1. The result in (Jung
et al., 2001) shows that the reduction ratio is about        Table 5
80% when fully optimized for DSP implementa-                 Objective test results (G.723.1 at 5.3 kbit/s)
tion.                                                                               LPC-CD (dB)             PESQ
                                                                                    Female       Male       Female           Male

4. Performance evaluations                                     Tandem (A–B)         3.98         3.90       3.023            3.362
                                                               Transcoding          3.66         3.54       3.051            3.376
                                                                 (A–B)
   To evaluate the performance of the proposed                 Tandem (B–A)         4.17         3.65       3.017            3.384
transcoding algorithm, subjective preference tests             Transcoding          3.86         3.22       3.077            3.426
are performed together with an objective quality                 (B–A)
evaluation and complexity check. Subjective                  Notice: A to B (from G.723.1 to G.729A), B to A (from G.729A
quality is evaluated via an A–B preference test, and         to G.723.1).
as an objective quality measure, LPC-CD and
PESQ are used.                                               Table 6
                                                             Objective test results (G.723.1 at 6.3 kbit/s)
4.1. Objective quality evaluation                                                   LPC-CD (dB)             PESQ
                                                                                    Female       Male       Female           Male
   For objective evaluation measures, the NTT
Korean speech database is used (NTT-AT, 1994).                 Tandem (A–B)         3.95         3.86       3.106            3.465
                                                               Transcoding          3.53         3.37       3.118            3.455
Each sentence is 8 s long with 8 kHz sampling
                                                                 (A–B)
frequency, four male and female speakers with 24               Tandem (B–A)         3.86         3.59       3.107            3.492
sentences per speaker are recorded under a quiet               Transcoding          3.70         3.11       3.133            3.507
environment, so a total of 96 sentences were used                (B–A)
for LPC-CD and PESQ evaluation. LPC-CD                       Notice: A to B (from G.723.1 to G.729A), B to A (from G.729A
(Kitawaki et al., 1998) is defined as                         to G.723.1).
30                                      S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31

Table 7                                                                   processor (Texas Instruments, 1996, 1998). Since
Preference test results (G.723.1 at 5.3 kbit/s)                           our purpose is to compare the relative complexity
     Preference      G.723.1 fi G.729A         G.729A fi G.723.1            both methods, we do not use a code optimization
                     Female      Male         Female       Male           process. And the complexity for total encoding
                     (%)         (%)          (%)          (%)            process can be varied because the constrained
     Tandem          26.9        30.6        26.3          35.8           search range for open-loop pitch depends on the
     Transcoding     42.5        36.9        25.4          45.4           input signal. But, the variation of complexity in
     No preference   30.6        32.5        48.3          18.8           that module is extensively small. So the depen-
                                                                          dency of complexity reduction ratio on input sig-
                                                                          nal is absolutely negligible.
Table 8
Preference test results (G.723.1 at 6.3 kbit/s)                              Results in Table 9 indicate that, being com-
                                                                          pared with tandem coding, the processing time of
     Preference       G.723.1 fi G.729A       G.729A fi G.723.1
                                                                          each module is noticeably reduced by using the
                      Female      Male       Female       Male
                                                                          transcoding algorithm. Also, the total encoding
                      (%)         (%)        (%)          (%)
                                                                          time of transcoding is close to 62–74% of the
     Tandem           32.4        35.8       39.8         38.6
                                                                          encoding time needed for tandem coding. Thus, it
     Transcoding      47.7        45.4       41.4         41.5
     No Preference    19.9        18.8       18.8         19.9            can be said that the developed transcoding algo-
                                                                          rithm can synthesize equivalent quality to the
                                                                          tandem coding with complexity about 26–38%
asked to choose ‘‘no preference’’. The test material                      lower than tandem coding.
includes 16 clean speech sentences obtained with
four male and four female speakers. Tables 7 and 8
show the results. As shown, tandem and trans-                             5. Conclusions
coding are preferred in a similar ratio or slightly
preferred to transcoding. Results imply that the                             In this paper, we proposed an efficient trans-
listeners cannot distinguish the quality of tandem                        coding algorithm that could convert 5.3/6.3 kbit/s
coding from that of transcoding. Thus, it can be                          G.723.1 bitstream into 8 kbit/s G.729A bitstream,
inferred that the proposed transcoding algorithm                          and vice versa. This transcoding algorithm can
produces speech with a quality equivalent to that                         reduce several problems caused by the tandem
of tandem coding.                                                         method, such as quality degradation, high com-
                                                                          plexity, and longer delay time. The proposed
4.3. Complexity check                                                     transcoding algorithm is composed of four steps:
                                                                          LSP conversion, open-loop pitch conversion, fast
   To check the complexity of the proposed algo-                          adaptive-codebook search, and fast fixed-code-
rithm, both tandem and transcoding algorithms                             book search. Subjective and objective evaluation
are implemented on a TI TMS320C6201 DSP                                   results showed that the proposed transcoding

Table 9
Comparison of complexity
     MIPS                    G.723.1 fi G.729A                     G.729A fi G.723.1(5.3k)           G.729A fi G.723.1(6.3k)
                             Tandem               Transcoding     Tandem           Transcoding     Tandem          Transcoding
     LPC  LSP                6.41                 2.36            6.94             5.55            6.94            5.55
     Open-loop                0.94                 0.21            1.54             1.19            1.54            1.19
     ACB                      2.45                 2.45           10.14             6.34            8.89            4.78
     FCB                      4.30                 4.30           10.50             2.16           17.27            6.68
     Others                   4.04                 4.04            8.05             8.05            8.05            8.05
     Total (encoding)        18.15                13.37           37.17            23.28           42.69           26.25
     Reduction ratio (%)     26.3                                 37.4                             38.5
S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31                                  31

algorithm could produce equivalent speech quality                   Kang, H.G., Kim, H.K., Cox, R.V., 2000. Improving trans-
to the tandem coding with shorter delays and less                      coding capability of speech coders in clean and frame
                                                                       erasured channel environments. In: Proceedings of IEEE
computational complexity.                                              Workshop on Speech Coding. pp. 78–80.
                                                                    Kitawaki, N., Nagabuchi, H., Itoh, K., 1998. Objective quality
                                                                       evaluation for low-bit-rate speech coding system. IEEE J.
References                                                             Selected Areas Commun. 7 (2), 242–248.
                                                                    NTT-AT multi-lingual speech database for telephonometry
Campos Neto, A.F., Crcoran, F.L., 1999. Performance assess-            1994. Available from http://www.ntt-at.com/products_e/
   ment of tandem connection of enhanced cellular coders.              speech/index.html.
   IEEE Proceedings of International Conference on Acoustics        Rabiner, L.R., Schafer, R.W., 1978. Digital Processing of
   Speech Signal Processing, March 1999, pp. 177–180.                  Speech Signals. Prentice Hall.
Epperson, J.F., 2002. Introduction to Numerical Methods and         Salami, R., Laflamme, C., Bessette, B., Adoul, J.P., 1997a.
   Analysis. Wiley.                                                    ITU-T G.729 Annex A: reduced complexity 8 kb/s CS-
Hersent, O., Gurle, D., Petit, J.-P., 2000. IP Telephony Packet-       ACELP Codec for digital simultaneous voice and data.
   based Multimedia Communications Systems. Addison Wesley.            IEEE Commun. Mag., 53–63.
ITU-T Rec. G.723.1, 1996. Dual-rate Speech Coder For Multi-         Salami, R., Laflamme, C., Bessette, B., Adoul, J-P., 1997b.
   media Communications Transmitting at 5.3 and 6.3 kbit/s.            Description of ITU-T recommendation G.729 Annex A:
ITU-T Rec. G.729, 1996a. Coding of Speech at 8 kbit/s Using            reduced complexity 8 kbit/s CS-ACELP Codec. In: IEEE
   Conjugate Structure Algebraic-Code-Excited-Linear-Pre-              Proceedings of International Conference Acoustic Speech
   diction (CS-ACELP).                                                 Signal Processing, Vol. 2. pp. 775–778.
ITU-T Rec. G.729 Annex A, 1996b. Reduced Complexity 8               Texas Instruments, 1996. TMS320C62x/67x CPU and Instruc-
   kbit/s CS-ACELP Speech Codec.                                       tion Set.
ITU-T Rec. P.862, 2000. Perceptual evaluation of speech             Texas Instruments, 1998. TMS320C6x C Source Debugger
   quality (PESQ), an objective method of end-to-end speech            UserÕs Guide.
   quality assessment of narrowband telephone networks and          TIA/EIA PN-3467, 1996. TDMA Radio Interface, Enhanced
   speech codecs, May 2000.                                            Full-Rate Speech Codec, February 1996.
Jung, S.K., Park, Y.C., Yoon, S.W., Cha, I.H., Youn, D.H.,          Yoon, S.W., Jung, S.K., Park, Y.C., Youn, D.H., 2001. An
   2001. A proposal of fast algorithms of ITU-T G.723.1 for            efficient transcoding algorithm for G.723.1 and G.729A
   efficient multi channel implementation. In: Proceedings of            speech coders. In: Proceedings of Eurospeech, September
   Eurospeech, September 2001, pp. 2017–2020.                          2001, pp. 2499–2502.

More Related Content

What's hot

A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMA NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMVLSICS Design
 
FPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial TelevisionFPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial TelevisionAI Publications
 
A low complex modified golden
A low complex modified goldenA low complex modified golden
A low complex modified goldenIJCNCJournal
 
Ldpc based error correction
Ldpc based error correctionLdpc based error correction
Ldpc based error correctionVijay Balaji
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...IJMERJOURNAL
 
An analytical approach to generate unique song signal(auss)
An analytical approach to generate unique song signal(auss)An analytical approach to generate unique song signal(auss)
An analytical approach to generate unique song signal(auss)csandit
 
AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS)
AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS) AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS)
AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS) cscpconf
 
Hardware Architecture of Complex K-best MIMO Decoder
Hardware Architecture of Complex K-best MIMO DecoderHardware Architecture of Complex K-best MIMO Decoder
Hardware Architecture of Complex K-best MIMO DecoderCSCJournals
 
Chaos Encryption and Coding for Image Transmission over Noisy Channels
Chaos Encryption and Coding for Image Transmission over Noisy ChannelsChaos Encryption and Coding for Image Transmission over Noisy Channels
Chaos Encryption and Coding for Image Transmission over Noisy Channelsiosrjce
 
Combining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicityCombining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicityIAEME Publication
 
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdmHybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdmIAEME Publication
 
Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...
Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...
Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...IJERA Editor
 
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...ijwmn
 
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...IRJET Journal
 

What's hot (20)

A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMA NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
 
79 83
79 8379 83
79 83
 
FPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial TelevisionFPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial Television
 
A low complex modified golden
A low complex modified goldenA low complex modified golden
A low complex modified golden
 
Ldpc based error correction
Ldpc based error correctionLdpc based error correction
Ldpc based error correction
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
Ijecet 06 06_002
Ijecet 06 06_002Ijecet 06 06_002
Ijecet 06 06_002
 
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
 
An analytical approach to generate unique song signal(auss)
An analytical approach to generate unique song signal(auss)An analytical approach to generate unique song signal(auss)
An analytical approach to generate unique song signal(auss)
 
AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS)
AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS) AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS)
AN ANALYTICAL APPROACH TO GENERATE UNIQUE SONG SIGNAL (AUSS)
 
Hardware Architecture of Complex K-best MIMO Decoder
Hardware Architecture of Complex K-best MIMO DecoderHardware Architecture of Complex K-best MIMO Decoder
Hardware Architecture of Complex K-best MIMO Decoder
 
ewsn09
ewsn09ewsn09
ewsn09
 
Chaos Encryption and Coding for Image Transmission over Noisy Channels
Chaos Encryption and Coding for Image Transmission over Noisy ChannelsChaos Encryption and Coding for Image Transmission over Noisy Channels
Chaos Encryption and Coding for Image Transmission over Noisy Channels
 
Combining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicityCombining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicity
 
Turbo codes.ppt
Turbo codes.pptTurbo codes.ppt
Turbo codes.ppt
 
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdmHybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdm
 
Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...
Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...
Performance Analysis of DRA Based OFDM Data Transmission With Respect to Nove...
 
Ijairp14 radhe
Ijairp14 radheIjairp14 radhe
Ijairp14 radhe
 
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
 
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
Optimizing Data Encoding Technique For Dynamic Power Reduction In Network On ...
 

Viewers also liked

Jesse Sampermans Research Project Presentation
Jesse Sampermans Research Project PresentationJesse Sampermans Research Project Presentation
Jesse Sampermans Research Project PresentationJesse Sampermans
 
PlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.pptPlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.pptgrssieee
 
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONSTH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONSgrssieee
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...CSCJournals
 

Viewers also liked (9)

Jesse Sampermans Research Project Presentation
Jesse Sampermans Research Project PresentationJesse Sampermans Research Project Presentation
Jesse Sampermans Research Project Presentation
 
test.pptx
test.pptxtest.pptx
test.pptx
 
Yasset iso point-cigb-2012
Yasset iso point-cigb-2012Yasset iso point-cigb-2012
Yasset iso point-cigb-2012
 
Sport tandem
Sport tandemSport tandem
Sport tandem
 
PlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.pptPlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.ppt
 
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONSTH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
 
Gomezetal ismir2012
Gomezetal ismir2012Gomezetal ismir2012
Gomezetal ismir2012
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
 

Similar to An efficient transcoding algorithm for G.723.1 and G.729A ...

Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Editor IJARCET
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Editor IJARCET
 
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICESCODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICESijmnct
 
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICESCODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICESijmnct_journal
 
Iterative network channel decoding with cooperative space-time transmission
Iterative network channel decoding with cooperative space-time transmissionIterative network channel decoding with cooperative space-time transmission
Iterative network channel decoding with cooperative space-time transmissionijasuc
 
A New Bit Split and Interleaved Channel Coding for MIMO Decoder
A New Bit Split and Interleaved Channel Coding for MIMO DecoderA New Bit Split and Interleaved Channel Coding for MIMO Decoder
A New Bit Split and Interleaved Channel Coding for MIMO DecoderIJARBEST JOURNAL
 
Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...
Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...
Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...ijcnac
 
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...IRJET Journal
 
Implementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGAImplementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGAMangaiK4
 
Analysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix ModalAnalysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix ModalMangaiK4
 

Similar to An efficient transcoding algorithm for G.723.1 and G.729A ... (20)

Ff34970973
Ff34970973Ff34970973
Ff34970973
 
Turbo encoder and decoder chip design and FPGA device analysis for communicat...
Turbo encoder and decoder chip design and FPGA device analysis for communicat...Turbo encoder and decoder chip design and FPGA device analysis for communicat...
Turbo encoder and decoder chip design and FPGA device analysis for communicat...
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377
 
37 44
37 4437 44
37 44
 
Ck25516520
Ck25516520Ck25516520
Ck25516520
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICESCODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
 
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICESCODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
CODING SCHEMES FOR ENERGY CONSTRAINED IOT DEVICES
 
Iterative network channel decoding with cooperative space-time transmission
Iterative network channel decoding with cooperative space-time transmissionIterative network channel decoding with cooperative space-time transmission
Iterative network channel decoding with cooperative space-time transmission
 
Gg2612711275
Gg2612711275Gg2612711275
Gg2612711275
 
B0530714
B0530714B0530714
B0530714
 
A New Bit Split and Interleaved Channel Coding for MIMO Decoder
A New Bit Split and Interleaved Channel Coding for MIMO DecoderA New Bit Split and Interleaved Channel Coding for MIMO Decoder
A New Bit Split and Interleaved Channel Coding for MIMO Decoder
 
Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...
Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...
Coverage of WCDMA Network Using Different Modulation Techniques with Soft and...
 
Successive cancellation decoding of polar codes using new hybrid processing ...
Successive cancellation decoding of polar codes using new  hybrid processing ...Successive cancellation decoding of polar codes using new  hybrid processing ...
Successive cancellation decoding of polar codes using new hybrid processing ...
 
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
 
Ai25204209
Ai25204209Ai25204209
Ai25204209
 
Implementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGAImplementation of High Speed OFDM Transceiver using FPGA
Implementation of High Speed OFDM Transceiver using FPGA
 
Analysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix ModalAnalysis of Women Harassment inVillages Using CETD Matrix Modal
Analysis of Women Harassment inVillages Using CETD Matrix Modal
 
38 41
38 4138 41
38 41
 

More from Videoguy

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 

More from Videoguy (20)

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A Survey
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 

An efficient transcoding algorithm for G.723.1 and G.729A ...

  • 1. Speech Communication 43 (2004) 17–31 www.elsevier.com/locate/specom An efficient transcoding algorithm for G.723.1 and G.729A speech coders: interoperability between mobile and IP network q Sung-Wan Yoon *, Hong-Goo Kang, Young-Cheol Park, Dae-Hee Youn MCSP LAB, Department of Electrical & Electronic Engineering, Yonsei University, 134 Shinchon-dong, Sudaemun-gu, Seoul 120-749, South Korea Received 16 December 2003 Abstract In this paper, an efficient transcoding algorithm for G.723.1 and G.729A speech coders is proposed. Transcoding in this paper is completed through four processing steps: LSP conversion, pitch interval conversion, fast adaptive-code- book search, and fast fixed-codebook search. For maintaining minimum distortion, sensitive parameters to quality such as adaptive- and fixed-codebooks are re-estimated from synthesized target signals. To reduce overall complexity, other parameters are directly converted in parametric levels without running through the decoding process. Objective and subjective preference tests verify that the proposed transcoding algorithm has comparable quality to the classical en- coder–decoder tandem approach. To compare the complexity of the algorithms, we implement them on the TI TMS320C6201 DSP chip. As a result, the proposed algorithm achieves 26–38% reduction of the overall complexity with a shorter processing delay. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Speech coding; Transcoding; Tandem; G.723.1; G.729; G.729A; CELP; LSP conversion; Pitch conversion; Fast codebook search 1. Introduction distinctive features, but they obviously tend to share common applications such as voice messag- For the last two decades, many new speech ing, and voice over internet protocol (VoIP). In coding standards have been established. Among such applications, the interoperability between the them, ITU-T G.723.1 (ITU-T Rec. G.723.1, 1996) networks is crucial, so that endpoint devices are and G.729 (ITU-T Rec. G.729, 1996a) cover a wide required to support both standard coders. How- range of applications requiring low bit rates. Both ever, supporting multiple coders in a single device standards have different applications due to their costs system complexity and power, which will become a significant issue when the device needs to be portable and is powered by small dry cell bat- q teries. Thus, to manifest interoperability, it is This work was supported by BERC-KOSEF. * Corresponding author. Tel.: +82-2-2123-4534; fax: +82-2- desirable for network systems to support commu- 312-4584. nications between two endpoint devices employing E-mail address: yocello@mcsp.yonsei.ac.kr (S.-W. Yoon). different types of coders. 0167-6393/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.specom.2004.01.004
  • 2. 18 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 A simple approach to overcome this interoper- connection. But, if the direct mapping is not pos- ability problem is to place the decoder/encoder of sible due to dissimilarities in quantizing the exci- one coder and the encoder/decoder of the other tation signal, the performance of the direct coder in tandem. However, the encoder–decoder mapping method will not be guaranteed. In that tandem is often associated with several problems. case, the other algorithms such as partial encoding First of all, quality degradation of the synthesized process that uses the source coder information to speech is inevitable because a speech signal is en- reduce the complexity of the encoding process in coded and decoded twice by the two different the target coder maintaining the quality should be speech coders. Quantization errors due to each applied. encoding process are accumulated, which results in In this paper, we propose an efficient transcod- the degradation of speech quality. In the case of ing algorithm working with 5.3/6.3 kbit/s G.723.1 cross tandeming, the mean opinion score (MOS) and 8 kbit/s G.729A (ITU-T Rec. G.729, 1996b) drops more than that of single tandeming (Cam- speech coders. According to the survey in (Hersent pos Neto and Crcoran, 1999, 2000). Second, et al., 2000), two coders are the most widely de- computational demands for the decoder–encoder ployed standards in VoIP systems nowadays. tandem may be too high to implement it into the Considering the frame length of two-speech cod- service systems for a large number of users. When ers, parameters corresponding to one frame of the tandem algorithms are placed in the gateways G.723.1 are converted to three sets of equivalent of cable networks or the base stations of wireless G.729A parameters. The proposed transcoding networks, the complexity eventually will be limit of algorithm is composed of four processing steps: the full communication systems. Finally, tandem line spectral pair (LSP) conversion, pitch interval coding is accompanied with an additional delay in conversion, fast adaptive-codebook search, and the communication links because an additional fast fixed-codebook search. In the LSP conversion, look-ahead delay for the LPC analysis is inevitable since variation of LSP trajectory in a steady state to obtain target bitstreams. speech segment of each coder is somehow negli- Unlike the tandem coding, transcoding can be gible in quality aspects, we directly convert them in used to overcome these difficulties. Transcoding is the LSP domain using a codebook matching pro- to translate source bitstreams to target ones cedure. For the pitch interval conversion, we without running through complete decoding– cannot use a direct mapping method because the encoding processes. Thus, it can minimize the search ranges of pitch intervals in two coders are quality degradation of the synthesized speech and different. From the observation that pitch contour computational complexities with no additional is monotonically varied with a small variation in delay. voiced or voice-like speech signals, we estimate the Past researches on transcoding, especially open-loop pitch in constrained regions relating to (Campos Neto and Crcoran, 1999) which pro- the pitch values of the adjacent frames. This is an posed the transcoding between G.729 and IS-641 efficient way in the respect of complexity reduc- (TIA/EIA PN-3467, 1996), was focused on the tion. In the adaptive-codebook and fixed-code- conversion of LSP and gain information. In the book search process, the codebook structures of process of the other encoding modules such as the two coders are different from each other. We adaptive-codebook delay search and fixed-code- apply a fast search algorithm for reducing the book index search, both coders share the infor- complexity of the transcoding system. Using the mation for excitation of each other. In other above four processing steps, the proposed trans- words, the pitch delay and fixed-codebook index coding algorithm translates bitstreams of the are not changed, and directly mapped from one source coder to those of the target coder with coder to the other coder. It is possible because of minimum distortion and reduced complexity. the many similarities in the structure of comprising To verify the performance of the proposed the excitation. This algorithm showed the im- algorithm, we perform objective and subjective proved performance compared to cross tandeming tests such as LPC cepstral distance (LPC-CD)
  • 3. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 19 (Kitawaki et al., 1998) and perceptual evaluation used to construct the short-term perceptual of speech quality (PESQ) (ITU-T Rec. P.862, weighting filter which is used for filtering the entire 2000) with various speech sets. In addition we frame to obtain the perceptual weighted speech compare the complexity of them to that of the signal. For every two sub-frames, the open-loop tandem algorithm by implementing the algorithms pitch period is computed using the weighted using TI TMS320C6201 DSP processor (Texas speech signal. Later, the speech signal runs Instruments, 1996). Evaluation results confirm through the adaptive and fixed-codebook search that the proposed transcoding achieves the procedures on a sub-frame basis. The adaptive- reduction of overall complexity with a shorter codebook search is performed using a 5th-order processing delay. Also, the preference tests reveal pitch predictor, and the closed-loop pitch and the superiority of the proposed transcoding algo- pitch gain are computed. Finally, the non-periodic rithm in comparison to the tandem. component of the excitation is approximated by This paper is organized as follows. G.723.1 and two types of fixed codebooks: multi-pulse maxi- G.729 speech coding algorithms are briefly de- mum likelihood quantization (MP-MLQ) for a scribed in Section 2. In Section 3, transcoding higher bit rate, and an algebraic code excited linear algorithms from G.723.1 to G.729A and from prediction (ACELP) for a lower bit rate. G.729A to G.723.1 are proposed. Section 4 de- scribes the performance evaluation of the pro- 2.2. ITU-T G.729 speech coder posed transcoding algorithm. Conclusions are presented in Section 5. ITU-T 8 kbit/s G.729 (ITU-T Rec. G.729, 1996a) is developed based on conjugated-structure ACELP (CS-ACELP). G.729 operates on the 2. ITU-T G.723.1 and G.729A frame length of 10 ms, and there is 5 ms look- ahead for linear prediction (LP) analysis, resulting Transcoding algorithm in this paper is associ- in a total 15 ms algorithmic delay. For every 10 ms ated with ITU-T G.723.1 (ITU-T Rec. G.723.1, frame, 10th-order LPC coefficients are computed 1996) and ITU-T G.729A (ITU-T Rec. G.729, using the Levinson–Durbin recursion. The LPC 1996b). Each speech coder is widely used for coefficients for the 2nd sub-frame are quantized multimedia communication with low bit-rate using multi-stage vector quantization (MSVQ). applications such as VoIP and digital cellular. The unquantized LPC coefficients are used to Their operational features are briefly described in construct the short-term perceptual weighting fil- this chapter. ter. After computing the weighted speech signal, the open-loop pitch period is computed. To avoid 2.1. ITU-T G.723.1 speech coder pitch multipling errors, the delay range is divided into three sections, and smaller pitch values are ITU-T G.723.1 (ITU-T Rec. G.723.1, 1996), the favored. Then, the adaptive- and fixed-codebooks standard for the multimedia communication are searched to find optimum codewords. The speech coder, has two modes whose bit rates are adaptive-codebook search is performed using a 5.3 and 6.3 kbit/s. G.723.1 takes 30 ms of speech or 1st-order pitch predictor, and a fractional pitch other audio signals for encoding, and signals with delay is searched in a 1/3 sample resolution. In the the same length are reproduced by the decoder. In fixed-codebook search, non-periodic components addition, there is a look-ahead of 7.5 ms, resulting of excitation are modeled using algebraic code- in a total algorithmic delay of 37.5 ms. books with four pulses. For the efficiency of For every 60-sample sub-frame, a set of 10th- quantization of pitch- and fixed-codebook gains, order linear prediction coefficients (LPC) is com- two codebooks with conjugate structures are used. puted. The LPC set of the last sub-frame is G.729 Annex A is a complexity-reduced version quantized using a predictive split vector quantizer of G.729 (ITU-T Rec. G.729, 1996b). The com- (PSVQ). The unquantized LPC coefficients are plexity of G.729A is about 50% of G.729 (Salami
  • 4. 20 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 et al., 1997a). Algorithmic differences between 3.1. Transcoding from G.723.1 to G.729A G.729 and G.729A are found in (Salami et al., 1997b). The transcoding algorithm proposed in this paper has an asymmetric structure between Tx and Rx paths. For the transmission of speech data 3. The proposed transcoding algorithm from G.723.1 to G.729A, the transcoding process is involved with the LSP conversion and the open- A simple transcoding algorithm can be obtained loop pitch conversion. Transcoding is executed on by placing the decoder/encoder of one coder and the basis of G.723.1 frame size; thus one G.723.1 the encoder/decoder of the other coder in tandem, frame is converted to three frames of G.729A. A as shown in Fig. 1(a). However, the decoder–en- block diagram of the proposed transcoding algo- coder tandem often raises several problems such as rithm from G.723.1 to G.729A is shown in Fig. 2. degradation of speech quality, high computational The left dotted box in the figure corresponds to the load, and additional delay. All these problems are decoder module of G.723.1 and the right one due to the fact that the speech signal should pass corresponds to the encoder module of G.729A. complete processes for the decoding and encoding For transcoding from the G.723.1 and G.729A, of two associated coders. LSP sets of G.723.1 are converted to those of Comparing with the tandem coding, transcod- G.729A, and they are quantized by the G729A ing, a direct translation of one bitstream to an- encoding scheme. Then, the quantized LSP sets are other, is more beneficial. Fig. 1(b) shows a block converted to LPC again, so the perceptually diagram of the transcoding. The transcoding can weighted synthesis filter is constructed for each avoid the degradation of the synthesized speech sub-frame. Afterwards, the target signal for open- quality because the several source parameters are loop pitch estimation is computed by filtering the directly translated in the parametric domain in- excitation through the perceptually weighted filter. stead of being re-estimated from the decoded PCM For the open-loop pitch estimation of G.729A, a data. In this respect, transcoding algorithm is ex- pitch smoothing technique with the closed-loop pected to show less distortion than the tandem pitch intervals of G.723.1 and G.729A is used. coding. Also, no additional delay is required, and Considering the similarity and the continuity of a reduced computational load is expected. the pitch parameter for the consecutive frames, the packets packets encoded by A PCM encoded by B Encoder Encoder Encoder Encoder /Decoder /Decoder /Decoder /Decoder A A B B packets PCM packets encoded by A encoded by B Base station / Gateway (a) packets packets encoded by A encoded by B Encoder Encoder Mutual Transcoder /Decoder /Decoder A↔B A B packets packets encoded by A encoded by B Base station / Gateway (b) Fig. 1. (a) Tandem and (b) transcoding.
  • 5. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 21 <G.729A> LSP index <G.723.1> Perceptually sw(n) Open-loop pitch LSP index LSP Weighting Estimation using Transcoder Filter Pitch Smoothing t(n) Pirch delay Li Li e(n) Adaptive-codebook BIT- pitch delay, Adaptive- u(n) Search Codebook + + gp pitch gain PACKING pitch gain, g p Decoder Remove Adaptive-codebook contribution x(n) Codebook index Fixed codebook Fixed- v(n) ci Codebook Fixed-codebook information Search Decoder gc c odebo ok ga in Fig. 2. Block diagram of transcoding from G.723.1 to G.729A. pitch smoothing method estimates the open-loop the bitstream and transmitted to the decoder of pitch in the constrained narrow region, around the G.729A. previous closed-loop pitch value of G.729A or the current closed-loop pitch of G.723.1. Using this 3.1.1. LSP conversion using linear interpolation approach, it is possible to generate equivalent A linear interpolation technique is employed to speech quality to tandem coding while spending translate the LSP parameters. Given a set of much less computational load. G.723.1 LSP parameters, three frame sets of After determining the open-loop pitch, the G.729A LSP parameters are computed because the adaptive- and fixed-codebooks are searched for frame length of G.723.1 is three times longer than G.729. The closed-loop pitch of the second sub- that of G.729A. Fig. 3 shows the LSP conversion frame at the current processing frame of G.729 is procedure, which can be denoted by used for a candidate of the open-loop pitch esti- mation for the next frame. It means that the open- 8 A < p1 ðjÞ; i ¼ 1; loop pitch of the next frame is estimated around piB ðjÞ ¼ 1ðp2 ðjÞ þ p3 ðjÞÞ; A A i ¼ 2; 1 6 j 6 10; the closed-loop pitch of the second sub-frame ob- :2A p4 ðjÞ; i ¼ 3; tained in the current frame. After the translation of LSP parameters and the open-loop pitch from ð1Þ G.723.1 to G.729A, the adaptive-codebook and the fixed-codebook parameters of G.729A are where pA and pB are the LSP parameters of determined by the standard encoding process. Fi- G.723.1 and G.729A, respectively, and i denotes nally, the parameters of G.729A are encoded to the frame index of G.729A. 1st subframe 2nd subframe 3rd subframe 4th subframe <G.723.1> LSP (G.723.1) LSP (G.723.1) LSP (G.723.1) LSP (G.723.1) p1A A p2 p3A A p4 Interpolation LSP (G.729A) LSP (G.729A) LSP (G.729A) p1B B p2 B p3 <G.729A> 3rd frame 1st frame 2nd frame Fig. 3. LSP conversion from G.723.1 to G.729A using linear interpolation.
  • 6. 22 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 To compare the performance of the tandem and shown in Fig. 4, the LPC spectrum obtained after the LSP conversion using linear interpolation, we the LSP conversion given in Eq. (1) matches clo- measured the spectral distortion to validate the sely to the reference spectrum, especially in the low efficiency of the proposed method. The decoded frequency region and around the formant. The LPC coefficients of G.723.1 were used as a refer- LPC spectrum after the decoder–encoder tandem, ence, to compare the distortions of the proposed however, indicates a larger spectral distortion than and tandem method, because we intended to show the proposed LSP conversion. Since speech quality the distortion added to the original spectrum. As is mainly determined from the accuracy of low and shown in Table 1, the spectral distortion of the formant frequency region components (Rabiner transcoding is much less than that of tandem. and Schafer, 1978), it can be said that the pro- Moreover the both outliers of 2–4 dB and above posed LSP conversion technique can provide bet- the 4 dB are extremely small for transcoding cases. ter speech quality than the tandem. We also compared the LPC spectrum of the tan- Also, it should be mentioned that the proposed dem and transcoding to that of the original one. LSP conversion method is more beneficial than the Fig. 4 shows the LPC spectrum in the voiced re- simple tandem with regard to complexity and gion of the speech signal, in which the LPC spec- processing delay. As described in (Kang et al., trum of G.729A is also shown as a reference. As 2000), the total delay of a speech communication system is decided by three factors: algorithmic delay, processing delay, and network delay. Con- Table 1 sidering the algorithmic and processing delays Spectral distortion––from G.723.1 to G.729A only, the total delays of the decoder–encoder Method Average Outliers (%) tandem and direct LSP conversion, denoted by SD [dB] Dtan and Dtrans are written by AB AB 2–4 dB >4 dB Tandem 2.81 61.71 14.03 Dtan ¼ 42:5 þ aA þ bA þ aB þ bB ; AB ð2Þ Transcoding 0.81 0.29 0.00 Dtrans ¼ 37:5 þ aA þ PAB þ bB ; AB ð3Þ LPC spectrum 40 G.729A (reference) Tandem 30 Transcoding 20 10 dB 0 -10 -20 -30 0 500 1000 1500 2000 2500 3000 3500 4000 Hz Fig. 4. Comparison of LPC spectrum (from G.723.1 to G.729A).
  • 7. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 23 Table 2 Transmission delay––from G.723.1 to G.729A Operation Tandem (ms) Transcoding (ms) A (G.723.1) Buffering 37.5 37.5 E E Encoding pA pA D TR Intermediate processing Decoding pA pAB E Encoding 3 Â pB Delay 5 0 D D B (G.729A) Decoding 3 Â pB 3 Â pB E D E D E TR D Total delay (ms) 42:5 þ pA þ pA þ 3ðpB þ pB Þ 37:5 þ pA þ pAB þ 3pB where subscripts A and B denote G.723.1 and sponding to the adjacent sub-frame (Yoon et al., G.729A, respectively, and AB denotes the con- 2001). nection from A to B. Also, am and bm (m ¼ A or B) The proposed open-loop pitch estimation is denote encoder and decoder delays, respectively, simplified using a pitch smoothing scheme shown and PAB denotes the delay introduced by the pro- in Fig. 5. At first, the closed-loop pitch of G.723.1 posed LSP conversion method. In the decoder– is compared with the pitch value obtained at the encoder tandem, an additional 5 ms look-ahead is 2nd sub-frame of the previous G.729A frame. If needed for LPC analysis. But the proposed LSP the distance between the two pitch values is less conversion technique does not introduce this look- than 10 samples, by considering the continuity of ahead delay because the processing of LPC anal- the pitch values, the closed-loop pitch of G.723.1 is ysis is not executed. As shown in Eqs. (2) and (3), determined as the open-loop pitch of G.729A. the total delay of the proposed method is at least 5 Otherwise, the pitch smoothing method is applied. ms less than that of the decoder–encoder tandem. To determine the threshold value of 10 samples for Because the processing delay, PAB , is less than the applying the pitch smoothing method in pitch sum of bA and aB , the total delay will be further interval conversion, the deviation of the adaptive- reduced. The delays induced by the decoder–en- codebook pitch search range, depending on the coder tandem and the LSP conversion technique corresponding open-loop pitch, is considered. 1 are summarized in Table 2. If the pitch difference is larger than 10 samples, E D E D TR In this table, pA ; pA ; pB ; pB ; pAB is the processing local delays maximizing Rðki Þ in Eq. (4) are time of encoding of G.723.1, decoding of G.723.1, searched in the range of Æ3 samples around the encoding of G.729A, decoding of G.729A, and closed-loop pitch delays of G.723.1 and G.729A, transcoding from G.723.1 and G.729A, respec- respectively: tively. When the LSP conversion is completed, the X 79 Rðki Þ ¼ sw ðnÞ Á sw ðn À ki Þ; perceptual weighting filter for the G.729A encoder ð4Þ n¼0 is constructed using the converted LSP parame- pi À 3 6 ki 6 pi þ 3ði ¼ A; BÞ; ters. Then, the perceptually weighted speech signal for the open-loop pitch estimation is generated using the perceptual weighting filter. where sw ðnÞ is the weighted speech signal, sub- scripts A and B, respectively, denote G.723.1 and G.729A, pi s are the closed-loop pitch delays, and 3.1.2. Open-loop pitch estimation using a pitch ki s are the open-loop pitch candidates. smoothing After the LSP conversion, the open-loop pitch for each frame of G.729A is estimated. In the proposed transcoding algorithm, the open-loop 1 For example, the closed-loop pitch search range of G.729A pitch is searched around the interval between the is from )5 to +4 sample around the open-loop pitch or pitch values of the source and target coder corre- preceding sub-frame pitch.
  • 8. 24 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 1st 2nd 3rd 4th subframe subframe subframe subframe <G.723.1> Clp_A1 Clp_A2 Clp_A3 Pitch- Pitch- Pitch- Smoothing Smoothing Smoothing <G.729 A> T_op 1 T _op2 T_op 3 Clp_B0 Clp_B1 Clp_B2 Clp_B3 Preivous 1st frame 2nd frame 3rd frame frame Fig. 5. Open-loop pitch estimation using pitch smoothing (G.723.1 fi G.729A). After determining the local delays, ti , i ¼ A; B G.723.1 is selected. Consequently, the smoothed maximizing Rðki Þ in each range, Rðki Þ is normalized open-loop pitch, Top , is determined as by the energy at the local maximum delays: Top ¼ t1 0 Rðti Þ R0 ðTop Þ ¼ R0 ðt1 Þ R ðti Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; P 2 i ¼ A; B: ð5Þ n sw ðn À ti Þ if R0 ðt2 Þ P 0:75 Á R0 ðTop Þ R0 ðTop Þ ¼ R0 ðt2 Þ To evaluate the validity of this pitch estimation Top ¼ t2 approach, we perform two experiments such as end observation of the pitch contour of two coders and comparison of the estimated open-loop pitch As shown in Fig. 6, the pitch contour of the contour of the target coder with the adaptive- proposed method, dashed line one, matches well codebook pitch contour of the source and target with the original pitch value of G.729A without coders. The adaptive-codebook pitch contour of any severe fluctuation. But, in the case of the target coder matches well with the estimated open- tandem approach, a drastic fluctuation appears loop pitch contour rather than the pitch value of such as pitch multiple errors even in the stable source one. We infer from the observation that the voiced speech segment. variation of pitch value between adjacent sub- The word ‘‘smoothed’’ means that the rapid frames is relatively stable and the estimated open- and sudden pitch variation often produced by the loop pitch is close to the previous closed-loop estimation in distorted speech segments, an even pitch value. Finally, the normalized local maxima voiced or voice-like region, is avoided, so the are compared with each other, and the local estimated open-loop pitch can represent the maximum from G.729A is favored. Thus, if the monotonically varying contour. local maximum of G.729A is larger than 3/4 times In the proposed scheme, the autocorrelation of of that of G.723.1, the open-loop pitch of G.729A the weighted speech is either computed around the is determined as the local maximum delay of constrained region or not at all. Also, as the local G.729A. 2 Otherwise, the local maximum delay of maximum value from the search process of origi- nal G.729A encoder in full search range is not involved, the open-loop pitch can be estimated 2 The 3/4 was determined from the results of our experiments with much less computational load in the con- based on large number of test database. strained search range. Furthermore, the pitch
  • 9. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 25 4000 smoothing scheme can reduce the drastic fluctua- 3000 tion of pitch contour in a voiced or voice-like 2000 segment, as shown in Fig. 6, like that of the tan- dem approach which results in sticky noise due to 1000 inaccurate pitch information. 0 -1000 3.2. From G.729A to G.723.1 -2000 -3000 For the case of speech transmission from G.729A to G.723.1, the proposed algorithm con- -4000 sists of LSP conversion using linear interpolation, -5000 pitch smoothing for the open-loop pitch conver- -6000 sion, fast adaptive-codebook search, and fast 0 200 400 600 800 1000 1200 1400 sample index fixed-codebook search. Parameters corresponding (a) to three frames of G.729A are collected and simultaneously converted to parameters corre- 65 sponding to one frame of G.723.1. A block dia- reference(G .729) 60 tandem gram of the proposed transcoding algorithm from pitch s moothing G.729A to G.723.1 is shown in Fig. 7. The overall 55 structure is similar to the connection from G.723.1 50 to G.729A, but fast adaptive-codebook and fixed- codebook search modules are different from pitch 45 G.723.1 to G.729A. 40 3.2.1. LSP conversion using a linear interpolation 35 LSP conversion from G.729A to G.723.1 is accomplished with parameters corresponding to 30 three frames of G.729A. However, only the 4th 25 0 2 4 6 8 10 12 14 16 18 frame parameters of G.723.1 are computed via a sample index linear interpolation, as given by (b) pA ðjÞ ¼ w1 Á p1 ðjÞ þ w2 Á p2 ðjÞ B B Fig. 6. Comparison of the open-loop pitch contour. (a) Voiced B speech segment. (b) The estimated open-loop pitch contour. þ w3 Á p3 ðjÞ; 1 6 j 6 10; ð6Þ Fig. 7. Block diagram of transcoding from G.729A to G.723.1.
  • 10. 26 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 1st frame 2nd frame 3rd frame B B <G.729A> p1B p2 p3 LSP (G.729A) LSP (G.729A) LSP (G.729A) Previous Interpolation Interpolation LSP (G.723.1) pA LSP (G.723.1) LSP (G.723.1) LSP (G.723.1) LSP (G.723.1) <G.723.1> 1st subframe 2nd subframe 3rd subframe 4th subframe Fig. 8. LSP conversion from G.729A to G.723.1 using linear interpolation. where superscripts A and B again denote parame- As described in Section 3.1.1, the proposed ters corresponding to G.723.1 and G.729, respec- approach reduces overall complexity and overall tively, and subscript denotes frame indices of processing delays. Processing delays are summa- G.729. w1 , w2 , and w3 are weighting factors. They rized and compared in Table 4, LPC spectra ob- are usually decided by considering geometric dis- tained in a voiced region shown in Fig. 9. As tance (Epperson, 2002). We set them at 0.05, 0.15 shown in the figure, the LPC spectrum of the and 0.80 respectively. The developed LSP con- proposed LSP conversion scheme matches well version process is illustrated in Fig. 8. In this with the reference compared to the tandem coding. transcoding algorithm, only the LSP set of the 4th sub-frame is computed by linear interpolation, 3.2.2. Open-loop pitch estimation using pitch because the LPC sets of the other sub-frames are smoothing computed by linear interpolation using the LSP Similar to the case from G.723.1 to G.729A, the information of 4th sub-frames between the previ- perceptually weighted speech signal is computed as ous and current frame. To compare the perfor- a target signal for the open-loop pitch estimation mance of the tandem and the LSP conversion of G.723.1. The same pitch smoothing technique is using linear interpolation in this direction, we applied. A block diagram of the developed open- measure the spectral distortion. The decoded LPC loop pitch estimation is shown in Fig. 10. As a cost coefficients of G.729A were used as a reference. As function, the normalized cross correlation at the shown in Table 3, the spectral distortion of the pitch-lag candidate is computed, which have been LSP conversion is much less than that of tandem. used for the original open-loop pitch estimation Moreover the both outliers of 2–4 dB and above module in G.723.1 (ITU-T Rec. G.723.1, 1996). the 4 dB are extremely less, too. Also, as shown in Fig. 9, the LPC spectrum of the transcoding 3.2.3. Fast adaptive-codebook search matches well with the reference spectrum obtained The adaptive-codebook search in G.723.1 uses from the original G.723.1 encoder compared to a 5th order pitch predictor, and estimates the pitch that of the tandem coding. delay and gain simultaneously. Thus, the adaptive codebook search in G.723.1 is computationally demanding mainly because the pitch delay and Table 3 pitch gain are searched simultaneously. Previously, Spectral distortion––from G.729A to G.723.1 we proposed a fast adaptive-codebook search Method Average SD Outliers (%) algorithm (Jung et al., 2001). In this algorithm, the [dB] pitch delay and pitch gain are computed sequen- 2–4 dB >4 dB tially, rather than simultaneously. At first, the Tandem 2.90 66.29 14.47 pitch delay is computed using a 1st-order pitch Transcoding 1.15 3.90 0.02 predictor, and later, the pitch gains of the 5th
  • 11. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 27 LPC spectrum 40 G.723.1(original) Tandem Transcoding 30 20 dB 10 0 -10 -20 0 500 1000 1500 2000 2500 3000 3500 4000 Hz Fig. 9. Comparison of LPC spectrum (from G.729A to G.723.1). Table 4 Transmission delay––from G.729A to G.723.1 Operation Tandem (ms) Transcoding (ms) B (G.729A) Buffering 35 35 E E Encoding 3 Â pB 3 Â pB E TR Intermediate processing Decoding 3 Â pA pBA E Encoding pA Delay 5 0 D D A (G.723.1) Decoding pA pA E D E D E TR D Total delay (ms) 40 þ 3ðpB þ pB Þ þ pA þ pA 25 þ pA þ pBA þ 2pB 1st frame 2nd frame 3rd frame <G.729A> Clp_B1 Clp_B4 Pitch- Pitch- Smoothing Smoothing <G.723.1> Clp_A0 T_op1 T_op2 1st 2nd 3rd 4th subframe subframe subframe subframe Current frame Fig. 10. Open-loop pitch estimation using pitch smoothing (G.729A ! G.723.1).
  • 12. 28 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 order pitch predictor are estimated using the G.729A pitch gain computed pitch delay. The pitch gain is computed gp Training G.729A G.729A G.723.1 using the function given by speech data Encoder Encoder Encoder X N À1 MS– 0ACB ¼ E ft½nŠ À g^ À LŠg2 t½n n¼0 Adaptive-Codebook Gain Search Index X N À1 X N À1 Table Generation indexg ¼ t2 ½nŠ À 2g t½nŠ^½n À LŠ t G.723.1 pitch gain table index n¼0 n¼0 X N À1 Fig. 11. Gain index table generation for fast adaptive-code- þ g2 t2 ½n À LŠ; ð7Þ book search. n¼0 where t½nŠ and ^ are target signals of the adap- t½nŠ tics. The process of the gain index table generation tive-codebook search and weighted synthesis sig- or pre-selection is shown in Fig. 11. nal, respectively. g is the pitch gain of the 1st-order In this approach, the similarity or relationship pitch predictor, and N is the sub-frame size. of pitch gains of each coder is considered. As Optimum pitch delay minimizing Eq. (7) is ob- shown in Fig. 11, the decoded PCM signal from tained by differentiating Eq. (7) with respect to g G.729A is encoded by G.723.1, like a tandem and by setting the derivative to zero: connection. In this process, we can find the sta- PN À1 tistical information between the pitch gains of ^ n¼0 t½nŠt ½n À LŠ g ¼ PN À1 : ð8Þ two coders. The dynamic range of the pitch gain ^2 ½n À LŠ n¼0 t value of G.729A is from 0 to 1.2. We divide this Substituting Eq. (8) into Eq. (7) gives an pitch gain range into eight sub-sections, and the expression for the weighted squared error in terms conjugate structure of G.729A gain codebook of the pitch delay, and we can conclude that is considered for the boundary value of each sub- minimizing Eq. (7) is equivalent to maximizing section. Then, in the following encoding process P 2 of G.723.1, the determined adaptive-codebook N À1 gain table indices are stored at each sub-frame. ^ n¼0 t½nŠt ½n À LŠ 0 Clag ¼ PN À1 2 : ð9Þ As a result, the distribution of the most proba- ^ n¼0 t ½n À LŠ ble top 40 or 85 gain indices for 85 or 170 entry 0 gain table, respectively, can be listed up at each Finally, L maximizing Clag is determined from a pitch gain sub-section of G.729A. For reliability of closed-loop search procedure. This scheme is gain index distribution, we use speech signals re- computationally very efficient, and there is no corded by a female and male speaker, with each quality loss compared to the original coder (Jung sentence 8 s long, and a total of 96 sentences per et al., 2001). speaker. Results of the subjective listening test In addition to above process, we apply an extra confirm that no degradation of speech quality was algorithm in the vector quantization of pitch gains. introduced. The pitch gain of the G.723.1 coder is vector- quantized using an 85 or 170-entry codebook, 170 for lower rate and 85 or 170 for higher rate, 3.2.4. Fast fixed-codebook search respectively. This process is another major com- In the fixed-codebook search of G.723.1 at 5.3 putational burden for implementation. In the kbit/s, four pulses are searched based on an developed transcoding algorithm, the search range ACELP structure for every sub-frame (ITU-T of the gain codebook is limited depending on the Rec. G.723.1, 1996). Each sub-frame is divided pitch gain of G.729A. In other words, the indices into four tracks, and the pulse and sign of each of the adaptive-codebook gain table are pre- pulse are determined using a nested-loop search. In selected depending on speech signal characteris- the theoretically worst case, the pulse locations are
  • 13. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 29 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi searched with the combination of 8 Â 8 Â 8 Â 8 ¼ X p 84 with the analysis-by-synthesis technique. In LPC CD ¼ 10= log10 2 ðCx ðiÞ À Cy ðiÞÞ2 ; practice, limiting the number of entering the loop i¼1 for the last pulse search can significantly reduce ð10Þ the complexity of the fixed-codebook search. In order to simplify the fixed-codebook search of where Cx ðiÞ and Cy ðiÞ are LPC cepstral coefficients G.723.1, a depth-first tree search which is used in of the input and output of the codec, and p is the the G.729A fixed-codebook search is employed in order of LPC filter. Measurement results are this paper. Using this scheme, the combination of summarized in Tables 5 and 6, where lower LPC- the pulse location can be reduced down to CD and highly PESQ imply better speech quality. 2 Â fð8 Â 8Þ þ ð8 Â 8Þg (Jung et al., 2001). As shown, for both measures, the proposed tran- The fixed-codebook excitation of G.723.1 at 6.3 coding scheme is judged as having better quality kbit/s is modeled by MP-MLQ based on multi than tandem coding. pulse excitation. The fixed-codebook parameters such as quantized gain, the signs and locations of 4.2. Subjective quality evaluation the pulses are sequentially optimized. This process is repeated for both the odd and even grids. To For subjective evaluations, informal A–B pref- reduce the complexity of this module, we adopt the erence tests are conducted. The tests are performed fast algorithm proposed in (Jung et al., 2001), by 30 naive listeners. In the tests, the subjects are which is based on ACELP search. In this fast asked to choose a sound between pairs of samples search algorithm, the 60 positions in a sub-frame presented through a headset. If the subjects can are divided into several tracks. Each pulse can not distinguish the quality difference, they are have either amplitude +1 or )1. The result in (Jung et al., 2001) shows that the reduction ratio is about Table 5 80% when fully optimized for DSP implementa- Objective test results (G.723.1 at 5.3 kbit/s) tion. LPC-CD (dB) PESQ Female Male Female Male 4. Performance evaluations Tandem (A–B) 3.98 3.90 3.023 3.362 Transcoding 3.66 3.54 3.051 3.376 (A–B) To evaluate the performance of the proposed Tandem (B–A) 4.17 3.65 3.017 3.384 transcoding algorithm, subjective preference tests Transcoding 3.86 3.22 3.077 3.426 are performed together with an objective quality (B–A) evaluation and complexity check. Subjective Notice: A to B (from G.723.1 to G.729A), B to A (from G.729A quality is evaluated via an A–B preference test, and to G.723.1). as an objective quality measure, LPC-CD and PESQ are used. Table 6 Objective test results (G.723.1 at 6.3 kbit/s) 4.1. Objective quality evaluation LPC-CD (dB) PESQ Female Male Female Male For objective evaluation measures, the NTT Korean speech database is used (NTT-AT, 1994). Tandem (A–B) 3.95 3.86 3.106 3.465 Transcoding 3.53 3.37 3.118 3.455 Each sentence is 8 s long with 8 kHz sampling (A–B) frequency, four male and female speakers with 24 Tandem (B–A) 3.86 3.59 3.107 3.492 sentences per speaker are recorded under a quiet Transcoding 3.70 3.11 3.133 3.507 environment, so a total of 96 sentences were used (B–A) for LPC-CD and PESQ evaluation. LPC-CD Notice: A to B (from G.723.1 to G.729A), B to A (from G.729A (Kitawaki et al., 1998) is defined as to G.723.1).
  • 14. 30 S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 Table 7 processor (Texas Instruments, 1996, 1998). Since Preference test results (G.723.1 at 5.3 kbit/s) our purpose is to compare the relative complexity Preference G.723.1 fi G.729A G.729A fi G.723.1 both methods, we do not use a code optimization Female Male Female Male process. And the complexity for total encoding (%) (%) (%) (%) process can be varied because the constrained Tandem 26.9 30.6 26.3 35.8 search range for open-loop pitch depends on the Transcoding 42.5 36.9 25.4 45.4 input signal. But, the variation of complexity in No preference 30.6 32.5 48.3 18.8 that module is extensively small. So the depen- dency of complexity reduction ratio on input sig- nal is absolutely negligible. Table 8 Preference test results (G.723.1 at 6.3 kbit/s) Results in Table 9 indicate that, being com- pared with tandem coding, the processing time of Preference G.723.1 fi G.729A G.729A fi G.723.1 each module is noticeably reduced by using the Female Male Female Male transcoding algorithm. Also, the total encoding (%) (%) (%) (%) time of transcoding is close to 62–74% of the Tandem 32.4 35.8 39.8 38.6 encoding time needed for tandem coding. Thus, it Transcoding 47.7 45.4 41.4 41.5 No Preference 19.9 18.8 18.8 19.9 can be said that the developed transcoding algo- rithm can synthesize equivalent quality to the tandem coding with complexity about 26–38% asked to choose ‘‘no preference’’. The test material lower than tandem coding. includes 16 clean speech sentences obtained with four male and four female speakers. Tables 7 and 8 show the results. As shown, tandem and trans- 5. Conclusions coding are preferred in a similar ratio or slightly preferred to transcoding. Results imply that the In this paper, we proposed an efficient trans- listeners cannot distinguish the quality of tandem coding algorithm that could convert 5.3/6.3 kbit/s coding from that of transcoding. Thus, it can be G.723.1 bitstream into 8 kbit/s G.729A bitstream, inferred that the proposed transcoding algorithm and vice versa. This transcoding algorithm can produces speech with a quality equivalent to that reduce several problems caused by the tandem of tandem coding. method, such as quality degradation, high com- plexity, and longer delay time. The proposed 4.3. Complexity check transcoding algorithm is composed of four steps: LSP conversion, open-loop pitch conversion, fast To check the complexity of the proposed algo- adaptive-codebook search, and fast fixed-code- rithm, both tandem and transcoding algorithms book search. Subjective and objective evaluation are implemented on a TI TMS320C6201 DSP results showed that the proposed transcoding Table 9 Comparison of complexity MIPS G.723.1 fi G.729A G.729A fi G.723.1(5.3k) G.729A fi G.723.1(6.3k) Tandem Transcoding Tandem Transcoding Tandem Transcoding LPC LSP 6.41 2.36 6.94 5.55 6.94 5.55 Open-loop 0.94 0.21 1.54 1.19 1.54 1.19 ACB 2.45 2.45 10.14 6.34 8.89 4.78 FCB 4.30 4.30 10.50 2.16 17.27 6.68 Others 4.04 4.04 8.05 8.05 8.05 8.05 Total (encoding) 18.15 13.37 37.17 23.28 42.69 26.25 Reduction ratio (%) 26.3 37.4 38.5
  • 15. S.-W. Yoon et al. / Speech Communication 43 (2004) 17–31 31 algorithm could produce equivalent speech quality Kang, H.G., Kim, H.K., Cox, R.V., 2000. Improving trans- to the tandem coding with shorter delays and less coding capability of speech coders in clean and frame erasured channel environments. In: Proceedings of IEEE computational complexity. Workshop on Speech Coding. pp. 78–80. Kitawaki, N., Nagabuchi, H., Itoh, K., 1998. Objective quality evaluation for low-bit-rate speech coding system. IEEE J. References Selected Areas Commun. 7 (2), 242–248. NTT-AT multi-lingual speech database for telephonometry Campos Neto, A.F., Crcoran, F.L., 1999. Performance assess- 1994. Available from http://www.ntt-at.com/products_e/ ment of tandem connection of enhanced cellular coders. speech/index.html. IEEE Proceedings of International Conference on Acoustics Rabiner, L.R., Schafer, R.W., 1978. Digital Processing of Speech Signal Processing, March 1999, pp. 177–180. Speech Signals. Prentice Hall. Epperson, J.F., 2002. Introduction to Numerical Methods and Salami, R., Laflamme, C., Bessette, B., Adoul, J.P., 1997a. Analysis. Wiley. ITU-T G.729 Annex A: reduced complexity 8 kb/s CS- Hersent, O., Gurle, D., Petit, J.-P., 2000. IP Telephony Packet- ACELP Codec for digital simultaneous voice and data. based Multimedia Communications Systems. Addison Wesley. IEEE Commun. Mag., 53–63. ITU-T Rec. G.723.1, 1996. Dual-rate Speech Coder For Multi- Salami, R., Laflamme, C., Bessette, B., Adoul, J-P., 1997b. media Communications Transmitting at 5.3 and 6.3 kbit/s. Description of ITU-T recommendation G.729 Annex A: ITU-T Rec. G.729, 1996a. Coding of Speech at 8 kbit/s Using reduced complexity 8 kbit/s CS-ACELP Codec. In: IEEE Conjugate Structure Algebraic-Code-Excited-Linear-Pre- Proceedings of International Conference Acoustic Speech diction (CS-ACELP). Signal Processing, Vol. 2. pp. 775–778. ITU-T Rec. G.729 Annex A, 1996b. Reduced Complexity 8 Texas Instruments, 1996. TMS320C62x/67x CPU and Instruc- kbit/s CS-ACELP Speech Codec. tion Set. ITU-T Rec. P.862, 2000. Perceptual evaluation of speech Texas Instruments, 1998. TMS320C6x C Source Debugger quality (PESQ), an objective method of end-to-end speech UserÕs Guide. quality assessment of narrowband telephone networks and TIA/EIA PN-3467, 1996. TDMA Radio Interface, Enhanced speech codecs, May 2000. Full-Rate Speech Codec, February 1996. Jung, S.K., Park, Y.C., Yoon, S.W., Cha, I.H., Youn, D.H., Yoon, S.W., Jung, S.K., Park, Y.C., Youn, D.H., 2001. An 2001. A proposal of fast algorithms of ITU-T G.723.1 for efficient transcoding algorithm for G.723.1 and G.729A efficient multi channel implementation. In: Proceedings of speech coders. In: Proceedings of Eurospeech, September Eurospeech, September 2001, pp. 2017–2020. 2001, pp. 2499–2502.