HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS
VELAGAPUDI RAMAKRISHNA SIDDHARTHA
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR
The main aim of the this viterbi decoder is to reduce the power consumption
without degrading the performance.
For the purpose of the low power consumption, We propose a precomputation architecture incorporated with T-algorithm for VD, which can
effectively reduce the power consumption without degrading the decoding
General solutions for Power reduction in VDs could be achieved by reducing
the number of states (for example, reduced-state sequence decoding (RSSD)
M-algorithm and T-algorithm ) or by over-scaling the supply voltage.
RSSD is in general not as efficient as the M-algorithm and T -algorithm is
more commonly used than M-algorithm in practical applications, because the
M-algorithm requires a sorting process in a feedback loop while T -algorithm
only searches for the optimal path metric (PM), that is, the minimum value or
the maximum value of all PMs.
T -algorithm has been shown to be very efficient in reducing the
power consumption. However, searching for the optimal PM in the
feedback loop still reduces the decoding speed.
To overcome this drawback, two variations of the T -algorithm have
been proposed: the relaxed adaptive VD , which suggests using an
estimated optimal PM, instead of finding the real one each cycle and
the limited-search parallel state VD based on scarce state transition
Functional diagram of a viterbi decoder.
BMU: branch metrics (BMs) are calculated in the BM unit (BMU) the
received symbols. In a TCM decoder this module is replaced by transition
metrics unit (TMU), which is more complex than the BMU.
ACSU:BMs are fed into the ACSU that recursively computes the path
metrics (PMs) and outputs decision bits for each possible state transition.
SMU: The decision bits are stored in and retrieved from the survivor-path
memory unit (SMU) in order to decode the source bits along the final
PMU: The PMs of the current iteration are stored in the PM unit (PMU).
T-algorithm requires extra computation in the ACSU loop for calculating
the optimal PM
A topology of pre-computation pipelining.
REVIEW OF PREVIOUS WORK ON LOW-POWER
VITERBI DECODER DESIGN
I review three most relevant works for low-power Viterbi decoder
designs. Seki, Kubota, Mizoguchi and Kato suggested a scarce state
transition (SST) scheme to reduce the switching activity of a Viterbi
decoder. The input is pre-decoded by a simple and hence, a power
efficient decoder. The pre-decoded sequence, which is not optimal
under a noisy channel, is reprocessed by a Viterbi decoder to improve
performance. The authors showed that the pre-decoded sequence
reduces the switching activity of the Viterbi decoder thereby reducing
Kang and Wilson suggested application of existing low-power design
methodologies at different levels. At the architectural level, they suggested
partition of major blocks and memory modules to reduce the power dissipation.
They considered Grey coding for memory addressing, which incurs less switching
compared to binary coding.
Garrett and Stan suggested a low-power architecture of the soft- output Viterbi
decoder for turbo codes. They proposed an orthogonal access memory structure,
which enables parallel access of sequentially received data. Use of such a memory
structure reduces the switching activity for read and write of survivor path
All the above works aim to reduce the switching activities of Viterbi decoders,
which is an effective scheme for power reduction.
VITERBI DECODER DESIGN
VD with 2-step pre-computation T-algorithm.
The minimum value of each BM group (BMG) can be calculated
in BMU or TMU and then passed to the “Threshold Generator”
unit (TGU) to calculate (PMopt + T). (PMopt + T) and the new
PMs are then compared in the “Purge Unit” (PU).
The “MIN 16” unit for finding the minimum value in each cluster
is constructed with 2 stages of 4-input comparators. This
architecture has been optimized to meet the iteration bound.
Compared with the conventional T-algorithm, the computational
overhead of this architecture is 12 addition operations and a
The full-trellis VD, the VD with the 2-step pre-computation architecture and one
with the conventional T-algorithm are modeled with Verilog HDL code.
This is because the former decoder has a much longer critical path and the
synthesis tool took extra measures to improve the clock speed (e.g., using many
standard cells with larger driving strength, duplicating logic and registers to
reduce fan-out and load capacitance, etc.).
It is clear that the conventional T-algorithm is not suitable for high-speed
applications. If the target throughput is moderately high, the proposed architecture
can operate at a lower supply voltage, which will lead to quadratic power
reduction compared to the conventional scheme (due to much shorter critical
path). Thus i next focus on the power comparison between the full trellis VD and
the proposed scheme.
The usage of this Viterbi algorithm is found to be advantageous due to its cost
effectiveness in modulated minimize at the same time the functional
performance in some situation would modulate in maintaining the original cost.
Emerging linear functioning of linear pulse distance is due to convenient
The pre-computation architecture that incorporates T-algorithm efficiently
reduces the power consumption of VDs without reducing the decoding
speed appreciably. I have also analyzed the pre-computation algorithm.
Algorithm is suitable for TCM systems which always employ high-rate
convolutional codes. Finally, I presented a design case. Both the ACSU and
SMU are modified to correctly decode the signal. ASIC synthesis and
power estimation results
J. He, Z. Wang and H. Liu, “An efficient 4-D 8PSK TCM decoder
architecture”, IEEE Trans. VLSI Syst., vol. 18, no. 5, pp. 808-817, May 2010.
2. J. He, H. Liu, Z. Wang, "A fast ACSU architecture for Viterbi decoder using Talgorithm," in Proc. 43rd IEEE Asilomar Conf. on Signals, Systems and
Computers, pp. 231-235, Nov. 2009.
3. R. A. Abdallah, and N. R. Shanbhag, “Error-resilient low-power Viterbi decoder
architectures,” IEEE Trans. Sig. Proc., vol. 57, No. 12, pp. 4906-4917, Dec. 2009.
4. J. Jin, and C.-Y. Tsui, “Low-power limited-search parallel state Viterbi decoder
implementation based on scarece state transition,” IEEE Trans. VLSI Syst., vol.
15, no. 10, pp.1172-1176, Oct. 2007.
5. F. Sun and T. Zhang, “Low power state-parallel relaxed adaptive Viterbi decoder
design and implementation,” in Proc. IEEE ISCAS, pp. 4811-4814, May, 2006.
6. “Bandwidth-Efficient Modulations”, CCSDS 401(3.3.6) Green Book, April 2003.
7. Francois Chan and David Haccoun, “Adaptive Viterbi decoding of convolutional codes
over memoryless channels,” IEEE Trans. Commun., vol. 45, no. 11, pp. 1389-1400, Nov.
8. J. B. Anderson and E. Offer, “Reduced-state sequence detection with convolutional
codes,” IEEE Trans. Inf. Theory, vol. 40, no. 3, pp. 965-972, May 1994.
9. S. J. Simmons, “Breadth-first trellis decoding with adaptive Effort,” IEEE
Trans.Commun., vol. 38, no. 1, pp. 3-12, Jan. 1990.
10. Jinjin He, Huaping Liu, Zhonhfeng Wang, Xinming Huang, and Kai Zhang, “HIGHSPEED LOW POWER VITERBI DECODERNDESIGN FOR TCM DECODERS,”IEEE
Transaction on VLSI,vol.20,NO 4,APRIL 2012.