1.
A Low Power Analog Channel Decoder for Ultra Portable Devices in 65 nm Technology ¨ Reza Meraji, John B. Anderson, Henrik Sj¨ land, Viktor Owall o Dept. of Electrical and Information Technology, Box 118, Lund University, Sweden Email: {reza.meraji, john.anderson, henrik.sjoland, viktor.owall}@eit.lth.se Abstract—This paper presents the architecture and the cor- analog decoders were realized in hardware using bipolar responding simulation results for a digitally interfaced ultra- transistors. Since CMOS devices biased in the weak inversion, low power extended Hamming decoder implemented in analog referred to as the sub-threshold (sub-VT ) region, show similar integrated circuitry. ST’s 65nm low power CMOS design library was used to simulate the complete decoder including a serial I-V exponential behavior as bipolar transistors, in recent input digital interface, an analog decoding core and a serial years they have been used to successfully implement iterative output digital interface. The simulated bit error rate (BER) decoding algorithms in analog circuitry. Performance gain is performance of the decoder is presented and compared to gained in terms of area and power consumption for the same the ideal performance of the Hamming code. Transistor-level data throughput compared to digital implementations. As an simulation results show that an ultra low power, high throughput Hamming decoder up to 2.5 Mb/s can be implemented using example, an analog Hamming decoder working in sub-VT was analog circuitry working in sub-threshold (sub-VT ) region with fabricated in 0.18µ CMOS and the measurement results are a total power consumption below 40 µW. The decoder consumes reported in [7] and [8]. less than 16 µW when a lower throughput of 250 kb/s is desired. Since the introduction of the analog decoding concept in 1998 a handful of successfully operating analog decoders have I. I NTRODUCTION been implemented and reported in the academic literature, but Error correcting codes (ECCs) have been extensively used so far they have not been able to ﬁnd their way into real world in various communication devices in order to provide better applications. Therefore, a key question in analog decoding is performance for a certain level of transmit power. Although whether it can be applied to real world applications and what employing ECC could be beneﬁcial in a communication sys- gains can be expected in speed, area and power consumption tem thanks to the offered coding gain, one should also consider compared to digital decoder implementations. the underlying trade-offs in the design process. The decoding algorithms in general are computationally complex and when II. S YSTEM CONSIDERATIONS implemented in hardware require a noticeable amount of power to decode the message. Using an ECC block might not Analog decoders are envisioned to reduce the total power be efﬁcient if the power saved by the provided coding gain is consumption of the receiver in two ways: First, computation instead dissipated as power consumption of the ECC circuit and processing in the analog domain is much less power itself [1]. That is even more crucial in modern portable or demanding than digital implementations. Second, they can distributed wireless communications devices such as medical take the analog soft data and directly produce the decoded implants and sensor networks. These devices need to be small, digital data. In this scheme there is no need for an analog- inexpensive, have a reasonably long lifetime and operate on to-digital converter (ADC) which is normally an essential an extremely limited power budget. block in wireless receivers. That is because analog decoders Conventionally, it has been the common practice to imple- theoretically can perform as a joint decoder-ADC in the ment the decoding algorithms using digital circuitry. However, system. recently there have been several proposals in the literature On the other hand, wireless standardization is a costly mentioning the advantages of implementing those algorithms and time-consuming process. In order to deploy the available in continuous domain instead. It has been observed that resources in an efﬁcient way, radio transmitters and receivers analog computation is much faster and less power consuming are generally expected to conform to industry standards. If than digital. In certain applications, analog computations can we take a bottom-up strategy, a suitable candidate sub-block achieve the robustness of digital systems but consume several in a wireless system might not necessarily be one that offers orders of magnitude less power. signiﬁcant reduction in power consumption, but instead would The concept of analog decoding was initially presented in be the one that ﬁts well into the system without demanding 1998 by Hagenauer [2] and Loeliger [3]. The idea was then drastic changes in the currently agreed and well developed pursued by other researchers to demonstrate the advantage of standard. Real world applications and industry standards typ- implementing the soft iterative decoding algorithms in analog ically require that the decoder is integrated into a digital circuitry over digital implementations in terms of silicon area, receiver where its input is provided in the form of quantized speed and power consumption [4], [5], [6]. Initially, simple soft information and the decoder output is hard decisions.978-1-4244-8971-8/10$26.00 c 2010 IEEE
2.
To the authors’ best knowledge the analog decoders pro-posed so far mainly concentrate on the decoder core anddo not speciﬁcally consider the system perspective. In orderto incorporate the analog decoder into a complete receiverdifferent alternatives should be investigated. There are twomain options: a) to apply the analog decoding directly on thereceived signals and b) to use it after digital demodulation.In a), synchronization and symbol detection still has to beperformed, most likely in the digital domain, and the completereceiver has to be redesigned compared to a traditional digitalreceiver. In b), a digital to analog conversion (DAC) is requiredbefore the analog decoder introducing overhead. In this paperwe consider approach b) to investigate if the additional com-plexity and power consumption of the DAC still makes theanalog decoder a feasible alternative. The best technology for wireless communication would beprocessing blocks that provide robustness and programmabilityof digital designs but power and speed performance of analogcircuits. These points motivate the investigation of low powerdigital-to-analog converters (DACs) and Analog-to-digital con-verters (ADCs) combined with an analog decoding core. Thesecircuits are necessary for the analog decoders to interface withthe surrounding digital circuitry and they can also eliminatethe costly and inefﬁcient storage capacitors which are normallyrequired in fully analog interfaces. Fig. 1. Generalized Gilbert multiplier network for implementing the sum- product algorithm shown with corresponding trellis representation III. A NALOG D ECODING M ODEL In digital implementations values are represented by dig-its with limited word-lengths, while for analog computing the required exponential dependency only when operating incontinuous-time currents and/or voltages are used to represent the sub-VT region.real values. This is intrinsically helpful in soft decision decod- The analog decoders are normally built on a network ofing algorithms in which the strength of the received signals Gilbert vector multipliers. Despite the slow operation of thein the coded block play an important role in decoding the transistors in the sub-VT operation, high throughput in thetransmitted message. In order to implement these algorithms analog decoders can be achieved by a highly parallel networkin analog, the probabilities of the received bits to be 0 or of transistors operating in continuous time. The decoding1, also called “soft information”, are naturally represented process of a coded block starts by loading soft values fromby voltages or currents. Since the most commonly used soft the channel in parallel to the network. The soft data thendecoding algorithms require a signiﬁcant number of additions stand as voltages or currents in the highly connected networksand multiplications these tasks have to be realized in analog of the analog multipliers arranged within a certain topology.circuits. Assuming that the network topology successfully represents Implementing adders in analog is straightforward and is a soft iterative decoding algorithm, feedback loops in thedone by shorting wires together, assuming the data is rep- network make the levels of the voltages or currents convergeresented by current values. Thus, addition does not require to steady state levels which corresponds to the decoded data.any power or dedicated area on silicon. Unlike the challenges Sine there is no need for any kind of memory in this scheme,of implementing a large number of high precision multipliers the settling time needed for convergence of such networks isin digital VLSI, costly in terms of area and power, the well only limited by intrinsic transistors speed and the parasiticknown Gilbert multiplier can be used to perform the required capacitances of the routing.multiplications in the analog domain [9]. The transistors ina Gilbert multiplier should have exponential relation between IV. G ENERIC S UM -P RODUCT A LGORITHMgate-source voltage and drain current for proper multiplication. A commonly used iterative decoding algorithm is the sum-In bipolar transistors such exponential behavior between col- product algorithm (SP). As the name suggests, the decoderslector current and base-emitter voltage readily exists in normal computation is mainly composed of sum-product operations.operation. For CMOS at high drain currents the current- The basic computations of the algorithm underlying iterativevoltage characteristic is quadratic; however, turning to the decoding can be expressed asdesirable exponential at low current levels in the sub-VToperating region. In other words, the MOS transistors provide pz (z) = γ px (x)py (y)f (x, y, z), (1)
3.
Fig. 2. Extended Hamming (8,4) analog decoder architecture where px (x) and py (y) are probability distributions such V. D ECODER A RCHITECTUREthat x px (x) = 1, y py (y) = 1 and γ is a scaling factor In analog decoding circuits, because of continuous-timeto ensure z pz (z) = 1. Also, f is a function that takes parallel computations, every block of the received coded dataeither 0 or 1 values. In decoding algorithms, f is conveniently must be applied simultaneously to the decoding core. Thatillustrated by a trellis diagram. In a trellis representation means the outputs of the computation block are also availablef (x, y, z) = 1 if and only if an edge labeled y between the in parallel. The architecture of the decoder is shown in Fig. 2.left-hand node x and the right-hand node z exists. It consists of an analog computational core, input and output interfaces and a digital controller. An example of a generic sum-product module based on the A. Analog Decoding CoreGilbert vector multipliers at transistor-level is shown in Fig.1 together with the corresponding trellis representation. The The decoder core demonstrates the transistor-level imple-input of the module is all the probabilities of the random mentation of the SP decoding algorithm of the tail-bitingvariables x and y represented by the current vectors Ixi ; trellis of an extended Hamming code. The decoder for thei ∈ 0, N and Iyj ; j ∈ 0, M respectively. The vector multiplier (8,4) Hamming code receives 8 parallel input samples fromgenerates all the possible probability products of the two the channel and decodes the 4 information bit estimates ininput variables labeled by currents Izij . The output currents parallel. Every analog input sample represents a soft bit whichrepresenting the input probability products are summed in the is a function of probability of receiving 1 over the probabilityconnectivity network, by shorting the corresponding wires, or that the received bit is 0. This function is usually expresseddiscarded if they are not needed in the sum via connecting the by the log-likelihood ratio (LLR) deﬁned aswire to V dd. p(r = 1) LLR = ln (2) p(r = 0) A key parameter is the reference current of primitive blocks. The probability of variable x, using the currents on a pair ofSince the probabilities are represented by currents, there has wires, are introduced as the vector (Ix0, Ix1) corresponding toto be a unique reference current in the network corresponding (p(x = 0), p(x = 1)). The probability of 1 therefore is denotedto a probability of 1. Then all the real valued probabilities by the unit current Iu; thus, Ix0 + Ix1 = Iu. The chosencan be deﬁned by a fraction of this current, the so called amount for the unit current must ensure that all transistors areunit current IU . Therefore, the input probability vectors must biased and stay in the sub-VT region. As one might notice,satisfy i Ixi = IU and j Iyj = IU . Similarly, the same the integrity of the decoding process is highly dependent onrequirements are valid for the output current vector Izij , which the accuracy of the unit currents used throughout the network.is the input for the next block in the network. If some of Deviation from the desired current values is common in thethe partial products are discarded in the multiplier, then all fabrication process and introduces mismatch between differentthe currents to the next block must be renormalized in order unit current values. However, it has been shown that the errorto satisfy Izij = IU . This renormalization relates to the caused by mismatch in analog decoders has been negligiblescaling factor γ in (1). so far [10].
4.
is robust and there is no need for any capacitor. Here, for the Hamming decoder, an array of 8x6-bit registers are enough to store a block of 8 soft information data each quantized by 6 bits. Our preliminary simulations showed that 6 bit quantization would be enough for the desired bit error rate (BER) performance. In addition, an array of D ﬂip-ﬂops (DFF) are placed between the registers and the DACs. The data is ﬁrst clocked in into the registers. Then the DFFs are simultaneously clocked to transfer and hold the data for the DAC inputs. New data (i.e. the next block of coded data) now can be clocked into the registers. The DFFs will hold the DAC input words so the decoding core can work while the new data is clocked in. Each pair of differential inputs required for the decoding core could be generated simultaneously in a current-steering DAC with differential output. The DACs are built from arrays of current sources directly injecting differential current into the decoder inputs. The sum of electrical currents from each DAC should match that of the decoding core; i.e. should match the reference current Iu. Essentially each bit in the DAC consists of a number of PMOS current source transistors and a PMOS differential pair. The outputs of the differential pair are connected to the two differential inputs of the core. In this Fig. 3. Trellis representation of the core modules way the current is always on and steered to the core inputs. C. Output Interface Since the output of the decoder core is an analog vector showing the probabilities of the decoded bits to be 0 or 1 by means of electrical currents, there is a need for an output interface to decide on the value of each bit. For this purpose, an array of latched current-mode comparators is used. The comparators are based on a design using a pair of cross- coupled inverters with a ﬂip-ﬂop latch. The design was ﬁrst introduced in [11] and more details can be found in [12]. Every comparator takes a pair of electrical currents rep- resenting the probabilities of the output bit to be 0 or 1. If the value of the current representing the probability of 1 is Fig. 4. Block diagram of the decoder core greater than the other one then the comparator output voltage reaches V dd; or zero whenever the condition is reversed. Thus, The block diagram of the core is illustrated in Fig. 3 and the output interface translates the analog probability currentsFig. 4. It is a direct implementation of the forward-backward to the digital decided bits. Finally, a parallel-to-serial shiftalgorithm as a special case of the SP algorithm which is register feeds out the decoded bits in serial.applied to tail-biting trellis diagram of the Hamming code. D. Digital ControllerEach trellis section can be realized in analog circuitry asillustrated in in Fig. 1. A digital timing circuit provides the required signaling for the whole circuit. Signals provided by this section manageB. Input Interface receiving the quantized soft information in serial and their An input interface is needed to take the serially incoming storage in the embedded registers. Additionally, the controllerquantized digital soft information and temporarily store it in provides the required signaling to load the stored soft infor-a memory. As soon as all the soft information for a block mation and sets the decoding time.of 8 coded data has been received it needs to be translatedto differential electrical currents and applied in parallel to the VI. S IMULATION RESULTSanalog decoding core. In order to do so, a separate current The ST’s 65nm low-leakage-high-VT (LL-HVT) CMOSsteering DAC is required for each quantized data in the transistor library was used to simulate the analog Hammingreceived block of coded data. Compared to the architectures decoder architecture. A wireless link with BPSK modulationwith fully analog interfaces where sample-and-hold blocks are and a memory-less AWGN channel is considered in order toused to store the received values, in this scheme data storage evaluate the performance of the decoder.
5.
TABLE II A NALOG DECODER CHARACTERISTICS −1 Uncoded BPSK (Theoretical) 10 Soft−Decision Extended Hamming (Ideal) Technology 65nm CMOS, LL-HVT Analog Extended Hamming decoder Analog supply voltage 1.2 V Bit error rate probability Digital supply voltage 0.8 V −2 10 Clock frequency up to 5 MHz, max. coding gain Decoder throughput up to 2.5 Mb/s, max. coding gain Energy per decoded bit 16 pJ/b @ 2.5 Mb/s −3 10 Coding gain @ BER=10−3 1.5 dB TABLE III E NERGY COMPARISON FOR A NALOG H AMMING DECODERS −4 10 Reference Tech. Iu E/b Pcore Ptot 0 1 2 3 4 5 6 7 8 SNR [dB] CMOS [µm] [µW] [µW] Fig. 5. Bit error rate performance, 2.5 Mb/s [7] (sim.) 0.18 1 µA 640 pJ N/A 283 TABLE I P OWER CONSUMPTION OF DIFFERENT SECTIONS OF THE DECODER [6] (meas.) 0.25 100 nA 140 nJ <5 55 Sub-Circuit Power Consumption [µW] [8] (meas.) 0.18 10 µA 102 pJ 150 229 2.5 Mb/s 250 Kb/s this paper 0.065 100 nA 16 pJ 6 40 DACs 5 <2 (sim.) Analog decoding core 6 6 (rate independent) for portable devices with limited power budget. Digital I/O Digital circuitry 28 8 interfaces facilitate using the decoder the same way as an Output comparators 1 <1 ordinary digital decoder without a need for changes in the Total 40 16 receiver architecture. Furthermore, the introduced architecture is quite general and can be applied to more complex analog In the simulations, the reference current Iu = 100nA was decoders. Finally, it should be mentioned that no signiﬁcantchosen which ensures that all transistors in the analog core as effort was made in this paper to reduce the power consumptionwell as in the current-steering DACs operate in the sub-VT for the digital part of the circuit. The power consumption of theregion. BER performance of the decoder that resulted from digital circuitry could therefore be further reduced by methodsaccurate transistor level simulations is shown in Fig. 5. The like clock gating for the input register arrays.curve closely follows the ideal performance that is expected ACKNOWLEDGMENTfrom the extended Hamming decoder. The BER performance The authors would like to thank Swedish Foundation forof an uncoded system with a signal corrupted in an AWGN Strategic Research (SSF) for funding the Wireless Communi-channel is provided for comparison. cation for Ultra Portable Devices at Lund University. Power consumption estimates and characteristics for thedecoder are summarized in tables I and II respectively. The R EFERENCESanalog circuits and the input DACs use a 1.2 V supply, whereas [1] N. Sadeghi et al., “Analysis of error control code use in ultra-low-powerthe digital circuitry operates on 0.8 V. The decoder converges wireless sensor networks,” ISCAS Island of Kos, Greece, 2006.to a 4-bit codeword in less than 2 µs, which translates to a [2] J. Hagenauer and M. Winklhofer, “The analog decoder,” ISIT98, Cam- bridge, MA., USA, 1998.maximum decoding speed of 2.5 Mb/s without loss in the BER [3] H. A. Loeliger, M. Helfenstein, F. Lustenberger, and F. Tark¨ y, “Iterative operformance. The complete decoder consumes only about 40 sum-product decoding with analog VLSI,” ISIT98, Cambridge, MA.,µW at a throughput of 2.5 Mb/s. The required power reduces USA, 1998. [4] F. Lustenberger et al., “All analog decoder for a binary (18,9,5) tailbitingto a total of 16 µW at a lower throughput of 250 Kb/s, mostly trellis code,” ESSCIRC, Duisburg, Germany, 1999.thanks to power savings in the digital circuitry at lower clock [5] M. Moerz et al., “An analog 0.25 µm bicmos tailbiting map decoder,”frequencies. ISSCC, San Francisco, CA., USA, 2000. [6] M. Frey et al., “Two experimental analog decoders,” Int. Analog VLSI The power consumption for the reported analog Hamming Workshop, Bordeaux, France, 2005.decoders is provided in table III. Studying the table should [7] N. Nguyen et al., “A 0.8v cmos analog decoder for an (8,4,4) extendedbe done with caution, since the power consumption heavily hamming code,” ISCAS, Vancouver, Canada, 2004. [8] C. Winstead et al., “Low-voltage CMOS circuits for analog iterativedepends on different factors such as chosen technology and decoders,” TCAS I: Regular Papers, 2006.decoder type. Required energy per decoded bit (E/b) is also [9] B. Gilbert, “A precise four-quadrant multiplier with subnanosecondincluded as an indicator for a crude comparison. response,” JSSC, 1968. [10] F. Lustenberger and H. A. Loeliger, “On mismatch errors in analog-vlsi error correcting decoders,” ISCAS, Sydney, Australia, 2001. VII. C ONCLUSION [11] S. Yu, “Design and test of error control decoders in analog cmos,” PhD In this paper we investigated the possibility of using ana- dissertation, Univ. of Utah, Logan, USA, 2004. [12] C. Winstead, “Analog iterative decoders,” PhD dissertation, Universitylog decoder in the digital domain. Our preliminary results of Alberta, Edmonton, AB, Canada, 2005.show that the proposed approach could be a viable option
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment