268                                                                                                PROCEEDINGS OF THE IEEE...
FORNEY: THE VITERBI ALGORITFIM                                                                                            ...
270                                                                                                            PROCEEDINGS...
272                                                                                                   PROCEEDINGS OF THE I...
FORKEY: THE VITERBI ALGORITHM                                                                                             ...
274                                                                                                     PROCEEDINGS OF THE...
FORNEY: THE VITERBI ALGORITIIM                                                                                            ...
276                                                                                                                      P...
FORNEY: THE VITERBI ALGORITHM                                                                                             ...
218                                                                                                        PROCEEDINGS OF ...
Upcoming SlideShare
Loading in...5



Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Transcript of "Viterbi2"

  1. 1. 268 PROCEEDINGS OF THE IEEE, VOL. 61, NO. 3, MARCH 1973 [2] C. Mfiller, The TkoryofRelatioity. London, England: Oxford Univ. accelerated system of reference,” Phys. Reu., vol. 134,pp.A799- Press, 1952, pp. 274-276, 302-309. A804, 1964. [3] E. J. Post, Formal Structure of Elecfromagnetics. Amsterdam, The [9] T. C. Mo, “Theoryof electrodynamics in media in noninertial frames Netherlands: North-Holland, 1962, chs. 3, 6, 7. and applications,” J. Math. Phys., vol. 11, pp. 2589-2610, 1970. [4] A. Sommerfeld, Elektrodynamik. Wiesbaden, Germany: Die- [lo] R. M. Fano, L. J. Chu, and R. B. Adler, Electromagnetk Fields, En- terich’sche Verlag, 1948, pp. 281-289. ergy and Forces. New York: Wiley, 1960.pp. 407-416. [SI T.Schlomka, “Das Ohmsche Gesetz beibewegtenKorpern,” A n n . [ll] L.Robin, Fonctions Sphiriques de Legendre et Fonctions Sphtroidulcs, Phys., vol. 6, no. 8, pp. 246-252, 1951. vol. 3.Paris, France: Gauthier-Villars,1959,pp. 94-10. [6] H. Epheser and T. Schlomka, “Flachengrossen und elektrodyn- [12] W. R. Symthe, Static and Dynamic Elec#ricity. New York: Mc- amische Grenzbedingungen bei bewegten Korpern,” Ann. Phys., Graw-Hill, 1950, p. 208. vol. 6, no. 8, pp. 211-220, 1951. [13] M. Faraday, Experimental Researches in Electricity, vol. 3. New [7] J. L. Anderson and J. W. .Ryon,“Electromagneticradiation in York:Dover, 1965, pp. 362-368. accelerated systems,” Phys. Rev., vol. 181, pp. 1765-1775, 1969. [14] E . G.Cullwick, Electromagnetism and Relativity. London, England: [8] C. V. Heer, “Resonant frequencies of an electromagnetic cavity in an Longmans, pp. 36-141, 167. The Viterbi Algorithm G. DAVID FORNEY, JR. Invited Paper Abstrucf-The Viterbi algorithm (VA) is a recursive optimal solu- 11. STATEMENT OF THE PROBLEMtion to the problem of estimating the state sequence of a discrete-time finite-stateMarkovprocess observed in memoryless noise. VA In its most general form, the may be viewed as a solu-Many problems i areas such as digital communicationscan be cast n tion to theproblem of maximum a posteriori probabilityin this form. T i paper gives a tutorial exposition of the algorithm hs (MAP) estimation of the state sequence of a finite-state dis-and of how it is implemented and analyzed. Applications to date are crete-time Markov process observed in memoryless noise. I nreviewed. Increasing use of the algorithm in a widening variety ofareas i foreseen. s this section we set up the problem in this generality, and then illustrate by example the different sorts problems that can of I. INTRODUCTION be made to such a model. The general approach fit also has the HE VITERBI algorithm (VA) was proposed in 1967 virtue of tutorial simplicity. TheunderlyingMarkovprocessischaracterized as fol- PI as a method of decoding convolutional codes. Since that time, it has been recognized an attractive solu- lows. Time is discrete. The statex k a t time k is one of a finite astion to a variety of digital estimation problems, somewhat as number M of states m , 1 s m 5 14;i.e., the state space X isthe Kalman filter has been adapteda to variety of analog esti- simply { 1, 2, . . , A I ] , Initiallyweshallassumethatthemation problems. Like the Kalman filter, the VA tracks the process runs only from time 0 t o t i m e K and that the initials t a t e of a stochastic process with a recursive method that is a n d final states x0 a n d XK are known; the state sequence isoptimum in a certain sense, and that lends itself readily to then represented by a finite vector X = (20, + , x ~ ) We see 1 .implementation and analysis. However, the underlying pro- later that extension to infinite sequences is trivial.cess is assumed to be finite-state Markov rather than Gaus- T h e process is Markov, in the sense that the probabilitysian, which leads to marked differences in structure. P ( X k + l I x 9 x1, . * . , xk) of being in state 0 X k + l at time k f 1, This paper is intended to be principally a tutorial intro- given all states up to time k , depends only on t h e s t a t e xk atduction to theVA, its structure, and its analysis. It also pur-time k :ports to review more or less exhaustively all work inspired by P ( X k + l j xo, x1, . . . , Xk) = P(Xk+lI Xk).o r related to the algorithm up to the time writing (summer of1972). Our belief is that the algorithmwill find application in The transition probabilities P ( x k + l ~ k ) may be time varying, xa n increasing diversity of areas. Our hope is t h a t we can ac- but we do not explicitly indicate this in the notation.celerate this process for the readers this paper. of I t is convenient to define the transition & at time k as t h e pair of states (xk+l, x k ) : This invited paper is one of a series planned on topics of general interest- tk (%+I, xk).The Editor. Manuscript received September 20, 1972;revised November 27, 1972. The author is with Codex Corporation, Newton, Mass. 02195. We let E bethe(possiblytime-varying)set of transitions
  2. 2. FORNEY: THE VITERBI ALGORITFIM 269 “ h a SHIFT REGISTER DETERMINISTIC FUNCTION t.- Fig. 1. Most general model. MEWRYLESS CHANNEL& = (%&I, xk) for which P(xe11 x k ) ZO,and I X I their number.Clearly I X 1 5 M2. There is evidently a one-to-one correspon-dencebetweenstatesequences X andtransitionsequences Fig. 2. Shift-register model.f = (50, * * , &-I). (We write x= t.) Theprocess is assumedtobeobservedinmemorylessnoise; t h a t is, there is a sequence z of observations z k in which I ’Ik ,Zk depends probabilistically only on the transition E at timek:l k-0In the parlance of information theory, z can be described asthe output of some memoryless channel whose input sequenceis(seeFig. 1). Again,though weshallnotindicateit ex-plicitly, the channel may be time varying in the sense that Fig. 3. A convolutionalencoder.P ( z k I b ) may be a function of k . This formulation subsumest h e following as special cases. Finally, we state the problem to which the VA is a solu- 1) The case in which z k depends only on the state x k : tion. Given a sequence z of observations of a discrete-time finite-stateMarkov process memoryless in noise, find the state sequence x for which the a posteriori probability P(xl I) is maximum. Alternately, find the transition sequence F, for 2) The case in which 4 depends probabilistically on an which P(F,j z) is maximum (since x 2 E ) . In the shift-registeroutput y k of the process at time k , where y k is in turn a de- model this is also the same as finding the most probable inputterministic function of the transition b or the state ?&. sequence u, sincet&. X ; or also the most probable signal se- Example: T h e following model frequently arises in digital yL1 quence y, if X . I t is well known that this MAP rule mini-communications. There is an input sequence = (UO, u ul, * . ), mizes the error probability in detecting the whole sequencewhere each u k is generated independently according to some (the block-, message-, or word-error probability), and thus isprobability distribution P ( i ( k ) and can take on one of a finite optimum in this sense. We shall see that in many applicationsnumber of values, say m. There is a noise-free signal sequence it is effectively optimum in any sense.y, not observable, in which eachy k is some deterministic func- Application Examples: We now give examples showingt h a ttion of the present and thev previous inputs: the problem statement above applies to a number of diverse - y k = f ( U k , * ’ * 7 Uk-v) fields,includingconvolutionalcoding,intersymbolinterfer- ence,continuous-phase frequency-shift keying(FSK),andThe observed sequence z is the output of a memoryless chan- text recognition. The adequately motivated reader may skipnel whose input is y. We call such a process a shift-register immediately to the next section.process, since (as illustrated in Fig. 2) it can be modeled by ashift register of length Y with inputs u k . (Alternately, it is a A . Convolutional Codesvth-orderm-aryMarkovprocess.) T o completethecorre- A rate-l/n binary convolutional encoder is a shift-registerspondence to our general model we define: circuit exactly like that of Fig. 2, where the inputs u k are in- 1) the state formationbitsandtheoutputs y k areblocks of # bits, y k x k & (uk-1, ‘ ‘ 3 uk-v) - =(Pa, * * , p,,~),each of which is a parity check on (modulo- 2 sum of) some subset of the v + l information bits (Uk, 2) the transition Uk-1, * . , Uk-v). When the encoded sequence (codeword) y is b & ( u k , * * ’ Uk--.). sentthroughamemorylesschannel, we have precisely the model of Fig. 2. Fig. 3 shows a particular rate-$ code withThe number of states is thus 1 X =m’, and of transitions, v = 2 . (This codeis the only one ever used for illustration in the I IEl =m’+l. If theinputsequence“starts” at time 0 and VA coding literature, but the reader must not infer that it is‘stops” at time K - v , i.e., the only one the VA can handle.) u = ( . . . 0, u0, u 1 , ’ ’ , ~ K - P 0, 0, ‘ * ’ ) , More general convolutional encoders exist: the rate may be k / n , the inputs may be nonbinary, and the encoder maythen the shift-register process effectively starts at time and even contain feedback. In every case, however, the code may 0ends at time K with x0 = XK = (0, 0, , 0). be taken to be generated by a shift-register process [2]. We might also note that other types of transmission codes (e.g., dc-free codes, run-length-limited codes, and others) can 1 The notation is appropriate when observations are discrete-valued;for continuous-valued Zk, simply substitute a density P ( z t I&) for the dis- be modeled as outputs of a finite-state machine and hence falltribution P ( z t I&). into our general setup [49].
  3. 3. 270 PROCEEDINGS OF THE IEEE, MARCH 1973B . IntersymbolInterference In digital transmission through analog channels, we fre-quently encounter following the situation. input The se- J i - EJ - E Yh +quence u, discrete-time and discrete-valued as in the shift-register model, is used modulate some continuous waveform to h0which is transmitted through a channel and then sampled. Fig. 4. Model of PAM system subject to intersymbolIdeally, samples z k would equal the correspondingU k , or some interference and white Gaussian noise.simple function thereof; in fact, however, the samples ,zk a r eperturbed both by noise and by neighboring inputs U p . T h elattereffectiscalledintersymbolinterference.Sometimesintersymbolinterferenceisintroduceddeliberatelyforpur-poses of spectralshaping,inso-calledpartial-responsesys-tems. In such cases the output samples can often modeled as zk = yk + nk be L9-jw-g "k Fig. 5. SIGNAL Model for binarycontinuous-phase FSK withdeviationratio + "lk 3 and coherent detection in white Gaussian noise.where y k is a deterministic function of a finite number of in-puts, say, y k = f ( U k , * * . , Uk-y), a n d n k is a white Gaussiannoise sequence. This is precisely Fig. 2. input sequenceu be binary and let ( 0 ) a n d w(1) be chosen so w To be still more specific, in pulse-amplitude modulation t h a t w ( 0 ) goesthroughanintegernumber of cyclesin T(PAM) the signal sequence y may be taken as the convolu- seconds and w(1) through an odd half-integer number; i.e.,tion of the input sequenceu with some discrete-time channel w(O)T=Oandw(l)T=7rmodulo27r. Thenif e o = O , O1=Oorr,impulse-responsesequence (ho,hl, * * ) : accordingtowhether u equalszero or one,andsimilarly o & = O or r , according to whether an even or odd number of yk = hi24k-i. ones has been transmitted. i Here we have a two-state process, with X = (0, 7r 1. T h eIf h, = O for i>v (finite impulse response), then we obtain our transmitted signal y k is a function of both the current inputshift-register model. An illustration of such a model in which u k and the state x k :intersymbol interference spans three time unitspears in Fig. 4. ( v = 2) ap- yk = cos [W(Uk)t + Xk] = cos cos w(Uk)f, It was shown in [29] that even problems where time is KT t < ( R + 1)T.actually continuous-Le., the received signal r(t) has the form Since transitions & = ( x k + l , % are one-to-one functions t h e $ of K - KT) + n(1) current state x k and input U k , we may alternately regard y k as r(t) = Ukh(1 being determined by ( k . k=O If we takeqo(t) &os w(0)t a n d vl(t) &os w(1)t as bases offor some impulse responseh ( t ) ,signaling interval T , and reali- t h e signal space, we may writezation n(t) of a white Gaussian noise process-can be reducedwithout loss of optimality to the aforementioned discrete-time y k = yOk?O(l) ylkvl(t) +form (via a "whitened matched filter"). where the coordinates ( y o , ylk) are given byC. Continuous-Phase FSK This example is cited not for its practical importance, butbecause, first, it leads to a simple model we shall later use inan example, and, second, it shows how the VA may lead to Ifresh insight even in the most traditional situations. ( ( 0 ,- I ) , if U k = 1 , Xk = In FSK, a digital input sequence u selects one of m fre- Finally, if the received signal ( ( t ) is q(t) plus white Gaussianquencies (if z& is m-ary) in each signaling interval length T ; of noise v ( t ) , then by correlating the received signal against boththat is, the transmitted signal q(t) is q o ( t ) and ql(t) in each signal interval (coherent detection), we may arrive withoutloss of information a t a discrete-time out- put signalwhere O ( U k ) is the frequency selected by U k , and e k is somephase angle. It is desirable for reasons both of spectral shap- + z k = (ZW, Z l k ) = (yak, y l k ) (nokt flu)ing and of modulator simplicity that the phase be continuous where no a n d n1 are independent equal-variance white Gaus-at the transition interval; thatis, t h a t sian noise sequences. This model appears in Fig. 5, where the w(Uk-1)kT + ek-1 E w ( u k ) k T + e k modulo 27r. signal generator generates( y ~y a ) according to the aforemen- tioned rules. ,This is called continuous-phase FSK. The continuity of the phase introduces memory into the D . TextRecognitionmodulation process; i.e., it makes the signal actually trans- VA is Ve include this example to show that the not limitedmitted in the kth interval dependent on previous signals. T o to digital communication. optical-character-recognition Intake the simplest possible case ("deviation ratio" =a), let t h e (OCR) readers, individual characters scanned, are salient
  4. 4. FORNEY: THE VITERBI ALGORITHM MARKOV PROCESS CHARACTERS OCR OCR OUTPU REPRESENTING ENGLISH yI * DEVICE zI Fig. 6. Use of VA to improve character recognition by exploiting context. asfeatures isolated, and some decision made to what letter orother character lies below the reader. When the charactersactually occur as part of natural-language text, it has longbeen recognized that contextual information can be used toassist the reader in resolving ambiguities. One way of modeling contextual constraints is to treat anatural language like English as though it were a discrete-time Markov process. For instance, we can suppose that theprobability of occurrence of eachletterdependsonthe v 01 IO 11 2K:::wpreviousletters,andestimatetheseprobabilitiesfromthe (b)frequencies of (vf1)-letter combinations[(v+l)-grams].Whilesuchmodelsdonotfullydescribethegeneration of Fig. 7. (a) State diagram of a four-state shift-register process. (b) Trellis f o r a four-state shift-register process.natural language(for examples of digram and trigram English,see Shannon [45], [46]), they account for a large part of t h estatistical dependencies language are in the and easily instant of time. The trellis begins and endsthe knownstates athandled. x. and XK. Its most important property is that every possi- to With such a model, English letters are viewed as the out- ble.state sequence x there corresponds a unique path throughputs of a n m’-state Markov process, wherem is the number of the trellis, and vice versa.distinguishable characters, such as 27 (the 26 letters and a Now we show how, given a sequence of observations I,space). If it is further assumed that the OCR output zk is every path may be assigned a “length” proportional to -Independent only on the corresponding input character then P ( x , z), where x is the state sequence associated with that yk,the OCR reader isa memoryless channel to whose output se- path. This will allow us t o solve the problem of finding t h e quencewemayapplytheVA to exploitcontextualcon- state sequence for whichP(xl z) is maximum, or equivalentlystraints; see Fig. 6. Here the ‘‘OCR output” may be anything for which P ( x , z) = P ( x i z ) P ( z ) is maximum, by finding the afrom the raw sensor data, possibly grid of zeros and ones, to path whose length --in P ( x , Z) is minimum, since In P ( x ,t) isthe actual decision which would be made by the reader in the a monotonic function of P ( x , Z) and there isa one-to-one cor- absence of contextual clues. Generally, the more raw the data, respondencebetweenpathsandsequences.Wesimply ob-the more useful the VA will be. servethatduetotheMarkovandmemorylessproperties, P ( x , z) factors as follows:E . Othw Recognizing certain similarities between magnetic record- P(x, 2) = 1 P(x)P(z x )ing media and intersymbol interference channels, Kobayashi[SO] has proposed applying the VA to digital magnetic re-cording systems. Timor [SI] has applied the algorithm to asequentialrangingsystem.Use of thealgorithminsource Hence if weassigneachbranch(transition)the ‘‘length”coding has been proposed [52]. Finally, Preparata and Ray [53] have suggested using the algorithm to search ‘‘semanticmaps”insyntacticpatternrecognition.Theseexhausttheapplications known to the author. then the total length of the path corresponding to some x is 111. THEALGORITHM We now show that the MAP sequence estimation problem k=Opreviously stated is formally identical to the problem find- of as claimed.ing the shortest route through a certain graph. The VA then Finding the shortest route througha graph is an old prob-arises as a natural recursive solution. lem in operations research. The most succinct solution was Ye areaccustomedtoassociatingwith a discrete-time givenbyMintyin a quarter-pagecorrespondencein 1957finite-state Markov process a state diagramof the type shown [ 5 5 ] , which we quote almost in its entirety:in Fig. 7(a), for a four-state shift-register process like that of . . . as follows: Build problem .model of the travel network,Fig. 3 or Fig. 4 (in this case, a de Bruijn diagram [54]). Here The shortest-route . . can be solved verysimply a string~K~des representstates,branchesrepresenttransitions,and where knots representcities and string lengths represnt dis-over the course of time the process traces some path from state tances (or costs). Seize the knot ‘Los Angeles”in your leftto state the diagram. throughstate hand and knot the ’Boston” in your right and pull them apart.
  5. 5. 272 PROCEEDINGS OF THE IEEE, MARCH 1973 Unfortunately, the Minty algorithm is not well adapted to modern methods of machine computation, nor are assis- tants as pliable as formerly. I t therefore becomes necessary to move on to the VA, which is also well known in operations 1 1 research [ 5 6 ] . I t requires one additional observation. 1 We denote by xok a segment (xo, xl, . , xk) consisting of the states to time k of the state sequence x = (xo, x 1 , * , x k ) . I n t h etrellis xok corresponds to a path segment starting t h e at node x0 and terminatingat Xk. For any particular time-k node %kt there will in general be several such path segments, each with some length k-1 UXOk> = 6 0 m>. The shortest such path segment is called the surriuor corre- sponding to the node x k , and is denoted ? ( X k ) . For any time k>O, there are M survivors in all, one for each k . The obser- x vation is this: the shortest complete path i must begin with one of these survivors. (If it did not, but went through state 7 Z k at time k , thenwecouldreplaceitsinitialsegmentby f ( x k ) to get a shorter path-contradiction.) T h u s at any time k we need remember only the M sur- vivors f(xk) and their lengths l(xk) p X [ f ( x k ) ] . To get to time k f l , we need only extend all time-k survivors by one time of unit, compute the lengths the extended path segments, and for each node xk+l select the shortest extended path segment terminatingin xk+1 as thecorrespondingtime-(kfl)sur- vivor. Recursion proceeds indefinitely without the number of survivors ever exceedingM . r3 Many readers will recognizethisalgorithm as a simple [%I. version of forward dynamic programming [57], By this or anyothername,thealgorithmiselementaryoncethe problem has been cast in the shortest route form. k=5 We illustrate the algorithm for a simple four-state trellis covering 5 timeunitsinFig. 8. Fig. 8(a)showsthecom- plete trellis, with each branch labeled with a length. (In a real (b) application, the lengths would be functions of the received Fig. 8. (a) Trellis labeled with branch lengths; M = 4, K = 5. data.)Fig.8(b)showsthe 5 recursivestepsbywhichthe (b) Recursive determination of the shortest path via the VA. algorithm determines the shortest path from the initial to the final node. At each stage only t h e 4 (or fewer) survivors are shown, along with their lengths. for each x k + l ; store r ( x r + l ) and the corresponding survivor A formal statement of the algorithm follows: i(Xk+l). S e t k to k+1 and repeat untilk = K . Viterbi Algorithm With finite state sequences x the algorithm terminates at time K with the shortest complete path stored the survivor as Storage: ?(xK).(time) k ; ?(xk), 1< x k < & f (survivor terminating in xk): Certain trivialmodifications necessary practice. are in r(xk), 1<xk <M (survivor length). When state sequences are very long or infinite, it is necessary to truncate survivors to some manageable length 6. In other Initialization: words,thealgorithmmustcometo a definitedecision o n k = 0.; nodes up to time k-6 at time k. Note that in Fig. 8(b) all time-4 survivors go through the same nodes up t o time 2. In X(xo) = x o ; i(m) arbitrary, m # xo; general, if thetruncationdepth 6 ischosenlargeenough, r(Xo) 0 ; = r(m) = Q), m # xo. there is a high probability that all time-k survivors will go through the same nodes up to time k - 6 , so that the initial Recursion: Compute segment of the maximum-likelihood path is known up to time k-6 and can be put out as the algorithms firm decision; in this case, truncation costs nothing. In the rare cases when for all = ( x k + l , xk). survivors disagree, any reasonable strategy for determining Find thealgorithms time-(k-6)decision will work [20], [21]: choose an arbitrary time-(k-6) node, or the node associated with the shortest survivor, or node chosen by majority vote, a
  6. 6. FORKEY: THE VITERBI ALGORITHM 273etc. If 6 is large enough, the effect on performance is negligi-ble. Also, if k becomes large, it is necessary to renormalize thelengths r ( m ) from time to time by subtracting a constantfrom all of them. Fig. 9. Typical cell of a shift-register process trellis. Finally, the algorithm may be required get started with- toout knowledge of the initial state xg. In this case it may beinitializedwithanyreasonableassignment of initialnodelengths, such as r(m)=0, all m, or else r(m)= -In r,,,if t h e 4 IIstates are known to have a priori probabilities r,,,. Usually,after an initial transient, there is a high probability that allsurvivors will merge with the correct path. Thus the algorithmsynchronizes itself without any special procedures. The complexity of the algorithm is easily estimated. First,memory: the algorithm requires M storage locations, one foreach state, where each location must be capable of storing a / ,, i(Uii,-. II 1, &r ,, i(Cii,-. ll 9+ 9 SELECT SELECT SELECT SELECT 1-OF- 2 I-OF-2 I-OF-2 1-OF-2“length” r ( m ) and a truncatedsurvivorlisting - % ( m ) of 6symbols. Second, computation: each in unitalgorithm must make IEI additions, one for each transition, of timethe ! {; *and M comparisons among the ] E l results. Thus the amount - y - - - - l TO ANOTHER LOGIC UNIT TOANOTHER LOGIC UNITof storage is proportional to the number of states, and theamount of computation to the numberof transitions. ‘ith a Fig, 10. Basic VA logic unit for binary shift-register process.shift-register process, M = my and 1 El = mV+l, that the com- soplexity increases exponentially with the length v of the shift Fourier transform (FFT). In fact, it is identical, except forregister. length, and indeed the FFT is also ordinarily organized cell- In the previous paragraph, have ignored the complexity wise. Li‘hile the add-and-compare computations of the we VAinvolvedingeneratingtheincrementallengths X(&). In a are unlike those involved in the FFT, some of the memory-shift-register process, it is typically true that P ( x k + l I xk) is organization tricks developed for the FFT may be expectedeither l/m or 0, depending on whether xk+l is an allowable suc- to be equally useful here. cessor t o xk or not; then all allowable transitions have the Because of its highly parallel structure and need for onlysame value of -In P(xk+lI xk) and this component of X(&) add, compare, and select operations, the VA is well suited to may be ignored. Note that in more general cases P(xk+l I xk) is high-speed applications. A convolutional decoder for a v = 6 known in advance: hence this component can be precomputed code (Ji= 64, / E l = 128) t h a t is built out of 356 transistor- a n d ‘wired in.” The component -In P ( z k / 4;c) is the only com- transistor logiccircuitsandthatcanoperate at u pt o 2 ponentthatdepends on the data; again, it is typical that Mbits/s perhaps represents the current state of the art [22]. many f k lead to the same output y k , andhencethevalue Even a software decoder for a similar code can be run at a -In P ( z k 1 yk) need be computed or looked up for all these .$x. rate order the of 1000 bits/s on a minicomputer. Such only once, givenz k . (Then the noise is Gaussian, -In P ( z k I y k ) moderate complexity qualifies the VA for inclusion in many is proportional simply to ( z k - y k ) * . ) Finally, all of this can be signal-processing systems. done outside the central recursion (“pipelined”). Hence the For further details of implementation, see [22], [23], a n d complexity of this computation tends not to be significant. the references therein. Furthermore, note that once the X&) are computed, the observation Zk need not be stored further, an attractive fea- IV. ASALYSIS PERFORMAXE OF ture in real-time applications. Just as important as thestraightforwardness of imple- A closer lookat t h e trellis of a shift-register process reveals mentation of t h e VA is the straightforwardness with which its additional detail that can be exploited in implementation. Forperformance can be analyzed. In many cases, tight upper and a binary shift register, the transitions in any unit time can lower bounds for error probability can be derived. Even when of be segregated into 2y-1 disjoint groups of four, each originat- t h e VA is not actually implemented, calculation of its per- ing in a common pair of states and terminating in another formanceshowshowfartheperformance of lesscomplex common pair. A typical such cell is illustrated in Fig. 9, with schemes is from ideal, and often suggests simple suboptimum the time-k states labeled and x‘l and the time-(k+l) statesschemes that attain nearly optimal performance. x’0 labelled Ox‘ a n d lx’, where x’ stands fora sequence of v - 1 bits Thekeyconceptinperformanceanalysisisthat of a n that is constant within any one cell. For example, each time errorevent.Let x be the actual state sequence, and 4 the unit in the trellis of Fig. 8 is made up of two such cells. Ye state sequence actually chosen by the VA. Over a long time x note that only quantities within the same interact in any and i will typically diverge and remerge a number of times, cell onerecursion.Fig. 10 shows a basiclogic m i t that imple- as illustrated in Fig. 11. Each distinct separation is called a n mentsthecomputationwithinanyone cell. A high-speed errorevent.Erroreventsmayingeneralbe of unbounded parallel implementation of the algorithm can be built around length if x is infinite, but the probability of a n infinite error 2-l such identical logic units; a low-speed implementation, event will usually be zero. aroundonesuchunit,time-shared; or a softwareversion, The importance of erroreventsisthattheyareprob- around the corresponding subroutine. abilistically independent of one another; in the language of Many readers will have noticed that the trellis Fig. 7(b) probability theory they are of recurrent. Furthermore, they allow reminds them of the computational flow diagram of the fast us to calculate error probability per unit time, which neces- is
  7. 7. 274 PROCEEDINGS OF THE IEEE, u 1973 r n k.0 LrK - u w - W Fig. 11. Typical correct path X (heavy line) and estimated path 2 (lighter line) in the trellis, showing three error events.sary since usually the probability any error in MAP estima- of Fig. 12. Typical correct path X (heavy line) and time-k incorrecttion of a block of length K goes t o 1 as K goes to infinity. subset for trellis of Fig. 7 b . ()Instead we calculate the probabilityof an error event startingat some given time, given that the starting state is correct,i.e., that an error event is not already in progress at that time. Given the correct path X , the set &k of all possible error ... . ..events startingat some time k is a treelike trellis which startsat x k and each of whose branches ends on the correct path, asillustratedinFig. 12, for the trellis of Fig.7(b).Incoding Fig. 13. Trellis for continuous-phase FSK.theory this is called the incorrect subset (at time ) . R The probability of any particular error event is easily cal-culated; it is simply the probability that the observations willbe such that over the time span during which 4 is differentfrom X , f is more likely thanX . If the error event has length T,this is simplya two-hypothesis decision problem between twosequences of length T, and typically has a standard solution. Fig. 14. Typical incorrect subset. Heavy line: correct p t . ah The probability P ( & k ) t h a t any error event in occurs can &k Lighter lines: incorrect subset.then be upper-bounded, usually tightly, by a union bound,i.e., by the sum of the probabilities of all error events in &k.While this sum may well be infinite, it is typically dominated a typical incorrect subset in Fig. 14. The shortest possibleby one or a few large leading terms representing particularly error event is of length 2 and consists of a decision t h a t t h elikely error events, whose sum then forms a good approxima- signalwas{cos w(l)t, -cos o(1)t) rather than {cos w(O)t,tion to P(&k).True upper bounds can often be obtained by cos w ( 0 ) t 1, or in our coordinate notation that(0, (0, - 1) ] { l),flow-graph techniques [4], [SI, [%I. is chosen over { (1, 0),(1, 0) 1. This is a two-hypothesis de- On the other hand, a lowerbound to error-event prob- cisionproblemin a four-dimensionalsignalspace [59]. I nability,againfrequentlytight,canbeobtainedbyagenie Gaussian noise of variance u* per dimension, only the Euclid-argument. Take the particular error event that has the great- ean distance d between the two signals matters; in this case-est probability of all those in &k. Suppose thata friendly genie d = 4 3 , and therefore the probability of this particular errortells you that the true state sequence is one of two possibili- eventis Q(d/%)=Q(l/ad/Z), where Q(x) istheGaussianties: the actual correct path, the incorrect path correspond- error probability function defined by oring to that error event. Even with this side information, youwill still make an error if the incorrect path ismorelikelygiven z, so your probability of error is still no better than theprobability of this particular error event. In the absencethe ofgenie, your error probability must be worse still, since one of By examination of Fig. 14 we see that the error events ofthe strategies you have, given the genie’s information, is to lengths 3, 4, . . lie at distances 4 6 , dG, . from theignore it. In summary, the probability of any particular error correct path; hence we arrive at upper and lower bounds onevent isa lower bound toP ( & k ) . P ( & k ) of An important side observation is that this lower bound VA.applies to any decision scheme, not just the Simple exten- + Q ( 4 2 / 2 u ) 2 P(&k) 5 Q ( d 2 / 2 ~ ) Q(d6/eu)sions give lower bounds to related quantities like bit prob- + + Q(4i6/2u) . ..*ability of error. If the VA givesperformanceapproachingthese bounds, then it may be claimed that it is effectively opti- I n view of the rapid decreaseof Q ( x ) with x , this implies thatmum with respect to these related quantities as well [3O]. P ( & k ) is accurately estimated as Q(d/2/2u) for any reasonable In conclusion, the probability of any error event starting noise variance a*. I t is easily verified from Fig. 13 that thisat time k may be upper- and lower-bounded as follows: result is independentof the correct pathx or the timeK. This result is interesting in that the best one can do withmax P(error event) 5 P(&k) 5 max P(error event) coherent detection in a single symbol interval is Q(1/2a) for + other terms. orthogonal signals. Thus exploiting the memory doubles the effective signal energy, or improves the signal-to-noise ratioWith luck these bounds will be close. by 3 dB. It may therefore be claimed that continuous-phase FSK is inherently 3 dB better than noncontinuous for devia-Example tion ratio 3, or as good as antipodal phase-shift keying. (While For concreteness, and because the resultis instructive, we we have proved this only a deviation ratio of ), it holds for forcarry through the calculation for continuous-phase FSK of nearly any deviation ratio.) Even though the is quite sim- VAtheparticularlysimpletype definedearlier.Thetwo-state ple for a two-state trellis, the fact that only one type error oftrellis for this process is shown in Fig. and the first part of event has any significant probability permits still simpler sub- 13,
  8. 8. FORNEY: THE VITERBI ALGORITIIM 215optimum schemes to achieve effectively the same performance formance superior to all other coding schemes save sequential[431,[441.’ decoding,anddoesthis at highspeeds,withmodestcom- plexity,andwithconsiderablerobustnessagainstvarying V. APPLICATIONS channelparameters. A number of prototypesystemshave We conclude with a review of the results, both theoretical been implemented and tested [21]-[26], some quite original,and practical, obtained with the VA to date. and it seems likely that Viterbi decoders become common will in space communication systems.A , ConvolutionalCodes Finally, Viterbi decoders have been used as elements in It was for convolutional codes that the algorithm was first very-high-performance concatenated coding schemes [lo]-developed, and naturally it has had its greatest impact here. [12], [27] andindecoding of convolutionalcodesinthe The principal theoretical result is contained in Viterbi’s presence of intersymbol interference [35], [6O].original paper [ l ] ;see also [SI-[7]. I t shows that for a suita-bly defined ensemble of random trellis codes and MAP decod- B . IntersymbolInterferenceing, the error probability can be made to decrease exponen- Application of the VA to intersymbol interference prob-tially with the constraint length a t all code rates R less t h a n lems i s more recent, and the main achievements have been vchannel capacity. Furthermore, the rate of decrease is con- theoretical. The principal result, for PAM in white Gaussiansiderably faster than for block codes with comparable decod- noise, is t h a t P ( & k ) can be tightly bounded as follows:ing complexity (although the same as that for block codeswith the same decoding delay). In our view, the most telling K ~ Q ( d m i n / 2I & k ) I ~ )P ( KrQ(drnirJ2~)comparison between block and convolutional codes is that an where K L and Kv are small constants, Q(x) is the Gaussianeffectively optimum block code of a n y specified length and error probability function defined earlier, u? is t h e noise vari-rate can be created by suitably terminating a convolutional ance, and dmin is the minimum Euclidean distance between(trellis)code, but with a lowerratefortheblockcode of any two distinct signals[29]. This result implies that on mostcourse [7]. In fact, if convolutional codes were any better, channels intersymbol interference need not lead to any sig-they could be terminated to yield better block codes than are nificant degradation in performance, which comes as rather atheoretically possible-an observation which shows that the surprise.bounds on convolutional code performance must be tight. Forexample,withthemostcommonpartialresponse For fixed binaryconvolutionalcodes of nonasymptotic systems, the VA recovers the 3-dB loss sustained by conven-length on symmetric memoryless channels, principal the tional detectors relative to full-response systems [29],[32].result [6] is that P ( & k ) is approximately given by Simplesuboptimumprocessors [29],[33] candonearly as P(&k) ;r,2-dD well. Several workers [34],[35],[42] have proposed adaptivewhere d is the free distance, Le., the minimum Hamming dis- versions of the algorithm unknown time-varying for andtance of any path in the incorrect subset &h from the correct channels. Ungerboeck [41], [42] and hlackechnie [35]path: N d is the number of such paths; and D is the Bhat- have shown that only matched filter rather thana whitened atacharyya distance matched filter is needed in PAM. D = logs P ( z O)l’*P(z 1)l I 2 1 I I t seems most likely that the greatest effect of the VA on digital modulation systems will be t o reveal those instances in Z which conventional detection techniques significantly fall z inwhere the sum is over all outputs the channel output spaceshort of optimum, to and suggest effective suboptimum2. methods of closing the gap. PAM channels that cannot be On Gaussian channels linearly equalized without excessive noise enhancement due t o nulls or near nulls in the transmission band are the likeliest candidates for nonlinear techniques of this kind.where & / N O is the signal-to-noise ratio per information bit.The tightness of this bound is confirmed by simulations [8], C. Text Recognition [9], [22]. For a v = 6 , d = 10, R=+ code, for example, error An experiment was run in which digram statistics wereprobabilities o l e * , lC5,a n d le7 achieved at Eb/No used t o correct garbled text produced by a simulated noisy f are =3.0, 4.3, a n d 5.5 dB,respectively,which is within a few characterrecognizer [47]. Resultsweresimilartothose oftenths of a dceibel of this bound [22]. Raviv [48] (using digram statistics), although the algorithm Channels available space for communications fre-are was simpler and only had hard decisions rather than confi-quently accurately modeled as white Gaussian channels. The dence levels t o work with. It appears that the algorithm mayVA attractive such is for channels because gives it per- be a useful adjunct t o sophisticated character-recognition sys- tems resolving for ambiguities when confidence levels for * De Buda [44] actually proves thatanoptimum decision onthe different characters are available.phase x t at time k can be made by examining the received waveform onlyat times k- 1 and k ; Le., ( z o . ~ - I z ~ t - l ) (ZU, w ) . The proof is that the log , , VI. CONCLUSIOKlikelihood ratio P ( Z 0 . k - 1 , Z1.k-I, ZOk, Zlk I Xk-1, Xk 0, The VA has already hadsignificant impact on our under- a -In standing of certain problems, notably in the theories convo- of P ( Z 0 . t - 1 , Z1.t-1, Z O t , &t-1, Xk = *, Y t + d lutional codes and of intersymbol interference. I t is beginningis proportional to - z ~ , t - l + z ~ . ~ - 1 - z u - -for any values of the pair of ~~states ( ~ l t - 1 ~ xk+1). For this phase decision (which differs slightly from our to have a substantial practical impact as well in the engineer-sequence decision) the error probability is exactly Q(&/2n). ing of space-communication links. The amount work it has of
  9. 9. 276 PROCEEDINGS OF THE IEEE, MARCH 1973inspired in the intersymbol interference area suggests that Now we note the recursive formulahere too practical applications are not far off. The generalityof the model t o which it applies and the straightforwardness p(%,zOL-) = p ( z k ,k - 1 , x ZO") Zk-1with which it can be analyzed and implemented lead one tobelieve that in both theory and practice it will find increasing = p ( z k - 1 ,O " ) p ( z k ,- 1 Z 4 1 zk-1) zk-1application in the years ahead. M which allows us to calculate the quantities P(%, ~0") from APPENDIX the M quantities P ( x ~ 1zo"*) with IZI multiplications and , RELATED ALGORITHMS additions using the exponentiated lengths In this Appendix we mention some processing structuresthat are closely related t o t h eVA, chiefly sequential decoding = p ( z k %k-l)P(zk-l I &-I). Iand minimum-bit-error-probability algorithms. We also men- Similarly we have the backward recursiontion extensions of the algorithm to generate reliability infor-mation, erasures, and lists. p(Zk" xk) = 1 p(zkK,x k + l Z k ) I ZW1 When the trellis becomes large, it is natural to abandonthe exhaustive search of the VA in favorof a sequential trial- = p ( z kx k + l , xk)p(zk+l" I Zk+d I Zkiland-error search that selectively examines only those pathslikely t o be the shortest. In the coding literature, such al- which has a similar complexity. Completion of these forwardgorithms are collectively known as sequential decoding [Is]- and backward recursions for all nodes allowsP ( x k , Z) and/or [l6]. The simplest to explain is the stack" algorithm [IS], P(&,z) t o be calculated for all nodes. [16], which a list is maintainedof the shortest partial paths in Now, to be specific, let us consider a shift-register processfound to date, the path on the top the list is extended, and and let S(uk) be the setof all states zel whose first component ofits successors reordered in the list until some path is found is %k. Thent h a t reachesthe &eLminalnode, or elsedecreaseswithout p ( u k , 2) = p ( x k + l , 2)-limit. (That some pathwill eventually do so is ensured in cod- Zk+lES(Ut)ing applications by the subtraction of a bias term such thatthe length of the correct path tends to decrease while thatof Since P(uk, z) =P(ukI z ) P ( z ) , MAP estimation of u k reducesall incorrect paths tends to increase.) Searches that start from o findingthemaximum t of thisquantity.Similarly, if weeither end of a finite trellis [I71 are alsouseful. wish t o find the MAP estimate of an output y k , say, then let In coding applications, sequential decoding has many of S(p) be the setof all & that leadto yk and computethe same properties as Viterbi decoding, including the sameerror probability. I t allows the decoding of longer and there- P(Yk, 4 = p ( f k , 2). €t= k t )fore more powerful codes, at the cost of a variable amount ofcomputationnecessitatingbufferstoragefortheincoming A similar procedure can be used to estimate any quantityd a t a z. It is probably less useful outside of coding, since i t which isa deterministic functionof states or transitions.depends on the decoders ability to recognize when the best Besidesrequiringmultiplications,thisalgorithmislesspathhasbeenfoundwithoutexaminingotherpaths,and attractive than the VA in requiring a backward as well as atherefore requires eithera finite trellisor a very large distance forward recursion and consequently storage of all data. Thebetween the correct path and possible error events. following amended algorithm [39], [48] eliminates the latter Intheintersymbolinterferenceliterature,many of the ugly feature at the cost of suboptimal performance and addi- find usearly attempts to optimum nonlinear algorithms used bit- tional computation. Let restrict ourselves to shift-register aerrorprobability as theoptimalitycriterion.TheMarkov process with input sequence u and agree to use only observa-property of the process leads to algorithms that are manage- tions up to time k+6 in estimating U k , say, where 6 2v- 1. W eable but less attractive than the Viterbi [36]-[42]. then have The general principle of several of these algorithms is as P ( u k , 20") = * * p(Ukk*, ZOw").follows.8 First, we calculate the joint probability P ( x k , z) for uti1 Uti6every state x k in the trellis, or alternately P(&,z) for everytransition &. This is done by observing that Thequantities in thesumcan be determinedrecursively by I p ( z k , 2) = p ( x k , ZO")p(ZkK x k , 20") = = p(xk, I ZOk1)p(zkK xk) p ( U k k M , ZOk*) UL-1 p(Uk-lk", ZOkM)since, given Xk, the outputs zkK from time k t o K are indepen- = p(uk-lkul, ZOku 1d e n t of the outputs zoL- from time 0 t o K - 1 . Similarly ut-1 p(uk+d, &+d I W k Y l , * uk+6-r). While the recursion is now forward only, we now must store m quantities rather than m. If I is large, this is most un- * attractive; if 6 is close t o v-1, the estimate may well be de- cidedly suboptimum. These variations thus seem considerably less attractive The author is indebted to Bahl et al. 1181 foraparticularly lucid t h a n t h e VA. Nonetheless, something like this may need toexposition of this type of algorithm. be hybridized with the VA in certain situations suchtrack-as
  10. 10. FORNEY: THE VITERBI ALGORITHM 277ing a finite-state source overa finite-state channel, where only space communication,” I E E E Trans. Commun. Technol., vol. COM- 19, pp. 835-847, Oct. 1971.the state sequence the source is interest. of of [23] G. C. Clark, Jr., and R. C. Davis, “Two recent applications error- of Finally, we can consider augmented outputs from the VA. correction coding to communications system design,” I E E E Trans.A good general indication of how well the algorithm is doing [24] Commun. Technol., vol. COM-19, pp. 856-863,TDMA satellite com- I. M. Jacobs and R. J. Sims, “Configuring a Oct. 1971.is the depth which all paths are merged; this can be to at used munication system with coding, in Proc. 5th Hawaii Int. Conf. Sys-establish whether or not a communications channel is on the tems Science. Honolulu, Hawaii:WesternPeriodicals, 1972, pp. 443-446.air, in synchronism, etc. l l ] ,[28]. A more selective indicator 1251 A. R. Cohen, J. A. Heller, andA. J. Viterbi, “A new coding technique [of howreliableparticularsegmentsare is the difference in forasynchronousmultiple access communication,” I E E E Trans.lengths between the best and the next-best paths the point at Commun. Technol., vol. COM-19, pp. 849-855, Oct. 1971. (261 D. Quagliato,“Errorcorrecting codes applied to satellitechan-of merging; this reliability indicator can be quantized into an nels,= in 1972 I E E E Int. Conf. Communications (Philadelphia, Pa.),erasure output. Lastly, the algorithm can be altered to store pp. 15/13-18.t h e L best paths, rather than the single best path, as the sur- [27] Linkabit NAS2-6722,coding Ames Res. Cen., Rep. Field, Calif., Contract Corp.,“Hybrid NASA systemstudy,” Final Moffett onvivors in each recursion, thus eventually generating a list of NASA Rep. CR114486, Sept. 1972.t h e L most likely path sequences. [281 G. C. Clark, Jr., and R. C. Davis, “Reliability-of-decodingindicators for maximum likelihood decoders,” in Proc. 5th Hawaii I n f . Conf. SystemsScience. Honolulu, Hawaii: Western Periodicals, 1927, pp. REFERENCES 447-450.Conuolutwnal Codes Intersymbol Interference (291 G. D.Forney,Jr.,“Maximum-likelihoodsequenceestimation of [l] A. J. Viterbi, “Error bounds for convolutional codes and an asymp- digital sequences in the presence of intersynlbol interference,” E E E I toticallyoptimumdecoding algorithm,” I E E E Trans. Inform. Trans. Inform. Theory, vol. IT-18, pp. 363-378, May 1972. Theory, vol. IT-13, pp. 260-269, Apr. 1967. [30] -, “Lower bounds on error probability in the presence of large 121 G. D.Forney,Jr.,“Convolutional codes I: Algebraic structure,” intersymbol interference,” I E E E Trans.Commun. Technol. (Cor- I E E E Trans. Inform. Theory, vol. IT-16, pp. 720-738, Nov. 1970. resp.), vol. COM-IO, pp. 76-77, Feb. 1972. [3] - , “Review of randomtreecodes,” NASA Ames Res. Cen., [31] J. K. Omura, optimum ”On receivers forchannels inter- with Moffett Field, Calif., Contract NAS2-3637, NASA CR 73176, Final symbol interference,” abstract presented at the IEEE Int. Symp. Rep., Dec. 1967, appendix A. Information Theory, Noordwijk, The Netherlands, June 1970. [4] G. C. Clark, R. C. Davis,J. C. Herndon, and D. D. McRae, “Interim [32] H. Koba;ashi, ”Correlative level codingand maximum-likelihood report on convolution coding research,” Advanced System Opera- decoding, I E E E Trans.Inform.Theory, vol. IT-17, PP. 586-594, tion, Radiation Inc., Melbourne, Fla., Memo Rep.38, Sept. 1969. Sept. 1971. (5) A. J. Viterbi and J. P. Odenwalder, “Further results on optimal de- (331 M. J. Ferguson, ”Optimal receptionfor binary partial response chan- coding of convolutional codes,” I E E E Trans. Inform. Theory (Cor- nels,” Bell Syst. Tech. J . , vol. 51, pp. 493-505, Feb. 1972. resp.), vol. IT-15, pp. 732-734, Nov. 1969. (341 F. R. Magee, Jr., and J. G. Proakis, “Adaptive maximum-likelihood [a] A. J. Viterbi, “Convolutional codes and their performance in com- sequence estimationfor digital signaling in thepresence of intersym- municationsystems,” I E E E Trans. Commun. Technol., COM-19,vol. bo1 interference,” I E E E Trans. Inform.Theory (Corresp.), vol. pp. 751-772, Oct. 1971. IT-19, pp. 120-124, Jan. 1973. [7] G. D.Forney,Jr.,“Convolutional codes 11: Maximum likelihood (351 L. K. Mackechnie, ”Maximum likelihood receivers for channels hav- decoding,’! Stanford Electronics Labs., Stanford, Calif., Tech. Rep. ing memory,’ Ph.D. dissertation, Dep. Elec. Eng., Univ. of Notre 7004-1, June 1972. Dame, Notre Dame, Ind., Jan. 1973. 181 J. A. Heller, “Short constraint length convolutional codes,” in Space J. [36] R. W. Chang and C. Hancock, “Onreceiver structures for channels Program Summary 37-54, vol. 111. Jet Propulsion Lab., Calif. Inst. having memory,’ I E E E Trans. Inform. Theory, vol. IT-12, pp. 463- Technol., pp. 171-177, 0ct.-Nov. 1968. 468, Oct. 1966. (91 - , “Improved performance of short constraint length convolu- [37] K.Abend,T. J. Harley,Jr., B. D.Fritchman,and C. Gumacos, tional codes,” in Space Program Summary37-56,vol. 111. Jet Propul- “On optimum receivers for channels having memory,”I E E E Trans. sion Lab., Calif. Inst. Technol., pp. 83-84, Feb.-Mar. 1969. Inform. Theory (Corresp.), vol. IT-14, pp. 819-820, Nov. 1968.[lo] J. P. Odenwalder, “Optimal decoding of convolutional codes,” Ph.D. [38] R. R. ?wen, “Bayesian decision procedures for interfering digital dissertation,Dep.Syst. Sci., Sch.Eng. Appl. Sci., Univ. of Cali- signals, I E E E Trans. Inform. Theory (Corresp.), vol. IT-15, pp. fornia, Los Angeles, 1970. 506-507, July 1969.[Ill J. L. Ramsey,“Cascadedtreecodes,” M I T Res. Lab.Electron., [39] K. Abend and B. D. Fritchman, “Statistical detection for communi- Cambridge, Mass., Tech. Rep. 478, Sept. 1970. cationchannelswithintersymbolinterference,” Proc. I E E E , vol.[12] G. W. Zeoli, “Coupled decodingof block-convolutional concatenated 58, pp. 779-785, May 1970. codes,”Ph.D.dissertation,Dep. Elec. Eng.,Univ. of California, (401 C. G. Hilborn, Jr., “Applications of unsupervised learning to prob- Los Angeles, 1971. lems of digital communication,” in Proc. 9th I E E E Symp. Adaptive[I31 J. M. Wozencraft,“Sequential decoding for reliable communica- Processes, Decision, and Control (Dec. 7-9, 1970). tion,” in 1957 I R E Nat. Conv. Rec., vol. 5, pt. 2, pp. 11-25. C. G. Hilborn, Jr., andD.G.Lainiotis,“Optimalunsupervised[I41 R. M. Fano,“A heuristic discussion of probabilistic decoding,”I E E E learning multicategory dependent hypotheses pattern recognition,” Trans. Inform. Theory,vol. IT-9, pp. 64-74, Apr. 1963. I E E E Trans. Inform. Theory, vol. IT-14, pp. 468-470, May 1968.1151 K. S. Zigangirov, “Some sequentialdecodingprocedures,” Probl. [41] G. Ungerboeck, “Nonlinear equalization of binary signals in Gaus- Pered. Inform., vol. 2, pp. 13-25, 1966. sian noise,”I E E E Trans. Commun. Technol., vol. COM-19, pp. 1128-[161 F. Jelinek, “Fast sequential decoding algorithmusing a stack,” I B M 1137, Dec. 1971. J . Res. Develop., vol. 13, pp. 675-685, Nov. 1969. [42] --, ”Adaptive maximum-likelihood receiver for carrier-modulated[171 L. R. Bahl, C. D. Cullum, W. D. Frazer, and F. Jelinek, ”An effi- data transmission systems,” in preparation. cient algorithm for computing free distance,” IEEE Trans. Inform. Theory (Corresp.), vol. IT-18, pp. 437-439, May 1972. Continuous-Phase FSK[ l a ] L. R. Bahl, J. Cocke,F. Jelinek, and J. Raviv, “Optimal decodingof [43] M. G. Pelchat, R. C. Davis, and M . B. Luntz, “Coherent demodula- linear codes for minimizing symbol error rate’ (Abstract), in 1972 tion of continuous-phase binary FSK signals,” inProc. Inl. Telemetry In#. Symp. InformationTheory (Pacific Grove, Calif., Jan.1972), Conf. (Washington, D. C., 1971). p. 90. [44] R. de Buda, “Coherent demodulation of frequency-shift keying with1191 P. L. McAdam, L. R. Welch, and C. L. Weber, “M.A.P. bit decoding low deviation ratio,”I E E E Trans. Commun. Technol.,vol. COM-20, of convolutional codes,” in 1972 I n f . Symp. Informafion Theory pp. 429-435, June 1972. (Pacific Grove, Calif., Jan. 1972), p. 91. Text Recognition [45] C. E. Shannon and W. Weaver, The MUthemdica! Theory of Com-Space Applications munication. Urbana, Ill.: Univ. of Ill.Press, 1949.[20] Linkabit Corp., “Coding systems study for high data rate telemetry [46] C. E. Shannon, “Prediction and entropy of printed English,” Bell links, ’Final Rep. on N4SA Ames Res. Cen., Moffett Field, Calif., Syst. Tech. J . , vol. 30, pp. 50-64, Jan. 1951. Contract NASZ-6024, Rep. CR-114278, 1970. (471 D. L. Neuhoff, “The Viterbi algorithm as aid in textrecognition,’ an[21] G. C. Clark, “Implementation of maximum likelihood decoders for Stanford Electronic Labs., Stanford, Calif., unpublished. convolutional codes,” in Proc. Int. Telcmetering Conf. (Washington, (481 J. Raviv, “Decision making in Markov chains applied to problem the D. C., 1971). of pattern recognition,” I E E E Trans.Inform.Theory, vol. IT-13,[22] J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite and pp. 536-551, Oct. 1967.
  11. 11. 218 PROCEEDINGS OF THE IEEE, MARCH 1973Miscellaneous [54] S. W. Golomb, Shift Regisln Sequcrrccs. San Franasm, Calif.: Holden-Day, 1967, pp..13-17.[49] H.Kobayashi,“Asurvey of codingschemesfortransmissionor [ 5 5 ] G. J. Minty, ‘ comment on the shortest-route problem,” O p a . A recording o digital data,” IEEETrans.Commun.Technol., f vol. Res., vol. 5 , p. 724, Oct. 1957. COM-19, pp. 1087-1100, Dec. 1971. [56] M. Pollack andW. Wiebenson,“Solutions of the shortest-route[SO] -, “Application of probabilistic decoding t o digital magnetic re- problem-A review,” O p a . Res., vol. 8, pp. 224-230, Mar. 1960. cordingsystems,” I B M J . Res. Develop., vol. 15, pp. 64-74, Jan. 1971. [57] R. Busacker and T. Saaty, Finite Graphs and Networks: A n Z n t r o - duction with Applications. New York: McGraw-Hill, 1965.[51] U. Tirnor, ‘Sequential ranging with the Viterbi algorithm,” Jet Pro- pulsion Lab., Pasadena, Calif., JPL Tech. Rep. 32-1526, vol. 11, pp. [58] J. K. Omura, “On the Viterbi decoding algorithm,” ZEEE Trans. 75-79, Jan. 1971. Inform. T h m y , vol. IT-15, pp. 177-179, Jan. 1969.[52] J.K.Omura, “On the Viterbi algorithm forsourcecoding”(Ab- [59] J. M. Wozencraft and I. M. Jacobs, Principres of Cmnunicaliorr stract), in1972 ZEEE Znt. Symp. Information Theory (Pacific Grove, Engineering. New York: Wiley, 1965, ch. 4. Calif., Jan. 1972), p. 21. [a] K. Omura, ‘Optimal receiver design for convolutional codes and J.[53] F. P. Preparata and S. R. Ray, “An approach to artificial nonsym- channelswithmemory via control theoreticalconcepts,” Inform. bolic cognition,”Inform. Sci.,vol. 4, pp. 65-86, Jan. 1972. Sci., vol. 3, pp. 24S-266, July 1971.