Like this document? Why not share!

# Toeplitz (2)

## on Dec 01, 2012

• 534 views

matrix

matrix

### Views

Total Views
534
Views on SlideShare
534
Embed Views
0

Likes
0
Downloads
1
Comments
0

No embeds

### Upload Details

Uploaded via as Adobe PDF

### Usage Rights

© All Rights Reserved

### Report content

• Comment goes here.
Are you sure you want to
Your message goes here
Edit your comment

## Toeplitz (2)Document Transcript

• Toeplitz and Circulant Matrices: A review
• Toeplitz and Circulant Matrices: A review Robert M. Gray Deptartment of Electrical Engineering Stanford University Stanford 94305, USA rmgray@stanford.edu
• ContentsChapter 1 Introduction 11.1 Toeplitz and Circulant Matrices 11.2 Examples 51.3 Goals and Prerequisites 9Chapter 2 The Asymptotic Behavior of Matrices 112.1 Eigenvalues 112.2 Matrix Norms 142.3 Asymptotically Equivalent Sequences of Matrices 172.4 Asymptotically Absolutely Equal Distributions 24Chapter 3 Circulant Matrices 313.1 Eigenvalues and Eigenvectors 323.2 Matrix Operations on Circulant Matrices 34Chapter 4 Toeplitz Matrices 37 v
• vi CONTENTS4.1 Sequences of Toeplitz Matrices 374.2 Bounds on Eigenvalues of Toeplitz Matrices 414.3 Banded Toeplitz Matrices 434.4 Wiener Class Toeplitz Matrices 48Chapter 5 Matrix Operations on Toeplitz Matrices 615.1 Inverses of Toeplitz Matrices 625.2 Products of Toeplitz Matrices 675.3 Toeplitz Determinants 70Chapter 6 Applications to Stochastic Time Series 736.1 Moving Average Processes 746.2 Autoregressive Processes 776.3 Factorization 80Acknowledgements 83References 85
• Abstract   t0 t−1 t−2 · · · t−(n−1)  t1 t0 t−1      t2 . .   t1 t0 .    . ..  .  . .  tn−1 ··· t0 The fundamental theorems on the asymptotic behavior of eigenval-ues, inverses, and products of banded Toeplitz matrices and Toeplitzmatrices with absolutely summable elements are derived in a tutorialmanner. Mathematical elegance and generality are sacriﬁced for con-ceptual simplicity and insight in the hope of making these results avail-able to engineers lacking either the background or endurance to attackthe mathematical literature on the subject. By limiting the generalityof the matrices considered, the essential ideas and results can be con-veyed in a more intuitive manner without the mathematical machineryrequired for the most general cases. As an application the results areapplied to the study of the covariance matrices and their factors oflinear models of discrete time random processes. vii
• 1 Introduction1.1 Toeplitz and Circulant MatricesA Toeplitz matrix is an n × n matrix Tn = [tk,j ; k, j = 0, 1, . . . , n − 1]where tk,j = tk−j , i.e., a matrix of the form   t0 t−1 t−2 · · · t−(n−1)  t1 t0 t−1     . .  Tn =  t2  t1 t0 . .  (1.1)  . ..  .  . .  tn−1 ··· t0Such matrices arise in many applications. For example, suppose that   x0  x1  x = (x0 , x1 , . . . , xn−1 )′ =    . .   .  xn−1 1
• 2 Introductionis a column vector (the prime denotes transpose) denoting an “input”and that tk is zero for k < 0. Then the vector   t0 0 0 · · · 0  x0   t1 t0 0 x1      .  x  .  y = Tn x =  t2 t1 t0 .  2      .  .   . ..  .  . . .  tn−1 · · · t0 xn−1   x0 t0  t x +t x   1 0 0 1  2 i=0 t2−i xi   =    . .  .     n−1 i=0 tn−1−i xiwith entries k yk = tk−i xi i=0represents the the output of the discrete time causal time-invariant ﬁlterh with “impulse response” tk . Equivalently, this is a matrix and vectorformulation of a discrete-time convolution of a discrete time input witha discrete time ﬁlter. As another example, suppose that {Xn } is a discrete time ran-dom process with mean function given by the expectations mk =E(Xk ) and covariance function given by the expectations KX (k, j) =E[(Xk − mk )(Xj − mj )]. Signal processing theory such as predic-tion, estimation, detection, classiﬁcation, regression, and communca-tions and information theory are most thoroughly developed underthe assumption that the mean is constant and that the covarianceis Toeplitz, i.e., KX (k, j) = KX (k − j), in which case the processis said to be weakly stationary. (The terms “covariance stationary”and “second order stationary” also are used when the covariance isassumed to be Toeplitz.) In this case the n × n covariance matricesKn = [KX (k, j); k, j = 0, 1, . . . , n − 1] are Toeplitz matrices. Muchof the theory of weakly stationary processes involves applications of
• 1.1. Toeplitz and Circulant Matrices 3Toeplitz matrices. Toeplitz matrices also arise in solutions to diﬀeren-tial and integral equations, spline functions, and problems and methodsin physics, mathematics, statistics, and signal processing. A common special case of Toeplitz matrices — which will resultin signiﬁcant simpliﬁcation and play a fundamental role in developingmore general results — results when every row of the matrix is a rightcyclic shift of the row above it so that tk = t−(n−k) = tk−n for k =1, 2, . . . , n − 1. In this case the picture becomes   t0 t−1 t−2 · · · t−(n−1)  t−(n−1) t0 t−1     . .  Cn =  t−(n−2) t−(n−1) t0 . .  (1.2)  . . ..   . .  t−1 t−2 ··· t0A matrix of this form is called a circulant matrix. Circulant matricesarise, for example, in applications involving the discrete Fourier trans-form (DFT) and the study of cyclic codes for error correction. A great deal is known about the behavior of Toeplitz matrices— the most common and complete references being Grenander andSzeg¨ [16] and Widom [33]. A more recent text devoted to the subject ois B¨ttcher and Silbermann [5]. Unfortunately, however, the necessary olevel of mathematical sophistication for understanding reference [16]is frequently beyond that of one species of applied mathematician forwhom the theory can be quite useful but is relatively little understood.This caste consists of engineers doing relatively mathematical (for anengineering background) work in any of the areas mentioned. This ap-parent dilemma provides the motivation for attempting a tutorial intro-duction on Toeplitz matrices that proves the essential theorems usingthe simplest possible and most intuitive mathematics. Some simple andfundamental methods that are deeply buried (at least to the untrainedmathematician) in [16] are here made explicit. The most famous and arguably the most important result describingToeplitz matrices is Szeg¨’s theorem for sequences of Toeplitz matrices o{Tn } which deals with the behavior of the eigenvalues as n goes toinﬁnity. A complex scalar α is an eigenvalue of a matrix A if there is a
• 4 Introductionnonzero vector x such that Ax = αx, (1.3)in which case we say that x is a (right) eigenvector of A. If A is Hermi-tian, that is, if A∗ = A, where the asterisk denotes conjugate transpose,then the eigenvalues of the matrix are real and hence α∗ = α, wherethe asterisk denotes the conjugate in the case of a complex scalar.When this is the case we assume that the eigenvalues {αi } are orderedin a nondecreasing manner so that α0 ≥ α1 ≥ α2 · · · . This eases theapproximation of sums by integrals and entails no loss of generality.Szeg¨’s theorem deals with the asymptotic behavior of the eigenvalues o{τn,i ; i = 0, 1, . . . , n − 1} of a sequence of Hermitian Toeplitz matricesTn = [tk−j ; k, j = 0, 1, 2, . . . , n − 1]. The theorem requires that severaltechnical conditions be satisﬁed, including the existence of the Fourierseries with coeﬃcients tk related to each other by ∞ f (λ) = tk eikλ ; λ ∈ [0, 2π] (1.4) k=−∞ 2π 1 tk = f (λ)e−ikλ dλ. (1.5) 2π 0Thus the sequence {tk } determines the function f and vice versa, hencethe sequence of matrices is often denoted as Tn (f ). If Tn (f ) is Hermi-tian, that is, if Tn (f )∗ = Tn (f ), then t−k = t∗ and f is real-valued. k Under suitable assumptions the Szeg¨ theorem states that o n−1 2π 1 1 lim F (τn,k ) = F (f (λ)) dλ (1.6) n→∞ n 2π 0 k=0for any function F that is continuous on the range of f . Thus, forexample, choosing F (x) = x results in n−1 2π 1 1 lim τn,k = f (λ) dλ, (1.7) n→∞ n 2π 0 k=0so that the arithmetic mean of the eigenvalues of Tn (f ) converges tothe integral of f . The trace Tr(A) of a matrix A is the sum of its
• 1.2. Examples 5diagonal elements, which in turn from linear algebra is the sum of theeigenvalues of A if the matrix A is Hermitian. Thus (1.7) implies that 2π 1 1 lim Tr(Tn (f )) = f (λ) dλ. (1.8) n→∞ n 2π 0Similarly, for any power s n−1 2π 1 s 1 lim τn,k = f (λ)s dλ. (1.9) n→∞ n 2π 0 k=0 If f is real and such that the eigenvalues τn,k ≥ m > 0 for all n, k,then F (x) = ln x is a continuous function on [m, ∞) and the Szeg¨ otheorem can be applied to show that n−1 2π 1 1 lim ln τn,i = ln f (λ) dλ. (1.10) n→∞ n 2π 0 i=0From linear algebra, however, the determinant of a matrix Tn (f ) isgiven by the product of its eigenvalues, n−1 det(Tn (f )) = τn,i , i=0so that (1.10) becomes n−1 1 lim ln det(Tn (f ))1/n = lim ln τn,i n→∞ n→∞ n i=0 2π 1 = ln f (λ) dλ. (1.11) 2π 0As we shall later see, if f has a lower bound m > 0, than indeed all theeigenvalues will share the lower bound and the above derivation applies.Determinants of Toeplitz matrices are called Toeplitz determinants and(1.11) describes their limiting behavior.1.2 ExamplesA few examples from statistical signal processing and information the-ory illustrate the the application of the theorem. These are described
• 6 Introductionwith a minimum of background in order to highlight how the asymp-totic eigenvalue distribution theorem allows one to evaluate results forprocesses using results from ﬁnite-dimensional vectors.The diﬀerential entropy rate of a Gaussian processSuppose that {Xn ; n = 0, 1, . . .} is a random process described byprobability density functions fX n (xn ) for the random vectors X n =(X0 , X1 , . . . , Xn−1 ) deﬁned for all n = 0, 1, 2, . . .. The Shannon diﬀer-ential entropy h(X n ) is deﬁned by the integral h(X n ) = − fX n (xn ) ln fX n (xn ) dxnand the diﬀerential entropy rate of the random process is deﬁned bythe limit 1 h(X) = lim h(X n ) n→∞ nif the limit exists. (See, for example, Cover and Thomas[7].) A stationary zero mean Gaussian random process is completely de-scribed by its mean correlation function rk,j = rk−j = E[Xk Xj ] or,equivalently, by its power spectral density function f , the Fourier trans-form of the covariance function: ∞ f (λ) = rn einλ , n=−∞ 2π 1 rk = f (λ)e−iλk dλ 2π 0For a ﬁxed positive integer n, the probability density function is 1 n′ −1 n e− 2 x Rn x fX n (xn ) = , (2π)n/2 det(Rn )1/2where Rn is the n × n covariance matrix with entries rk−j . A straight-forward multidimensional integration using the properties of Gaussianrandom vectors yields the diﬀerential entropy 1 h(X n ) = ln(2πe)n detRn . 2
• 1.2. Examples 7The problem at hand is to evaluate the entropy rate 1 1 1 h(X) = lim h(X n ) = ln(2πe) + lim ln det(Rn ). n→∞ n 2 n→∞ nThe matrix Rn is the Toeplitz matrix Tn generated by the power spec-tral density f and det(Rn ) is a Toeplitz determinant and we have im-mediately from (1.11) that 2π 1 1 h(X) = log 2πe ln f (λ) dλ . (1.12) 2 2π 0This is a typical use of (1.6) to evaluate the limit of a sequence of ﬁnite-dimensional qualities, in this case speciﬁed by the determinants of of asequence of Toeplitz matrices.The Shannon rate-distortion function of a Gaussian processAs a another example of the application of (1.6), consider the eval-uation of the rate-distortion function of Shannon information theoryfor a stationary discrete time Gaussian random process with 0 mean,covariance KX (k, j) = tk−j , and power spectral density f (λ) given by(1.4). The rate-distortion function characterizes the optimal tradeoﬀ ofdistortion and bit rate in data compression or source coding systems.The derivation details can be found, e.g., in Berger [3], Section 4.5,but the point here is simply to provide an example of an application of(1.6). The result is found by solving an n-dimensional optimization interms of the eigenvalues τn,k of Tn (f ) and then taking limits to obtainparametric expressions for distortion and rate: n−1 1 Dθ = lim min(θ, τn,k ) n→∞ n k=0 n−1 1 1 τn,k Rθ = lim max(0, ln ). n→∞ n 2 θ k=0
• 8 IntroductionThe theorem can be applied to turn this limiting sum involving eigen-values into an integral involving the power spectral density: 2π Dθ = min(θ, f (λ)) dλ 0 2π 1 f (λ) Rθ = max 0, ln dλ. 0 2 θAgain an inﬁnite dimensional problem is solved by ﬁrst solving a ﬁnitedimensional problem involving the eigenvalues of matrices, and thenusing the asymptotic eigenvalue theorem to ﬁnd an integral expressionfor the limiting result.One-step prediction errorAnother application with a similar development is the one-step predic-tion error problem. Suppose that Xn is a weakly stationary randomprocess with covariance tk−j . A classic problem in estimation theory isto ﬁnd the best linear predictor based on the previous n values of Xi ,i = 0, 1, 2, . . . , n − 1, n ˆ Xn = ai Xn−i , i=1 ˆin the sense of minimizing the mean squared error E[(Xn −Xn )2 ] over allchoices of coeﬃcients ai . It is well known (see, e.g., [14]) that the min-imum is given by the ratio of Toeplitz determinants det Tn+1 / det Tn .The question is to what this ratio converges in the limit as n goes to∞. This is not quite in a form suitable for application of the theorem, 1/nbut we have already evaluated the limit of detTn in (1.11) and forlarge n we have that 2π 1 (det Tn )1/n ≈ exp ln f (λ) dλ ≈ (det Tn+1 )1/(n+1) 2π 0and hence in particular that (det Tn+1 )1/(n+1) ≈ (det Tn )1/nso that 2π det Tn+1 1 ≈ (det Tn )1/n → exp ln f (λ) dλ , det Tn 2π 0
• 1.3. Goals and Prerequisites 9providing the desired limit. These arguments can be made exact, butit is hoped they make the point that the asymptotic eigenvalue distri-bution theorem for Hermitian Toeplitz matrices can be quite useful forevaluating limits of solutions to ﬁnite-dimensional problems.Further examplesThe Toeplitz distribution theorems have also found application in morecomplicated information theoretic evaluations, including the channelcapacity of Gaussian channels [30, 29] and the rate-distortion functionsof autoregressive sources [11]. The examples described here were chosenbecause they were in the author’s area of competence, but similar appli- TMcations crop up in a variety of areas. A Google search using the titleof this document shows diverse applications of the eigenvalue distribu-tion theorem and related results, including such areas of coding, spec-tral estimation, watermarking, harmonic analysis, speech enhancement,interference cancellation, image restoration, sensor networks for detec-tion, adaptive ﬁltering, graphical models, noise reduction, and blindequalization.1.3 Goals and PrerequisitesThe primary goal of this work is to prove a special case of Szeg¨’s oasymptotic eigenvalue distribution theorem in Theorem 4.2. The as-sumptions used here are less general than Szeg¨’s, but this permits omore straightforward proofs which require far less mathematical back-ground. In addition to the fundamental theorems, several related re-sults that naturally follow but do not appear to be collected togetheranywhere are presented. We do not attempt to survey the ﬁelds of ap-plications of these results, as such a survey would be far beyond theauthor’s stamina and competence. A few applications are noted by wayof examples. The essential prerequisites are a knowledge of matrix theory, an en-gineer’s knowledge of Fourier series and random processes, and calculus(Riemann integration). A ﬁrst course in analysis would be helpful, but itis not assumed. Several of the occasional results required of analysis are
• 10 Introductionusually contained in one or more courses in the usual engineering cur-riculum, e.g., the Cauchy-Schwarz and triangle inequalities. Hopefullythe only unfamiliar results are a corollary to the Courant-Fischer the-orem and the Weierstrass approximation theorem. The latter is an in-tuitive result which is easily believed even if not formally proved. Moreadvanced results from Lebesgue integration, measure theory, functionalanalysis, and harmonic analysis are not used. Our approach is to relate the properties of Toeplitz matrices to thoseof their simpler, more structured special case — the circulant or cyclicmatrix. These two matrices are shown to be asymptotically equivalentin a certain sense and this is shown to imply that eigenvalues, inverses,products, and determinants behave similarly. This approach providesa simpliﬁed and direct path to the basic eigenvalue distribution andrelated theorems. This method is implicit but not immediately appar-ent in the more complicated and more general results of Grenander inChapter 7 of [16]. The basic results for the special case of a bandedToeplitz matrix appeared in [12], a tutorial treatment of the simplestcase which was in turn based on the ﬁrst draft of this work. The re-sults were subsequently generalized using essentially the same simplemethods, but they remain less general than those of [16]. As an application several of the results are applied to study certainmodels of discrete time random processes. Two common linear modelsare studied and some intuitively satisfying results on covariance matri-ces and their factors are given. We sacriﬁce mathematical elegance and generality for conceptualsimplicity in the hope that this will bring an understanding of theinteresting and useful properties of Toeplitz matrices to a wider audi-ence, speciﬁcally to those who have lacked either the background or thepatience to tackle the mathematical literature on the subject.
• 2 The Asymptotic Behavior of MatricesWe begin with relevant deﬁnitions and a prerequisite theorem and pro-ceed to a discussion of the asymptotic eigenvalue, product, and inversebehavior of sequences of matrices. The major use of the theorems of thischapter is to relate the asymptotic behavior of a sequence of compli-cated matrices to that of a simpler asymptotically equivalent sequenceof matrices.2.1 EigenvaluesAny complex matrix A can be written as A = U RU ∗ , (2.1)where the asterisk ∗ denotes conjugate transpose, U is unitary, i.e.,U −1 = U ∗ , and R = {rk,j } is an upper triangular matrix ([18], p.79). The eigenvalues of A are the principal diagonal elements of R. IfA is normal, i.e., if A∗ A = AA∗ , then R is a diagonal matrix, whichwe denote as R = diag(αk ; k = 0, 1, . . . , n − 1) or, more simply, R =diag(αk ). If A is Hermitian, then it is also normal and its eigenvaluesare real. A matrix A is nonnegative deﬁnite if x∗ Ax ≥ 0 for all nonzero vec- 11
• 12 The Asymptotic Behavior of Matricestors x. The matrix is positive deﬁnite if the inequality is strict for allnonzero vectors x. (Some books refer to these properties as positivedeﬁnite and strictly positive deﬁnite, respectively.) If a Hermitian ma-trix is nonnegative deﬁnite, then its eigenvalues are all nonnegative. Ifthe matrix is positive deﬁnite, then the eigenvalues are all (strictly)positive. The extreme values of the eigenvalues of a Hermitian matrix H canbe characterized in terms of the Rayleigh quotient RH (x) of the matrixand a complex-valued vector x deﬁned by RH (x) = (x∗ Hx)/(x∗ x). (2.2)As the result is both important and simple to prove, we state and proveit formally. The result will be useful in specifying the interval containingthe eigenvalues of a Hermitian matrix. Usually in books on matrix theory it is proved as a corollary tothe variational description of eigenvalues given by the Courant-Fischertheorem (see, e.g., [18], p. 116, for the case of real symmetric matrices),but the following result is easily demonstrated directly.Lemma 2.1. Given a Hermitian matrix H, let ηM and ηm be themaximum and minimum eigenvalues of H, respectively. Then ηm = min RH (x) = min z ∗ Hz (2.3) x ∗ z:z z=1 ηM = max RH (x) = max z ∗ Hz. (2.4) x ∗ z:z z=1Proof. Suppose that em and eM are eigenvectors corresponding to theminimum and maximum eigenvalues ηm and ηM , respectively. ThenRH (em ) = ηm and RH (eM ) = ηM and therefore ηm ≥ min RH (x) (2.5) x ηM ≤ max RH (x). (2.6) xSince H is Hermitian we can write H = U AU ∗ , where U is unitary and
• 2.1. Eigenvalues 13A is the diagonal matrix of the eigenvalues ηk , and therefore x∗ Hx x∗ U AU ∗ x = x∗ x x∗ x n 2 y ∗ Ay k=1 |yk | ηk = = n , y∗ y k=1 |yk | 2where y = U ∗ x and we have taken advantage of the fact that U isunitary so that x∗ x = y ∗ y. But for all vectors y, this ratio is boundbelow by ηm and above by ηM and hence for all vectors x ηm ≤ RH (x) ≤ ηM (2.7)which with (2.5–2.6) completes the proof of the left-hand equalities ofthe lemma. The right-hand equalities are easily seen to hold since if xminimizes (maximizes) the Rayleigh quotient, then the normalized vec-tor x/x∗ x satisﬁes the constraint of the minimization (maximization)to the right, hence the minimum (maximum) of the Rayleigh quotionmust be bigger (smaller) than the constrained minimum (maximum)to the right. Conversely, if x achieves the rightmost optimization, thenthe same x yields a Rayleigh quotient of the the same optimum value.2 The following lemma is useful when studying non-Hermitian ma-trices and products of Hermitian matrices. First note that if A is anarbitrary complex matrix, then the matrix A∗ A is both Hermitian andnonnegative deﬁnite. It is Hermitian because (A∗ A)∗ = A∗ A and it isnonnegative deﬁnite since if for any complex vector x we deﬁne thecomplex vector y = Ax, then n x∗ (A∗ A)x = y ∗ y = |yk |2 ≥ 0. k=1Lemma 2.2. Let A be a matrix with eigenvalues αk . Deﬁne the eigen-values of the Hermitian nonnegative deﬁnite matrix A∗ A to be λk ≥ 0.Then n−1 n−1 λk ≥ |αk |2 , (2.8) k=0 k=0with equality iﬀ (if and only if) A is normal.
• 14 The Asymptotic Behavior of MatricesProof. The trace of a matrix is the sum of the diagonal elements of amatrix. The trace is invariant to unitary operations so that it also isequal to the sum of the eigenvalues of a matrix, i.e., n−1 n−1 Tr{A∗ A} = (A∗ A)k,k = λk . (2.9) k=0 k=0From (2.1), A = U RU ∗ and hence n−1 n−1 Tr{A∗ A} = Tr{R∗ R} = |rj,k |2 k=0 j=0 n−1 = |αk |2 + |rj,k |2 k=0 k=j n−1 ≥ |αk |2 (2.10) k=0Equation (2.10) will hold with equality iﬀ R is diagonal and hence iﬀA is normal. 2 Lemma 2.2 is a direct consequence of Shur’s theorem ([18], pp. 229-231) and is also proved in [16], p. 106.2.2 Matrix NormsTo study the asymptotic equivalence of matrices we require a metricon the space of linear space of matrices. A convenient metric for ourpurposes is a norm of the diﬀerence of two matrices. A norm N (A) onthe space of n × n matrices satisﬁes the following properties: (1) N (A) ≥ 0 with equality if and only if A = 0, is the all zero matrix. (2) For any two matrices A and B, N (A + B) ≤ N (A) + N (B). (2.11) (3) For any scalar c and matrix A, N (cA) = |c|N (A).
• 2.2. Matrix Norms 15The triangle inequality in (2.11) will be used often as is the followingdirect consequence: N (A − B) ≥ |N (A) − N (B)|. (2.12) Two norms — the operator or strong norm and the Hilbert-Schmidtor weak norm (also called the Frobenius norm or Euclidean norm whenthe scaling term is removed) — will be used here ([16], pp. 102–103). Let A be a matrix with eigenvalues αk and let λk ≥ 0 be the eigen-values of the Hermitian nonnegative deﬁnite matrix A∗ A. The strongnorm A is deﬁned by A = max RA∗ A (x)1/2 = max [z ∗ A∗ Az]1/2 . (2.13) x ∗ z:z z=1From Lemma 2.1 2 ∆ A = max λk = λM . (2.14) kThe strong norm of A can be bound below by letting eM be the normal-ized eigenvector of A corresponding to αM , the eigenvalue of A havinglargest absolute value: 2 A = max z ∗ A∗ Az ≥ (e∗ A∗ )(AeM ) = |αM |2 . M (2.15) z:z z=1 ∗If A is itself Hermitian, then its eigenvalues αk are real and the eigen-values λk of A∗ A are simply λk = α2 . This follows since if e(k) is an keigenvector of A with eigenvalue αk , then A∗ Ae(k) = αk A∗ e(k) = αk e(k) . 2Thus, in particular, if A is Hermitian then A = max |αk | = |αM |. (2.16) k The weak norm (or Hilbert-Schmidt norm) of an n × n matrixA = [ak,j ] is deﬁned by  1/2 n−1 n−1 1 |A| =  |ak,j |2  n k=0 j=0 n−1 1/2 1 1 = ( Tr[A∗ A])1/2 = λk . (2.17) n n k=0
• 16 The Asymptotic Behavior of Matrices √The quantity n|A| is sometimes called the Frobenius norm or Eu-clidean norm. From Lemma 2.2 we have n−1 1 |A|2 ≥ |αk |2 , with equality iﬀ A is normal. (2.18) n k=0 The Hilbert-Schmidt norm is the “weaker” of the two norms since n−1 2 1 A = max λk ≥ λk = |A|2 . (2.19) k n k=0 A matrix is said to be bounded if it is bounded in both norms. The weak norm is usually the most useful and easiest to handle ofthe two, but the strong norm provides a useful bound for the productof two matrices as shown in the next lemma.Lemma 2.3. Given two n × n matrices G = {gk,j } and H = {hk,j },then |GH| ≤ G |H|. (2.20)Proof. Expanding terms yields 1 |GH|2 = | gi,k hk,j |2 n i j k 1 ∗ ∗ = gi,k gi,m hk,j hm,j n m i j k 1 = h∗ G∗ Ghj , j (2.21) n jwhere hj is the j th column of H. From (2.13), h∗ G∗ Ghj j 2 ≤ G h∗ hj jand therefore 1 |GH|2 ≤ G 2 h∗ hj = G j 2 |H|2 . n j 2 Lemma 2.3 is the matrix equivalent of (7.3a) of ([16], p. 103). Notethat the lemma does not require that G or H be Hermitian.
• 2.3. Asymptotically Equivalent Sequences of Matrices 172.3 Asymptotically Equivalent Sequences of MatricesWe will be considering sequences of n × n matrices that approximateeach other as n becomes large. As might be expected, we will use theweak norm of the diﬀerence of two matrices as a measure of the “dis-tance” between them. Two sequences of n × n matrices {An } and {Bn }are said to be asymptotically equivalent if (1) An and Bn are uniformly bounded in strong (and hence in weak) norm: An , Bn ≤ M < ∞, n = 1, 2, . . . (2.22) and (2) An − Bn = Dn goes to zero in weak norm as n → ∞: lim |An − Bn | = lim |Dn | = 0. n→∞ n→∞Asymptotic equivalence of the sequences {An } and {Bn } will be ab-breviated An ∼ Bn . We can immediately prove several properties of asymptotic equiva-lence which are collected in the following theorem.Theorem 2.1. Let {An } and {Bn } be sequences of matrices witheigenvalues {αn , i} and {βn , i}, respectively. (1) If An ∼ Bn , then lim |An | = lim |Bn |. (2.23) n→∞ n→∞ (2) If An ∼ Bn and Bn ∼ Cn , then An ∼ Cn . (3) If An ∼ Bn and Cn ∼ Dn , then An Cn ∼ Bn Dn . (4) If An ∼ Bn and A−1 , Bn n −1 ≤ K < ∞, all n, then An−1 ∼ B −1 . n (5) If An Bn ∼ Cn and A−1 ≤ K < ∞, then Bn ∼ A−1 Cn . n n (6) If An ∼ Bn , then there are ﬁnite constants m and M such that m ≤ αn,k , βn,k ≤ M , n = 1, 2, . . . k = 0, 1, . . . , n − 1. (2.24)
• 18 The Asymptotic Behavior of MatricesProof. (1) Eq. (2.23) follows directly from (2.12). −→ (2) |An −Cn | = |An −Bn +Bn −Cn | ≤ |An −Bn |+|Bn −Cn | n→∞ 0 (3) Applying Lemma 2.3 yields |An Cn − Bn Dn | = |An Cn − An Dn + An Dn − Bn Dn | ≤ An |Cn − Dn |+ Dn |An − Bn | −→ n→∞ 0. (4) n −1 |A−1 − Bn | = |Bn Bn A−1 − Bn An A−1 | −1 n −1 n −1 ≤ Bn · A−1 n ·|Bn − An | −→ n→∞ 0. (5) Bn − A−1 Cn n = A−1 An Bn − A−1 Cn n n ≤ A−1 n |An Bn − Cn | −→ n→∞ 0. (6) If An ∼ Bn then they are uniformly bounded in strong norm by some ﬁnite number M and hence from (2.15), |αn,k | ≤ M and |βn,k | ≤ M and hence −M ≤ αn,k , βn,k ≤ M . So the result holds for m = −M and it may hold for larger m, e.g., m = 0 if the matrices are all nonnegative deﬁnite. 2 The above results will be useful in several of the later proofs. Asymp-totic equality of matrices will be shown to imply that eigenvalues, prod-ucts, and inverses behave similarly. The following lemma provides aprelude of the type of result obtainable for eigenvalues and will itselfserve as the essential part of the more general results to follow. It showsthat if the weak norm of the diﬀerence of the two matrices is small, thenthe sums of the eigenvalues of each must be close.
• 2.3. Asymptotically Equivalent Sequences of Matrices 19Lemma 2.4. Given two matrices A and B with eigenvalues {αk } and{βk }, respectively, then n−1 n−1 1 1 | αk − βk | ≤ |A − B|. n n k=0 k=0Proof: Deﬁne the diﬀerence matrix D = A − B = {dk,j } so that n−1 n−1 αk − βk = Tr(A) − Tr(B) k=0 k=0 = Tr(D).Applying the Cauchy-Schwarz inequality (see, e.g., [22], p. 17) to Tr(D)yields n−1 2 n−1 2 |Tr(D)| = dk,k ≤n |dk,k |2 k=0 k=0 n−1 n−1 ≤ n |dk,j |2 = n2 |D|2 . (2.25) k=0 j=0Taking the square root and dividing by n proves the lemma. 2 An immediate consequence of the lemma is the following corollary.Corollary 2.1. Given two sequences of asymptotically equivalent ma-trices {An } and {Bn } with eigenvalues {αn,k } and {βn,k }, respectively,then n−1 1 lim (αn,k − βn,k ) = 0, (2.26) n→∞ n k=0and hence if either limit exists individually, n−1 n−1 1 1 lim αn,k = lim βn,k . (2.27) n→∞ n n→∞ n k=0 k=0Proof. Let Dn = {dk,j } = An − Bn . Eq. (2.27) is equivalent to 1 lim Tr(Dn ) = 0. (2.28) n→∞ n
• 20 The Asymptotic Behavior of MatricesDividing by n2 , and taking the limit, results in 1 −→ 0 ≤ | Tr(Dn )|2 ≤ |Dn |2 n→∞ 0 (2.29) nfrom the lemma, which implies (2.28) and hence (2.27). 2 The previous corollary can be interpreted as saying the sample orarithmetic means of the eigenvalues of two matrices are asymptoticallyequal if the matrices are asymptotically equivalent. It is easy to seethat if the matrices are Hermitian, a similar result holds for the meansof the squared eigenvalues. From (2.12) and (2.18), |Dn | ≥ | |An | − |Bn | | n−1 n−1 1 1 = α2 n,k − 2 βn,k n n k=0 k=0 −→ n→∞ 0 −→if |Dn | n→∞ 0, yielding the following corollary.Corollary 2.2. Given two sequences of asymptotically equivalent Her-mitian matrices {An } and {Bn } with eigenvalues {αn,k } and {βn,k },respectively, then n−1 1 lim n,k 2 (α2 − βn,k ) = 0, (2.30) n→∞ n k=0and hence if either limit exists individually, n−1 n−1 1 1 lim α2 = lim n,k 2 βn,k . (2.31) n→∞ n n→∞ n k=0 k=0 Both corollaries relate limiting sample (arithmetic) averages ofeigenvalues or moments of an eigenvalue distribution rather than in-dividual eigenvalues. Equations (2.27) and (2.31) are special cases ofthe following fundamental theorem of asymptotic eigenvalue distribu-tion.
• 2.3. Asymptotically Equivalent Sequences of Matrices 21Theorem 2.2. Let {An } and {Bn } be asymptotically equivalent se-quences of matrices with eigenvalues {αn,k } and {βn,k }, respectively. sThen for any positive integer s the sequences of matrices {An } and s } are also asymptotically equivalent,{Bn n−1 1 lim n,k s (αs − βn,k ) = 0, (2.32) n→∞ n k=0and hence if either separate limit exists, n−1 n−1 1 1 lim αs = lim n,k s βn,k . (2.33) n→∞ n n→∞ n k=0 k=0Proof. Let An = Bn + Dn as in the proof of Corollary 2.1 and consider s ∆As − Bn = ∆n . Since the eigenvalues of As are αs , (2.32) can be n n n,kwritten in terms of ∆n as 1 lim Tr(∆n ) = 0. (2.34) n→∞ nThe matrix ∆n is a sum of several terms each being a product of Dn ’sand Bn ’s, but containing at least one Dn (to see this use the binomialtheorem applied to matrices to expand As ). Repeated application of nLemma 2.3 thus gives −→ |∆n | ≤ K|Dn | n→∞ 0, (2.35)where K does not depend on n. Equation (2.35) allows us to applyCorollary 2.1 to the matrices As and Dn to obtain (2.34) and hence n s(2.32). 2 Theorem 2.2 is the fundamental theorem concerning asymptoticeigenvalue behavior of asymptotically equivalent sequences of matri-ces. Most of the succeeding results on eigenvalues will be applicationsor specializations of (2.33). Since (2.33) holds for any positive integer s we can add sums corre-sponding to diﬀerent values of s to each side of (2.33). This observationleads to the following corollary.
• 22 The Asymptotic Behavior of MatricesCorollary 2.3. Suppose that {An } and {Bn } are asymptoticallyequivalent sequences of matrices with eigenvalues {αn,k } and {βn,k },respectively, and let f (x) be any polynomial. Then n−1 1 lim (f (αn,k ) − f (βn,k )) = 0 (2.36) n→∞ n k=0and hence if either limit exists separately, n−1 n−1 1 1 lim f (αn,k ) = lim f (βn,k ) . (2.37) n→∞ n n→∞ n k=0 k=0Proof. Suppose that f (x) = m as xs . Then summing (2.32) over s s=0yields (2.36). If either of the two limits exists, then (2.36) implies thatboth exist and that they are equal. 2 Corollary 2.3 can be used to show that (2.37) can hold for any ana-lytic function f (x) since such functions can be expanded into complexTaylor series, which can be viewed as polynomials with a possibly in-ﬁnite number of terms. Some eﬀort is needed, however, to justify theinterchange of limits, which can be accomplished if the Taylor seriesconverges uniformly. If An and Bn are Hermitian, however, then a muchstronger result is possible. In this case the eigenvalues of both matricesare real and we can invoke the Weierstrass approximation theorem ([6],p. 66) to immediately generalize Corollary 2.3. This theorem, our onereal excursion into analysis, is stated below for reference.Theorem 2.3. (Weierstrass) If F (x) is a continuous complex functionon [a, b], there exists a sequence of polynomials pn (x) such that lim pn (x) = F (x) n→∞uniformly on [a, b]. Stated simply, any continuous function deﬁned on a real intervalcan be approximated arbitrarily closely and uniformly by a polynomial.Applying Theorem 2.3 to Corollary 2.3 immediately yields the followingtheorem:
• 2.3. Asymptotically Equivalent Sequences of Matrices 23Theorem 2.4. Let {An } and {Bn } be asymptotically equivalent se-quences of Hermitian matrices with eigenvalues {αn,k } and {βn,k }, re-spectively. From Theorem 2.1 there exist ﬁnite numbers m and M suchthat m ≤ αn,k , βn,k ≤ M , n = 1, 2, . . . k = 0, 1, . . . , n − 1. (2.38)Let F (x) be an arbitrary function continuous on [m, M ]. Then n−1 1 lim (F (αn,k ) − F (βn,k )) = 0, (2.39) n→∞ n k=0and hence if either of the limits exists separately, n−1 n−1 1 1 lim F (αn,k ) = lim F (βn,k ) (2.40) n→∞ n n→∞ n k=0 k=0 Theorem 2.4 is the matrix equivalent of Theorem 7.4a of [16]. Whentwo real sequences {αn,k ; k = 0, 1, . . . , n−1} and {βn,k ; k = 0, 1, . . . , n−1} satisfy (2.38) and (2.39), they are said to be asymptotically equallydistributed ([16], p. 62, where the deﬁnition is attributed to Weyl). As an example of the use of Theorem 2.4 we prove the followingcorollary on the determinants of asymptotically equivalent sequencesof matrices.Corollary 2.4. Let {An } and {Bn } be asymptotically equivalent se-quences of Hermitian matrices with eigenvalues {αn,k } and {βn,k }, re-spectively, such that αn,k , βn,k ≥ m > 0. Then if either limit exists, lim (det An )1/n = lim (det Bn )1/n . (2.41) n→∞ n→∞Proof. From Theorem 2.4 we have for F (x) = ln x n−1 n−1 1 1 lim ln αn,k = lim ln βn,k n→∞ n n→∞ n k=0 k=0and hence n−1 n−1 1 1 lim exp ln αn,k = lim exp ln βn,k n→∞ n n→∞ n k=0 k=0
• 24 The Asymptotic Behavior of Matricesor equivalently 1 1 lim exp[ ln det An ] = lim exp[ ln det Bn ], n→∞ n n→∞ nfrom which (2.41) follows. 2 With suitable mathematical care the above corollary can be ex-tended to cases where αn,k , βn,k > 0 provided additional constraintsare imposed on the matrices. For example, if the matrices are assumedto be Toeplitz matrices, then the result holds even if the eigenvalues canget arbitrarily small but remain strictly positive. (See the discussion onp. 66 and in Section 3.1 of [16] for the required technical conditions.)The diﬃculty with allowing the eigenvalues to approach 0 is that theirlogarithms are not bounded. Furthermore, the function ln x is not con-tinuous at x = 0, so Theorem 2.4 does not apply. Nonetheless, it ispossible to say something about the asymptotic eigenvalue distributionin such cases and this issue is revisited in Theorem 5.2(d). In this section the concept of asymptotic equivalence of matrices wasdeﬁned and its implications studied. The main consequences are the be-havior of inverses and products (Theorem 2.1) and eigenvalues (Theo-rems 2.2 and 2.4). These theorems do not concern individual entries inthe matrices or individual eigenvalues, rather they describe an “aver- −1 −1age” behavior. Thus saying An ∼ Bn means that |A−1 − Bn | n→∞ 0 −1 −→ nand says nothing about convergence of individual entries in the matrix.In certain cases stronger results on a type of elementwise convergenceare possible using the stronger norm of Baxter [1, 2]. Baxter’s resultsare beyond the scope of this work.2.4 Asymptotically Absolutely Equal DistributionsIt is possible to strengthen Theorem 2.4 and some of the interim re-sults used in its derivation using reasonably elementary methods. Thekey additional idea required is the Wielandt-Hoﬀman theorem [34], aresult from matrix theory that is of independent interest. The theoremis stated and a proof following Wilkinson [35] is presented for com-pleteness. This section can be skipped by readers not interested in thestronger notion of equal eigenvalue distributions as it is not neededin the sequel. The bounds of Lemmas 2.5 and 2.5 are of interest in
• 2.4. Asymptotically Absolutely Equal Distributions 25their own right and are included as they strengthen the the traditionalbounds.Theorem 2.5. (Wielandt-Hoﬀman theorem) Given two Hermitianmatrices A and B with eigenvalues αk and βk , respectively, then n−1 1 |αk − βk |2 ≤ |A − B|2 . n k=0Proof: Since A and B are Hermitian, we can write them as A =U diag(αk )U ∗ , B = W diag(βk )W ∗ , where U and W are unitary. Sincethe weak norm is not eﬀected by multiplication by a unitary matrix, |A − B| = |U diag(αk )U ∗ − W diag(βk )W ∗ | = |diag(αk )U ∗ − U ∗ W diag(βk )W ∗ | = |diag(αk )U ∗ W − U ∗ W diag(βk )| = |diag(αk )Q − Qdiag(βk )|,where Q = U ∗ W = {qi,j } is also unitary. The (i, j) entry in the matrixdiag(αk )Q − Qdiag(βk ) is (αi − βj )qi,j and hence n−1 n−1 n−1 n−1 1 ∆ |A − B|2 = |αi − βj |2 |qi,j |2 = |αi − βj |2 pi,j (2.42) n i=0 j=0 i=0 j=0where we have deﬁned pi,j = (1/n)|qi,j |2 . Since Q is unitary, we alsohave that n−1 n−1 |qi,j |2 = |qi,j |2 = 1 (2.43) i=0 j=0or n−1 n−1 1 pi,j = pi,j = . (2.44) n i=0 j=0This can be interpreted in probability terms: pi,j = (1/n)|qi,j |2 is aprobability mass function or pmf on {0, 1, . . . , n − 1}2 with uniformmarginal probability mass functions. Recall that it is assumed that the
• 26 The Asymptotic Behavior of Matriceseigenvalues are ordered so that α0 ≥ α1 ≥ α2 ≥ · · · and β0 ≥ β1 ≥β2 ≥ · · · . We claim that for all such matrices P satisfying (2.44), the right-hand side of (2.42) is minimized by P = (1/n)I, where I is the identitymatrix, so that n−1 n−1 n−1 |αi − βj |2 pi,j ≥ |αi − βi |2 , i=0 j=0 i=0which will prove the result. To see this suppose the contrary. Let ℓbe the smallest integer in {0, 1, . . . , n − 1} such that P has a nonzeroelement oﬀ the diagonal in either row ℓ or in column ℓ. If there is anonzero element in row ℓ oﬀ the diagonal, say pℓ,a then there must alsobe a nonzero element in column ℓ oﬀ the diagonal, say pb,ℓ in order forthe constraints (2.44) to be satisﬁed. Since ℓ is the smallest such value,ℓ < a and ℓ < b. Let x be the smaller of pl,a and pb,l . Form a newmatrix P ′ by adding x to pℓ,ℓ and pb,a and subtracting x from pb,ℓ andpℓ,a . The new matrix still satisﬁes the constraints and it has a zero ineither position (b, ℓ) or (ℓ, a). Furthermore the norm of P ′ has changedfrom that of P by an amount x (αℓ − βℓ )2 + (αb − βa )2 − (αℓ − βa )2 − (αb − βℓ )2 = −x(αℓ − αb )(βℓ − βa ) ≤ 0since ℓ > b, ℓ > a, the eigenvalues are nonincreasing, and x is posi-tive. Continuing in this fashion all nonzero oﬀdiagonal elements can bezeroed out without increasing the norm, proving the result. 2 From the Cauchy-Schwarz inequality n−1 n−1 n−1 n−1 |αk − βk | ≤ (αk − βk )2 12 = n (αk − βk )2 , k=0 k=0 k=0 k=0which with the Wielandt-Hoﬀman theorem yields the followingstrengthening of Lemma 2.4, n−1 n−1 1 1 |αk − βk | ≤ (αk − βk )2 ≤ |An − Bn |, n n k=0 k=0
• 2.4. Asymptotically Absolutely Equal Distributions 27which we formalize as the following lemma.Lemma 2.5. Given two Hermitian matrices A and B with eigenvaluesαn and βn in nonincreasing order, respectively, then n−1 1 |αk − βk | ≤ |A − B|. n k=0Note in particular that the absolute values are outside the sum inLemma 2.4 and inside the sum in Lemma 2.5. As was done in theweaker case, the result can be used to prove a stronger version of The-orem 2.4. This line of reasoning, using the Wielandt-Hoﬀman theorem,was pointed out by William F. Trench who used special cases in hispaper [23]. Similar arguments have become standard for treating eigen-value distributions for Toeplitz and Hankel matrices. See, for example,[32, 9, 4]. The following theorem provides the derivation. The speciﬁcstatement result and its proof follow from a private communicationfrom William F. Trench. See also [31, 24, 25, 26, 27, 28].Theorem 2.6. Let An and Bn be asymptotically equivalent sequencesof Hermitian matrices with eigenvalues αn,k and βn,k in nonincreasingorder, respectively. From Theorem 2.1 there exist ﬁnite numbers m andM such that m ≤ αn,k , βn,k ≤ M , n = 1, 2, . . . k = 0, 1, . . . , n − 1. (2.45)Let F (x) be an arbitrary function continuous on [m, M ]. Then n−1 1 lim |F (αn,k ) − F (βn,k )| = 0. (2.46) n→∞ n k=0 The theorem strengthens the result of Theorem 2.4 because ofthe magnitude inside the sum. Following Trench [24] in this case theeigenvalues are said to be asymptotically absolutely equally distributed.Proof: From Lemma 2.5 1 |αn,k − βn,k | ≤ |An − Bn |, (2.47) n k=0
• 28 The Asymptotic Behavior of Matriceswhich implies (2.46) for the case F (r) = r. For any nonnegative integerj |αj − βn,k | ≤ j max(|m|, |M |)j−1 |αn,k − βn,k |. n,k j (2.48)By way of explanation consider a, b ∈ [m, M ]. Simple long divisionshows that j aj − bj = aj−l bl−1 a−b l=1so that aj − bj |aj − bj | | | = a−b |a − b| j = | aj−l bl−1 | l=1 j ≤ |aj−l bl−1 | l=1 j = |a|j−l |b|l−1 l=1 ≤ j max(|m|, |M |)j−1 ,which proves (2.48). This immediately implies that (2.46) holds forfunctions of the form F (r) = r j for positive integers j, which in turnmeans the result holds for any polynomial. If F is an arbitrary contin-uous function on [m, M ], then from Theorem 2.3 given ǫ > 0 there is apolynomial P such that |P (u) − F (u)| ≤ ǫ, u ∈ [m, M ].
• 2.4. Asymptotically Absolutely Equal Distributions 29Using the triangle inequality, n−11 |F (αn,k ) − F (βn,k )|n k=0 n−1 1 = |F (αn,k ) − P (αn,k ) + P (αn,k ) − P (βn,k ) + P (βn,k ) − F (βn,k )| n k=0 n−1 n−1 1 1 ≤ |F (αn,k ) − P (αn,k )| + |P (αn,k ) − P (βn,k )| n n k=0 k=0 n−1 1 + |P (βn,k ) − F (βn,k )| n k=0 n−1 1 ≤ 2ǫ + |P (αn,k ) − P (βn,k )| n k=0As n → ∞ the remaining sum goes to 0, which proves the theoremsince ǫ can be made arbitrarily small. 2
• 3 Circulant MatricesA circulant matrix C is a Toeplitz matrix having the form   c0 c1 c2 ··· cn−1  . .    cn−1 c0 c1 c2 .   .. .   C=  cn−1 c0 c1 ,  (3.1) . . .. .. .. . . .     . c2    c1  c1 ··· cn−1 c0where each row is a cyclic shift of the row above it. The structure canalso be characterized by noting that the (k, j) entry of C, Ck,j , is givenby Ck,j = c(j−k) mod n .The properties of circulant matrices are well known and easily derived([18], p. 267,[8]). Since these matrices are used both to approximate andexplain the behavior of Toeplitz matrices, it is instructive to presentone version of the relevant derivations here. 31
• 32 Circulant Matrices3.1 Eigenvalues and EigenvectorsThe eigenvalues ψk and the eigenvectors y (k) of C are the solutions of Cy = ψ y (3.2)or, equivalently, of the n diﬀerence equations m−1 n−1 cn−m+k yk + ck−m yk = ψ ym ; m = 0, 1, . . . , n − 1. (3.3) k=0 k=mChanging the summation dummy variable results in n−1−m n−1 ck yk+m + ck yk−(n−m) = ψ ym ; m = 0, 1, . . . , n − 1. (3.4) k=0 k=n−mOne can solve diﬀerence equations as one solves diﬀerential equations —by guessing an intuitive solution and then proving that it works. Sincethe equation is linear with constant coeﬃcients a reasonable guess isyk = ρk (analogous to y(t) = esτ in linear time invariant diﬀerentialequations). Substitution into (3.4) and cancellation of ρm yields n−1−m n−1 ck ρk + ρ−n ck ρk = ψ. k=0 k=n−mThus if we choose ρ−n = 1, i.e., ρ is one of the n distinct complex nthroots of unity, then we have an eigenvalue n−1 ψ= ck ρk (3.5) k=0with corresponding eigenvector ′ y = n−1/2 1, ρ, ρ2 , . . . , ρn−1 , (3.6)where the prime denotes transpose and the normalization is chosen togive the eigenvector unit energy. Choosing ρm as the complex nth rootof unity, ρm = e−2πim/n , we have eigenvalue n−1 ψm = ck e−2πimk/n (3.7) k=0
• 3.1. Eigenvalues and Eigenvectors 33and eigenvector 1 ′ y (m) = √ 1, e−2πim/n , · · · , e−2πim(n−1)/n . nThus from the deﬁnition of eigenvalues and eigenvectors, Cy (m) = ψm y (m) , m = 0, 1, . . . , n − 1. (3.8)Equation (3.7) should be familiar to those with standard engineeringbackgrounds as simply the discrete Fourier transform (DFT) of thesequence {ck }. Thus we can recover the sequence {ck } from the ψk bythe Fourier inversion formula. In particular, n−1 n−1 n−1 1 1 ψm e2πiℓm = ck e−2πimk/n e2πiℓm/n n n m=0 m=0 k=0 n−1 n−1 1 = ck e2πi(ℓ−k)m/n = cℓ , (3.9) n m=0 k=0where we have used the orthogonality of the complex exponentials: n−1 n k mod n = 0 e2πimk/n = nδk mod n = , (3.10) m=0 0 otherwisewhere δ is the Kronecker delta, 1 m=0 δm = . 0 otherwiseThus the eigenvalues of a circulant matrix comprise the DFT of theﬁrst row of the circulant matrix, and conversely ﬁrst row of a circulantmatrix is the inverse DFT of the eigenvalues. Eq. (3.8) can be written as a single matrix equation CU = U Ψ, (3.11)where U = [y (0) |y (1) | · · · |y (n−1) ] = n−1/2 [e−2πimk/n ; m, k = 0, 1, . . . , n − 1]
• 34 Circulant Matricesis the matrix composed of the eigenvectors as columns, andΨ = diag(ψk ) is the diagonal matrix with diagonal elementsψ0 , ψ1 , . . . , ψn−1 . Furthermore, (3.10) implies that U is unitary. Byway of details, denote that the (k, j)th element of U U ∗ by ak,j andobserve that ak,j will be the product of the kth row of U , which is √{e−2πimk/n / n; m = 0, 1, . . . , n−1}, times the jth column of U ∗ , which √is {e2πimj/n / n; m = 0, 1, . . . , n − 1} so that n−1 1 ak,j = e2πim(j−k)/n = δ(k−j) mod n n m=0and hence U U ∗ = I. Similarly, U ∗ U = I. Thus (3.11) implies that C = U ΨU ∗ (3.12) Ψ = U ∗ CU. (3.13)Since C is unitarily similar to a diagonal matrix it is normal.3.2 Matrix Operations on Circulant MatricesThe following theorem summarizes the properties derived in the previ-ous section regarding eigenvalues and eigenvectors of circulant matricesand provides some easy implications.Theorem 3.1. Every circulant matrix C has eigenvectors y (m) = 1 ′√ n 1, e−2πim/n , · · · , e−2πim(n−1)/n , m = 0, 1, . . . , n − 1, and corre-sponding eigenvalues n−1 ψm = ck e−2πimk/n k=0and can be expressed in the form C = U ΨU ∗ , where U has the eigen-vectors as columns in order and Ψ is diag(ψk ). In particular all circulantmatrices share the same eigenvectors, the same matrix U works for allcirculant matrices, and any matrix of the form C = U ΨU ∗ is circulant. Let C = {ck−j } and B = {bk−j } be circulant n × n matrices witheigenvalues n−1 n−1 −2πimk/n ψm = ck e , βm = bk e−2πimk/n , k=0 k=0
• 3.2. Matrix Operations on Circulant Matrices 35respectively. Then (1) C and B commute and CB = BC = U γU ∗ , where γ = diag(ψm βm ), and CB is also a circulant matrix. (2) C + B is a circulant matrix and C + B = U ΩU ∗ , where Ω = {(ψm + βm )δk−m } (3) If ψm = 0; m = 0, 1, . . . , n − 1, then C is nonsingular and C −1 = U Ψ−1 U ∗ .Proof. We have C = U ΨU ∗ and B = U ΦU ∗ where Ψ = diag(ψm ) andΦ = diag(βm ). (1) CB = U ΨU ∗ U ΦU ∗ = U ΨΦU ∗ = U ΦΨU ∗ = BC. Since ΨΦ is diagonal, the ﬁrst part of the theorem implies that CB is circulant. (2) C + B = U (Ψ + Φ)U ∗ . (3) If Ψ is nonsingular, then CU Ψ−1 U ∗ = U ΨU ∗ U Ψ−1 U ∗ = U ΨΨ−1 U ∗ = U U ∗ = I. 2 Circulant matrices are an especially tractable class of matrices sinceinverses, products, and sums are also circulant matrices and hence bothstraightforward to construct and normal. In addition the eigenvaluesof such matrices can easily be found exactly and the same eigenvectorswork for all circulant matrices. We shall see that suitably chosen sequences of circulant matricesasymptotically approximate sequences of Toeplitz matrices and henceresults similar to those in Theorem 3.1 will hold asymptotically forsequences of Toeplitz matrices.
• 4 Toeplitz Matrices4.1 Sequences of Toeplitz MatricesGiven the simplicity of sums, products, eigenvalues,, inverses, and de-terminants of circulant matrices, an obvious approach to the study ofasymptotic properties of sequences of Toeplitz matrices is to approxi-mate them by sequences asymptotically equivalent of circulant matricesand then applying the results developed thus far. Such results are mosteasily derived when strong assumptions are placed on the sequence ofToeplitz matrices which keep the structure of the matrices simple andallow them to be well approximated by a natural and simple sequenceof related circulant matrices. Increasingly general results require corre-sponding increasingly complicated constructions and proofs. Consider the inﬁnite sequence {tk } and deﬁne the correspondingsequence of n × n Toeplitz matrices Tn = [tk−j ; k, j = 0, 1, . . . , n − 1] asin (1.1). Toeplitz matrices can be classiﬁed by the restrictions placed onthe sequence tk . The simplest class results if there is a ﬁnite m for whichtk = 0, |k| > m, in which case Tn is said to be a banded Toeplitz matrix.A banded Toeplitz matrix has the appearance of the of (4.1), possessinga ﬁnite number of diagonals with nonzero entries and zeros everywhere 37
• 38 Toeplitz Matriceselse, so that the nonzero entries lie within a “band” including the maindiagonal:   t0 t−1 · · · t−m  t  1 t0    .  .   . 0    .. ..    . .    tm     ..   .  Tn =  .    tm · · · t1 t0 t−1 · · · t−m  ..   .     .. ..     . . t−m    . .  .     0 t0 t−1     tm · · · t1 t0 (4.1) In the more general case where the tk are not assumed to be zerofor large k, there are two common constraints placed on the inﬁnitesequence {tk ; k = . . . , −2, −1, 0, 1, 2, . . .} which deﬁnes all of the ma-trices Tn in the sequence. The most general is to assume that the tkare square summable, i.e., that ∞ |tk |2 < ∞. (4.2) k=−∞Unfortunately this case requires mathematical machinery beyond thatassumed here; i.e., Lebesgue integration and a relatively advancedknowledge of Fourier series. We will make the stronger assumption thatthe tk are absolutely summable, i.e., that ∞ |tk | < ∞. (4.3) k=−∞Note that (4.3) is indeed a stronger constraint than (4.2) since ∞ ∞ 2 2 |tk | ≤ |tk | . (4.4) k=−∞ k=−∞
• 4.1. Sequences of Toeplitz Matrices 39 The assumption of absolute summability greatly simpliﬁes themathematics, but does not alter the fundamental concepts of Toeplitzand circulant matrices involved. As the main purpose here is tutorialand we wish chieﬂy to relay the ﬂavor and an intuitive feel for theresults, we will conﬁne interest to the absolutely summable case. Themain advantage of (4.3) over (4.2) is that it ensures the existence andof the Fourier series f (λ) deﬁned by ∞ n ikλ f (λ) = tk e = lim tk eikλ . (4.5) n→∞ k=−∞ k=−nNot only does the limit in (4.5) converge if (4.3) holds, it convergesuniformly for all λ, that is, we have that n −n−1 ∞ f (λ) − tk eikλ = tk eikλ + tk eikλ k=−n k=−∞ k=n+1 −n−1 ∞ ≤ tk eikλ + tk eikλ , k=−∞ k=n+1 −n−1 ∞ ≤ |tk | + |tk | k=−∞ k=n+1where the right-hand side does not depend on λ and it goes to zero asn → ∞ from (4.3). Thus given ǫ there is a single N , not depending onλ, such that n f (λ) − tk eikλ ≤ ǫ , all λ ∈ [0, 2π] , if n ≥ N. (4.6) k=−nFurthermore, if (4.3) holds, then f (λ) is Riemann integrable and the tkcan be recovered from f from the ordinary Fourier inversion formula: 2π 1 tk = f (λ)e−ikλ dλ. (4.7) 2π 0As a ﬁnal useful property of this case, f (λ) is a continuous function ofλ ∈ [0, 2π] except possibly at a countable number of points.
• 40 Toeplitz Matrices A sequence of Toeplitz matrices Tn = [tk−j ] for which the tk areabsolutely summable is said to be in the Wiener class,. Similarly, afunction f (λ) deﬁned on [0, 2π] is said to be in the Wiener class if ithas a Fourier series with absolutely summable Fourier coeﬃcients. Itwill often be of interest to begin with a function f in the Wiener classand then deﬁne the sequence of of n × n Toeplitz matrices 2π 1 Tn (f ) = f (λ)e−i(k−j)λ dλ ; k, j = 0, 1, · · · , n − 1 , (4.8) 2π 0which will then also be in the Wiener class. The Toeplitz matrix Tn (f )will be Hermitian if and only if f is real. More speciﬁcally, Tn (f ) =Tn (f ) if and only if tk−j = t∗ for all k, j or, equivalently, t∗ = t−k all ∗ j−k kk. If t∗ = t−k , however, k ∞ ∞ ∗ f (λ) = t∗ e−ikλ k = t−k e−ikλ k=−∞ k=−∞ ∞ = tk eikλ = f (λ), k=−∞so that f is real. Conversely, if f is real, then 2π 1 t∗ = k f ∗ (λ)eikλ dλ 2π 0 2π 1 = f (λ)eikλ dλ = t−k . 2π 0 It will be of interest to characterize the maximum and minimummagnitude of the eigenvalues of Toeplitz matrices and how these relateto the maximum and minimum values of the corresponding functions f .Problems arise, however, if the function f has a maximum or minimumat an isolated point. To avoid such diﬃculties we deﬁne the essentialsupremum Mf = ess supf of a real valued function f as the smallestnumber a for which f (x) ≤ a except on a set of total length or mea-sure 0. In particular, if f (x) > a only at isolated points x and not onany interval of nonzero length, then Mf ≤ a. Similarly, the essentialinﬁmum mf = ess inff is deﬁned as the largest value of a for which
• 4.2. Bounds on Eigenvalues of Toeplitz Matrices 41f (x) ≥ a except on a set of total length or measure 0. The key ideahere is to view Mf and mf as the maximum and minimum values of f ,where the extra verbiage is to avoid technical diﬃculties arising fromthe values of f on sets that do not eﬀect the integrals. Functions f inthe Wiener class are bounded since ∞ ∞ ikλ |f (λ)| ≤ |tk e |≤ |tk | (4.9) k=−∞ k=−∞so that ∞ m|f | , M|f | ≤ |tk |. (4.10) k=−∞4.2 Bounds on Eigenvalues of Toeplitz MatricesIn this section Lemma 2.1 is used to obtain bounds on the eigenvalues ofHermitian Toeplitz matrices and an upper bound bound to the strongnorm for general Toeplitz matrices.Lemma 4.1. Let τn,k be the eigenvalues of a Toeplitz matrix Tn (f ).If Tn (f ) is Hermitian, then mf ≤ τn,k ≤ Mf . (4.11)Whether or not Tn (f ) is Hermitian, Tn (f ) ≤ 2M|f | , (4.12)so that the sequence of Toeplitz matrices {Tn (f )} is uniformly boundedover n if the essential supremum of |f | is ﬁnite.Proof. From Lemma 2.1, max τn,k = max(x∗ Tn (f )x)/(x∗ x) (4.13) k x min τn,k = min(x∗ Tn (f )x)/(x∗ x) k x
• 42 Toeplitz Matricesso that n−1 n−1 x∗ Tn (f )x = tk−j xk x∗ j k=0 j=0 n−1 n−1 2π 1 = 2π f (λ)ei(k−j)λ dλ xk x∗ j (4.14) k=0 j=0 0 2 2π n−1 1 = 2π xk eikλ f (λ) dλ 0 k=0and likewise n−1 2π n−1 1 x∗ x = |xk |2 = | xk eikλ |2 dλ. (4.15) 2π 0 k=0 k=0Combining (4.14)–(4.15) results in n−1 2 2π f (λ) xk eikλ dλ 0 k=0 x∗ Tn (f )x mf ≤ 2 = ≤ Mf , (4.16) 2π n−1 x∗ x xk eikλ dλ 0 k=0which with (4.13) yields (4.11). We have already seen in (2.16) that if Tn (f ) is Hermitian, then ∆ Tn (f ) = maxk |τn,k | = |τn,M |. Since |τn,M | ≤ max(|Mf |, |mf |) ≤M|f | , (4.12) holds for Hermitian matrices. Suppose that Tn (f ) is notHermitian or, equivalently, that f is not real. Any function f can bewritten in terms of its real and imaginary parts, f = fr +ifi , where bothfr and fi are real. In particular, fr = (f + f ∗ )/2 and fi = (f − f ∗ )/2i.From the triangle inequality for norms, Tn (f ) = Tn (fr + ifi ) = Tn (fr ) + iTn (fi ) ≤ Tn (fr ) + Tn (fi ) ≤ M|fr | + M|fi | .
• 4.3. Banded Toeplitz Matrices 43Since |(f ± f ∗ )/2 ≤ (|f |+ |f ∗ |)/2 ≤ M|f | , M|fr | + M|fi | ≤ 2M|f | , proving(4.12). 2 Note for later use that the weak norm of a Toeplitz matrix takes aparticularly simple form. Let Tn (f ) = {tk−j }, then by collecting equalterms we have n−1 n−1 1 |Tn (f )|2 = |tk−j |2 n k=0 j=0 n−1 1 = (n − |k|)|tk |2 n k=−(n−1) n−1 = (1 − |k|/n)|tk |2 . (4.17) k=−(n−1) We are now ready to put all the pieces together to study the asymp-totic behavior of Tn (f ). If we can ﬁnd an asymptotically equivalentsequence of circulant matrices, then all of the results regarding cir-culant matrices and asymptotically equivalent sequences of matricesapply. The main diﬀerence between the derivations for simple sequenceof banded Toeplitz matrices and the more general case is the sequenceof circulant matrices chosen. Hence to gain some feel for the matrixchosen, we ﬁrst consider the simpler banded case where the answer isobvious. The results are then generalized in a natural way.4.3 Banded Toeplitz MatricesLet Tn be a sequence of banded Toeplitz matrices of order m + 1, thatis, ti = 0 unless |i| ≤ m. Since we are interested in the behavior or Tnfor large n we choose n >> m. As is easily seen from (4.1), Tn lookslike a circulant matrix except for the upper left and lower right-handcorners, i.e., each row is the row above shifted to the right one place.We can make a banded Toeplitz matrix exactly into a circulant if we ﬁllin the upper right and lower left corners with the appropriate entries.
• 44 Toeplitz MatricesDeﬁne the circulant matrix Cn in just this way, i.e.,   t0 t−1 · · · t−m tm ··· t1  .. . .   t1  . .     tm    . ..  .   . .    tm 0        ..   .      Cn =   tm · · · t1 t0 t−1 · · · t−m  .. ..  . .                 0 t−m    t−m     . . ..   . . .  . .   t0 t−1     t−1 · · · t−m tm · · · t1 t0   (n) (n) c0 ··· cn−1      (n) (n)   c  n−1 c0 ···   = . (4.18)    . .. .   . . .   . .      (n) (n) (n) c1 ··· cn−1 c0 (n) (n)Equivalently, C, consists of cyclic shifts of (c0 , · · · , cn−1 ) where  t−k  k = 0, 1, · · · , m  (n) ck = tn−k k = n − m, · · · , n − 1 (4.19)  0  otherwise If a Toeplitz matrix is speciﬁed by a function f and hence denotedby Tn (f ), then the circulant matrix deﬁned by (4.18–4.19) is similarly
• 4.3. Banded Toeplitz Matrices 45denoted Cn (f ). The function f will be explicitly shown when it is usefulto do so, for example when the results being developed speciﬁcallyinvolve f . The matrix Cn is intuitively a candidate for a simple matrix asymp-totically equivalent to Tn — we need only demonstrate that it is indeedboth asymptotically equivalent and simple.Lemma 4.2. The matrices Tn and Cn deﬁned in (4.1) and (4.18) areasymptotically equivalent, i.e., both are bounded in the strong normand lim |Tn − Cn | = 0. (4.20) n→∞Proof. The tk are obviously absolutely summable, so Tn are uniformlybounded by 2M|f | from Lemma 4.1. The matrices Cn are also uni- ∗formly bounded since Cn Cn is a circulant matrix with eigenvalues|f (2πk/n)|2 ≤ 4M 2 . The weak norm of the diﬀerence is |f | m 1 |Tn − Cn |2 = n k(|tk |2 + |t−k |2 ) k=0 . m 1 −→ ≤ mn (|tk |2 + |t−k |2 ) n→∞ 0 k=0 2 The above lemma is almost trivial since the matrix Tn − Cn hasfewer than m2 non-zero entries and hence the 1/n in the weak normdrives |Tn − Cn | to zero. From Lemma 4.2 and Theorem 2.2 we have the following lemma.Lemma 4.3. Let Tn and Cn be as in (4.1) and (4.18) and let theireigenvalues be τn,k and ψn,k , respectively, then for any positive integers n−1 1 s s lim τn,k − ψn,k = 0. (4.21) n→∞ n k=0In fact, for ﬁnite n, n−1 1 s s τn,k − ψn,k ≤ Kn−1/2 , (4.22) n k=0
• 46 Toeplitz Matriceswhere K is not a function of n.Proof. Equation (4.21) is direct from Lemma 4.2 and Theorem 2.2.Equation (4.22) follows from Corollary 2.1 and Lemma 4.2. 2 The lemma implies that if either of the separate limits converges,then both will and n−1 n−1 1 s 1 s lim τn,k = lim ψn,k . (4.23) n→∞ n n→∞ n k=0 k=0The next lemma shows that the second limit indeed converges, and infact provides an evaluation for the limit.Lemma 4.4. Let Cn (f ) be constructed from Tn (f ) as in (4.18) andlet ψn,k be the eigenvalues of Cn (f ), then for any positive integer s wehave n−1 2π 1 s 1 lim ψn,k = f s (λ) dλ. (4.24) n→∞ n 2π 0 k=0If Tn (f ) is Hermitian, then for any function F (x) continuous on[mf , Mf ] we have n−1 2π 1 1 lim F (ψn,k ) = F (f (λ)) dλ. (4.25) n→∞ n 2π 0 k=0Proof. From Theorem 3.1 we have exactly n−1 (n) ψn,j = ck e−2πijk/n k=0 m n−1 = t−k e−2πijk/n + tn−k e−2πijk/n k=0 k=n−m m 2πj = tk e−2πijk/n = f ( ) (4.26) n k=−mNote that the eigenvalues of Cn (f ) are simply the values of f (λ) with λuniformly spaced between 0 and 2π. Deﬁning 2πk/n = λk and 2π/n =
• 4.3. Banded Toeplitz Matrices 47∆λ we have n−1 n−1 1 s 1 lim ψn,k = lim f (2πk/n)s n→∞ n n→∞ n k=0 k=0 n−1 = lim f (λk )s ∆λ/(2π) n→∞ k=0 2π 1 = f (λ)s dλ, (4.27) 2π 0where the continuity of f (λ) guarantees the existence of the limit of(4.27) as a Riemann integral. If Tn (f ) and Cn (f ) are Hermitian, thanthe ψn,k and f (λ) are real and application of the Weierstrass theoremto (4.27) yields (4.25). Lemma 4.2 and (4.26) ensure that ψn,k and τn,kare in the interval [mf , Mf ]. 2 Combining Lemmas 4.2–4.4 and Theorem 2.2 we have the followingspecial case of the fundamental eigenvalue distribution theorem.Theorem 4.1. If Tn (f ) is a banded Toeplitz matrix with eigenvaluesτn,k , then for any positive integer s n−1 2π 1 s 1 lim τn,k = f (λ)s dλ. (4.28) n→∞ n 2π 0 k=0Furthermore, if f is real, then for any function F (x) continuous on[mf , Mf ] n−1 2π 1 1 lim F (τn,k ) = F (f (λ)) dλ; (4.29) n→∞ n 2π 0 k=0i.e., the sequences {τn,k } and {f (2πk/n)} are asymptotically equallydistributed. This behavior should seem reasonable since the equations Tn (f )x =τ x and Cn (f )x = ψx, n > 2m + 1, are essentially the same nth orderdiﬀerence equation with diﬀerent boundary conditions. It is in fact the“nice” boundary conditions that make ψ easy to ﬁnd exactly whileexact solutions for τ are usually intractable.
• 48 Toeplitz Matrices With the eigenvalue problem in hand we could next write down the-orems on inverses and products of Toeplitz matrices using Lemma 4.2and results for circulant matrices and asymptotically equivalent se-quences of matrices. Since these theorems are identical in statementand proof with the more general case of functions f in the Wiener class,we defer these theorems momentarily and generalize Theorem 4.1 tomore general Toeplitz matrices with no assumption of bandedeness.4.4 Wiener Class Toeplitz MatricesNext consider the case of f in the Wiener class, i.e., the case wherethe sequence {tk } is absolutely summable. As in the case of sequencesof banded Toeplitz matrices, the basic approach is to ﬁnd a sequenceof circulant matrices Cn (f ) that is asymptotically equivalent to the se-quence of Toeplitz matrices Tn (f ). In the more general case under con-sideration, the construction of Cn (f ) is necessarily more complicated.Obviously the choice of an appropriate sequence of circulant matricesto approximate a sequence of Toeplitz matrices is not unique, so weare free to choose a construction with the most desirable properties.It will, in fact, prove useful to consider two slightly diﬀerent circulantapproximations. Since f is assumed to be in the Wiener class, we havethe Fourier series representation ∞ f (λ) = tk eikλ (4.30) k=−∞ 2π 1 tk = f (λ)e−ikλ dλ. (4.31) 2π 0Deﬁne Cn (f ) to be the circulant matrix with top row (n) (n) (n)(c0 , c1 , · · · , cn−1 ) where n−1 (n) 1 ck = f (2πj/n)e2πijk/n . (4.32) n j=0
• 4.4. Wiener Class Toeplitz Matrices 49Since f (λ) is Riemann integrable, we have that for ﬁxed k n−1 (n) 1 lim ck = lim f (2πj/n)e2πijk/n n→∞ n→∞ n j=0 (4.33) 2π 1 = 2π f (λ)eikλ dλ = t−k 0 (n)and hence the ck are simply the sum approximations to the Riemannintegrals giving t−k . Equations (4.32), (3.7), and (3.9) show that theeigenvalues ψn,m of Cn (f ) are simply f (2πm/n); that is, from (3.7) and(3.9) n−1 (n) ψn,m = ck e−2πimk/n k=0   n−1 n−1 = 1 f (2πj/n)e2πijk/n  e−2πimk/n n k=0 j=0 n−1 n−1 1 = f (2πj/n) e2πik(j−m)/n n j=0 k=0 = f (2πm/n). (4.34)Thus, Cn (f ) has the useful property (4.26) of the circulant approxi-mation (4.19) used in the banded case. As a result, the conclusionsof Lemma 4.4 hold for the more general case with Cn (f ) constructedas in (4.32). Equation (4.34) in turn deﬁnes Cn (f ) since, if we aretold that Cn (f ) is a circulant matrix with eigenvalues f (2πm/n), m =0, 1, · · · , n − 1, then from (3.9) n−1 (n) 1 ck = ψn,m e2πimk/n n m=0 n−1 1 = f (2πm/n)e2πimk/n , (4.35) n m=0
• 50 Toeplitz Matricesas in (4.32). Thus, either (4.32) or (4.34) can be used to deﬁne Cn (f ). The fact that Lemma 4.4 holds for Cn (f ) yields several useful prop-erties as summarized by the following lemma.Lemma 4.5. Given a function f satisfying (4.30–4.31) and deﬁne thecirculant matrix Cn (f ) by (4.32). (1) Then ∞ (n) ck = t−k+mn , k = 0, 1, · · · , n − 1. (4.36) m=−∞ (Note, the sum exists since the tk are absolutely summable.) (2) If f (λ) is real and mf = ess inf f > 0, then Cn (f )−1 = Cn (1/f ). (3) Given two functions f (λ) and g(λ), then Cn (f )Cn (g) = Cn (f g).Proof. (1) Applying (4.31) to λ = 2πj/n gives ∞ j f (2π ) = tℓ eiℓ2πj/n n ℓ=−∞ which when inserted in (4.32) yields n−1 (n) 1 j ck = f (2π )e2πijk/n n n j=0 n−1 ∞ 1 = tℓ eiℓ2πj/n e2πijk/n (4.37) n j=0 ℓ=−∞ ∞ n−1 ∞ 1 i2π(k+ℓ)j/n = tℓ e = tℓ δ(k+ℓ) mod n , n ℓ=−∞ j=0 ℓ=−∞ where the ﬁnal step uses (3.10). The term δ(k+ℓ) mod n will be 1 whenever ℓ = −k plus a multiple mn of n, which yields (4.36).
• 4.4. Wiener Class Toeplitz Matrices 51 (2) Since Cn (f ) has eigenvalues f (2πk/n) > 0, by Theorem 3.1 Cn (f )−1 has eigenvalues 1/f (2πk/n), and hence from (4.35) and the fact that Cn (f )−1 is circulant we have Cn (f )−1 = Cn (1/f ). (3) Follows immediately from Theorem 3.1 and the fact that, if f (λ) and g(λ) are Riemann integrable, so is f (λ)g(λ). 2 Equation (4.36) points out a shortcoming of Cn (f ) for applicationsas a circulant approximation to Tn (f ) — it depends on the entire se-quence {tk ; k = 0, ±1, ±2, · · · } and not just on the ﬁnite collection ofelements {tk ; k = 0, ±1, · · · , ±(n − 1)} of Tn (f ). This can cause prob-lems in practical situations where we wish a circulant approximationto a Toeplitz matrix Tn when we only know Tn and not f . Pearl [19]discusses several coding and ﬁltering applications where this restrictionis necessary for practical reasons. A natural such approximation is toform the truncated Fourier series n−1 ˆ fn (λ) = tm eimλ , (4.38) m=−(n−1)which depends only on {tm ; m = 0, ±1, · · · , ±n − 1}, and then deﬁne ˆthe circulant matrix Cn (fn ); that is, the circulant matrix having as top (n) (n)row (ˆ0 , · · · , cn−1 ) where analogous to the derivation of (4.37) c ˆ n−1 (n) 1 ˆ 2πj )e2πijk/n ck ˆ = fn ( n n j=0   n−1 n−1 1 =  tℓ eiℓ2πj/n  e2πijk/n n j=0 ℓ=−(n−1) n−1 n−1 1 = tℓ ei2π(k+ℓ)j/n n ℓ=−(n−1) j=0 n−1 = tℓ δ(k+ℓ) mod n . ℓ=−(n−1)
• 52 Toeplitz MatricesNow, however, we are only interested in values of ℓ which have the form−k plus a multiple mn of n for which −(n − 1) ≤ −k + mn ≤ n − 1.This will always include the m = 0 term for which ℓ = −k. If k = 0,then only the m = 0 term lies within the range. If k = 1, 2, . . . , n − 1,then m = −1 results in −k + n which is between 1 and n − 1. No othermultiples lie within the range, so we end up with (n) t0 k=0 ck ˆ = . (4.39) t−k + tn−k k = 1, 2, . . . , n − 1 ˆ ˆ Since Cn (fn ) is also a Toeplitz matrix, deﬁne Cn (fn ) = Tn = {t′ } ′ k−jwith  c(n) = tk + tn+k ˆ−k k = −(n − 1), . . . , −1  ′ (n) tk = c0 = t0 ˆ k=0 , (4.40)  (n) c  ˆ =t + t k = 1, 2, . . . , n − 1 n−k −(n−k) kwhich can be pictured as ′ Tn =  t0 t−1 + tn−1 t−2 + tn−2 · · · t−(n−1) + t1 t1 + t−(n−1) t0 t−1 + tn−1   t2 + t−(n−2) t1 + t−(n−1) . .   (4.41) t0 .  . . ..  . .  tn−1 + t1 ··· t0 ˆ Like the original approximation Cn (f ), the approximation Cn (fn )reduces to the Cn (f ) of (4.19) for a banded Toeplitz matrix of order mif n > 2m + 1. The following lemma shows that these circulant matricesare asymptotically equivalent to each other and to Tm .Lemma 4.6. Let Tn (f ) = {tk−j } where ∞ |tk | < ∞, k=−∞
• 4.4. Wiener Class Toeplitz Matrices 53and ∞ n−1 f (λ) = ˆ tk eikλ , fn (λ) = tk eikλ . k=−∞ k=−(n−1) ˆDeﬁne the circulant matrices Cn (f ) and Cn (fn ) as in (4.32) and (4.38)–(4.39). Then, ˆ Cn (f ) ∼ Cn (fn ) ∼ Tn . (4.42) ˆProof. Since both Cn (f ) and Cn (fn ) are circulant matrices with thesame eigenvectors (Theorem 3.1), we have from part 2 of Theorem 3.1and (2.17) that n−1 ˆ 1 ˆ |Cn (f ) − Cn (fn )|2 = |f (2πk/n) − fn (2πk/n)|2 . n k=0 ˆRecall from (4.6) and the related discussion that fn (λ) uniformly con-verges to f (λ), and hence given ǫ > 0 there is an N such that for n ≥ Nwe have for all k, n that ˆ |f (2πk/n) − fn (2πk/n)|2 ≤ ǫand hence for n ≥ N n−1 ˆ 1 |Cn (f ) − Cn (fn )|2 ≤ ǫ = ǫ. n i=0Since ǫ is arbitrary, ˆ lim |Cn (f ) − Cn (fn )| = 0 n→∞proving that ˆ Cn (f ) ∼ Cn (fn ). (4.43)
• 54 Toeplitz MatricesApplication of (4.40) and (4.17) results in n−1 ˆ |Tn (f ) − Cn (fn )|2 = (1 − |k|/n)|tk − t′ |2 k k=−(n−1) −1 n−1 n+k n−k = |tn+k |2 + |t−(n−k) |2 n n k=−(n−1) k=1 −1 n−1 k k = |tk |2 + |t−k |2 n n k=−(n−1) k=1 n−1 k = |tk |2 + |t−k |2 (4.44) n k=1Since the {tk } are absolutely summable, they are also square summablefrom (4.4) and hence given ǫ > 0 we can choose an N large enough sothat ∞ |tk |2 + |t−k |2 ≤ ǫ. k=NTherefore ˆ lim |Tn (f ) − Cn (fn )| n→∞ n−1 = lim (k/n)(|tk |2 + |t−k |2 ) n→∞ k=0 N −1 n−1 2 2 = lim (k/n)(|tk | + |t−k | ) + (k/n)(|tk |2 + |t−k |2 ) n→∞ k=0 k=N N −1 ∞ 1 ≤ lim k(|tk |2 + |t−k |2 ) + (|tk |2 + |t−k |2 ) ≤ ǫ n→∞ n k=0 k=NSince ǫ is arbitrary, ˆ lim |Tn (f ) − Cn (fn )| = 0 n→∞
• 4.4. Wiener Class Toeplitz Matrices 55and hence ˆ Tn (f ) ∼ Cn (fn ), (4.45)which with (4.43) and Theorem 2.1 proves (4.42). 2 Pearl [19] develops a circulant matrix similar to Cn (fnˆ ) (dependingonly on the entries of Tn (f )) such that (4.45) holds in the more generalcase where (4.2) instead of (4.3) holds. We now have a sequence of circulant matrices {Cn (f )} asymptoti-cally equivalent to the sequence {Tn (f )} and the eigenvalues, inversesand products of the circulant matrices are known exactly. ThereforeLemmas 4.2–4.4 and Theorems 2.2–2.2 can be applied to generalizeTheorem 4.1.Theorem 4.2. Let Tn (f ) be a sequence of Toeplitz matrices such thatf (λ) is in the Wiener class or, equivalently, that {tk } is absolutelysummable. Let τn,k be the eigenvalues of Tn (f ) and s be any positiveinteger. Then n−1 2π 1 s 1 lim τn,k = f (λ)s dλ. (4.46) n→∞ n 2π 0 k=0Furthermore, if f (λ) is real or, equivalently, the matrices Tn (f ) are allHermitian, then for any function F (x) continuous on [mf , Mf ] n−1 2π 1 1 lim F (τn,k ) = F (f (λ)) dλ. (4.47) n→∞ n 2π 0 k=0 Theorem 4.2 is the fundamental eigenvalue distribution theorem ofSzeg¨ (see [16]). The approach used here is essentially a specialization oof Grenander and Szeg¨ ([16], ch. 7). o Theorem 4.2 yields the following two corollaries.Corollary 4.1. Given the assumptions of the theorem, deﬁne theeigenvalue distribution function Dn (x) = (number of τn,k ≤ x)/n. As-sume that dλ = 0. λ:f (λ)=x
• 56 Toeplitz MatricesThen the limiting distribution D(x) = limn→∞ Dn (x) exists and isgiven by 1 D(x) = dλ. 2π f (λ)≤xThe technical condition of a zero integral over the region of the set ofλ for which f (λ) = x is needed to ensure that x is a point of continuityof the limiting distribution. It can be interpreted as not allowing f (λ)to have a ﬂat region around the point x. The limiting distributionfunction evaluated at x describes the fraction of the eigenvalues thatsmaller than x in the limit as n → ∞, which in turn implies that thefraction of eigenvalues between two values a and b > a is D(b) − D(a).This is similar to the role of a cumulative distribution function (cdf)in probability theory.Proof. Deﬁne the indicator function 1 mf ≤ α ≤ x 1x (α) = 0 otherwiseWe have n−1 1 D(x) = lim 1x (τn,k ). n→∞ n k=0Unfortunately, 1x (α) is not a continuous function and hence Theo-rem 4.2 cannot be immediately applied. To get around this problem wemimic Grenander and Szeg¨ p. 115 and deﬁne two continuous functions othat provide upper and lower bounds to 1x and will converge to it inthe limit. Deﬁne  1  α≤x  + 1x (α) = 1 − α−x x < α ≤ x + ǫ ǫ  0  x+ǫ<α  1  α≤x−ǫ  − α−x+ǫ 1x (α) = 1 − ǫ x−ǫ<α≤x  0  x<α
• 4.4. Wiener Class Toeplitz Matrices 57The idea here is that the upper bound has an output of 1 everywhere1x does, but then it drops in a continuous linear fashion to zero at x + ǫinstead of immediately at x. The lower bound has a 0 everywhere 1xdoes and it rises linearly from x to x − ǫ to the value of 1 instead ofinstantaneously as does 1x . Clearly 1− (α) < 1x (α) < 1+ (α) for all α. x x Since both 1+ and 1− are continuous, Theorem 4.2 can be used to x xconclude that n−1 1 lim 1+ (τn,k ) x n→∞ n k=0 1 = 1+ (f (λ)) dλ x 2π 1 1 f (λ) − x = dλ + (1 − ) dλ 2π f (λ)≤x 2π x<f (λ)≤x+ǫ ǫ 1 1 ≤ dλ + dλ 2π f (λ)≤x 2π x<f (λ)≤x+ǫand n−1 1 lim 1− (τn,k ) x n→∞ n k=0 1 = 1− (f (λ)) dλ x 2π 1 1 f (λ) − (x − ǫ) = dλ + (1 − ) dλ 2π f (λ)≤x−ǫ 2π x−ǫ<f (λ)≤x ǫ 1 1 = dλ + (x − f (λ)) dλ 2π f (λ)≤x−ǫ 2π x−ǫ<f (λ)≤x 1 ≥ dλ 2π f (λ)≤x−ǫ 1 1 = dλ − dλ 2π f (λ)≤x 2π x−ǫ<f (λ)≤x These inequalities imply that for any ǫ > 0, as n grows the sample
• 58 Toeplitz Matrices n−1average (1/n) k=0 1x (τn,k ) will be sandwiched between 1 1 dλ + dλ 2π f (λ)≤x 2π x<f (λ)≤x+ǫand 1 1 dλ − dλ. 2π f (λ)≤x 2π x−ǫ<f (λ)≤xSince ǫ can be made arbitrarily small, this means the sum will besandwiched between 1 dλ 2π f (λ)≤xand 1 1 dλ − dλ. 2π f (λ)≤x 2π f (λ)=xThus if dλ = 0, f (λ)=xthen 2π 1 D(x) = 2π 1x [f (λ)]dλ 0 . 1 = 2π v dλ f (λ)≤x 2Corollary 4.2. Assume that the conditions of Theorem 4.2 hold andlet mf and Mf denote the essential inﬁmum and the essential supre-mum of f , respectively. Then lim max τn,k = Mf n→∞ k lim min τn,k = mf . n→∞ kProof. From Corollary 4.1 we have for any ǫ > 0 D(mf + ǫ) = dλ > 0. f (λ)≤mf +ǫ
• 4.4. Wiener Class Toeplitz Matrices 59The strict inequality follows from the continuity of f (λ). Since 1 lim {number of τn,k in [mf , mf + ǫ]} > 0 n→∞ nthere must be eigenvalues in the interval [mf , mf + ǫ] for arbitrarilysmall ǫ. Since τn,k ≥ mf by Lemma 4.1, the minimum result is proved.The maximum result is proved similarly. 2
• 5 Matrix Operations on Toeplitz MatricesApplications of Toeplitz matrices like those of matrices in general in-volve matrix operations such as addition, inversion, products and thecomputation of eigenvalues, eigenvectors, and determinants. The prop-erties of Toeplitz matrices particular to these operations are based pri-marily on three fundamental results that have been described earlier: (1) matrix operations are simple when dealing with circulant ma- trices, (2) given a sequence of Toeplitz matrices, we can instruct asymp- totically equivalent sequences of circulant matrices, and (3) asymptotically equivalent sequences of matrices have equal asymptotic eigenvalue distributions and other related prop- erties. In the next few sections some of these operations are explored inmore depth for sequences of Toeplitz matrices. Generalizations andrelated results can be found in Tyrtyshnikov [31]. 61
• 62 Matrix Operations on Toeplitz Matrices5.1 Inverses of Toeplitz MatricesIn some applications we wish to study the asymptotic distribution of afunction F (τn,k ) of the eigenvalues that is not continuous at the mini-mum or maximum value of f . For example, in order for the results de-rived thus far to apply to the function F (f (λ)) = 1/f (λ) which ariseswhen treating inverses of Toeplitz matrices, it has so far been neces-sary to require that the essential inﬁmum mf > 0 because the functionF (1/x) is not continuous at x = 0. If mf = 0, the basic asymptoticeigenvalue distribution Theorem 4.2 breaks down and the limits andthe integrals involved might not exist. The limits might exist and equalsomething else, or they might simply fail to exist. In order to treat theinverses of Toeplitz matrices when f has zeros, we state without proofan intuitive extension of the fundamental Toeplitz result that showshow to ﬁnd asymptotic distributions of suitably truncated functions.To state the result, deﬁne the mid function  z y ≥ z   ∆ mid(x, y, z) = y x ≤ y ≤ z (5.1)  x y ≤ z x < z. This function can be thought of as having input y and thresholdsz and X and it puts out y if y is between z and x, z if y is smaller thanz, and x if y is greater than x. The following result was proved in [13]and extended in [25]. See also [26, 27, 28].Theorem 5.1. Suppose that f is in the Wiener class. Then for anyfunction F (x) continuous on [ψ, θ] ⊂ [mf , Mf ] n−1 2π 1 1 lim F (mid(ψ, τn,k , θ) = F (mid(ψ, f (λ), θ) dλ. (5.2) n→∞ n 2π 0 k=0 Unlike Theorem 4.2 we pick arbitrary points ψ and θ such that F iscontinuous on the closed interval [ψ, θ]. These need not be the minimumand maximum of f .Theorem 5.2. Assume that f is in the Wiener class and is real andthat f (λ) ≥ 0 with equality holding at most at a countable number ofpoints. Then (a) Tn (f ) is nonsingular
• 5.1. Inverses of Toeplitz Matrices 63(b) If f (λ) ≥ mf > 0, then Tn (f )−1 ∼ Cn (f )−1 , (5.3)where Cn (f ) is deﬁned in (4.35). Furthermore, if we deﬁne Tn (f ) −Cn (f ) = Dn then Tn (f )−1 has the expansion Tn (f )−1 = [Cn (f ) + Dn ]−1 −1 = Cn (f )−1 I + Dn Cn (f )−1 2 = Cn (f )−1 I + Dn Cn (f )−1 + Dn Cn (f )−1 + · · · , (5.4)and the expansion converges (in weak norm) for suﬃciently large n.(c) If f (λ) ≥ mf > 0, then π 1 ei(k−j)λ Tn (f )−1 ∼ Tn (1/f ) = dλ ; (5.5) 2π −π f (λ)that is, if the spectrum is strictly positive, then the inverse of a sequenceof Toeplitz matrices is asymptotically Toeplitz. Furthermore if ρn,k arethe eigenvalues of Tn (f )−1 and F (x) is any continuous function on[1/Mf , 1/mf ], then n−1 π 1 1 lim F (ρn,k ) = F ((1/f (λ)) dλ. (5.6) n→∞ n 2π −π k=0(d) Suppose that mf = 0 and that the derivative of f (λ) exists andis bounded for all λ. Then Tn (f )−1 is not bounded, 1/f (λ) is not inte-grable and hence Tn (1/f ) is not deﬁned and the integrals of (5.2) maynot exist. For any ﬁnite θ, however, the following similar fact is true:If F (x) is a continuous function on [1/Mf , θ], then n−1 2π 1 1 lim F (min(ρn,k , θ)) = F (min(1/f (λ), θ)) dλ. (5.7) n→∞ n 2π 0 k=0
• 64 Matrix Operations on Toeplitz MatricesProof. (a) Since f (λ) > 0 except at possibly countably many points,we have from (4.14) n−1 2 π ∗ 1 ikλ x Tn (f )x = xk e f (λ)dλ > 0. 2π −π k=0Thus for all n min τn,k > 0 kand hence n−1 det Tn (f ) = τn,k = 0 k=0so that Tn (f ) is nonsingular.(b) From Lemma 4.6, Tn ∼ Cn and hence (5.1) follows from Theo-rem 2.1 since f (λ) ≥ mf > 0 ensures that Tn (f )−1 , Cn (f )−1 ≤ 1/mf < ∞.The series of (5.4) will converge in weak norm if |Dn Cn (f )−1 | < 1. (5.8)Since −→ |Dn Cn (f )−1 | ≤ Cn (f )−1 |Dn | ≤ (1/mf )|Dn | n→∞ 0,Eq. (5.8) must hold for large enough n.(c) We have from the triangle inequality that |Tn (f )−1 − Tn (1/f )| ≤ |Tn (f )−1 − Cn (f )−1 | + |Cn (f )−1 − Tn (1/f )|.From (b) for any ǫ > 0 we can choose an n large enough so that ǫ |Tn (f )−1 − Cn (f )−1 | ≤ . (5.9) 2From Theorem 3.1 and Lemma 4.5, Cn (f )−1 = Cn (1/f ) and fromLemma 4.6 Cn (1/f ) ∼ Tn (1/f ). Thus again we can choose n largeenough to ensure that |Cn (f )−1 − Tn (1/f )| ≤ ǫ/2 (5.10)
• 5.1. Inverses of Toeplitz Matrices 65so that for any ǫ > 0 from (5.7)–(5.8) can choose n such that |Tn (f )−1 − Tn (1/f )| ≤ ǫ,which implies (5.5). Equation (5.6) follows from (5.5) and Theorem 2.4.Alternatively, if G(x) is any continuous function on [1/Mf , 1/mf ] and(5.4) follows directly from Lemma 4.6 and Theorem 2.3 applied toG(1/x).(d) When f (λ) has zeros (mf = 0), then from Corollary 4.2 lim minτn,k = 0 and hencen→∞ k −1 Tn = max ρn,k = 1/ min τn,k (5.11) k kis unbounded as n → ∞. To prove that 1/f (λ) is not integrable andhence that Tn (1/f ) does not exist, consider the disjoint sets Ek = {λ : 1/k ≥ f (λ)/Mf > 1/(k + 1)} = {λ : k ≤ Mf /f (λ) < k + 1} (5.12)and let |Ek | denote the length of the set Ek , that is, |Ek | = dλ. λ:Mf /k≥f (λ)>Mf /(k+1)From (5.12) π ∞ 1 1 dλ = dλ −π f (λ) f (λ) k=1 Ek ∞ |Ek |k ≥ . (5.13) Mf k=1For a given k, Ek will comprise a union of disjoint intervals of the form(a, b) where for all λ ∈ (a, b) we have that 1/k ≥ f (λ)/Mf > 1/(k + 1).There must be at least one such nonempty interval, so |Ek | will bebound below by the length of this interval, b − a. Then for any x, y ∈(a, b) y df |f (y) − f (x)| = | dλ| ≤ η|y − x|. x dλ
• 66 Matrix Operations on Toeplitz MatricesBy assumption there is some ﬁnite value η such that df ≤ η, (5.14) dλso that |f (y) − f (x)| =≤ η|y − x|.Pick x and y so that f (x) = Mf /(k + 1) and f (y) = Mf /k (since f iscontinuous at almost all points, this argument works almost everywhere– it needs more work if these end points are not points of continuity off ), then 1 1 Mf b − a ≥ |y − x| ≥ Mf ( − )= . k k+1 k+1Combining this with (5.13) yields π ∞ Mf dλ/f (λ) ≥ (k/Mf )( ))/η (5.15) −π k(k + 1 k=1 ∞ 1 = , (5.16) k+1 k=1which diverges so that 1/f (λ) is not integrable. To prove (5.5) letF (x) be continuous on [1/Mf , θ], then F (min(1/x, θ)) is continuouson [0, Mf ] and hence Theorem 2.4 yields (5.5). Note that (5.5) im-plies that the eigenvalues of Tn (f )−1 are asymptotically equally dis-tributed up to any ﬁnite θ as the eigenvalues of the sequence of matricesTn [min(1/f, θ)]. 2 A special case of (d) is when Tn (f ) is banded and f (λ) has at leastone zero. Then the derivative exists and is bounded since m df /dλ = iktk eikλ k=−m . m ≤ |k||tk | < ∞ k=−mThe series expansion of (b) is due to Rino [20]. The proof of (d) ismotivated by one of Widom [33]. Further results along the lines of (d)
• 5.2. Products of Toeplitz Matrices 67regarding unbounded Toeplitz matrices may be found in [13]. Relatedresults considering asymptotically equal distributions of unbounded se-quences can be found in Tyrtyshnikov [32] and Trench [25]. These worksextend Weyl’s deﬁnition of asymptotically equal distributions to un-bounded sequences using the mid function used here to treat inverses.This leads to conditions for equal distributions and their implications. Extending (a) to the case of non-Hermitian matrices can be some-what diﬃcult, i.e., ﬁnding conditions on f (λ) to ensure that Tn (f ) isinvertible. Parts (a)-(d) can be straightforwardly extended if f (λ) iscontinuous. For a more general discussion of inverses the interestedreader is referred to Widom [33] and the cited references. The resultsof Baxter [1] can also be applied to consider the asymptotic behaviorof inverses in quite general cases.5.2 Products of Toeplitz MatricesWe next combine Theorem 2.1 and Lemma 4.6 to obtain the asymptoticbehavior of products of Toeplitz matrices. The case of only two matricesis considered ﬁrst since it is simpler. A key point is that while theproduct of Toeplitz matrices is not Toeplitz, a sequence of productsof Toeplitz matrices {Tn (f )Tn (g)} is asymptotically equivalent to asequence of Toeplitz matrices {Tn (f g)}.Theorem 5.3. Let Tn (f ) and Tn (g) be deﬁned as in (4.8) where f (λ)and g(λ) are two functions in the Wiener class. Deﬁne Cn (f ) and Cn (g)as in (4.35) and let ρn,k be the eigenvalues of Tn (f )Tn (g)(a) Tn (f )Tn (g) ∼ Cn (f )Cn (g) = Cn (f g). (5.17) Tn (f )Tn (g) ∼ Tn (g)Tn (f ). (5.18) n−1 2π 1 lim n−1 ρs = n,k [f (λ)g(λ)]s dλ s = 1, 2, . . . . (5.19) n→∞ 2π 0 k=0(b) If Tn (f ) and Tn (g) are Hermitian, then for any F (x) continuous on
• 68 Matrix Operations on Toeplitz Matrices[mf mg , Mf Mg ] n−1 2π −1 1 lim n F (ρn,k ) = F (f (λ)g(λ)) dλ. (5.20) n→∞ 2π 0 k=0(c) Tn (f )Tn (g) ∼ Tn (f g). (5.21)(d) Let f1 (λ), ., fm (λ) be in the Wiener class. Then if the Cn (fi ) aredeﬁned as in (4.35) m m m Tn (fi ) ∼ Cn fi ∼ Tn fi . (5.22) i=1 i=1 i=1 m(e) If ρn,k are the eigenvalues of Tn (fi ), then for any positive integer i=1s n−1 m s 2π −1 1 lim n ρs n,k = fi (λ) dλ (5.23) n→∞ 2π 0 k=0 i=1 If the Tn (fi ) are Hermitian, then the ρn,k are asymptotically real,i.e., the imaginary part converges to a distribution at zero, so that n−1 m s 2π 1 s1 lim (Re[ρn,k ]) = fi (λ) dλ. (5.24) n→∞ n 2π 0 k=0 i=1 n−1 1 lim (ℑ[ρn,k ])2 = 0. (5.25) n→∞ n k=0Proof. (a) Equation (5.14) follows from Lemmas 4.5 and 4.6 and The-orems 2.1 and 2.3. Equation (5.16) follows from (5.14). Note that whileToeplitz matrices do not in general commute, asymptotically they do.Equation (5.17) follows from (5.14), Theorem 2.2, and Lemma 4.4.(b) Proof follows from (5.14) and Theorem 2.4. Note that the eigen-values of the product of two Hermitian matrices are real ([18], p. 105).
• 5.2. Products of Toeplitz Matrices 69(c) Applying Lemmas 4.5 and 4.6 and Theorem 2.1 |Tn (f )Tn (g) − Tn (f g)| = |Tn (f )Tn (g) − Cn (f )Cn (g) + Cn (f )Cn (g) − Tn (f g)| ≤ |Tn (f )Tn (g) − Cn (f )Cn (g)| + |Cn (f g) − Tn (f g)| −→ n→∞ 0.(d) Follows from repeated application of (5.14) and part (c).(e) Equation (5.22) follows from (d) and Theorem 2.1. For the Her-mitian case, however, we cannot simply apply Theorem 2.4 since theeigenvalues ρn,k of i Tn (fi ) may not be real. We can show, however,that they are asymptotically real in the sense that the imaginary partvanishes in the limit. Let ρn,k = αn,k + iβn,k where αn,k and βn,k arereal. Then from Theorem 2.2 we have for any positive integer s n−1 n−1 lim n−1 (αn,k + iβn,k )s = lim n−1 s ψn,k n→∞ n→∞ k=0 k=0 m s 2π 1 = fi (λ) dλ, (5.26) 2π 0 i=1 mwhere ψn,k are the eigenvalues of Cn fi . From (2.17) i=1 n−1 n−1 m 2 −1 2 −1 n |ρn,k | = n α2 n,k + 2 βn,k ≤ Tn (fi ) . k=0 k=0 i=iFrom (4.57), Theorem 2.1 and Lemma 4.4 m 2 m 2 lim Tn (fi ) = lim Cn fi n→∞ n→∞ i=1 i=1 m 2 2π −1 = (2π) fi (λ) dλ. (5.27) 0 i=1
• 70 Matrix Operations on Toeplitz MatricesSubtracting (5.26) for s = 2 from (5.27) yields n−1 1 2 lim βn,k ≤ 0. n→∞ n k=1Thus the distribution of the imaginary parts tends to the origin andhence s n−1 2π m 1 s 1 lim αn,k = fi (λ) dλ. n→∞ n 2π 0 k=0 i=1 2 Parts (d) and (e) are here proved as in Grenander and Szeg¨ ([16], opp. 105-106. We have developed theorems on the asymptotic behavior of eigenval-ues, inverses, and products of Toeplitz matrices. The basic method hasbeen to ﬁnd an asymptotically equivalent circulant matrix whose spe-cial simple structure could be directly related to the Toeplitz matricesusing the results for asymptotically equivalent sequences of matrices.We began with the banded case since the appropriate circulant matrixis there obvious and yields certain desirable properties that suggest thecorresponding circulant matrix in the inﬁnite case. We have limited ourconsideration of the inﬁnite order case functions f (λ) or Toeplitz ma-trices in the Wiener class and hence to absolutely summable coeﬃcientsfor simplicity. The more general case of square summable tk is treatedin Chapter 7 of [16] and requires signiﬁcantly more mathematical care,but can be interpreted as an extension of the approach taken here. We did not treat sums of Toeplitz matrices as no additional con-sideration is needed: a sum of Toeplitz matrices of equal size is also aToeplitz matrix, so the results immediately apply. We also did not con-sider the asymptotic behavior of eigenvectors for the simple reason thatthere do not exist results along the lines that intuition suggests, thatis, that show that in some sense the eigenvectors for circulant matricesalso work for Toeplitz matrices.5.3 Toeplitz DeterminantsWe close the consideration of matrix operations on Toeplitz matrices byreturning to a problem mentioned in the introduction and formalize the
• 5.3. Toeplitz Determinants 71behavior of limits of Toeplitz determinants. Suppose now that Tn (f ) is asequence of Hermitian Toeplitz matrices such that that f (λ) ≥ mf > 0.Let Cn (f ) denote the sequence of circulant matrices constructed fromf as in (4.32). Then from (4.34) the eigenvalues of Cn (f ) are f (2πm/n) n−1for m = 0, 1, . . . , n − 1 and hence det(Cn (f )) = m=0 f (2πm/n). Thisin turn implies that n−1 1 1 1 m ln (det(Cn (f ))) = ln detCn (f ) = n ln f (2π ). n n m=0 nThese sums are the Riemann approximations to the limiting integral,whence 1 1 lim ln (det(Cn (f ))) n = ln f (2πλ) dλ. n→∞ 0 Exponentiating, using the continuity of the logarithm for strictlypositive arguments, and changing the variables of integration yields 2π 1 1 lim (det(Cn (f ))) n = exp ln f (λ) dλ. n→∞ 2π 0This integral, the asymptotic equivalence of Cn (f ) and Tn (f )(Lemma 4.6), and Corollary 2.4 together yield the following result ([16],p. 65).Theorem 5.4. Let Tn (f ) be a sequence of Hermitian Toeplitz matricesin the Wiener class such that ln f (λ) is Riemann integrable and f (λ) ≥mf > 0. Then 2π 1 1 lim (det(Tn (f ))) n = exp ln f (λ) dλ . (5.28) n→∞ 2π 0
• 6 Applications to Stochastic Time SeriesToeplitz matrices arise quite naturally in the study of discrete timerandom processes. Covariance matrices of weakly stationary processesare Toeplitz and triangular Toeplitz matrices provide a matrix repre-sentation of causal linear time invariant ﬁlters. As is well known andas we shall show, these two types of Toeplitz matrices are intimatelyrelated. We shall take two viewpoints in the ﬁrst section of this chaptersection to show how they are related. In the ﬁrst part we shall con-sider two common linear models of random time series and study theasymptotic behavior of the covariance matrix, its inverse and its eigen-values. The well known equivalence of moving average processes andweakly stationary processes will be pointed out. The lesser known factthat we can deﬁne something like a power spectral density for autore-gressive processes even if they are nonstationary is discussed. In thesecond part of the ﬁrst section we take the opposite tack — we startwith a Toeplitz covariance matrix and consider the asymptotic behav-ior of its triangular factors. This simple result provides some insightinto the asymptotic behavior of system identiﬁcation algorithms andWiener-Hopf factorization. Let {Xk ; k ∈ I} be a discrete time random process. Generally we 73
• 74 Applications to Stochastic Time Seriestake I = Z, the space of all integers, in which case we say that theprocess is two-sided, or I = Z+ , the space of all nonnegative integers,in which case we say that the process is one-sided. We will be interestedin vector representations of the process so we deﬁne the column vector(n−tuple) X n = (X0 , X1 , . . . , Xn−1 )′ , that is, X n is an n-dimensionalcolumn vector. The mean vector is deﬁned by mn = E(X n ), which weusually assume is zero for convenience. The n × n covariance matrixRn = {rj,k } is deﬁned by Rn = E[(X n − mn )(X n − mn )∗ ]. (6.1)Covariance matrices are Hermitian since Rn = E[(X n − mn )(X n − mn )∗ ]∗ = E[(X n − mn )(X n − mn )∗ ]. (6.2) ∗Setting m = 0 yields the This is the autocorrelation matrix. Subscriptswill be dropped when they are clear from context. If the matrix Rn isToeplitz for all n, say Rn = Tn (f ), then rk,j = rk−j and the process issaid to be weakly stationary. In this case f (λ) = ∞ k=−∞ rk e ikλ is thepower spectral density of the process. If the matrix Rn is not Toeplitzbut is asymptotically Toeplitz, i.e., Rn ∼ Tn (f ), then we say thatthe process is asymptotically weakly stationary and f (λ) is the powerspectral density. The latter situation arises, for example, if an otherwisestationary process is initialized with Xk = 0, k ≤ 0. This will cause atransient and hence the process is strictly speaking nonstationary. Thetransient dies out, however, and the statistics of the process approachthose of a weakly stationary process as n grows. We now proceed to investigate the behavior of two common linearmodels for random processes, both of which model a complicated pro-cess as the result of passing a simple process through a linear ﬁlter. Forsimplicity we will assume the process means are zero.6.1 Moving Average ProcessesBy a linear model of a random process we mean a model wherein wepass a zero mean, independent identically distributed (iid) sequence ofrandom variables Wk with variance σ 2 through a linear time invariantdiscrete time ﬁltered to obtain the desired process. The process Wk is
• 6.1. Moving Average Processes 75discrete time “white” noise. The most common such model is called amoving average process and is deﬁned by the diﬀerence equation n n k=0 bk Wn−k = k=0 bn−k Wk n = 0, 1, . . . Un = . (6.3) 0 n<0We assume that b0 = 1 with no loss of generality since otherwise wecan incorporate b0 into σ 2 . Note that (6.3) is a discrete time convolu-tion, i.e., Un is the output of a ﬁlter with “impulse response” (actuallyKronecker δ response) bk and input Wk . We could be more general byallowing the ﬁlter bk to be noncausal and hence act on future Wk ’s.We could also allow the Wk ’s and Uk ’s to extend into the inﬁnite pastrather than being initialized. This would lead to replacing of (6.3) by ∞ ∞ Un = bk Wn−k = bn−k Wk . (6.4) k=−∞ k=−∞We will restrict ourselves to causal ﬁlters for simplicity and keep theinitial conditions since we are interested in limiting behavior. In addi-tion, since stationary distributions may not exist for some models itwould be diﬃcult to handle them unless we start at some ﬁxed time.For these reasons we take (6.3) as the deﬁnition of a moving average. Since we will be studying the statistical behavior of Un as n getsarbitrarily large, some assumption must be placed on the sequence bkto ensure that (6.3) converges in the mean-squared sense. The weakestpossible assumption that will guarantee convergence of (6.3) is that ∞ |bk |2 < ∞. (6.5) k=0In keeping with the previous sections, however, we will make thestronger assumption ∞ |bk | < ∞. (6.6) k=0As previously this will result in simpler mathematics. Equation (6.3) can be rewritten as a matrix equation by deﬁning
• 76 Applications to Stochastic Time Seriesthe lower triangular Toeplitz matrix   1 0  b 1   1   b2 b1    Bn =  .  . .. ..  (6.7)  . b2 . .       bn−1 . . . b2 b1 1so that U n = Bn W n . (6.8)If the ﬁlter bn were not causal, then Bn would not be triangular. If inaddition (6.4) held, i.e., we looked at the entire process at each timeinstant, then (6.8) would require inﬁnite vectors and matrices as inGrenander and Rosenblatt [15]. Since the covariance matrix of Wk issimply σ 2 In , where In is the n × n identity matrix, we have for thecovariance of Un : (n) RU = EU n (U n )∗ = EBn W n (W n )∗ Bn ∗ = σ 2 Bn Bn ∗   min(k,j) = σ 2 bℓ−k b∗  ℓ−j ℓ=0 (n)The matrix RU = [rk,j ] is not Toeplitz. For example, the upper leftentry is 1 and the second diagonal entry is 1 + b2 . However, as we next 1 (n)show, the sequence RU becomes asymptotically Toeplitz as n → ∞.If we deﬁne ∞ b(λ) = bk eikλ (6.9) k=0then Bn = Tn (b) (6.10)so that (n) RU = σ 2 Tn (b)Tn (b)∗ . (6.11)
• 6.2. Autoregressive Processes 77 (n)Observe that RU is Hermitian, as all covariance matrices must be.We can now apply the results of the previous sections to obtain thefollowing theorem.Theorem 6.1. Let Un be a moving average process with covariancematrix RUn (n) given by (6.9)–(6.11). Let ρn,k be the eigenvalues of (n)RU . Then (n) RU ∼ σ 2 Tn (|b|2 ) = Tn (σ 2 |b|2 ) (6.12)so that Un is asymptotically stationary. If m = ess inf σ 2 |b(λ)|2 andM = ess sup σ 2 |b(λ)|2 and F (x) is any continuous function on [m, M ],then n−1 2π 1 1 lim F (ρn,k ) = F (σ 2 |b(λ)|2 ) dλ. (6.13) n→∞ n 2π 0 k=0If σ 2 |b(λ)|2 ≥ m > 0, then (n) −1 RU ∼ σ −2 Tn (1/|b|2 ). (6.14) (n)Proof. Since RU is Hermitian, the results follow from Theorems 4.2and 5.2 and (2.3). 2 If the process Un had been initiated with its stationary distributionthen we would have had exactly (n) RU = σ 2 Tn (|b|2 ). (n) −1More knowledge of the inverse RU can be gained from Theorem 5.2,e.g., circulant approximations. Note that the spectral density of themoving average process is σ 2 |b(λ)|2 and that sums of functions of eigen-values tend to an integral of a function of the spectral density. In eﬀectthe spectral density determines the asymptotic density function for the (n)eigenvalues of RU and σ 2 Tn (|b|2 ).6.2 Autoregressive ProcessesLet Wk be as previously deﬁned, then an autoregressive process Xn isdeﬁned by n − k=1 ak Xn−k + Wn n = 0, 1, . . . Xn = (6.15) 0 n < 0.
• 78 Applications to Stochastic Time SeriesAutoregressive process include nonstationary processes such as theWiener process. Equation (6.15) can be rewritten as a vector equationby deﬁning the lower triangular matrix.   1  a 1 0   1  a1 1     An =  .. ..  (6.16)   . .       an−1 a1 1so that An X n = W n .Since (n) (n) RW = An RX A∗ n (6.17)and det An = 1 = 0, An is nonsingular. Hence (n) RX = σ 2 A−1 An n −1∗ (6.18)or (n) (RX )−1 = σ −2 A∗ An . n (6.19) (n)Equivalently, if (RX )−1 = {tk,j } then min(k,j) tk,j = a∗ am−j . m−k m=0Unlike the moving average process, we have that the inverse covariancematrix is the product of Toeplitz triangular matrices. Deﬁning ∞ a(λ) = ak eikλ (6.20) k=0we have that (n) (RX )−1 = σ −2 Tn (a)∗ Tn (a). (6.21) (n)Observe that (RX )−1 is Hermitian.
• 6.2. Autoregressive Processes 79Theorem 6.2. Let Xn be an autoregressive process with absolutely (n)summable {ak } and covariance matrix RX with eigenvalues ρn,k . Then (n) (RX )−1 ∼ σ −2 Tn (|a|2 ). (6.22)If m = ess inf σ −2 |a(λ)|2 and M = ess sup σ −2 |a(λ)|2 , then for anyfunction F (x) on [m, M ] we have n−1 2π 1 1 lim F (1/ρn,k ) = F (σ 2 |a(λ)|2 ) dλ, (6.23) n→∞ n 2π 0 k=0 (n)where 1/ρn,k are the eigenvalues of (RX )−1 . If |a(λ)|2 ≥ m > 0, then (n) RX ∼ σ 2 Tn (1/|a|2 ) (6.24)so that the process is asymptotically stationary.Proof. Theorem 5.3. 2 Note that if |a(λ)| 2 > 0, then 1/|a(λ)|2 is the spectral density of X . n (n)If |a(λ)|2 has a zero, then RX may not be even asymptotically Toeplitzand hence Xn may not be asymptotically stationary (since 1/|a(λ)|2may not be integrable) so that strictly speaking xk will not have aspectral density. It is often convenient, however, to deﬁne σ 2 /|a(λ)|2 asthe spectral density and it often is useful for studying the eigenvalue (n)distribution of Rn . We can relate σ 2 /|a(λ)|2 to the eigenvalues of RXeven in this case by using Theorem 5.2 (d).Corollary 6.1. Given the assumptions of the theorem, then for anyﬁnite θ and any function F (x) continuous on [1/m, θ] n−1 2π 1 1 lim F (min(ρn,k , θ)) = F (min(1/|a(γ)|2 , θ)) dλ. (6.25) n→∞ n 2π 0 k=0Proof. Theorems 6.2 and 5.1. 2 If we consider two models of a random process to be asymptoticallyequivalent if their covariances are asymptotically equivalent, then fromTheorems 6.1 and 6.2 we have the following corollary.
• 80 Applications to Stochastic Time SeriesCorollary 6.2. Given the assumptions of Theorems 6.1 and 6.2, con-sider the moving average process deﬁned by U n = Tn (b)W nand the autoregressive process deﬁned by Tn (a)X n = W n .Then the processes Un and Xn are asymptotically equivalent if a(λ) = 1/b(λ).Proof. Follows from Theorems 5.2 and 5.3 and (n) RX = σ 2 Tn (a)−1 Tn (a)∗ −1 ∼ σ 2 Tn (1/a)Tn (1/a)∗ ∼ σ 2 Tn (1/a)∗ Tn (1/a). (6.26)Comparison of (6.26) with (6.11) completes the proof. 2 The methods above can also easily be applied to study the mixedautoregressive-moving average linear models [33].6.3 FactorizationConsider the problem of the asymptotic behavior of triangular factorsof a sequence of Hermitian covariance matrices Tn (f ) in the Wienerclass. It is well known that any such matrix can be factored into theproduct of a lower triangular matrix and its conjugate transpose ([15],p. 37), in particular ∗ Tn (f ) = {tk,j } = Bn Bn , (6.27)where Bn is a lower triangular matrix with entries (n) bk,j = {(det Tk ) det(Tk−1 )}−1/2 γ(j, k), (6.28)where γ(j, k) is the determinant of the matrix Tk with the right-handcolumn replaced by (tj,0 , tj,1 , . . . , tj,k−1 )′ . Note in particular that thediagonal elements are given by (n) bk,k = {(det Tk )/(det Tk−1 )}1/2 . (6.29)
• 6.3. Factorization 81Equation (6.28) is the result of a Gaussian elimination or a Gram-Schmidt procedure. The factorization of Tn allows the construction of alinear model of a random process and is useful in system identiﬁcationand other recursive procedures. Our question is how Bn behaves forlarge n; speciﬁcally is Bn asymptotically Toeplitz? Suppose that f (λ) has the form f (λ) = σ 2 |b(λ)|2 (6.30) b∗ (λ) = b(−λ) ∞ b(λ) = bk eikλ k=0 b0 = 1.The decomposition of a nonnegative function into a product with thisform is known as a Wiener-Hopf factorization . For a current surveysee the discussion and references in Kailath et al. [17] We have alreadyconstructed functions of this form when considering moving averageand autoregressive models. It is a classic result that a necessary andsuﬃcient condition for f to have such a factorization is that ln f havea ﬁnite integral. From (6.27) and Theorem 5.2 we have Bn Bn = Tn (f ) ∼ Tn (σb)Tn (σb)∗ . ∗ (6.31)We wish to show that (6.31) implies that Bn ∼ Tn (σb). (6.32)Proof. Since det Tn (σb) = σ n = 0, Tn (σb) is invertible. Likewise, sincedet Bn = [det Tn (f )]1/2 we have from Theorem 5.2 (a) that det Tn (f ) =0 so that Bn is invertible. Thus from Theorem 2.1 (e) and (6.31) wehave Tn Bn = [Bn Tn ]−1 ∼ Tn Bn = [Bn Tn ]∗ . −1 −1 ∗ ∗−1 −1 (6.33)Since Bn and Tn are both lower triangular matrices, so is Bn and−1 −1hence Bn Tn and [Bn Tn ]−1 . Thus (6.33) states that a lower triangularmatrix is asymptotically equivalent to an upper triangular matrix. This
• 82 Applications to Stochastic Time Seriesis only possible if both matrices are asymptotically equivalent to a (n)diagonal matrix, say Gn = {gk,k δk,j }. Furthermore from (6.33) we haveGn ∼ G∗−1 n (n) |gk,k |2 δk,j ∼ In . (6.34)Since Tn (σb) is lower triangular with main diagonal element σ, Tn (σb)−1is lower triangular with all its main diagonal elements equal to 1/σ even (n) (n)though the matrix Tn (σb)−1 is not Toeplitz. Thus gk,k = bk,k /σ. SinceTn (f ) is Hermitian, bk,k is real so that taking the trace in (6.34) yields n−1 −2 1 (n) 2 lim σ bk,k = 1. (6.35) n→∞ n k=0 From (6.29) and Corollary 2.4, and the fact that Tn (σb) is triangularwe have that n−1 1 (n)lim σ −1 bk,k = σ −1 lim {(det Tn (f ))/(det Tn−1 (f ))}1/2n→∞ n n→∞ k=0 = σ −1 lim {det Tn (f )}1/2n σ −1 lim {det Tn (σb)}1/n n→∞ n→∞ = σ −1 σ = 1. (6.36)Combining (6.35) and (6.36) yields −1 lim |Bn Tn − In | = 0. (6.37) n→∞Applying Theorem 2.1 yields (6.32). 2 Since the only real requirements for the proof were the existence ofthe Wiener-Hopf factorization and the limiting behavior of the deter-minant, this result could easily be extended to the more general casethat ln f (λ) is integrable. The theorem can also be derived as a specialcase of more general results of Baxter [1] and is similar to a result ofRissanen and Barbosa [21].
• AcknowledgementsThe author would like to thank his brother, Augustine Heard Gray,Jr., for his assistance long ago in ﬁnding the eigenvalues of the in-verse covariance matrices of discrete time Wiener processes, his ﬁrst en-counter with Toeplitz and asymptotically Toeplitz matrices. He wouldlike to thank Adriano Garsia and Tom Pitcher for helping him strugglethrough Grenander and Szeg¨’s book during summer lunches in 1967 owhen the author was a summer employee at JPL during his graduatestudent days at USC. This manuscript ﬁrst appeared as a technical re-port in 1971 as an expanded version of the tutorial paper [12] and wasrevised in 1975. After laying dormant for many years, it was revisedand converted to L TEXand posted on the World Wide Web. That re- Asulted in signiﬁcant feedback, corrections, and suggestions and in manyrevisions through the years. Particular thanks go to Ronald M. Aartsof the Philips Research Labs for correcting many typos and errors inthe 1993 revision, Liu Mingyu in pointing out errors corrected in the1998 revision, Paolo Tilli of the Scuola Normale Superiore of Pisa forpointing out an incorrect corollary and providing the correction, andto David Neuhoﬀ of the University of Michigan for pointing out sev-eral typographical errors and some confusing notation. For corrections, 83
• 84 Acknowledgementscomments, and improvements to the 2001 revision thanks are due toWilliam Trench, John Dattorro, and Young-Han Kim. In particular,Professor Trench brought the Wielandt-Hoﬀman theorem and its use toprove strengthened results to my attention. Section 2.4 largely followshis suggestions, although I take the blame for any introduced errors.For the 2002 revision, particular thanks to Cynthia Pozun of ENSTfor several corrections. For the 2005–2006 revisions, special thanks toJean-Fran¸ois Chamberland-Tremblay, Lee Patton, Sergio Verdu and ctwo very preceptive and helpful anonymous reviewers. Finally, the au-thor would like to thank the National Science Foundation for the sup-port of the author’s research involving Toeplitz matrices which led tothe original paper and report.
• References [1] G. Baxter, “A Norm Inequality for a ‘Finite-Section’ Wiener-Hopf Equation,” Illinois J. Math., 1962, pp. 97–103. [2] G. Baxter, “An Asymptotic Result for the Finite Predictor,” Math. Scand., 10, pp. 137–144, 1962. [3] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compres- sion, Prentice Hall, Englewood Cliﬀs, New Jersey, 1971. [4] A. B¨ttcher and S.M. Grudsky, Toeplitz Matrices, Asymptotic Linear Algebra, o and Functional Analysis, Birkh¨user, 2000. a [5] A. B¨ttcher and B. Silbermann, Introduction to Large Truncated Toeplitz Ma- o trices, Springer, New York, 1999. [6] W. Cheney, Introduction to Approximation theory, McGraw-Hill, 1966. [7] T. A. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991. [8] P. J. Davis, Circulant Matrices, Wiley-Interscience, NY, 1979. [9] D. Fasino and P. Tilli, “Spectral clustering properties of block multilevel Hankel matrices, Linear Algebra and its Applications, Vol. 306, pp. 155–163, 2000.[10] F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Co., NY 1960.[11] R.M. Gray, “Information Rates of Autoregressive Processes,” IEEE Trans. on Info. Theory, IT-16, No. 4, July 1970, pp. 412–421.[12] R. M. Gray, “On the asymptotic eigenvalue distribution of Toeplitz matrices,” IEEE Transactions on Information Theory, Vol. 18, November 1972, pp. 725– 730.[13] R.M. Gray, “On Unbounded Toeplitz Matrices and Nonstationary Time Series with an Application to Information Theory,” Information and Control, 24, pp. 181–196, 1974. 85
• 86 References[14] R.M. Gray and L.D. Davisson, An Introduction to Statistical Signal Processing, Cambridge University Press, London, 2005.[15] U. Grenander and M. Rosenblatt, Statistical Analysis of Stationary Time Se- ries, Wiley and Sons, NY, 1966, Chapter 1.[16] U. Grenander and G. Szeg¨, Toeplitz Forms and Their Applications, University o of Calif. Press, Berkeley and Los Angeles, 1958.[17] T. Kailath, A. Sayed, and B. Hassibi, Linear Estimation, Prentice Hall, New Jersey, 2000.[18] P. Lancaster, Theory of Matrices, Academic Press, NY, 1969.[19] J. Pearl, “On Coding and Filtering Stationary Signals by Discrete Fourier Transform,” IEEE Trans. on Info. Theory, IT-19, pp. 229–232, 1973.[20] C.L. Rino, “The Inversion of Covariance Matrices by Finite Fourier Trans- forms,” IEEE Trans. on Info. Theory, IT-16, No. 2, March 1970, pp. 230–232.[21] J. Rissanen and L. Barbosa, “Properties of Inﬁnite Covariance Matrices and Stability of Optimum Predictors,” Information Sciences, 1, 1969, pp. 221–236.[22] W. Rudin, Principles of Mathematical Analysis, McGraw-Hill, NY, 1964.[23] W. F. Trench, “Asymptotic distribution of the even and odd spectra of real symmetric Toeplitz matrices,” Linear Algebra Appl., Vol. 302-303, pp. 155–162, 1999.[24] W. F. Trench, “Absolute equal distribution of the spectra of Hermitian matri- ces,” Lin. Alg. Appl., 366 (2003), 417–431.[25] W. F. Trench, “Absolute equal distribution of families of ﬁnite sets,” Lin. Alg. Appl. 367 (2003), 131–146.[26] W. F. Trench, “A note on asymptotic zero distribution of orthogonal polyno- mials,” Lin. Alg. Appl. 375 (2003) 275–281[27] W. F. Trench, “Simpliﬁcation and strengthening of Weyl’s deﬁnition of asymp- totic equal distribution of two families of ﬁnite sets,” Cubo A Mathematical Journal Vol. 06 N 3 (2004), 47–54.[28] W. F. Trench, “Absolute equal distribution of the eigenvalues of discrete Sturm– Liouville problems,” J. Math. Anal. Appl., Volume 321, Issue 1 , 1 September 2006, Pages 299–307.[29] B.S. Tsybakov, “Transmission capacity of memoryless Gaussian vector chan- nels,” (in Russian),Probl. Peredach. Inform., Vol 1, pp. 26–40, 1965.[30] B.S. Tsybakov, “On the transmission capacity of a discrete-time Gaussian chan- nel with ﬁlter,” (in Russian),Probl. Peredach. Inform., Vol 6, pp. 78–82, 1970.[31] E.E. Tyrtyshnikov, “Inﬂuence of matrix operations on the distribution of eigen- values and singular values of Toeplitz matrices,” Linear Algebra and its Appli- cations, Vol. 207, pp. 225–249, 1994.[32] E.E. Tyrtyshnikov, “A unifying approach to some old and new theorems on distribution and clustering,” Linear Algebra and its Applications, Vol. 232, pp. 1–43, 1996.[33] H. Widom, “Toeplitz Matrices,” in Studies in Real and Complex Analysis, edited by I.I. Hirschmann, Jr., MAA Studies in Mathematics, Prentice-Hall, Englewood Cliﬀs, NJ, 1965.[34] A.J. Hoﬀman and H. W. Wielandt, “The variation of the spectrum of a normal matrix,” Duke Math. J., Vol. 20, pp. 37–39, 1953.
• References 87[35] James H. Wilkinson, “Elementary proof of the Wielandt-Hoﬀman theorem and of its generalization,” Stanford University, Department of Computer Science Report Number CS-TR-70-150, January 1970 .
• Indexabsolutely summable, 32, 38, 48 20absolutely summable Toeplitz characteristic equation, 5 matrices, 41 circulant matrix, 2, 25analytic function, 16 conjugate transpose, 6, 70asymptotic equivalence, 38 continuous, 17, 21, 22, 33, 39,asymptotically absolutely 41, 48, 49, 52, 54–57, equally distributed, 21 67, 69asymptotically equally dis- continuous complex function, tributed, 17, 56 16asymptotically equivalent ma- convergence trices, 11 uniform, 32asymptotically weakly station- Courant-Fischer theorem, 6 ary, 64 covariance matrix, 1, 63, 64autocorrelation matrix, 64 cyclic matrix, 2autoregressive process, 68 cyclic shift, 25bounded matrix, 10 determinant, 17, 31, 60, 71bounded Toeplitz matrices, 31 DFT, 28 diagonal, 8Cauchy-Schwartz inequality, 13, diﬀerential entropy, 73 88
• INDEX 89diﬀerential entropy rate, 73 circulant, 2, 25discrete time, 63 covariance, 1 cyclic, 2eigenvalue, 5, 26, 31 Hermitian, 6eigenvalue distribution theo- normal, 6 rem, 40, 48 Toeplitz, 26, 31eigenvector, 5, 26 matrix, Toeplitz, 1Euclidean norm, 9 mean, 64 metric, 8factorization, 70 moments, 15ﬁlter, 1 moving average, 65 linear time invariant, 63ﬁnite order, 31 noncausal, 65ﬁnite order Toeplitz matrix, 36 nonnegative deﬁnite, 6Fourier series, 32 nonsingular, 53 truncated, 44 norm, 8Fourier transform axioms, 10 discrete, 28 Euclidean, 9Frobenius norm, 9 Frobenius, 9function, analystic, 16 Hilbert-Schmidt, 8, 9 operator, 8Gaussian process, 73 strong, 8, 9Hermitian, 6 weak, 8, 9Hilbert-Schmidt norm, 8, 9 normal, 6, 8identity matrix, 19 one-sided, 64impulse respone, 65 operator norm, 8information theory, 73 polynomials, 16inverse, 29, 31, 53 positive deﬁnite, 6Kronecker delta, 27 power specral density, 64Kronecker delta response, 65 power spectral density, 63, 64, 73linear diﬀerence equation, 26 probability mass function, 19 product, 29, 31matrix bounded, 10 random process, 63
• 90 INDEX discrete time, 64Rayleigh quotient, 6Riemann integrable, 48, 53, 61Shannon information theory, 73Shur’s theorem, 8spectrum, 53square summable, 31Stone-Weierstrass approxima- tion theorem, 16Stone-Weierstrass theorem, 40strictly positive deﬁnite, 6sum, 29Taylor series, 16time series, 63Toeplitz determinant, 60Toeplitz matrix, 1, 26, 31Toeplitz matrix, ﬁnite order, 31trace, 8transpose, 26triangle inequality, 10triangular, 63, 66, 68, 70, 72two-sided, 64uniform convergence, 32uniformaly bounded, 38unitary, 6upper triangular, 6weak norm, 9weakly stationary, 64 asymptotically, 64white noise, 65Wielandt-Hoﬀman theorem, 18, 20Wiener-Hopf factorization, 63