1.
1 Gaussian Process Regressors for Multiuser Detection in DS-CDMA systems Juan Jos´ Murillo-Fuentes Member, IEEE, and Fernando P´ rez-Cruz Senior Member, IEEE. e e Abstract— In this paper we present Gaussian processes for Linear ﬁlter receivers, based on zero-forcing (decorrelator)Regression (GPR) as a novel detector for CDMA digital com- or minimum mean-squared error (MMSE) criteria [7], [8], [9],munications. Particularly, we propose GPR for constructing [1], are widely used to avoid the computational complexity ofanalytical nonlinear multiuser detectors in CDMA communi-cation systems. GPR can easily compute the parameters that nonlinear solutions, although they are suboptimal and clearlydescribe its nonlinearities by maximum likelihood. Thereby, no underperform in some fairly standard scenarios. Regularizationcross-validation is needed, as it is typically used in nonlinear [10] has been extensively used to improve the detectionestimation procedures. The GPR solution is analytical, given capabilities of linear MUD, when the empirical correlationits parameters, and it does not need to solve an optimization matrix is ill-conditioned because short sequences have beenproblem for building the nonlinear estimator. These propertiesprovide fast and accurate learning, two major issues in digital transmitted. Blind algorithms [11], [12], [13] make use ofcommunications. The GPR with a linear decision function can regularizations tools as ridge regression [14] and in [15] thebe understood as a regularized MMSE detector, in which the authors review parametric regularized MMSE detectors forregularization parameter is optimally set. We also show the CDMA and propose several extensions. However, these papersGPR receiver to be a straightforward nonlinear extension of the do not detail how the regularizer shape and its parameterslinear minimum mean square error (MMSE) criterion, widelyused in the design of these receivers. We argue the beneﬁts of are selected and standard methods for setting them, as cross-this new approach in short codes CDMA systems where little validation [16], typically need long training sequences. In [12]information on the users’ codes, users’ amplitudes or the channel the regularization is blindly estimated using a neural network,is available. The paper includes some experiments to show that although the channel is memoryless.GPR outperforms linear (MMSE) and nonlinear (SVM) state-of- Nonlinear detectors enhance the linear detectors by approxi-the-art solutions. mating the optimal solution at a lower complexity cost. There are two major approaches in machine learning that can be I. I NTRODUCTION followed to obtain nonlinear detectors. On the one hand, if full information on the system is available, we may estimate In direct-sequence code-division multiple-access (DS- the symbols transmitted by each user and then cancel theCDMA) systems [1], [2], [3] the different users transmit over interference (IC) over the user of interest (UoI). These methodsthe same frequency band and during the same time slot using usually exhibit near-optimum performance at the expense ofdifferent codes, known as spreading codes. Multiuser detectors perfect information on the number of users, users’ signatures,aim to recover the information of one or more users. There are channel impulse response (CIR), . . . Furthermore, most oftwo main aspects a multi-user detection (MUD) receiver for them are based on equal transmitted power for each user, asDS-CDMA must face: near-far [1] and multipath [4] problems. they assume perfect power control. The detection process inThe former is grounded in the nonorthogonality of the users’ the uplink in wireless cellular CDMA systems can be roughlycodes and the high signal-power deviation from user to user. matched to this scenario and, in statistical inference, theseThe latter causes inter-symbolic interference (ISI). The optimal learning problems are commonly referred to as generativesolution to MUD in CDMA communication systems is known modeling. Generative modeling concentrates on describing theto be nonlinear [5], [6]. This nonlinearity is stronger for dependencies between all the different variables (sources ofshort spreading codes and less pronounced for longer codes. information), so as to understand the correlations betweenFurthermore, the ﬁnite-response multipath-mitigation equal- the variables, gain knowledge about causal interactions, andizer and the radiofrequency ampliﬁer at the receiver enhance answer any query using the constructed model. Graphical mod-the nonlinear nature of the optimal multi-user detector. The els [17] and density estimation [14] belong in this category.complexity of the optimal nonlinear MUD receiver [5] grows And good examples of multiuser detectors using generativeexponentially with the number of users (joint Viterbi decoding modeling as IC can be found in [18]-[22].[1]), which limits its practical applicability. On the other hand, we may have little knowledge of the This work was partially funded by Spanish government (Ministerio de Ed- whole communication system. We may not know the numberucaci´ n y Ciencia TEC2006-13514-C02-01,02/TCM and Consolider-Ingenio o of users, nor the other users’ signatures, nor the CIR; and2010 CSD2008-00010) and the European Union (FEDER). Fernando P´ rez- e the users are received with different signal strength. In thisCruz is supported by Marie Curie Fellowship 040883-AI-COM. F. P´ rez-Cruz is at the Electrical Engineering Department in Princeton e scenario, typically encountered in the downlink of wirelessUniversity. Princeton, NJ. He is also an Associate Professor at Universidad cellular CDMA systems, the objective is to detect the incomingCarlos III de Madrid. E-mail: fp@princeton.edu symbols for the UoI. These estimates might not be reliable to J.J. Murillo-Fuentes is with the Dept. Teor´a de la Se˜ al y Comunica- ı nciones, Escuela Superior de Ingenieros, Universidad de Sevilla, Paseo de los apply IC techniques or we might not have enough knowledgeDescubrimientos s/n, 41092 Sevilla, Spain. E-mail: murillo@us.es to estimate and suppress the other users’ symbols. In statistical
2.
2inference, we resort to discriminative modeling to provide the or parameters, which can be suboptimal for each individualmost accurate answer for a given supervised learning problem. instantiation of our problem. These characteristics translateDiscriminative modeling focuses on solving a given task (e.g., to shorter training sequences and improved convergence forregression, classiﬁcation) without trying to model the underly- GPR-based CMDA receivers.ing probability density of the data [23]. By not fully modeling Linear GPR detectors can be presented as a regularizedthe data density this approach is limited in its ability to provide linear MMSE and its regularization parameter can be learnt, asinterpretations and answer arbitrary queries about the data. well, by maximum likelihood. Consequently, we avoid settingHowever, the predictive performance of discriminative models the regularization parameter by hand and rely on the trainingis superior to that of generative models. The applicability and sequence to optimally set it. The linear GPR detector outper-degree of success of either approach depends on the available forms linear MMSE receivers for short training sequences, asinformation at the receiver. In the discriminative group we shown in the experimental section, because its regularizationhave tools such as Gaussian processes [24] and support vector parameter is optimum for every training sequence. We alsomachines [25]. For DS-CDMA, successful approaches have address its computational complexity, to show it is identicalbeen proposed in the literature using nonlinear discriminative to that of the linear MMSE detector, and hence it can be usedmodeling [26]-[30]. In [29] a multilayered perceptron (MLP) in any low complexity decoder.was applied to enhance the performance of linear detectors. Finally, we can understand the nonlinear GPR as a linearHowever, training was time consuming and its convergence GPR in which the data has been previously nonlinearly trans-rate was unpredictable. A MUD receiver was successfully de- formed to a high-dimensional space [24]. Thereby the non-signed using a Gaussian radial basis function network (RBFN) linear GPR is presented as a nonlinear regularized Bayesian[26] that signiﬁcantly improved training times compared to MMSE and its natural extension for building nonlinear re-[29]. Neither of them considered multipath wireless channels. ceivers. The complexity of training a full GPR solution isThere have been several proposals to jointly address MUD and demanding and it can be a limiting factor for its use over fast-channel equalization: RBFN and Volterra series [30], [27]; fading channels. Nevertheless, recent advances in GPR trainingand, more recently, Support Vector Machines (SVMs) [28]. can be applied for designing low complexity receivers.SVMs were developed from well-founded learning theoryresults [23] and they present features such as: a convex The remainder of this paper is organized as follows. Wefunctional and universality [31], which makes them a desirable introduce DS-CDMA systems and the linear MMSE-MUDtool for solving general nonlinear detection problems. All these receiver in Section II. We present the GPR analytical solutionnonlinear methods solve an optimization functional to build a in Section III and describe its likelihood function for thenonlinear MUD receiver, in which either the architecture or parameters. In Section IV we focus on the design of a MUDthe parameters have to be prespeciﬁed. They cannot be learnt receiver for DS-CDMA using GPR and its interpretation asfor each MUD receiver individually, because standard cross- a regularized nonlinear MMSE receiver. We report in Sectionvalidation techniques [32], typically used for this purpose, are V the performance of GPR-MUD compared to state-of-the-artnot feasible in digital communication systems, as described in linear and nonlinear receivers, using a DS-CDMA system inSection III. different scenarios. We conclude with some ﬁnal remarks and Our proposal is motivated by the need of improving syn- proposed further work in Section VI.chronous DS-CDMA receivers for multipath channels whenthe only available information is the spreading code of theUoI and a short training sequence. We present a novel dis-criminative model based on Gaussian processes for regression II. S YSTEM MODEL AND MUD(GPR) [33], [34], [24]. The GPR can be used to design linearand nonlinear detectors and we examine both structures in thispaper. We have shown some preliminary results applying GPR A. The DS-CDMA system modelfor MUD in DS-CDMA systems in [35], [36]. In this paper,we provide a deeper understanding of GPR for CDMA and In this paper we focus on synchronous DS-CDMA [1],consider its properties in detail. We extend our investigations [37], and we assume all users transmit at the same symbol-to include a GPR computational complexity analysis and its rate using a BPSK modulation. Although, the obtained resultsrelation to linear and nonlinear regularized MMSE. We also can be readily generalized to other scenarios, as asynchronousface multipath channels. DS-CDMA, different rates, or modulations. In the discrete The GPR solution for MUD receivers is advantageous in chip-rate-sampled baseband synchronous DS-CDMA modelseveral ways, when compared to other nonlinear machine proposed in [1], we transmit K symbols (one per user) everylearning tools (e.g. MLPs, RBFNs and SVMs). First, the GPR Ts sec., bt = [bt (1), bt (2), ..., bt (K)] . Each user’s symbol,solution is analytical, given its parameters. There is no need bt (j), is ampliﬁed by a different aj , i.e., in the downlink of afor solving a complex optimization problem at this stage. mobile (cellular) communication system larger amplitudes areSecond, Given a training dataset, a likelihood function for assigned to users further away, causing the near-far problem tothe GPR parameters can be stated and, hence, its parameters users closest to the base station. Finally, each user is multipliedcan be optimally set (in maximum likelihood sense). MLPs, by its spreading code sj , which is a sequence of N pseudo-RBFNs or SVMs need to specify beforehand its structure random binary values regarded as chips.
3.
3 The N -chip signal at the receiver end yields: III. G AUSSIAN P ROCESSES FOR R EGRESSION A. The GPR solution SA 0 ... 0 bt Gaussian Processes for regression (GPRs) [33], [34], [24] .. . b . 0 SA . . t−1 is a Bayesian technique for nonlinear regression estimation, rt = H . · +nt .. .. . . which can be extended for classiﬁcation problems [39], [40]. . . . . . 0 bt−M +1 It assumes a zero-mean GP prior over the space of possible 0 . . . 0 SA functions and a Gaussian likelihood model. The posterior can = P · vt + nt , (1) be analytically computed, it is a Gaussian density function, and the predictions given by the model are also Gaussians.where S is an N × K matrix whose columns contains the Given a labelled training data set (D = {xi , yi }L , where i=1spreading codes, A is a K × K diagonal matrix containing the input xi ∈ RD and the output yi ∈ R) and a newthe user’s amplitudes and nt is an N M -dimensional column- input location x∗ , we aim to predict the probability distribu-vector with additive white Gaussian noise (AWGN) of variance tion for its output y∗ , i.e., p(y∗ |x∗ , D). In GPs we assume 2 a generalized linear regressor model for y with Gaussianσn I. 2 We have pre-multiplied the received chips by H to incor- noise: p(y|x, w) = N (y; w φ(x), σν ), where φ(·) deﬁnes aporate the effect of a multipath channel in the CDMA system, nonlinear transformation of the input space, and a zero-mean 2[28]. Throughout the paper, we consider a time-invariant Gaussian prior over w, p(w) = N (w; 0, σw I). The predictedchannel with inter-symbolic interference (ISI) characterized output, p(y∗ |x∗ , D) = N (y∗ ; µy∗ , σy∗ ), yieldsby its discrete channel impulsive response h(z) of length (or µy∗ = φ (x∗ )µw = k C−1 y (3)order) Mc in chip periods. The length of the channel response 2 −1times the chip period is the maximum delay considered in our σy∗ = k(x∗ , x∗ ) + k C k (4)multipath model. The matrix P in (1) summarizes the effect of where µw is the MAP estimate of w, beingthe channel, the spreading codes and the different amplitudes 2for each user, and vt = [bt , bt−1 . . . bt−M +1 ] is a KMs - (C)ij = k(xi , xj ) + σν δij (5)dimensional vector including the transmitted bits. the covariance matrix of the Gaussian process and k = [k(x∗ , x1 ), . . . , k(x∗ , xn )] . The function k(·, ·) is known as the kernel function of the nonlinear transformation φ(·), as itB. Multi-User Detection represents the inner product in the transformed feature space: 2 k(xi , xj ) = σw φ (xi )φ(xj ). (6) At the receiver, we learn the system model to estimate thetransmitted symbols. Usually, a training sequence is trans- The covariance matrix is also referred to as the kernel matrix.mitted to solve this task. In wireless communications short We use both terms as synonyms in the paper, as they aretraining sequences are mandatory [38] to increase the number typically used in the machine learning literature [31], [41],of information bits. The objective for the DS-CDMA MUD [24]. The nontrivial steps needed to obtain (3) and (4) arereceiver is to recover the transmitted bit for a particular user, detailed in [34]. It is important to point out that the term C−1 ythe UoI. We can use the received vector in (1) or the projection in (3) is computed only once using the set of training samplesof this vector onto the spreading codes, if available. Linear and then used to estimate the output for any test input sample.MUDs are useful when the ISI is negligible and the codesare quasi-orthogonal. When the multipath effect and the near- B. Covariance Matrixfar problem are strong, the optimal detector becomes highly If either φ(·) or k(·, ·) are known, we can analyticallynonlinear. The nonlinearity of the detector is signiﬁcantly predict the output of any incoming sample. But for mostmore disruptive for short spreading codes. In these scenarios regression problems the best nonlinear transformation (or itsnonlinear detectors are useful. Nonlinear MUD for DS-CDMA kernel) is unknown. We need to describe a parametric kernelestimate the symbol of the UoI as ˆ∗ (j) = f (r∗ ). If we knew b that can be adjusted for each regression problem. Kernelall the 2KMs possible received noise free states, we could design is one of the most challenging open problems inderive a MUD by studying a Bayes-optimal classiﬁer [28]. machine learning [24], as we need to incorporate our priorThis optimal one-shot detector is given by: knowledge into the kernel and, at the same time, we want the kernel to be ﬂexible to explain previously unknown trends in 2KMs ˜(i) (j) 2 ˆ∗ (j) = sign b r − Pv(i) the data. The use of a parametric kernel function leads us to the b √ exp − , 2πσn 2 2σn problem of ﬁnding the optimal setting of its hyperparameters1 . i=1 For example, if we know the optimal solution to be linear, we (2) 2 could use the linear kernel: k(xi , xj ) = σw xi xj , in whichwhere ˜(i) (j) is the class label {±1} for the ith noise free state b φ(x) = x. In this case, the only unknown hyperparametersPv(i) . This structure resembles that of the Gaussian RBFN 2 2 would be σν and σw in (5)-(6), as we do not need to knowused in [26], and it suggests gaussian kernels in SVM, asdescribed in [28]. But its complexity is exponential in the 1 We refer to the kernel parameters as hyperparameters to distinguish themnumber of users and the length of the channel response. from the parameters of the regression model (the ws).
4.
4these variances a priori. In general, kernel functions are more IV. T HE GPR-MUD DETECTORcomplex than the linear one and they incorporate several A. The GPR-MUDhyperparameters. To set the hyperparameters of the covariancefunction for each speciﬁc problem, we can proceed as we The GPR mean prediction in (3) can be directly used as ado for the parameters w. We ﬁrst deﬁne a prior over these nonlinear multi-user detector,hyperparameters, then compute its posterior using Bayes rule, ˆ∗ (j) = sign(φ (x∗ )µw ) = sign(k C−1 y). b (7) jand ﬁnally integrate them out to obtain predictions. However inthis case, the posterior is non-analytical. Hence the integration This GPR estimate resembles that of a linear detector. It has ahas to be done either by sampling or approximation. Although weight vector, either µw or C−1 y, multiplied by a nonlinearthis approach is well principled [34], it is computational transformation, either φ(x∗ ) or k, of the input to be predicted.intensive and it is not feasible for communications systems. The output of the GPR detector is the prediction of the bitAlternatively, we can use the likelihood function of the hyper- transmitted by the UoI (yi = bi (j)). The input to the GPR areparameters and compute its maximum to obtain its optimal Q consecutives samples from the channel:setting, which is used to describe the kernel for the test xi = [ri , ri−1 , . . . , ri−Q+1 ] , (8)samples [34]. We use this second approach, because it isless computational demanding [24]. Moreover, as the number where ri are the N chips received at time step i in (1). If theof training samples increases the posterior distribution peaks codes of the other users are available, we could ﬁrst projectaround its maximum likelihood estimate and the solutions of the received chips onto them as follows,both approaches should not differ signiﬁcantly. xi = [ri S, ri−1 S, . . . , ri−Q+1 S] . (9) Next, we need to design the most suitable kernel functionC. Discussion for our GPR-MUD receiver, which should be able to capture our prior knowledge and allow extracting unknown interac- Gaussian Processes for regression is a general nonlinear tions. We propose, for the GPR-MUD, the following versatileregression tool that, given the covariance function, provides covariance functionan analytical solution to any regression estimation problem. dMoreover, it does not only give point estimates, but it also (Cθ )ij = k(ri , rj ) = α1 exp −α3 (ri ( ) − rj ( ))2assigns conﬁdence intervals for them. We perform the op- =1timization step to set the hyperparameters of the covari- + α2 ri rj + α0 δij , (10)ance function by maximum likelihood, unlike SVM or othernonlinear machine learning tools, in which the optimization where the αi weights need to be nonnegative for Cθ tois used to set the optimal parameters. In SVM and other be positive-deﬁnite and, to deal with an unconstrained opti-tools the hyperparameters have to be either prespeciﬁed or mization problem, we deﬁne a vector of hyperparameters asestimated by cross-validation [32]. Cross-validation optimizes θ = [log α1 , log α2 , log α3 , log α0 ]. This covariance functionseveral functionals, typically less than 10, for each possible contains three terms. The second term is the linear covariancesetting of the hyperparameters [14]. The number of hyper- function. Therefore, the GPR model contains as a particularparameters that can be tuned is quite limited (at most 2 case the linear regressor (α1 = 0). The third term correspondsor 3), as the computational complexity of cross-validation 2 to σν δij in the deﬁnition of C in (5), which is considered asincreases exponentially with the number of hyperparameters. an extra hyperparameter of the covariance function. The ﬁrstThese remarkable drawbacks limit the application of these term is a radial basis kernel with the same length-scale fornonlinear tools in digital communications receivers, since we each input dimension, since in DS-CDMA all chips affect theface complex nonlinear problems with reduced computational solution in the same manner.resources and short training sequences. By exploiting the GPs The covariance function in (10) is a good kernel for solvingframework, as stated in this paper, we can avoid them to build the GP-MUD, because it contains a linear and a nonlinearprecise nonlinear receivers for DS-CDMA applications. part. The optimal decision surface for MUD is nonlinear. As mentioned earlier, the Gaussian processes framework can However, in many cases, a linear detector is close to optimalbe extended for solving classiﬁcation tasks. Gaussian process as spreading codes are almost orthogonal to each other and tofor classiﬁcation (GPC) solution is non-analytical and we need their delayed replicas. Typically, a minor nonlinear correctionto approximate its posterior distribution to make posterior allows for optimal decisions. In this sense the proposed GPRprobability predictions for the new samples and to train its covariance function is ideal for this problem. The linear parthyperparameters. These approximations are computationally can mimic the best linear decision boundary and the nonlinearintensive and make GPC harder to train than GPR. Moreover, part modiﬁes it, where the linear explanation is not optimal.as detailed in [24] (Chapter 6), in many cases GPR performs Using a radial basis kernel for the nonlinear part is a goodequally well to GPC, if we are only interested in minimizing choice to achieve optimal nonlinear decisions. Because, thethe misclassiﬁcation rate. Therefore, although GPC seems received chips form a constellation of 2KMs clouds withthe natural tool for solving this task, we decided to use Gaussian spread around their centers. But we do not need toGPR because it is less computationally demanding and its have exponentially many of them, as the linear terms explainsmisclassiﬁcation rate is similar to that of GPC. away most of the received symbols.
5.
5B. The GP solution: a nonlinear MMSE good example of a regularized solution is the CMOE detector The MMSE criterion minimizes: in [11], fmmse (·) = argmin E (b − f (r)) 2 (11) wcmoe = (RR /L + αI)−1 R · b/L, (18) f (·) where α is set using an ad-hoc procedure. Others examples ofin which b and r are, respectively, the transmitted and received more involved methods for regularization can be found in [22],symbols and f (·) is the function that estimates b from r. It [15]. A common drawback of these approaches is that they dois widely known [42] that the minimum of (11) for AWGN not clearly indicate how the regularization parameter is set andchannels is given by the conditional mean of b given r: how it affects the solution. GPR provides a simple method, ˆ∗ = fmmse (r∗ ) = E [b∗ |r∗ ] , b (12) based on maximum likelihood, to compute the regularization. GPR trains its regularization parameter as to automaticallywhich is a nonlinear function of the inputs, unless the inputs choosing the best linear detector between the matched ﬁlter 2 2 2 2are Gaussian distributed. (σν /σw → ∞) and the SMMSE (σν /σw = 0), improving the In DS-CDMA, linear MMSE estimators are typically used, performance of the MUD-SMMSE.due to their simplicity and nearly optimal results in many The nonlinear GPR-MUD inherits this property from thestandard scenarios. The linear, f (r) = r w, MMSE detector linear case, as the nonlinear GPR can be understood as a linearcan be expressed as a Wiener ﬁlter [1]: solution in a transformed space. Therefore, if the proposed 2 −1 kernel is ﬂexible to approximate any desired function, the wmmse = argmin E b−r w = (Rrr ) Rrb , (13) GPR predictions are optimal in the MMSE sense. The kernel w we proposed in Section IV-A is universal and it is able towhere Rrr and Rrb are, respectively, the autocorrelation of approximate (12) and produce optimal decisions.the inputs and the cross-correlation between the inputs andoutputs. Under mild conditions, the MMSE linear detector can be C. Computational loadbuilt from the spreading codes of every user and the channel’s GPR uses the ‘kernel trick’ [31], [41], [24] to achieveimpulsive response: nonlinear solutions without explicitly describing its nonlinear transformation to the feature space, which can be inﬁnite wmmse = (PP + σn I)−1 pj , 2 (14) dimensional. The computational complexity of GPRs is limitedwhere pj denotes the jth column of P. If the other users’ by the inversion of the covariance matrix, which is O(L3 ).spreading codes or the channel state information are unknown, This inversion is needed in every step of any gradient-basedwe need to replace Rrr and Rrb by their sampled versions: optimization method to estimate the hyperparameters [24] in the training stage. Then, the result of the training stage is wsmmse = (R · R /L)−1 R · b/L, (15) used to estimate the output for any new received data at awhere the columns of R are the received chips at every symbol computational cost given just by the computation of k in (3).in the training set. We denote this solution as sampled MMSE For large training sequences this computational complexity(SMMSE). can be prohibitively large and we need to approximate the The GPR-MUD estimates the output for a new incoming GPR solution to deal with thousands of samples, as we showsample using (7), which can be simpliﬁed if we use a linear in short. In the linear GPR solutions, if the input dimensionkernel (φ(r) = r) to: is lower than the number of training samples (L > D), the computational complexity reduces to O(D3 ) as we derive next, 2 2 2 −1 wgpr = µw = σw R σw R R + σν I b. (16) which makes the complexity for this algorithm independent of the number of training examples and similar to that of the After some algebraic transformations we can express the linear SMMSE-MUD.linear GPR solution, up to a scaling constant, as 1) The general case: There are several proposals in the 2 −1 literature to reduce the complexity of the GPR training and σν wgpr = RR + 2 I Rb. (17) prediction stages [43]-[47]. A detailed review of all these σw methods can be found in [48]. The main motif of theseSince RR and Rb are, respectively, the empirical estimates approaches is representing the covariance matrix of the GPof the correlations matrices Rrr and Rrb , this expression is with a reduced set of J samples (or pseudo-samples) much 2 2 smaller than L, with the same accuracy as the whole trainingequal to (15) except for the term σν /σw I. This term endowsthe SMMSE solution in (15) with ridge regularization [14]. dataset.Ridge regression is a quite standard tool to avoid overﬁtting The algorithms in [43]-[47] propose different ways ofand improve the performance of the SMMSE detector. The selecting the J samples in the reduced set. Some of them trainregularization term must fade away as the number of training the locations of the pseudo-samples. This allows obtaining asamples increases [10], thereby achieving the optimal solution further reduced set at a larger computational complexity duringfor inﬁnite samples. For short training sequences, the estimated training. Other approaches use heuristics to select the pseudo-correlation matrices represent poorly Rrr and Rrb and the samples from the training set. Also, the pseudo-samples canregularization term should be kept high to avoid overﬁtting. A be choosen incrementally until a desired accuracy is achieved.
6.
6 2) Linear GPR: If we are interested only in linear detectors, SMMSEthe complexity of each iteration of the optimization procedure CMOEcan be signiﬁcantly reduced from O(L3 ) to O(min{L, D}) SVM −1 10 CMTwithout loss of accuracy. We propose to run a very simple GPRalgorithm for computing the linear GPR-MUD. For a linearkernel the covariance function becomes: −2 10 BER Cθ = eθ1 X X + eθ2 I = eθ1 X X + eθ2 −θ1 I , (19)where the columns of X are the sample inputs in (8) and θ1 , θ2 −3are the to be estimated hyperparameters. We can write X X 10in terms of its eigenvalues: X X = UΛU , (20) −4 10where U is an orthonormal matrix, containing the eigenvectors 0 5 10 15 20 25of X X, and Λ is a diagonal matrix with its eigenvalues. This SNR(dB)representation is useful to compute the inverse of Cθ as −1 C−1 = e−θ1 U Λ + eθ2 −θ1 I θ U . (21) Fig. 1 BER ALONG THE SNR IN A CDMA SCENARIO WITH K = 10 USERS AND This eigenvalue-eigenvector representation simpliﬁes the G OLD SEQUENCES WITH N = 31 FOR LINEAR SVM (∗), CMT ( ),gradient with respect to θi , as follows: CMOE ( ), SMMSE (◦), AND LINEAR GPR ( ) MUD FOR L = 64 D D TRAINING SAMPLES . ∂l(θ) 1 λi 1 λi zi e−θ1 2 = + (22) ∂θ1 2 i=1 λi + eθ2 −θ1 2 i=1 (λi + eθ2 −θ1 ) 2 D D ∂l(θ) 1 eθ2 −θ1 1 eθ2 −2θ1 zi 2 SMMSE = θ2 −θ1 + , (23) CMOE ∂θ2 2 λi + e 2 (λi + eθ2 −θ1 ) 2 i=1 i=1 SVM −1 10 CMTwhere z = U y. At most, the number of nonzero eigenvalues GPRis min(L, D). We have used D in the previous equations asin most cases L > D. −2 10 The complexity of the linear SMMSE-MUD is O(D3 ) as BERit needs to invert a D × D matrix. The complexity of trainingthe GPR is linear in D, once we have computed its eigenvalue −3decomposition which is O(D3 ). Each optimization step of 10the GPR is insigniﬁcant with respect to the matrix inversion.Therefore, the complexity of the GPR detector with a linearkernel is of the same order of magnitude as that of the −4 10SMMSE-MUD. 50 100 150 200 250 300 350 Training Samples V. E XPERIMENTAL R ESULTSA. Regularization Fig. 2 We ﬁrst propose the same scenario as in [15] with K = 10 BER ALONG THE NUMBER OF TRAINING DATA IN A CDMA SCENARIOactive users and Gold spreading sequences of length N = 31. WITH K = 10 USERS AND G OLD SEQUENCES WITH N = 31 FOR LINEARThe amplitudes of the interferer were equal to that of the user SVM (∗), CMT ( ), CMOE ( ), SMMSE (◦), AND LINEAR GPR ( )of interest. We include the average for 300 simulated chip- MUD FOR A SNR=15 D B.spaced channels of length Mc = 15 with equally distributedzero-mean random Gaussian for the channel paths. We usedQ = 2 in (8). We compare the BER of the SMMSE detector in (15) to a function of the SNR for L = 64 samples in Fig. 1 and thethe GPR with linear kernel (α1 = 0 in (10)), the SVM with BER as a function of the training samples for an SNR of 15dBlinear kernel and soft margin 0.5, the CMOE in (18) and the in Fig. 2.tappered regularized method (CMT) in [15]. The regularization The linear GPR-MUD receiver clearly outperforms the otherparameters for the CMOE and the CMT where set as described detectors for short training sequences for the SNR range ofin [49], where the covariance matrix was estimated with L = interest, see Fig. 1. The SVM is the second best procedure,372 samples. The training sequences were generated randomly although its performance is signiﬁcantly poorer than that of thefor every channel. The BER was estimated with test inputs GPR detector. The CMOE and CMT use a ﬁxed regularizationdifferent from the training sequences. We report the BER as procedure that precludes them to perform well for all training
7.
7 1.5sequences and SNRs, which is a severe limitation as thesedetectors must perform well in many different scenarios. TheSMMSE convergence is signiﬁcantly slower than that of the 1GPR and, for short training sequences, its BER is orders ofmagnitude above the GPR-MUD detector. 0.5 In Fig. 2, we compare the performance of the differentmethods as the number of training samples increases. Aswe expected, once the number of training samples is long xt(2) 0enough, the SMMSE and GPR detectors tend to coincide. Butfor shorter training sequences the GPR-MUD receiver learns −0.5much faster than the SMMSE one. This is a very importantfeature, as the longer the training sequence needs to be thefewer information bits we can transmit in each burst of data. −1The linear SVM solution follows the GPR although there is aconstant gap between its solution and the GPR; this is due −1.5to the ﬁxed soft margin parameter used for all lengths of −2 −1.5 −1 −0.5 0 0.5 1 1.5 2the training sequence. The CMOE and CMT present poor xt(1)performances, even for L = 372. This experiment illustratesone of the major advantages of GPR detectors for CDMA Fig. 3communications: they can tune their hyperparameters to ﬁnd D ECISION BOUNDARY IN A CDMA SCENARIO WITH 2 USERS , 160the best regularized linear solution. Its capability of resorting TRAINING SAMPLES AND A SNR OF 9 D B FOR SVM ( DOTTED ), GPto nonlinear solutions is its other main advantage, as illustrated ( SOLID ) AND THE OPTIMUM ONE - SHOT ( DASH - DOTTED ) CENTRALIZEDnext. MUD. We now consider a detector in a simple situation for K = 2users transmitting with the same power. The spreading factoris N = 4 and the spreading codes are [+1 + 1 − 1 − 1] and[+1 − 1 − 1 + 1]. The UoI corresponds to the second code, We repeat Example 2 in [28], where the spreading factori.e. j = 2, and the channel impulse response is given by is N = 8 and we have K = 3 users with equal power. We report the BER for Users 2 and 3. The spreading codes are, c(z) = 0.3 + 0.7z −1 + 0.3z −2 . (24) respectively, [+1 +1 +1 +1 −1 −1 −1 −1], [+1 −1 +1 −1The inputs were as in (9) with Q = 1. Both the SVM and the −1 + 1 − 1 + 1] and [+1 − 1 − 1 + 1 −1 + 1 + 1 − 1]GPR use a Gaussian kernel (α2 = 0 in (10)). Following [28] and the channel response is given by:for the SVM, the width of the kernel is set to the noise standard c(z) = 0.4 + 0.9z −1 + 0.4z −2 . (25)deviation and the soft margin to 0.6. In Fig. 3 we depict thedecision boundaries for a signal to noise ratio (SNR) of 9 dB. For comparison purposes, we include the BER for theWe include the optimal one-shot detector in (2) (dash-dotted), SVM-MUD in [28] and SMMSE-MUD. Unlike [28], werethe SVM-MUD (dotted) and the GPR-MUD (solid). The bit the authors use the chips projected onto the users spreadingerror rate (BER) achieved by the SVM-MUD and the GPR- codes, we use the chips as inputs (8) with Q = 1. WeMUD are similar, although the boundaries are quite different. also include the BER for the ideal case, memoryless channelWe observe that the SVM-MUD is unable to generalize without interfering users (dashed). The covariance matrix ofproperly. The SVM boundary is sinuous, causing more than the GPR-MUD is given by (10). The SVM-MUD is trainedtwo decision regions, and in an eventual decrease in SNR, using a Gaussian kernel with its width equal to 3 times theit would present a poor BER performance. In comparison, noise standard deviation and the soft margin parameter 2 setGPR-MUD would degrade its performance gracefully, due to to 0.6.its regularized solution. In Fig. 4 we report the BER along the SNR. We depict the averaged results for 1000 independent experiments with 105 test samples and the same 32 training samples in eachB. A ﬂexible solution run with different noise. For such short training sequence, the In the previous experiments the kernels were either lineal SVM cannot learn a good classiﬁer and it is outperformed byor nonlinear. Since in digital communications the transmitter, the GPR-MUD and the linear SMMSE-MUD. The GPR, asradio channel and receivers are typically close to linear sys- the SVM, does not have enough training examples to buildtems, we propose the kernel in (10) to estimate the incoming a nonlinear classiﬁer. However, the GPR is able to “see” thatbits. The linear part of this covariance matrix allows a fast the training sequence is too short to train a nonlinear classiﬁerlearning of the system response and the nonlinear part adapts and it resorts to the linear SMMSE solution.the linear solution to accommodate the nonlinearities. Using 2 In [28] the authors do not report the width used in the experiments, butthe same kernel in other machine learning tools, such as SVM, they say it is related to the noise standard deviation. We found that 3 timesis cumbersome, as we have a large number of hyperparameters the standard deviation of the noise for the kernel width provided good resultsto learn by means of cross-validation. for the SVM-MUD.
8.
8 −1 SMMSE −1 SMMSE 10 10 GPR GPR SVM SVM −2 Ideal −2 Ideal 10 10 −3 −3 10 10 BER BER −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 −7 −7 10 10 10 15 20 25 30 10 15 20 25 30 SNR(dB) SNR(dB) Fig. 4 Fig. 5BER FOR U SER 2 IN E XPERIMENT 2: A CDMA SCENARIO WITH K = 3 BER FOR U SER 3 IN E XPERIMENT 2: A CDMA SCENARIO WITH K = 3USERS , N = 8 AND 32 TRAINING SAMPLES . SMMSE (◦), SVM (∗) AND USERS , N = 8 AND 100 TRAINING SAMPLES . SMMSE (◦), SVM (∗) AND GPR ( ) MUD. GPR ( ) MUD. In Fig. 5, we plot the BER for User 3 averaged over 1000 outperform these nonlinear classiﬁers for MUD in CDMA.experiments with 105 test samples and 100 random training They provide better solutions at shorter training sequences.examples. The optimal decision boundary is nonlinear. This We have shown that the GPR solution can be understoodscenario is illustrative of the nice properties of the GPR as a nonlinear MMSE. The linear part of the GPR-MUDdetector compared to the SVM-MUD receiver. For low SNR performs as the linear MMSE for large training sequences.both the GPR and SVM-MUD obtain a nonlinear detector For short training sequences, the GPR-MUD outperforms thethat outperforms the linear SMMSE. For high SNR and short linear MMSE-MUD, because it trains its regularization hy-training sequence, the nonlinear algorithms are unable to perparameter to accommodate the received training sequence.improve the linear solution. The GPR-MUD mimics the linear The GPR-MUD receiver with linear kernel is able to optimallySMMSE and the SVM solution degrades, unable to improve set the regularization parameter, instead of relying on ﬁxed orthe linear SMMSE detector. ad-hoc procedures for selecting it. Since for real scenarios the number of training samples is The linear GPR-MUD, as presented in this paper, can belimited, the GPR provides optimal results either by obtain- implemented directly for fast-fading multi-path channels anding the best nonlinear detector or by mimicking the linear its computational complexity is similar to that of the SMMSE.SMMSE-MUD, if there is not enough information available. For the nonlinear GPR, we have used the full covariance matrix and its optimization can be computationally costly in VI. C ONCLUSIONS AND FURTHER WORK some scenarios. There are several proposals that address this computational complexity issue [43]-[47], which can be used In this paper we have introduced Gaussian Processes for to implement the proposed GPR-MUD with low computationalRegression (GPR) as a nonlinear detector for DS-CDMA complexity with nonlinear kernels.digital communications systems. The GPR is a discriminativelearning tool and it does not assume anything about the CDMAcommunication system. It does not need to know how many ACKNOWLEDGEMENTSusers are active and what spreading codes they are using. It The authors thank Professor Guillermo Estevez at Vodafonedoes not need to know the channel model or its length. It Research and Development for his helpful comments andonly relies on a training sequence for the UoI to detect the review on some aspects of cellular systems in this paper.incoming chips and it can train linear and nonlinear modelsdepending on which suits the application best. This makes it R EFERENCESa very desirable tool for designing CDMA MUD receivers. GPR solution is analytical and its hyperparameters can [1] S. Verd´ , Multiuser Detection. Cambridge University Press, 1998. u [2] R. Prasad, CDMA for Wireless Personal Communications. Norwood,be learnt by maximum likelihood. This is a considerable MA: Artech House, 1996.improvement compared to other nonlinear tools as neural [3] M. Zeng, A. Annamalai, and V. K. Bhargava, “Recent advances innetworks or SVMs, which need to prespecify its structure or cellular wireless communications,” IEEE Communication Magazine, vol. 37, no. 1, pp. 128–138, Sep 1999.hyperparameters because the optimization step is taken to ﬁnd [4] J. G. Proakis, Digital Communications, 4th ed. New York, NY:the optimal parameters. This extra ﬂexibility allows GPR to McGraw-Hill, 2000.
9.
9 [5] S. Verd´ , “Minimum probability of error for asynchronous gaussian u [31] B. Sch¨ lkopf and A. Smola, Learning with kernels. M.I.T. Press, 2001. o multiple access channels,” IEEE Transactions on Information Theory, [32] G. S. Kimeldorf and G. Wahba, “Some results in Tchebychefﬁan spline vol. 32, pp. 85–96, Jan 1986. functions,” Journal of Mathematical Analysis and Applications, vol. 33, [6] R. Lupas and S. Verdu, “Linear multiuser detectors for synchronous pp. 82–95, 1971. code-division multiple-access channels,” IEEE Transactions on Infor- [33] C. K. I. Williams and C. E. Rasmussen, “Gaussian processes for mation Theory, vol. 35, no. 1, pp. 123–136, Jan 1989. regression,” in Proc. Conf. Advances in Neural Information Processing [7] Z. Xie, R. Short, and C. Rushforth, “A family of suboptimum detectors Systems, NIPS, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, for coherent multiuser communications,” IEEE J. Selected Areas on Eds., vol. 8. MIT Press, 1995. Communications, vol. 8, pp. 683–690, 1990. [34] C. Williams, “Prediction with gaussian processes: From linear regression [8] U. Madhow and M. L. Honig, “MMSE interference suppression for to linear prediction and beyond,” in Learning and Inference in Graphical direct-sequence spread-spectrum CDMA,” IEEE Transactions on Com- Models, M. Jordan, Ed. Kluwer Academic Press, 1998. munications, vol. 42, pp. 3178–3188, 1994. [35] J. Murillo-Fuentes, S. Caro, and F. Perez-Cruz, “Gaussian processes [9] H. V. Poor and S. Verd´ , “Probability of error in MMSE multiuser u for multiuser detection in CDMA receivers,” in In Advances in Neural detection,” IEEE Transactions on Information Theory, vol. 43, pp. 858– Information Processing Systems, NIPS, Vancouver, CA, Dec. 2005. 871, 1997. [36] F. P´ rez-Cruz and J. Murillo-Fuentes, “Gaussian processes for digital e[10] A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-posed Problems. communications,” in ICASSP, vol. V, Tolousse, France, May 2006, pp. Wiston, Washington DC, 1977. 781–784.[11] U. Madhow, “Blind adaptive interference suppression for direct sequence [37] T. S. Rappaport, Wireless Communications. Prentice Hall PTR, 2002. CDMA,” Proceedings of the IEEE, vol. 86, pp. 2048–2069, Oct 1998. [38] J. G. Proakis, “Adaptive equalization for TDMA digital mobile radio,”[12] J. W. Shubao Liu, “Blind adaptive multiuser detection using a recurrent IEEE Transactions on Vehicular Technology, vol. 40, no. 2, pp. 333–341, neural netowrk,” in IEEE Int. Conf. on Communications, Circuits and Feb. 1991. Systems (ICCCAS), vol. 2, Jun 2004, pp. 1071–1075. [39] C. K. I. Williams and D. Barber, “Bayesian classiﬁcation with gaus-[13] K. Zariﬁ, S. Shahbazpanahi, A. B. Gershman, and Z.-Q. Luo, “Robust sian processes,” IEEE Transactions on Pattern Analysis and Machine blind multiuser detection based on the worst-case performance optimiza- Intelligence, vol. 20, no. 12, pp. 1342–1351, 1998. tion of the mmse receiver,” IEEE Transactions on Signal Processing, [40] M. Kuss and C. E. Rasmussen, “Assessing approximations for Gaussian vol. 57, no. 1, pp. 295–296, Jan 2005. process classiﬁcation,” in Proc. Conf. Advances in Neural Information[14] C. M. Bishop, Neural Networks for Pattern Recognition. Clarendon Processing Systems, NIPS, Y. Weiss, B. Sch¨ lkopf, and J. Platt, Eds., o Press, 1995. vol. 18. MIT Press, 2006, pp. 699–706.[15] L. Rugini, P. Banelli, and S. Cacopardi, “A full rank regularization [41] F. P´ rez-Cruz and O. Bousquet, “Kernel methods and their potential use e technique for MMSE detection in multiuser CDMA systems,” IEEE in signal processing,” Signal Processing Magazine, vol. 21, no. 3, pp. Communications Letters, vol. 9, no. 1, p. 200, May 2005. 57–65, 2004.[16] G. Wahba and S. Wold, “A completely automatic french curve: ﬁtting [42] D. Guo, S. Shamai, and S. Verd´ , “Mutual information and minimum u spline functions by cross-validation,” Communications in Statistics, vol. mean-square error in gaussian channels,” IEEE Trans. Information Series A 4, no. 1, pp. 257–263, 1975. Theory, vol. 51, no. 4, pp. 1261–1283, Apr. 2005.[17] M. I. Jordan, Ed., Learning in Graphical Models. Cambridge, MA: [43] A. J. Smola and P. L. Bartlett, “Sparse greedy gaussian process regres- MIT Press, 1999. sion,” in Advances in Neural Information Processing Systems 13, T. K.[18] K.-S. C. Anders Høst-Madsen, “MMSE/PIC multiuser detection for Leen, T. G. Dietterich, and V. Tresp, Eds. Cambridge, MA: MIT Press, DS/CDMA systems with inter- and intra-cell interference,” IEEE Trans- 2001. actions on Communications, vol. 47, no. 2, p. 1999, Feb 1999. [44] L. Csat’o and M. Opper, “Sparse online gaussian processes,” Neural[19] Y. Kabashima, “A CDMA multiuser detection algorithm on the basis of Computation, vol. 14, no. 3, pp. 641– 669, 2002. belief propagation,” Journal of Physics A: Mathematical and General, [45] M. Seeger, C. K. I. Williams, and N. Lawrence, “Fast forward selection vol. 36, no. 2003, pp. 11 111–11 121, Oct 2003. to speed up sparse gaussian process regression,” in International Work-[20] M. O. T. Tanaka, “Approximate belief propagation, density evolution, shop on Artiﬁcial Intelligence and Statistics, C. Bishop and B. J. Frey, and statistical neurodynamics for CDMA multiuser detection,” IEEE Eds., 2003. Transactions on Information Theory, vol. 51, no. 2, pp. 700–706, Feb [46] E. Snelson and Z. Ghahramani, “Sparse gaussian processes using 2005. pseudo-inputs,” in Advances in Neural Information Processing Systems[21] L. K. R. P. H. Tan, “Asymptotically optimal non-linear MMSE mul- 18, Y. Weiss, B. Sch¨ lkopf, and J. Platt, Eds. Cambridge, MA: MIT o tiuser detection based on multivariate gaussian approximation,” IEEE Press, 2006. Transactions on Communications, vol. 54, no. 8, pp. 1427–1438, Aug [47] E. Snelson and Z. G. (2007), “Local and global sparse gaussian process 2006. approximations,” in Artiﬁcial Intelligence and Statistics, vol. 11, 2007.[22] D. D. Lin and T. J. Lim, “A variational free energy minimization inter- [48] J. Qui˜ onero-Candela and C. E. Rasmussen, “A unifying view of sparse n pretation of multiuser detection in CDMA,” in IEEE Global Telecom- approximate gaussian process regression,” Journal of Machine Learning munications Conference (Globecom), vol. 3, St. Louis, MO, USA, Dec Research, vol. 6, pp. 1935–1959, 2005. 2005. [49] L. Rugini, P. Banelli, and S. Cacopardi, “Regularized MMSE multiuser[23] V. N. Vapnik, Statistical Learning Theory. New York: John Wiley & detection using covariance matrix tapering,” in EEE International Con- Sons, 1998. ference on Communications, vol. 4, May 2003, pp. 2460 – 2464.[24] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press, 2006.[25] A. J. Smola and B. Sch¨ lkopf, “A tutorial on support o vector regression,” Royal Holloway College, University of London, UK, Tech. Rep. NC-TR-98-030, 1998, ftp://www.neurocolt.com/pub/neurocolt/tech reports/1998/98030.ps.Z.[26] U. Mitra and H. V. Poor, “Neural network techniques for adaptive mul- tiuser demodulation,” IEEE Journal Selected Areas on Communications, vol. 12, pp. 1460–1470, 1994.[27] R. Tanner and D. G. M. Cruickshank, “Volterra based receivers for DS- CDMA,” in IEEE Int. Symp. Personal, Indoor, Mobile Radio Commun., vol. 3, September 1997, pp. 1166–1170.[28] S. Chen, A. K. Samingan, and L. Hanzo, “Support vector machine multiuser receiver for DS-CDMA signals in multipath channels,” IEEE Transactions on Neural Network, vol. 12, no. 3, pp. 604–611, December 2001.[29] B. Aazhang, B. P. Paris, and G. C. Orsak, “Neural networks for multiuser detection in code-division multiple-access communications,” IEEE Transactions on Communications, vol. 40, pp. 1212–1222, 1992.[30] D. G. M. Cruickshank, “Radial basis function receivers for DS-CDMA,” IEE Electronic Letter, vol. 32, no. 3, pp. 188–190, 1996.
Be the first to comment