M.J.C. Pontes et al. / Analytica Chimica Acta 642 (2009) 12–18 13method in terms of soil line demarcation and number of detectedsoil classes.Mouazen et al.  employed VIS–NIR reﬂectance spectroscopy(306.5–1710.9 nm) to discriminate soil texture classes. Factorial dis-criminant analysis (FDA) was applied to the ﬁrst ﬁve principalcomponents (PCs) resulting from the principal component analysis(PCA) of the VIS–NIR spectra. Four different classes of soil sampleswere classiﬁed with 85.7% and 81.8% of correct classiﬁcation forthe calibration and validation sets, respectively. After two similarclasses (coarse and ﬁne sand) were merged, the correct classiﬁca-tion rate increased to 89.9% (calibration) and 85.1% (validation).The present paper proposes a novel analytical methodologyfor soil classiﬁcation based on the use of laser-induced break-down spectroscopy (LIBS). In LIBS, a pulsed laser of high power isfocused on the sample surface. The high power per area (irradiance)causes the vaporization of sample constituents and the formation ofplasma. The spectrum of emission from the plasma is then acquiredand used as analytical response. This technique can be applied tosolid, liquid or gaseous materials with little or no sample treatment.LIBS has been successfully applied to classiﬁcation of differentsamples, including chemical and biological warfare agent simu-lants , alloys , archaeological objects , polymers ,explosives , among others. However, only one paper has beenpublished on the use of LIBS in the context of soil classiﬁcation. In that work, LIBS spectra initially containing more than 50,000points were reduced to 68 points corresponding to the spectral linesof eight major soil elements (aluminium, silicon, iron, calcium, mag-nesium, potassium, titanium and manganese) and PCA was appliedto the reduced data. As a result, only two soil classes could be dis-criminated. The substantial dispersion of the remaining samplesprevented an adequate classiﬁcation.The present paper investigates the use of LIBS and chemometrictechniques for classiﬁcation of Brazilian soil samples into three dif-ferent orders, namely Argissolo, Latossolo and Nitossolo. These soilorders were deﬁned in the Brazilian System of Soil Classiﬁcation, created in 1999. According to the international classiﬁcation ofFAO (Food and Agriculture Organization of the United Nations) ,the Argissolo, Latossolo and Nitossolo orders are equivalent to theAcrisol, Ferralsol and Nitisol soil groups, respectively. The Argissoloorder consists of exchangeable basic-cation poor, morphologicallyand physically heterogeneous soils. Latossolo soils are exchange-able basic-cation poor and morphologically homogeneous. TheNitossolo order comprises soils with variable content of exchange-able cations, carrying a unique set of physical and morphologicalproperties that reﬂects on a typical hydrological and mechanicalbehaviour. These three orders are representative of humid tropi-cal regions with soils typically developed from highly weatheredparent material. These soils are constituted mostly by iron andaluminium oxides (e.g. goethite and gibbsite) and 1:1 (Si:Al) layersilicate (basically kaolinite). According to IBGE , Argissolo andLatossolo are predominant in Brazil, as well as in other countries ofSouth America. Nitossolo corresponds to approximately 1% of theBrazilian territory.Owing to the very large number of variables in a LIBS spectrum,the use of appropriate feature extraction procedures is required.In this context, a possible approach consists of selecting spectrallines corresponding to speciﬁc elements . However, in order toreduce the possibility of losing relevant information for the classiﬁ-cation task, the present work employs statistical variable selectionalgorithms instead of a priori considerations. More speciﬁcally, thesuccessive projection algorithm (SPA) , the genetic algorithm(GA) , and a stepwise formulation (SW)  are adopted for thispurpose. Linear discriminant analysis is then employed to obtaina classiﬁcation model based on the selected spectral variables. Inaddition, the use of a data compression procedure in the waveletdomain is proposed to reduce the computational workload involvedin the variable selection process. For means of comparison, theresults obtained by using SIMCA models are also presented.2. TheoryThe linear discriminant analysis (LDA) classiﬁcation methodemploys linear decision boundaries (hyperplanes), which aredeﬁned in order to maximize the ratio of between-class to within-class dispersion . In order to have a well-posed problem, thenumber of calibration (training) objects must be larger than thenumber of variables to be included in the LDA model. Therefore, theuse of LDA for classiﬁcation of spectral data usually requires appro-priate variable selection procedures [18,19,21]. In this section, thethree algorithms adopted for this purpose in the present work (SPA,SW, and GA) will be described. Moreover, a wavelet compression(WC) method, which can be employed prior to variable selection,will also be presented.2.1. Successive projections algorithmThe successive projections algorithm (SPA) was originally pro-posed by Araújo et al.  to minimize multi-collinearity effectsand thus improve the conditioning of multiple linear regression(MLR) modelling for spectral data. In the original formulation, can-didate subsets of variables were deﬁned as the result of projectionoperations carried out on the matrix of instrumental response data.These subsets were then used to build MLR models, which werecompared in terms of the prediction error in a set of validationsamples. This validation set was not employed in either the projec-tion operations or the calibration of the MLR models. At the end, thesubset of variables leading to the smallest root-mean-square errorof validation (RMSEV) was adopted.In a subsequent paper , SPA was adapted for use in clas-siﬁcation problems. As in the original formulation, the candidatesubsets of variables were formed as the result of projection opera-tions intended to minimize multi-collinearity effects, which are aknown cause of poor generalization performance in LDA . How-ever, the RMSEV metric was replaced with an average risk G of LDAmisclassiﬁcation. Such a cost function is calculated in the validationset asG =1KvKvk=1gk, (1)where gk (risk of misclassiﬁcation of the kth validation object xk,k = 1, . . ., Kv) is deﬁned asgk =r2(xk, Ik)minIj /= Ikr2(xk, Ij). (2)In this deﬁnition, the numerator r2(xk, Ik) is the squared Maha-lanobis distance  between object xk (of class index Ik) and thesample mean Ik of its true class. The denominator in Eq. (2) cor-responds to the squared Mahalanobis distance between object xkand the center of the closest wrong class. In the Mahalanobis dis-tance calculations, the sample mean for each class and the pooledcovariance matrix for each variable subset under consideration arecomputed by using the training data.2.2. Stepwise algorithmThe stepwise (SW) selection algorithm adopted in the presentwork was proposed by Caneca et al.  for classiﬁcation of diesel-engine lubricating oils on the basis of near and mid-infrared spectra.Initially, the algorithm calculates the discriminability of each spec-tral variable with respect to the classes under consideration .
14 M.J.C. Pontes et al. / Analytica Chimica Acta 642 (2009) 12–18Fig. 1. Filter bank implementation of the wavelet transform. In this diagram, H, G represent a low-pass and a high-pass digital ﬁlter, respectively, and ↓2 denotes the dyadicdownsampling operation.The variable with the largest discriminability value is selected anda leave-one-out cross-validation procedure is carried out by usingLDA. Among the remaining variables, those having a large correla-tion with the selected one are then discarded to avoid collinearityproblems. This process is repeated at each subsequent iterationby successively adding variables to the LDA model until no morevariables are available for selection. The subset of variables leadingto the smallest number of cross-validation errors is then adopted.If different subsets lead to the same number of cross-validationerrors, the subset with the smallest number of variables is chosen.It is worth noting that, after the second iteration, the discardingof variables is based on the coefﬁcient of multiple correlation, whichis deﬁned, for each variable xi still available for selection, asri =(ˆxi)(xi), (3)where (·) denotes the standard deviation calculated in the trainingset and ˆxi is an estimate of xi obtained by multiple linear regressionfrom the variables already selected. If ri is close to one, variable xi isredundant because its values can be predicted, with good accuracy,from the variables already included in the LDA model. An inconve-nience of this algorithm is the need to set a threshold for ri in orderto decide which variables are to be discarded. However, it is possi-ble to test different threshold values and then compare the resultingLDA models on the basis of the classiﬁcation errors obtained in aseparate validation set.2.3. Genetic algorithmThe GA is a versatile search technique inspired in the biologicalmechanisms of evolution by natural selection [25–27]. In vari-able selection problems, the algorithm typically encodes subsetsof variables in the form of strings of binary (0/1) values termed“chromosomes”. Each position (or “gene”) in the chromosome isassociated to one of the variables available for selection. GenesFig. 2. Diagram of LIBS instrument. (a) Laser source and cooler, (b) Nd:YAG laserhead, (c) dicroic mirror, (d) focusing lens, (e) soil sample, (f) sample cell, (g) collectinglens, (h) ﬁber optic, (i) detector trigger signal, (j) echelle polychromator, (k) ICCDdetector and (l) computer.with a “1” value indicate that the corresponding variables are tobe included in the model. The algorithm starts with a populationof randomly generated chromosomes, which are then combinedaccording to certain rules in order to generate a new generationof chromosomes (offspring). This process is repeated until a givenstopping criterion is satisﬁed.The present work adopts the GA formulation presented in Ref., which has the following features. A ﬁtness value is deﬁnedfor each chromosome as the inverse of the validation cost deﬁnedin Eq. (1) calculated for the subset of variables encoded in thechromosome (“1” genes). The probability of a given chromosomebeing selected for offspring generation is proportional to its ﬁtness(“roulette” method) . By using this probabilistic method, pairsof chromosomes are formed and then combined to generate pairsof descendants by one-point crossover and mutation operators. Thepopulation size is kept constant, each generation being completelyreplaced by its descendants. However, the best individual is auto-matically transferred to the next generation (elitism) to avoid theloss of good solutions. This evolutionary process is repeated until apre-speciﬁed number of cycles is completed.2.4. Wavelet compressionThe SPA, SW and GA algorithms described above may involveconsiderable computational workload if the number of variables islarge, as in the case of LIBS spectra. This problem can be alleviatedby using a compression technique to reduce the dimensionality ofthe data prior to the variable selection procedures. In the presentwork, a wavelet compression method is adopted for this purpose.The wavelet transform (WT) is a multi-resolutional signalprocessing tool  that has found several applications in denois-ing, feature extraction and compression of instrumental signals[29–34]. The WT of a spectrum x = [x( 1) x( 2) · · · x(J)], where jis the jth wavelength, can be obtained by using a digital ﬁlter bankstructure [28,31,35] of the form depicted in Fig. 1.The basic structure of the ﬁlter bank consists of a pair of low-pass (H) and high-pass (G) ﬁlters, followed by a downsamplingoperation, which discards one in every two points of the ﬁlteringoutcome. The downsampled output of the low-pass ﬁlter, termed“approximation coefﬁcients”, is a smoothed version of the spec-Table 1Number of training and validation samples in each class.Class SetTraining ValidationArgissolo 31 15Latossolo 56 28Nitossolo 12 7Total 99 50
M.J.C. Pontes et al. / Analytica Chimica Acta 642 (2009) 12–18 15Fig. 3. Mean LIBS spectrum of each soil order.trum at a coarser resolution. The downsampled output of thehigh-pass ﬁlter, termed “detail coefﬁcients”, correspond to high-frequency noise, as well as sharp features of the spectrum, suchas narrow peaks. This operation can be reapplied to the approx-imation coefﬁcients up to the number of decomposition levelsspeciﬁed by the analyst. The result of the transform comprisesthe ﬁnal approximation coefﬁcients, as well as the detail coefﬁ-cients obtained along the entire ﬁlter bank. With a slight abuseof language, this result will be henceforth termed “wavelet coefﬁ-cients”.The H and G ﬁlters employed in the ﬁlter bank are typically ofﬁnite length, which implies that each approximation or detail coef-ﬁcient corresponds to a reduced range of wavelengths within thespectrum. This spatial localization feature is often invoked as oneof the main advantages of WT over the Fourier transform [28,35].However, the choice of appropriate H and G ﬁlters for a speciﬁcapplication may not be straightforward [29,31]. In the present work,different wavelet ﬁlters were tested and compared in terms ofcompression ability for the LIBS data set under consideration. Thedecomposition levels were set to the maximum number for whichthe spatial localization features of the WT are not lost . Thislimit situation occurs when the H, G ﬁlters span the entire lengthof the downsampled approximation coefﬁcients .3. Experimental3.1. Brazilian soil data setA total of 149 Brazilian soil samples of three different orders(Argissolo: 46, Latossolo: 84 and Nitossolo: 19) collected at the Bhorizon (subsurface layer) were employed in the study. Before LIBSspectral recording, these samples were dried in an oven at 105 ◦Cfor 2.5 h, ground and sieved to a particle size smaller than 350 m.Table 2Classiﬁcation rates obtained with GA–LDA, SW–LDA, SPA–LDA and SIMCA for (1) Argissolo, (2) Latossolo and (3) Nitossolo. The number of spectral variables employed in eachmodel is indicated in parenthesis. N indicates the number of samples employed in the calculation of the classiﬁcation rates.True class index N GA–LDA (17) SW–LDA (7) SPA–LDA (5) SIMCAPredicted class index (%) Predicted class index (%) Predicted class index (%) Predicted class index (%)Validation set 1 2 3 1 2 3 1 2 3 1 2 31 15 73 13 13 73 27 0 80 20 0 100 80 802 28 0 89 11 0 79 21 4 89 7 93 100 793 7 0 29 71 0 29 71 0 0 100 100 100 100Cross-validation1 46 72 15 13 74 20 7 70 20 11 98 72 672 84 11 69 20 10 75 16 11 73 17 79 98 603 19 16 32 53 16 16 68 11 16 74 90 95 100
16 M.J.C. Pontes et al. / Analytica Chimica Acta 642 (2009) 12–18Fig. 4. (a) PC2 × PC1 and (b) PC3 × PC1 score plots for the overall set of 149 soilsamples (O: Argissolo, : Latossolo, : Nitossolo).3.2. LIBS instrumentThe measurements were carried out with a lab-made LIBSinstrument consisting of a Nd:YAG laser (Quantel, 1064 nm,360 mJ/pulse and pulse duration of 5 ns), an echelle polychro-mator (52.13 lines/mm, Mechelle 5000, Andor Technology), anIntensiﬁed Charge Couple Device (ICCD) detector with an arrayof 1024 × 1024 pixels (Model DH734, Andor Technology) and anadjustable position plate for the sample. Fig. 2 presents a diagramof the LIBS instrument.3.3. Spectra acquisitionThirty spectra were acquired for each sample by applying thelaser pulse to different points of the sample surface. Prior to themeasurement process, the sample cell was ﬁlled and the soil surfacewas levelled. After every ﬁve measurements, on different points,the sample surface was re-levelled to eliminate the small cratersproduced by the laser beam.The laser energy, delay time and integration time gate were110 mJ/pulse, 500 ns and 10 s, respectively. The focal point was sit-uated 0.5 cm below the sample surface. The spectra were acquiredin the range 203.13–987.64 nm. Each resulting spectrum had 26,624points.3.4. SoftwareEach individual spectrum was pre-treated by Standard NormalVariate (SNV) . Afterwards, the average spectrum for each sam-ple was calculated. The average spectra were then divided intotraining and validation sets by using the classic Kennard-Stone (KS)algorithm . The KS algorithm was applied to each class sepa-rately, as described in Ref. . The number of samples in each setis presented in Table 1.For the purpose of WC, 22 different wavelets were tested (Symlet4-10, Daubechies 1-10 and Coiﬂet 1-5). The low-pass and high-pass ﬁlters for dbN, symN and coifN have length 2N, 2N, and 6N,respectively (i.e., small values of N are associated to wavelets ofsmall width). These wavelets were selected in view of previousworks concerning FT-IR  and UV–VIS  spectrometry. Themaximum number of decomposition levels for each wavelet wasemployed, as discussed in Section 2.4. The percentage of data vari-ance retained in the compression process was set to 95%.SNV, PCA and SIMCA were performed with the default settingsof the Unscrambler® 9.6 software (CAMO A/S). The optimal numberof PCs was determined from the residual variance curve. The ﬁrstlocal minimum is adopted unless later PCs give signiﬁcantly lowerresidual variance. The signiﬁcance level of the F-test for SIMCA clas-siﬁcation was set to the default value (5%). The WC, KS, GA–LDA,SW–LDA and SPA–LDA classiﬁcation routines were implemented inMatlab® 6.5. The GA routine was carried out during 200 generationswith 400 chromosomes each. Crossover and mutation probabili-ties were set to 60% and 10%, respectively, as in . Moreover, thealgorithm was repeated three times, starting from different ran-dom initial populations. The best solution (in terms of the ﬁtnessvalue) resulting from the three realizations of the GA was employed.Seven threshold values (0.1, 0.2, 0.5, 0.7, 0.8, 0.9, and 0.95) forthe coefﬁcient of multiple correlation were tested in the SW–LDAalgorithm. The best threshold was selected on the basis of the classi-ﬁcation errors in the validation set. If two threshold values providedthe same number of classiﬁcation errors, the threshold providingthe simplest model (smallest number of selected variables) wasfavoured.The results were expressed in terms of classiﬁcation rates for thevalidation set. In addition, cross-validation results were obtained byapplying the leave-one-out approach to the entire data set of 149samples.4. Results and discussionFig. 3 presents the mean LIBS spectrum of each soil order inthe range of approximately 203–1000 nm. As can be seen, discrim-inating the three soil orders on the basis of LIBS measurements isnot straightforward, owing to the complexity of the spectra. Thedifﬁculty involved in the classiﬁcation task is also apparent in thePC score plots presented in Fig. 4. As can be seen, the dispersionwithin each class is considerable. Such a dispersion can be ascribedto the poor repeatability of the LIBS measurements, as well as thelarge chemical and mineralogical variability within each soil type.In Fig. 4, the best discrimination is found between Latossolo andArgissolo samples, which are reasonably well separated along PC1.In fact, these two orders are the most distinct in terms of miner-alogical constitution. However, they are considerably overlappedby Nitossolo. It may be argued that distinctive features of Nitossoloare not adequately captured by the LIBS spectra.4.1. Classiﬁcation in the original spectral domainTable 2 presents the classiﬁcation results (validation set andcross-validation) obtained in the original spectral domain. This
M.J.C. Pontes et al. / Analytica Chimica Acta 642 (2009) 12–18 17Fig. 5. Determination of the optimum number of variables in SPA–LDA.table also indicates the number of spectral variables (wavelengths)employed in each model. In the case of SW-LDA, the threshold valueselected according to the criteria described in Section 3.4 was 0.2.The number of variables for SPA–LDA was determined from theminimum of the cost function displayed in Fig. 5.The rates in Table 2 express both correct classiﬁcations (pre-dicted class index equal to correct class index) and incorrectclassiﬁcations (predicted class index different from correct classindex). In each LDA model, the three rates in a row add up to 100%,because every sample is included in one and only one class. Forexample, the 15 validation samples of class 1 (Argissolo) were clas-siﬁed by GA–LDA in the following manner: 11 samples (73%) werecorrectly included in class 1, two samples (13%) were incorrectlyincluded in class 2 (Latossolo), and two samples (13%) were incor-rectly included in class 3 (Nitossolo). In contrast, SIMCA may includea given sample in more than one class. Therefore, the sum of thethree rates in a row may be larger than 100% for SIMCA.Among the LDA models, the worst overall results in terms ofvalidation and cross-validation were obtained with GA–LDA. Thisﬁnding may be ascribed to the fact that GA–LDA does not take intoaccount multicollinearity effects in the variable selection process,whereas SPA–LDA and SW–LDA were designed to minimize sucheffects. In fact, it is worth noting that GA–LDA selected a largernumber of spectral variables (17), as compared to SW–LDA (7) andSPA–LDA (5). As regards the comparison between SW–LDA andSPA–LDA, it can be seen that SPA–LDA provides better results inthe validation set for all three soil types (average correct classiﬁca-tion rate of 90%). In terms of overall cross-validation performance,SW–LDA and SPA–LDA are similar, as the average correct classiﬁca-tion rate was 72% for both models.SIMCA provided good validation and cross-validation results interms of correctly including the samples in their true class. How-ever, almost all samples were also included in an incorrect class.This problem may be ascribed to the dispersion and overlapping ofthe soil classes, as seen in the score plots presented in Fig. 4.4.2. Use of wavelet compressionAs discussed in Section 3.4, 22 wavelets were tested for com-pression of the LIBS spectra. Table 3 presents the results, which areexpressed in terms of the number of coefﬁcients required to explain95% of the data variance. On the overall, the best performances (i.e.,the smallest number of required coefﬁcients) were obtained withthe smallest wavelets within each family. In fact, small waveletsTable 3Number of wavelet coefﬁcients required to explain 95% of the data variance.Wavelet Number of retained coefﬁcientsSym4 663Sym5 684Sym6 696Sym7 701Sym8 723Sym9 729Sym10 751Db1 785Db2 692Db3 690Db4 738Db5 753Db6 781Db7 818Db8 858Db9 865Db10 896Coif1 678Coif2 677Coif3 700Coif4 719Coif5 751may be a better match to the narrow emission peaks found in LIBSspectra.Classiﬁcation tests were carried out by using the ﬁve bestwavelets in terms of compression performance (sym4, db2, db3,coif1 and coif2). Table 4 presents the validation results obtainedby applying GA–LDA, SW–LDA and SPA–LDA to the compresseddata set. The best wavelets for GA–LDA, SW–LDA and SPA–LDAwere sym4 (considering compression performance in addition tothe classiﬁcation rate), coif1 and coif2, respectively. By using thesewavelets, a classiﬁcation rate of 84% was obtained with the threeLDA models. For GA–LDA and SW–LDA, this rate is an improvementin comparison with the results obtained in the original spectraldomain. In the case of SPA–LDA, the result became slightly worse,as the classiﬁcation rate in the original domain was 90%. However,the computation workload involved in the modelling process wassubstantially reduced by the use of WC, as the number of variableswas reduced by a factor of 40 (from 26,624 to 677 with coif2, forexample). By using a computer with a Celeron 2.66 GHz processorand 2 GB RAM, the time required for variable selection by SPA wasreduced from approximately 1000 min to 8 min. It is worth not-ing that the time spent in the WC process itself is relatively small(approximately 28 s for the coif2 wavelet).By using the best wavelet for each model, the cross-validationrates for GA–LDA, SW–LDA and SPA–LDA were 69%, 70% and 71%,respectively. For SW–LDA and SPA–LDA, these results are slightlyworse than the rate obtained in the original domain (72%). ForGA–LDA, the result is actually better, as the rate obtained in theoriginal domain was 65%. In view of the overall validation and cross-validation results, it can be concluded that the WC process doesnot signiﬁcantly compromise the classiﬁcation performance of theresulting models.Table 4Average classiﬁcation rate (%) in the validation set (original spectral domain andwavelet-compressed data).GA–LDA SW–LDA SPA–LDAOriginal domain 78 74 90Sym4 84 81 79Db2 84 77 68Db3 84 83 80Coif1 77 84 79Coif2 83 75 84
18 M.J.C. Pontes et al. / Analytica Chimica Acta 642 (2009) 12–18Table 5Classiﬁcation rates obtained with SPA–LDA and SIMCA for (1) Argissolo, (2) Latossolo and (3) Nitossolo. The number of wavelet coefﬁcients employed in each model isindicated in parenthesis. N indicates the number of samples employed in the calculation of the classiﬁcation rates.True class index N SPA–LDA (6) SIMCA (677) SIMCA (6)Predicted class index (%) Predicted class index (%) Predicted class index (%)Validation set 1 2 3 1 2 3 1 2 31 15 67 20 13 100 80 80 100 80 672 28 4 86 11 93 100 68 86 96 543 7 0 0 100 100 100 100 71 43 100Cross-validation1 46 67 17 15 96 70 65 93 65 632 84 10 71 19 79 98 52 80 94 603 19 16 11 74 95 95 100 74 90 100For comparison purposes, Table 5 presents the classiﬁcationresults of SPA–LDA and SIMCA for the coif2-compressed data set.SIMCA models were constructed with the 677 coefﬁcients resultingfrom the WC compression process and also with the six coefﬁcientsselected by SPA–LDA. On the overall, the SIMCA classiﬁcation rateswere similar to those obtained in the original domain with the fullspectrum (Table 2). This result corroborates the conclusion that thewavelet compression retains discriminatory information concern-ing the soil classes under study.5. ConclusionsThis paper presented a novel methodology for soil classiﬁca-tion based on the use of LIBS data and chemometrics methods.The methodology was validated in a case study involving threeBrazilian soil types (Argissolo, Latossolo and Nitossolo). Better dis-crimination of the soil types was attained by employing a subsetof selected spectral variables for LDA, as compared to the use offull-spectrum SIMCA modelling. More speciﬁcally, the best resultswere obtained with SPA–LDA, which achieved an average classiﬁ-cation rate of 90% in the validation set and 72% in cross-validation.The proposed wavelet compression procedure was useful to reducethe computational workload (by a factor of 100) without signiﬁ-cantly compromising the classiﬁcation accuracy. It is worth notingthat, after the classiﬁcation models have been obtained, the pro-posed methodology can be applied to new samples in a fast andstraightforward manner.Future works could investigate the combination of LIBS withother techniques, such as VIS–NIR spectroscopy, for the purposeof improving the classiﬁcation outcome.AcknowledgmentsThe authors thank PROCAD/CAPES (Grant 0081/05-1) andFAPESP (Grant 03/07419-5) for partial ﬁnancial support. Theresearch fellowships and scholarships granted by CNPq and CAPESare also gratefully acknowledged.References H.G. Santos; P.K.T. Jacomine, L.H.C. Anjos, V.A. Oliveira, J.B. Oliveira, R.M. Coelho,J.F. Lumbreras, T.J.F. Cunha, Sistema Brasileiro de Classiﬁcac¸ ão de Solos, 2ndedition, Embrapa Solos, Rio de Janeiro, 2006. Soil Survey Staff, Keys to Soil Taxonomy, 9th ed., United States Department ofAgriculture, Washington, 2003. D. Baize, M.C. Girard, Référentiel pédologique, Paris, 1995. P. Tittonell, K.D. Shepherd, B. Vanlauwe, K.E. Giller, Agr. Ecosyst. Environ. 123(2008) 137. J.D. Phillips, D.A. Marion, Geoderma 141 (2007) 89. E.A.G. Zagatto, Análises Químicas Multielementares em Sistemas FIA-ICP-GSAMe Classiﬁcac¸ ões dos Solos do Estado de São Paulo, Doctoral thesis, UniversidadeEstadual de Campinas, Campinas, 1981. J.A.M. Demattê, R.C. Campos, M.C. Alves, P.R. Fiorio, M.R. Nanni, Geoderma 121(2004) 95. A.M. Mouazen, R. Karoui, J. Baerdemaeker, H. Ramon, J. Near Infrared Spectrosc.13 (2005) 231. C. Pasquini, J. Cortez, L.M.C. Silva, F.B. Gonzaga, J. Braz. Chem. Soc. 18 (2007)463. C.A. Munson, F.C. Lucia Jr., T. Piehler, K.L. McNesby, A.W. Miziolek, Spectrochim.Acta Part B 60 (2005) 1217. S.R. Goode, S.L. Morgan, R. Hoskins, A. Oxsher, J. Anal. At. Spectrom. 15 (2000)1133. M. Corsi, G. Cristoforetti, M. Giuffrida, M. Hidalgo, S. Legnaioli, L. Masotti, V.Palleschi, A. Salvetti, E. Tognoni, C. Vallebona, A. Zanini, Microchim. Acta 152(2005) 105. R. Sattmann, I. Mönch, H. Krause, R. Noll, S. Couris, A. Hatziapostolou, A.Mavromanolakis, C. Fotakis, E. Larrauri, R. Miguel, Appl. Spectrosc. 52 (1998)456. W. Schade, C. Bohling, K. Hohmann, D. Scheel, Laser Part. Beams 24 (2006) 241. B. Bousquet, J.-B. Sirven, L. Canioni, Spectrochim, Acta Part B 62 (2007) 1582. IBGE (Brazilian Institute of Geography and Statistics), EMBRAPA (Brazilian Agri-culture Research Institute), Soil Map of Brazil (1:5,000,000), 2001. Available at:http://mapas.ibge.gov.br/solos/viewer.htm (accessed in March 2008). IUSS Working Group WRB, World Reference Base for Soil Resources, World SoilResources Reports, 103, 128, 2006. M.J.C. Pontes, R.K.H. Galvão, M.C.U. Araújo, P.N.T. Moreira, O.D.P. Neto, G.E. José,T.C.B. Saldanha, Chemom. Intell. Lab. Syst. 78 (2005) 11. A.R. Caneca, M.F. Pimentel, R.K.H. Galvão, C.E. Matta, F.R. Carvalho, I.M.Raimundo Jr., C. Pasquini, J.J.R. Rohwedder, Talanta 70 (2006) 344. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classiﬁcation, 2nd ed., John Wiley, NewYork, 2001. Y. Mallet, D. Coomans, O. de Vel, Chemom. Intell. Lab. Syst. 35 (1996) 157. M.C.U. Araújo, T.C.B. Saldanha, R.K.H. Galvão, T. Yoneyama, H.C. Chame, V. Visani,Chemom. Intell. Lab. Syst. 57 (2001) 65. T. Naes, B.H. Mevik, J. Chem. 15 (2001) 413. R. de Maesschalck, D. Jouan-Rimbaud, D.L. Massart, Chemom. Intell. Lab. Syst.50 (2000) 1. D.E. Goldberg, Genetic Algorithms in Search, Optimization,and Machine Learn-ing, Addison-Wesley Longman Publishing Co., Inc., Boston, 1989. D. Jouan-Rimbaud, D.L. Massart, R. Leardi, O.E. Noord, Anal. Chem. 67 (1995)4295. R. Leardi, J. Chem. 15 (2001) 559. B. Walczak, Wavelets in Chemistry, Elsevier Science, New York, 2000. C. Cai, P.B. Harrington, J. Chem. Inf. Comput. Sci. 38 (1998) 1161. U. Depczynski, K. Jetter, K. Molt, A. Niemoller, Chemom. Intell. Lab. Syst. 49(1999) 151. C.J. Coelho, R.K.H. Galvão, M.C.U. Araújo, M.F. Pimentel, E.C. Silva, J. Chem. Inf.Comput. Sci. 43 (2003) 928. R.K.H. Galvão, H.A.D. Filho, M.N. Martins, M.C.U. Araújo, C. Pasquini, Anal. Chim.Acta 581 (2007) 159. A.C. Sousa, M.M.L.M. Lucio, O.F. Bezerra Neto, G.P.S. Marcone, A.F.C. Pereira, E.O.Dantas, W.D. Fragoso, M.C.U. Araújo, R.K.H. Galvão, Anal. Chim. Acta 588 (2007)231. S. Ren, L. Gao, Talanta 50 (2000) 1163. M. Vetterli, J. Kovacevic, Wavelets and Subband Coding, Prentice-Hall, NewJersey, 1995. R.N.F. Santos, R.K.H. Galvão, M.C.U. Araújo, E.C. Silva, Talanta 71 (2007) 1136. R.J. Barnes, M.S. Dhanoa, S.J. Lister, Appl. Spectrosc. 43 (1989) 772. R.W. Kennard, L.A. Stone, Technometrics 11 (1969) 137. L. Gao, S. Ren, Spectrochim, Acta Part A 61 (2005) 1136.