Your SlideShare is downloading. ×
THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

THEME – 2 On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties

289
views

Published on

Published in: Science, Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
289
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. On Normalizing Transformations of the Coefficient of Variation for a Normal Population with an Application to Evaluation of Uniformity of Plant Varieties Yogendra P. Chaubey∗ Department of Mathematics and Statistics Concordia University, Montreal, Canada H3G 1M8 E-mail: yogen.chaubey@concordia.ca ∗ Joint work with M. Singh, ICARDA, Aleppo, Syria and Debaraj Sen, Department of Mathematics and Statistics, Concordia University, Montreal, Canada Talk to be presented at the International Workshop on Applied Mathematics and Omics Technologies for Discovering Biodiversity and Genetic Resources for Climate Change Mitigation and Adaptation to Sustain Agriculture in Drylands, ICARDA, Rabat, Morocco June 24-27, 2014 Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 1 / 49
  • 2. Abstract The variance stabilizing transformation (VST), that was formally introduced by Bartlett (1947, Biometrics) is quite popular in statistical applications due to its approximate normalizing property. This property is mainly due to the fact that the variance stabilizing transformations may be more symmetric compared to the the untransformed statistics. Chaubey and Mudholkar (1983, Technical Report, Concordia University) developed a differential equation, analogous to Bartlett’s, for obtaining an approximately symmetrizing transformations and illustrated it’s use in some common examples. In general, the transformation may be computationally intensive as illustrated in Chaubey, Singh and Sen (2013, Comm. Stat. - Theor. Meth.) in terms of coefficient of variation from normal samples. In this talk we review these transformations in this light and examine some new transformations along with an application to evaluating the uniformity of plant varieties. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 2 / 49
  • 3. Outline 1 Introduction 2 Symmetrizing and Variance Stabilizing Transformations 3 A Condition under which VST is ST Fisher’s transformation of correlation coeff. Arcsin Transformation for the Binomial Proportion Square root transformation for Poisson RV Chi-square Random Variable 4 Symmetrizing transformations in Standard Cases 5 VST and ST for Coefficient of Variation Appendix: R-Codes for Computing the Symmetrizing Transformation Small Sample Adjustment Inverse Gaussian Distribution 6 An Application Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 3 / 49
  • 4. Introduction The transformations along with the approximations are important for both genetic resources data and climate data and appear as a prerequisite for raw data analysis. The earliest consideration of a transformation that stabilizes the variance is due to Fisher (1915, 1922) in proposing Z = tanh−1r and 2χ2 ν − 1 as approximately normalizing transformations of the correlation coefficient r and the χ2 ν variable respectively. Bartlett (1947) introduced variance stabilizing transformations formally for the purpose of utilizing the usual analysis of variance in the absence of homoscedasticity. He showed how to derive these using a differential equation, and as illustrations, confirmed the variance stabilizing character of z and χ2 ν and gave many additional examples including the square root of a Poisson random variable and the function arcsin √ p of the binomial sample proportion p. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 4 / 49
  • 5. Introduction Since then, these transformations have been variously studied and refined essentially with a view to improving normality. Thus, Anscomb (1948) improved √ X of the the Poisson variable X to X + (3/8), arcsin √ p to arcsin (p + 3/8)/(1 + (3/4)), and Hotelling (1953) in his definitive study of the distribution of the correlation coefficient, proposed numerous improvements of Z. Now, we note that even though many variance stabilizing transformations of random variables have near normal distributions and they simplify the inference problems such as confidence interval estimation of the parameter, the stability of variance is not necessary for normality. However, approximate symmetry is clearly a prerequisite of any approximately normalizing transformation. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 5 / 49
  • 6. Introduction Hence, an approximately symmetrizing transformation of a random variable may be a more effective method of normalizing it than stabilizing its variance. Historically, this was first illustrated by Wilson and Hilferty (1931), who showed that the cube root of a chi square variable obtained by them as an approximately symmetrizing power-transformation provides a normal approximation superior to that based on Fisher’s variance stabilizing transformation. Their approach of constructing a skewness reducing power transformation has now been extended to many other distributions, e.g. to non-central chi square by Sankaran (1959), to quadratic forms by Jensen and Solomon (1972), to sample variance from non-normal populations and multivariate likelihood ratio statistics by Mudho1kar and Trivedi (1980, 1981a, 1981b). Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 6 / 49
  • 7. Introduction In this talk, we present the results explored in Chaubey and Mudholkar (1983) with respect to developing a differential equation analogous to Bartlett’s, which gives an approximately symmetrizing transformation. This paper also examines some of the standard transformations in this light. Next we consider the computing aspects of these transformations illustrated for coefficient of variation for normal populations as discussed in Chaubey, Singh and Sen (2014) and indicate its adaptation to inverse Gaussian case. An application in the context of assessing uniformity of two plant varieties is illustrated. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 7 / 49
  • 8. Preliminaries Let Tn be a statistic based on a random sample of size n, constructed to estimate a parameter θ. Further, assume that √ n(Tn − θ) tends to follow N(0, σ2(θ)) as n → ∞. Denote the jth central moment of Tn by µj(θ) = E(Tn − µ(θ))j , j = 1, 2, ... where µ(θ) = E(Tn). Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 8 / 49
  • 9. Preliminaries A smooth function g(Tn), intended for use as a transformation, can be approximated by the Taylor’s expansion as g(Tn) − g(θ) ≈ (Tn − θ)g (θ) + 1 2 (Tn − θ)2 g (θ), (2.1) where g (θ) = dg(θ) dθ and g (θ) = d2g(θ) dθ2 . Hence as a first approximation we have g(Tn) − E[g(Tn)] ≈ (Tn − µ(θ))(g (θ) + ξ1(θ)g (θ)) + 1 2 [(Tn − µ(θ))2 − µ2(θ)]g (θ). (2.2) where ξ1(θ) = µ(θ) − θ. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 9 / 49
  • 10. Preliminaries Define R = g (θ) g (θ) and R1 = R 1 + ξ1(θ)R . then we have from (2.8), approximate expression of the variance (µ2g of g(Tn) µ2g = (g (θ))2 (1 + ξ1(θ)R)2 [µ2(θ) +R1µ3(θ) + 1 4 R2 1(µ4(θ) − µ2 2 (θ))] (2.3) Similarly the third central moment µ3g of Tn (up to order O(1/n2)) can be approximately given by µ3g = (g (θ))3 (1 + ξ1(θ)R)3 µ3(θ) + 3 2 R1(µ4(θ) − µ2 2(θ)) , (2.4) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 10 / 49
  • 11. Variance Stabilizing Transformation where we have omitted terms containing central moments of order higher than 4 (this assumes that the third and fourth central moments are of order O(1/n2) and the higher order moments are of lower order). Variance stabilizing transformation: (See Rao (1973)). (V ST), may now be obtained using (2.3). Ignoring the last two terms, g(.) is an approximate V ST if (g (θ))2µ2(θ) is constant, or, g (θ) = C σ(θ) where C is a constant. Hence g(θ) = C 1 σ(θ) dθ. (2.5) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 11 / 49
  • 12. Symmetrizing Transformation: To derive the symmetrizing transformation (ST), the third moment of g(Xn) given in (2.4) may be equated to zero. Thus for a ST g, µ3(θ) + 3 2 R1(µ4(θ) − µ2 2(θ)) = 0 (2.6) that gives g (θ) g (θ) = − 2 3 µ3(θ) µ4(θ) − µ2 2(θ) , (2.7) where again the term involving ξ1µ3(θ) have been ignored. The solution of this equation can be written as (see Chaubey and Mudholkar, 1983): g(θ) = e−a(θ) dθ (2.8) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 12 / 49
  • 13. A Condition under which VST is ST where a(θ) = 2 3 f1(θ) f2(θ) dθ (3.1) with f1(.) and f2(.) being defined as f1(θ) = µ3(θ), (3.2) f2(θ) = µ4(θ) − µ2 2(θ). (3.3) It is natural to ask if and when can a VST be a ST. Such a condition may be derived by equating µ3(g) = 0 with the g obtained from VST, using Eq (2.7). It can be easily seen that such a condition appears in the equation: 1 σ(θ) {f1(θ) − 3 2 f2(θ) dlnσ(θ) dθ } = 0 That is dlnσ(θ) dθ = 2 3 f1(θ) f2(θ) (3.4) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 13 / 49
  • 14. Standard Transformations We may examine the extent to which some standard VST’s are ST in the light of the above condition. Fisher’s transformation of correlation coeff: Using the results from Hotelling (1953), we have f1(ρ) = −6ρ(1 − ρ2)3/n2, f2(ρ) = 2(1 − ρ2)4/n2 and σ(ρ) = (1 − ρ2). It is easily seen that the condition in Eq(3.4) is satisfied as both sides of the equation equals −2ρ/(1 − ρ2). arcsin Transformation for the Binomial Proportion: For the binomial proportion θ, we have f1(θ) = θ(1 − θ)(1 − 2θ)/n2 f2(θ) = 2θ2(1 − θ)2/n2, and σ(θ) = θ(1 − θ). In this case 2 3 f1(θ) f2(θ) = 1 3 1 − 2θ θ(1 − θ) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 14 / 49
  • 15. Standard Transformations However, dlnσ(θ) dθ = 1 2 1 − 2θ θ(1 − θ) . Hence the condition in (3.4) is not satisfied. This implies that a better normalizing transformation may be available in contrast to the VST, arcsin √ p. Square root transformation for Poisson RV In this case f1(θ) = θ, f2(θ) = θ + 2θ2, σ(θ) = (θ). And 2 3 f1(θ) f2(θ) = 2 3(1 + 2θ) where as dlnσ(θ) dθ = 1 2θ . Again in this case the condition does not hold. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 15 / 49
  • 16. Standard Transformations Chi-square Random Variable Let X be distributed as χ2 nθ. Letting Tn = X/n, We have f1(θ) = 8θ2/n2, f2(θ) = 8θ4/n2 + O(1/n3), and σ(θ) = (2θ). The VST is given by (2Tn). 2 3 f1(θ) f2(θ) = 2 3θ where as dlnσ(θ) dθ = 1 θ and the condition is not satisfied again. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 16 / 49
  • 17. Symmetrizing transformations in Standard Cases The above examples demonstrate that there may be a possibility to get a better normalizing transformation than given by the variance stabilizing transformation. Now we use the differential equation (2.8) to obtain such transformations in the examples discussed above. Correlation Coefficient: In this case g(ρ) = exp[ 2ρ 1 − ρ2 dρ]dρ = 1 1 − ρ2 dρ = 1 2 ln 1 + ρ 1 − ρ (4.1) which is the well known Fisher’s Z transformation that confirms our conclusion reached earlier (see Chaubey and Mudholkar (1984)). Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 17 / 49
  • 18. Symmetrizing transformations in Standard Cases Binomial Proportion: In this case the ST is given by g(θ) = θ−1/3 (1 − θ)−1/3 dθ. (4.2) This equation does not have an explicit solution, however it can be solved numerically. Later on we include a program for finding the ST for coefficient of variation that can be easily adapted here. The ST may be contrasted with the VST given by gv(θ) = θ−1/2 (1 − θ)−1/2 dθ = sin−1√ p. (4.3) Poisson Variable: In this case the ST is given by g(θ) = 3 2 θ2/3 (4.4) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 18 / 49
  • 19. Symmetrizing transformations in Standard Cases Thus the Poisson variable is better normalized by a power transformation with power = 2/3 as compared to the VST with power= 1/2. Chi-square Random Variable: In the set-up considered earlier the symmetrizing transformation is given by g(θ) = e−(2/3)lnθ dθ = 3θ1/3 . (4.5) Thus the symmetrizing transformation for the Chi-square random variable is the well known Wilson-Hilferty cube-root transformation. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 19 / 49
  • 20. VST and ST for Coefficient of Variation These transformations have been investigated well in the literature. Next we report on our recent investigations concerning VST and ST with respect to the coefficient of variation, φ = σ/µ, where σ is the population standard deviation and µ is the population mean, where µ is assumed to be non-negative. It is used in many applied areas as an alternative to the standard deviation. Engineering applications - Signal to Noise Ratio: Kordonsky and Gertsbakh (1997). Agricultural research - Measure of homogeneity of experimental field: Taye and Njuho (2008). - uniformity of a plant variety for seed acceptability: Singh, Niane and Chaubey (2010). Biometry - Measure of reproducibility of observations: Butcher and O’Brien (1991) and Quan and Shih (1996) Economics - a measure of income-diversity: Bedeian and Mossholder (2000). Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 20 / 49
  • 21. VST and ST for Coefficient of Variation Normal Samples: The inference on φ can be dealt with that for θ = 1/φ based on the estimate ˆθ = ¯X/S, where ¯X denotes the mean and S2 the sample variance based on a random sample X1, ..., Xn from N(µ, σ2). Since √ nTn ∼ tν(δ), i.e. a non-central −t. (see Johnson and Kotz 1970) with ν = n − 1 and the non-centrality parameter δ = θ, the central moments of ˆθ [ using the moments of non-central t from Hogben et al. (1961)] are listed below: E(ˆθ) = c11θ, (5.1) µ2(ˆθ) = E(ˆθ − E(ˆθ))2 = c22θ2 + c20 n , (5.2) µ3(ˆθ) = E(ˆθ − E(ˆθ))3 = (c33θ2 + c31 n )θ, (5.3) µ4(ˆθ) = E(ˆθ − E(ˆθ))4 = c44θ4 + c42 n θ2 + c40 n2 , (5.4) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 21 / 49
  • 22. VST and ST for Coefficient of Variation where c11 = ν 2 Γ(ν−1) 2 Γ(ν 2 ) , ν = n − 1, c22 = ν (ν − 2) − c2 11, c20 = ν (ν − 2) , c33 = ν(7 − 2ν) (ν − 2)(ν − 3) + 2c2 11 c11, c31 = 3νc11 (ν − 2)(ν − 3) , c44 = ν2 (ν − 2)(ν − 4) − 2ν(5 − ν)c2 11 (ν − 2)(ν − 3) − 3c4 11, c42 = 6ν (ν − 2) ν (ν − 4) − (ν − 1)c2 11 (ν − 3) , and c40 = 3ν2 (ν − 2)(ν − 4) . Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 22 / 49
  • 23. VST and ST for Coefficient of Variation The above moments can be substituted in the formulae for the functions f1(θ) and f2(θ) in equations (3.2) and (3.3) in order to obtain the symmetrizing transformation. The integral in equation (2.8) is too complex to obtain explicitly and therefore, we shall numerically evaluate it for various values of θ and a given sample size n. We have used the formula S(x) for integration of function s(x) as s(x)dx = S(x) = x 0 s(u)du + S(0). For the ease of accessibility and to impress upon the reader how easy it is to obtain this transformation, the source codes written in R, that were used to compute these values are given in the appendix. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 23 / 49
  • 24. R-Codes for Computing the Symmetrizing Transformation ## Symmetrizing transformation ## Name of the function: fsym ## Arguments: x is the argument at which the function ## is computed ## ss is the sample size ## Output: The value of the symmetrizing function # fsym<-function(x,ss){ # #integral of f1(phi)/f2(phi) f1f2<-function(x,ss){ hfun<-function(phi,ss=ss) { nu<-ss-1;d<-sqrt(ss)*phi c11<-sqrt(nu/2)*gamma((nu-1)/2)/gamma(nu/2) c22<-(nu/(nu-2))-c11^2;c20<-nu/(nu-2) c31<-3*c11*c20/(nu-3);c33<-c11*(2*c11^2 +(nu*(7-2*nu)/((nu-2)*(nu-3))))Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 24 / 49
  • 25. R-Codes for Computing the Symmetrizing Transformation c42<-6*c20*((nu/(nu-4))-((nu-1)*c11^2/(nu-3))) c44<-(c20*nu/(nu-4))-(2*c20*c11^2*(5-nu)/(nu-3))-3*c11^4 mu1<-(c11*d)/sqrt(ss);mu2<-(c22*d^2+c20)/ss mu3<-(c31*d+c33*d^3)/ss^1.5 mu4<-(c40+c42*d^2+c44*d^4)/ss^2 mu3/(mu4-mu2^2)} fval<- integrate(hfun,0,x,ss=ss)$value exp(-2*fval/3)} ## f1f2int<-function(x,ss)sapply(x,f1f2,ss=ss) ## integrate(f1f2int,0,x,ss=ss)$value} Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 25 / 49
  • 26. Symmetrizing transformation 0.00 0.10 0.20 0.30 2.03.04.0 θ g(1θ) n=30 0.00 0.10 0.20 0.30 2.02.53.03.5 θ g(1θ) n=50 0.00 0.10 0.20 0.30 1.82.22.63.0 θ g(1θ) n=100 0.00 0.10 0.20 0.30 1.82.22.6 θ g(1θ) n=200 Figure: 1. Symmetrizing transformation values of the coefficient of variation (θ) for varying values of sample size Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 26 / 49
  • 27. Comparison of ST and VST Chaubey, Singh and Sen (2013) carried out a large scale simulation comparing the VST, ST and UT (untransformed statistic) in terms of their normalizing quality. The VST was studied in Singh (1993)that is available in an explicit form: g(θ) = sinh−1 (Bθ) = ln Bθ + 1 + B2θ2 (5.5) where B = (1 + 3 4ν ) n 2ν . Based on 100,000 simulations, it was concluded that the V ST reduces the skewness as compared to the untransformed statistic but the skewness is still significant even for sample sizes as large as 200. On the other hand the ST reduces skewness to a considerable degree for sample sizes as small as 30. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 27 / 49
  • 28. Comparison of ST and VST For simulating the probability distribution of g(θ) we consider the standardized statistic Zg = g(ˆθ) − E(g(ˆθ)) var(g(ˆθ)) where g(.) is any of the functions associated with symmetrizing, variance stabilizing transformations and no transformation. The expected value E(g(ˆθ)), using the expansion of g(Xn) = ˆθ in (2.1), is obtained as, E(g(Tn)) = g(θ) + g (θ)ξ1(θ) + 1 2 g (θ)(µ2(θ) + ξ2 1(θ)) = g(θ) + g (θ)[ξ1(θ) + R 2 (µ2(θ) + ξ2 1(θ))]. (5.6) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 28 / 49
  • 29. Comparison of ST and VST Note that for computation of the above expectation for ST, R = g (θ)/g (θ) is substituted from (2.7) and g is numerically obtained from g (θ) = exp{− 2 3 θ 0 f1(u) f2(u) du} (5.7) The table of simulated probabilities are given in the next table. It was noted that for sample sizes less than 50, ST does not provide significant improvement to the VST. Hence, an adjustment for small sample sizes was provided as described next. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 29 / 49
  • 30. Table 1. Probability distribution (P (Z ≤ zα))∗ of standardized transforms of CV α CV n Transformation† 0.005 0.025 0.05 0.5 0.95 0.975 0.995 0.1 30 ST 0.003 0.021 0.046 0.514 0.939 0.965 0.990 V ST 0.002 0.016 0.039 0.517 0.944 0.969 0.991 UT 0.000 0.006 0.025 0.547 0.937 0.961 0.985 50 ST 0.004 0.023 0.049 0.504 0.943 0.970 0.993 V ST 0.002 0.018 0.042 0.511 0.945 0.970 0.992 UT 0.001 0.010 0.031 0.533 0.939 0.964 0.987 100 ST 0.005 0.025 0.051 0.502 0.946 0.972 0.994 V ST 0.003 0.020 0.046 0.509 0.946 0.971 0.993 UT 0.001 0.015 0.038 0.523 0.941 0.966 0.990 0.2 30 ST 0.003 0.021 0.045 0.511 0.939 0.966 0.990 V ST 0.002 0.017 0.039 0.514 0.943 0.969 0.991 UT 0.000 0.007 0.025 0.543 0.937 0.961 0.985 50 ST 0.004 0.023 0.048 0.510 0.943 0.970 0.993 V ST 0.002 0.018 0.042 0.516 0.945 0.970 0.992 UT 0.001 0.010 0.031 0.536 0.939 0.963 0.987 100 ST 0.005 0.024 0.049 0.501 0.947 0.973 0.994 V ST 0.003 0.020 0.044 0.508 0.947 0.971 0.993 UT 0.002 0.015 0.037 0.522 0.942 0.966 0.989 0.3 30 ST 0.003 0.022 0.047 0.511 0.941 0.967 0.991 V ST 0.002 0.017 0.040 0.516 0.945 0.969 0.991 UT 0.000 0.007 0.026 0.543 0.938 0.962 0.985 50 ST 0.004 0.025 0.050 0.505 0.943 0.969 0.993 V ST 0.002 0.020 0.043 0.512 0.944 0.969 0.992 UT 0.001 0.012 0.033 0.532 0.938 0.962 0.987 100 ST 0.005 0.025 0.050 0.503 0.947 0.973 0.994 V ST 0.003 0.021 0.045 0.510 0.946 0.971 0.993 UT 0.001 0.015 0.038 0.524 0.942 0.966 0.990 † ST : Symmetrizing transformation. V ST : variance stabilizing transformation. UT : Untransformed. *: zα is such that for Z ∼ N(0, 1), P (Z ≤ zα) = α. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 30 / 49
  • 31. Small Sample Adjustment For adjusting the normal approximation provided by the ST, the technique suggested in Mudholkar and Chaubey (1975), using a mixture approximation was utilized. This technique models the distribution of the standardized statistic ZST = (g(Tn) − E(g(Tn)))/ √ µ2g, denote the standardized version of the ST. Then ZST is modeled as λN(0, 1) (1 − λ) (χ2 ν − ν) √ 2ν where denotes the mixture of the corresponding distributions. The values of ν and λ are obtained by equating the simulated skewness and kurtosis denoted by β1(ST) and β2(ST), respectively, i.e. ν = 8 β1(ST) and λ = 1 − 2 3 β2(ST) − 3 β1(ST) (5.8) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 31 / 49
  • 32. Small Sample Adjustment The lower tail probabilities for ZST can now be approximated as: P(ZST ≤ x) = λΦ(x) + (1 − λ)P(χ2 ν ≤ ν + x √ 2ν) (5.9) The confidence intervals are obtained using the following approximate representation of the quantiles of a mixture distribution in terms of those of its components. Let zα and z∗ α be the α quantiles of the standardized distributions N(0, 1) and χ2 ν −ν √ 2ν respectively. Then the α quantile xα of the mixture distribution is approximated as: xα = λzα + (1 − λ)z∗ α (5.10) where z∗ α is given in terms of the α quantile χ2 ν,α as z∗ α = χ2 ν,α − ν √ 2ν . (5.11) Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 32 / 49
  • 33. Small Sample Adjustment We have used simulated values of β1 and β2 for ST, to develop polynomial approximations in powers of φ and 1/n. Here we used the technique of multiple linear regression including up to quadratic terms as well as their interactions on a grid of 105 combinations of φ and n values that resulted in the following expressions: β1ST ≈ −0.06694 + 8.51908/n + 15.42537/n2 +(0.2456 − 14.69333/n + 155.42357/n2 )φ −(0.25299 − 9.73724/n + 162.48528/n2 )φ2 (5.12) β2ST ≈ 3.02586 − 4.67269/n +209.31385/n2 + (0.16502 − 5.7324/n + 4.18595/n2 )φ −(0.12802 − 5.69879/n + 93.2359/n2 )φ2 (5.13) These models were judged to be adequate under squared multiple correlation coefficients which were 99.6% and 98%, respectively. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 33 / 49
  • 34. Small Sample Adjustment A comparison of probabilities obtained by the mixture approximation using the simulated as well as modeled values of skewness and kurtosis along with corresponding probabilities obtained by simulation (based on 100,000 runs) are presented in Table 2 for θ = 0.1, 0.2, 0.3 and n = 20, 30, 40, 50. It may be seen from this table that the mixture approximation based on modeled skewness (see Eq. (5.12)) and kurtosis (see Eq. (5.13)) gives values reasonably close to those based on their simulated values, and in turn, those are close to the exact probabilities obtained by simulation. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 34 / 49
  • 35. Small Sample Adjustment Table 2. A comparison of the mixture approximation for P (ZST ≤ zα) : (1) By simulation, (2) Mixture approximation with skewness and kurtosis obtained by simulation (3) Mixture approximation with skewness and kurtosis obtained by empirical formulae (Eqs. 5.12 and 5.13). (zα is such that for Z ∼ N(0, 1), P (Z ≤ zα) = α.) Approximation Lower Tail Probability (α) CV n Method 0.005 0.025 0.05 0.5 0.95 0.975 0.995 0.1 20 (1) 0.002 0.017 0.041 0.520 0.935 0.961 0.987 (2) 0.002 0.015 0.037 0.522 0.943 0.968 0.990 (3) 0.002 0.015 0.037 0.522 0.943 0.967 0.990 30 (1) 0.003 0.021 0.046 0.514 0.939 0.965 0.990 (2) 0.004 0.021 0.045 0.509 0.947 0.972 0.993 (3) 0.004 0.021 0.045 0.510 0.947 0.972 0.993 40 (1) 0.004 0.023 0.048 0.508 0.942 0.968 0.992 (2) 0.004 0.023 0.048 0.505 0.948 0.973 0.994 (3) 0.005 0.024 0.048 0.503 0.949 0.974 0.994 50 (1) 0.004 0.023 0.049 0.504 0.943 0.970 0.993 (2) 0.005 0.024 0.049 0.503 0.949 0.974 0.994 (3) 0.004 0.023 0.048 0.504 0.948 0.973 0.994 0.2 20 (1) 0.002 0.018 0.043 0.521 0.935 0.961 0.987 (2) 0.002 0.015 0.038 0.521 0.943 0.968 0.990 (3) 0.002 0.015 0.037 0.523 0.943 0.967 0.990 30 (1) 0.003 0.021 0.045 0.511 0.939 0.966 0.990 (2) 0.004 0.021 0.045 0.509 0.947 0.972 0.993 (3) 0.004 0.021 0.046 0.508 0.947 0.972 0.993 40 (1) 0.004 0.023 0.049 0.507 0.943 0.969 0.992 (2) 0.004 0.023 0.047 0.505 0.948 0.973 0.994 (3) 0.004 0.023 0.047 0.505 0.948 0.973 0.994 50 (1) 0.004 0.023 0.048 0.510 0.943 0.970 0.993 (2) 0.004 0.024 0.048 0.503 0.949 0.974 0.994 (3) 0.005 0.024 0.049 0.502 0.949 0.974 0.995 Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 35 / 49
  • 36. Small Sample Adjustment Table 2. Continued... Approximation Lower Tail Probability (α) CV n Method 0.005 0.025 0.05 0.5 0.95 0.975 0.995 0.3 20 (1) 0.002 0.017 0.041 0.520 0.937 0.963 0.988 (2) 0.002 0.016 0.038 0.521 0.943 0.968 0.990 (3) 0.003 0.017 0.040 0.517 0.944 0.969 0.991 30 (1) 0.003 0.022 0.047 0.511 0.941 0.967 0.991 (2) 0.004 0.021 0.045 0.509 0.947 0.972 0.993 (3) 0.004 0.021 0.046 0.508 0.947 0.972 0.993 40 (1) 0.004 0.022 0.048 0.507 0.941 0.968 0.991 (2) 0.004 0.023 0.047 0.505 0.948 0.973 0.994 (3) 0.004 0.022 0.046 0.507 0.947 0.972 0.993 50 (1) 0.004 0.025 0.050 0.505 0.943 0.969 0.993 (2) 0.004 0.023 0.048 0.504 0.949 0.974 0.994 (3) 0.005 0.024 0.049 0.502 0.949 0.974 0.995 Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 36 / 49
  • 37. Inverse Gaussian Distribution The inverse Gaussian (IG) distribution is regarded as a natural choice for modeling non-negative data in many situations; see Chhikara and Folks (1974). The pdf an IG distribution is given by f(x; µ, λ) = λ 2πx3 e − λ(x−µ)2 2µ2x where x, λ, µ > 0. For this distribution E(X) = µ, V ar(X) = µ3 /λ, CV (X) = µ λ and therefore the ratio ϕ = µ/λ being the squared CV presents an alternative way to parametrize the distribution. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 37 / 49
  • 38. Inverse Gaussian Distribution Based on a random sample X1, X2, ..., Xn from IG(µ, λ), ϕ may be of interest for inference on θ. Its unbiased estimator is given by ˆϕ = ¯XU, where U = 1 n − 1 n i=1 ( 1 Xi − 1 ¯X ). It is known that ¯X and U are independent and ¯X ∼ IG(µ, nλ) and (n − 1)U/λ ∼ χ2 (n−1) These properties may be used to set up the VST and ST in this situation. The details will be communicated in a forthcoming publication. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 38 / 49
  • 39. An Application We compare the 95% confidence intervals for the CV s using data on heights (cm) of n = 30 wheat plants of two varieties (Singh et al. 2010). The sample values were: Variety 1 (Entry 4) : ¯x = 91.7 cm, sd = 6.25cm, CV = 0.06814. Variety 2 (Entry 5): ¯x = 115.03cm sd = 2.63cm, CV = 0.0229 For a general transformation, we have standardised random variate Zg = g(ˆφ) − E(g(ˆφ)) Var(g(ˆφ)) 100(1 − α)% confidence limits are solutions (φL, φU ) of the following equations: g(ˆφ) − E(g(ˆφ)) Var(g(ˆφ)) = xα/2, x1−α/2 Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 39 / 49
  • 40. An Application xα/2, x1−α/2 are obtained using the distribution of Zg as describde earlier: P  xα/2 ≤ g(ˆφ) − E(g(ˆφ)) Var(g(ˆφ)) ≤ x1−α/2   = 1 − α Note that the above equations involve the parameters φ and hence θ in the expected values and variance of all the three transformations, except the variance of variance stabilizing transformation through non-linear functions, the solutions need to be obtained numerically. In our application the uniroot function available in R software was used. For the variance stabilizing transformation and no transformation cases, xα is the α−quantile of the standard normal distribution. For the symmetrizing transformation, the skewness (β1) and kurtosis (β2) were modeled using the equations given in the preceding section. The constants required for the approximations are given in Table 3. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 40 / 49
  • 41. An Application Table 3. Constants for the approximation. Variety n θ β1 β2 λ ν Entry 4 30 0.068157 0.2288 3.1010 0.7056 34.97 Entry 5 30 0.022864 0.2325 3.1022 0.7070 34.41 The values of xα from equation (5.10) are: x0.025 = −1.8907 and x0.975 = 2.0235. The resulting 95% confidence intervals for θ for various transformations are given in Table 4. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 41 / 49
  • 42. An Application Table 4. The 95% confidence intervals of θ. Entry 4 Entry 5 Transformations Lower Upper Width Lower Upper Width Symmetrizing 0.05425 0.09051 0.03636 0.01821 0.03031 0.01210 Variance stabilizing 0.05317 0.09037 0.03720 0.01785 0.03028 0.01242 Untransformed 0.04936 0.08704 0.03767 0.01657 0.02916 0.01259 Vangel’s Approx. 0.05409 0.09106 0.03697 0.01820 0.03072 0.01252 In this example, we note that symmetrizing transformation provides narrower confidence intervals as compared to others. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 42 / 49
  • 43. References Anscombe. F.J. (1948). The transformation of Poisson. Binomial. Negative Binomial data. Biometrika 35, 246-254. Bartlett, M.S. (1947). The use of transformations. Biometrics 1, 39-52. Bedeian, A.G. and Mossholder, K.W. (2000). On the use of the coefficient of variation as a measure of diversity. Organizational Research Methods 3, 285-297. Butcher, J.M. and O’Brien, C. (1991). The reproducibility of biometry and keratometry measurements. Eye 5, 708-711. Chaubey, Y.P. and Mudholkar, G.S. (1983). On the symmetrizing transformations of random variables. Preprint, Concordia University, Montreal. Available at http://spectrum.library.concordia.ca/973582/ Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 43 / 49
  • 44. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 44 / 49
  • 45. References Chaubey, Y.P. and Mudholkar, G.S. (1984). On the almost symmetry of Fisher’s Z. Metron 42(I/II), 165–169. Chaubey, Y. P., M. Singh and D. Sen (2013). On symmetrizing transformation of the sample coefficient of variation from a normal population. Communications in Statistics - Simulation and Computation 42, 2118-2134. Chhikara R. S. and J. L. Folks (1989). The inverse Gaussian distribution. Marcel Dekker, New York. Fisher. R.A. (1915). Frequency distribution of the values of correlation coefficient from an indefinitely large population. Biometrika 10, 507-521. Fisher. R.A. (1922). On the interpretation of χ2 from contingency tables and calculation of ρ. J. Roy. Statist. Soc. Ser. A, 85, 87–94. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 45 / 49
  • 46. References Hogben, D., Pinkham, R.S. and Wilk, M.B. (1961). The moments of the non-central t-distribution. Biometrika 9, 119–127. Hotelling. H. (1953). New light on the correlation coefficient and its transforms. J. Roy. Statist. Soc. Ser. B. 15, 193-224. Jensen, D.R. and Solomon, H. (1972). A Gaussian approximation to the distribution of a quadratic form in normal variables. J. Amer. Statist. Assoc. 67, 898-902. Johnson, N.L. and Kotz, S. (1970). Distributions in statistics: continuous univariate distributions -2, (Chapter 27), New York: John Wiley & Sons. Kordonsky, K.B. and Gertsbakh, I. (1997). Multiple Time Scales and the Lifetime Coefficient of Variation: Engineering Applications. Lifetime Data Analysis 2, 139-156. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 46 / 49
  • 47. References Mudholkar, G.S. and Chaubey, Y.P. (1975). Use of logistic distribution for approximating probabilities and percentiles of Student’s distribution. Journal of Statistical Research 9, 1-9. Mudholkar, G.S. and Trivedi, M.C. (1980). A normal approximation for the distribution of the likelihood ratio statistic in multivariate analysis of variance. Biometrika 67, 485-488. Mudholkar, G.S. and Trivedi, M.C. (1981a). A Gaussian approxiamtion to the distribution of the sample variance for nonnormal Populations. Journal of the American Statistical Association 76, 479485. Mudholkar, G.S. and Trivedi. M.C. (1981b). A normal approximation for the multivariate likelihood ratio statistics. In Statistical Distributions in Scientific Work (C. Taillie, C.P. Patil and A.A. Baldessari, Eds.). Dordrecht: Reidel, Vol. 5, 219-230 Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 47 / 49
  • 48. References Quan,H. and Shih, J. (1996). Assessing reproducibility by the within-subject coefficient of variation with random effects models. Biometrics 52, 1195-1203. Rao, C.R. (1973). Linear Statistical Inference and Its applications, New York: John Wiley. Singh, M. (1993). Behavior of sample coefficient of variation drawn from several distributions. Sankhy¯a 55, 65-76. Singh, M., Niane, A.A., and Chaubey, Y.P. (2010). Evaluating uniformity of plant varieties: sample size for inference on coefficient of variation. Journal of Statistics and Applications 5, 1–13. Sankaran, M.S. (1959). On the noncentral χ2 distribution. Biometrika 46, 235-237. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 48 / 49
  • 49. References Taye, G. and Njuho, P. (2008). Monitoring Field Variability Using Confidence Interval for Coefficient of Variation. Communications in Statistics - Theory and Methods 37, 831–846 Wilson, E.B. and Hilferty. M.M. (1931). The distribution of Chi-square. Proc. Nat. Acad. Sc. ll, 684-688. Yogendra P. Chaubey () Department of Mathematics & Statistics Concordia University 49 / 49