Successfully reported this slideshow.
Upcoming SlideShare
×

# Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Approximate lower bound sampling errors of maximum likelihood estimates of covariance components and their linear functions can be obtained from the inverse of the
information matrix. For non-linear functions, sampling variances are commonly determined as the variance of their first order Taylor series expansions. This is used to obtain sampling errors for estimates of heritabilities and correlations, and these quantities can be computed
with most software performing such analyses. In other instances, however, more complicated functions are of interest or the linear approximation is difficult or inadequate. A pragmatic alternative then is to evaluate sampling characteristics by repeated sampling of parameters from their asymptotic, multivariate normal distribution, calculating the function(s) of interest for each sample and inspecting the distribution across replicates. This paper demonstrates the use of this approach and examines the quality of
approximation obtained.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Sampling based approximation of confidence intervals for functions of genetic covariance matrices

1. 1. Sampling based appr ximation of conﬁdence intervals for functions of genetic covariance matrices Karin Meyer 1 David Houle 2 1 Animal Genetics and Breeding Unit, University of New England, Armidale NSW 2351 2 Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295 AAABG 2013
2. 2. Sampling standard errors | Introduction REML sampling variances REML estimates of covariance components – multivariate normal distribution: ˆθθθ ∼ N (θθθ, I(θθθ)−1) – inverse of information matrix −→ sampling errors – large sample theory; asymptotic lower bounds Linear functions of estimates – sampling variances readily obtained Non-linear functions – obtain 1st order Taylor series expansion – evaluate sampling variance of linear approximation – needs partial derivatives w.r.t. all variables −→ can be complicated / tedious −→ options for evaluating in REML software limited Conﬁdence intervals: ±zα s.e. – misleading at boundary of parameter space? K. M. | 2 / 12 “Delta method”
3. 3. Sampling standard errors | Introduction Alternatives Dealing with boundary conditions – Derive conﬁdence intervals from proﬁle likelihood – Bayesian estimation General procedure – Sample data, repeat analysis −→ distribution over reps – slow & laborious! K. M. | 3 / 12
4. 4. Sampling standard errors | Introduction Alternatives Dealing with boundary conditions – Derive conﬁdence intervals from proﬁle likelihood – Bayesian estimation General procedure – Sample data, repeat analysis −→ distribution over reps – slow & laborious! Objectives 1 Propose new scheme – sample from (theoretical) distribution of estimates – simple & fast 2 Examine quality of approximation of sampling errors K. M. | 3 / 12
5. 5. Sampling standard errors | Method Sampling scheme Large sample theory – (RE)ML estimates have MVN distribution – Sampling covariance ∝ inverse of information matrix Sample from this distribution ˜θθθ ∼ N ˆθθθ, H(ˆθθθ)−1 Information matrix – Use same parameterisation as REML analysis → eliminate linear approx., account for constraints – Evaluate function(s) of interest for ˜θθθ – Examine distribution over replicates K. M. | 4 / 12
6. 6. Sampling standard errors | Method Sampling scheme Large sample theory – (RE)ML estimates have MVN distribution – Sampling covariance ∝ inverse of information matrix Sample from this distribution ˜θθθ ∼ N ˆθθθ, H(ˆθθθ)−1 Information matrix – Use same parameterisation as REML analysis → eliminate linear approx., account for constraints – Evaluate function(s) of interest for ˜θθθ – Examine distribution over replicates Mandel, M. (2013) Simulation-based conﬁdence intervals for functions with complicated derivatives. American Statistician 67, 76–81. K. M. | 4 / 12
7. 7. Sampling standard errors | Simulation Does it work? Simulate two data sets – 4000 animals, 6 traits – h2 = 2 × (0.2, 0.3, 0.4) – σ2 P = 100 – rE = 0.3 – a) rG = 0.5, b) rG = |0.7||i−j| REML analysis – AI algorithm – Cholesky factor Estimates – ˆθθθ – H(ˆθθθ) Compare estimates of sampling variances REML Based on H(ˆθθθ), “Delta” method Empirical Re-sample data using estimates as popul. values, repeat analysis; 10000 replicates Approx. Sample from MVN distribution, N(ˆθθθ, H(ˆθθθ)−1 ) 200000 replicates K. M. | 5 / 12
8. 8. Sampling standard errors | Results Sampling covariances for ˆΣΣΣG - a∗ Empirical vs. REML Approximate vs. REML Approximate vs. Empirical ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ●●●● ●● ●● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ●● ●● ● ●●● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● REML ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●●● ●●●●●●● ● ●● ● ● ● ●● ● ●● ●●● ● ●● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●●● ●●●●●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ●● ●● ● ●●● ● ● ● ● ● ●● ●●● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● REML ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●●● ●●●●●●● ● ●● ● ● ● ●● ● ●● ●●● ● ●● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●●● ●●●●●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ●● ●● ● ●●● ● ● ● ● ● ●● ●●● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● Empirical0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 6 traits, 21 (co)variance components, 231 sampling (co)variances variance, ◦ covariance ∗Case a: all genetic eigenvalues > 0 K. M. | 6 / 12
9. 9. Sampling standard errors | Results Sampling covariances for ˆΣΣΣG - b† Rank 6 Rank 5 ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● 0 5 10 15 0 5 10 15 0 5 10 15 Empirical Approximate Approximation unreliable if model is over-parameterised †Case b: one genetic eigenvalue ≈ 0 K. M. | 7 / 12
10. 10. Sampling standard errors | Results Delta method for ˆrij Estimate elements of Cholesky L factor of ΣΣΣ = LL – H(ˆθθθ)−1 gives Cov(ˆlij,ˆlmn) – covariances between σij Cov(ˆσij, ˆσkl) ≈ f(i,j) t=1 f(k,m) s=1 ˆljt ˆlms Cov ˆlit,ˆlks +ˆljt ˆlks Cov ˆlit,ˆlms +ˆlit ˆlms Cov ˆljt,ˆlks +ˆlit ˆlks Cov ˆljt,ˆlms For ˆrij = ˆσij/ ˆσ2 i ˆσ2 j Var(ˆrij) ≈ 4ˆσ4 i ˆσ4 j Var(ˆσij) + ˆσ2 ij ˆσ4 j Var(ˆσ2 i ) + ˆσ2 ij ˆσ4 i Var(ˆσ2 j ) − 4ˆσij ˆσ2 i ˆσ4 j Cov(ˆσij, ˆσ2 i ) − 4ˆσij ˆσ4 i ˆσ2 j Cov(ˆσij, ˆσ2 j ) + 2ˆσ2 ij ˆσ2 i ˆσ2 j Cov(ˆσ2 i , ˆσ2 j ) / 4ˆσ6 i ˆσ6 j K. M. | 8 / 12
11. 11. Sampling standard errors | Results Approximation for ˆrij Let ΣΣΣ = LL and θθθ = vech(L) For many replicates – Sample ˜θθθ ∼ N(ˆθθθ, H(ˆθθθ)−1 ) – Construct ˜L from ˜θθθ – Calculate ˜ΣΣΣ = ˜L˜L – Calculate correlation ˜rij = ˜σij/ ˜σ2 i ˜σ2 j Evaluate Var(ˆrij) as emprical variance of ˜rij across replicates K. M. | 9 / 12
12. 12. Sampling standard errors | Results Distribution of ˆrG12 - b Empirical 0.5 0.6 0.7 0.8 0.9 1.0 Correlation Approximate 0.5 0.6 0.7 0.8 0.9 1.0 Correlation REML Empirical Approxim. ˆrG12 0.897 0.873 0.866 s.e. 0.059 0.066 0.063 K. M. | 10 / 12
13. 13. Sampling standard errors | Results Distribution of second eigenvalue Empirical 20 30 40 Eigenvalue Approximate 20 30 40 Eigenvalue REML Empirical Approxim. ˆλ2 32.93 33.25 33.84 s.e. – 3.27 3.30 K. M. | 11 / 12
14. 14. Sampling standard errors | Results | Conclusions Conclusions Sampling from MVN distribution – accommodates arbitrary functions – yields good approximation of sampling variances – easier than Delta method for complicated derivatives – more appropriate conﬁdence interval at boundary of parameter space – but: −→ relies on large sample theory −→ information matrix needs to be safely p.d. −→ assumes ˆθθθ ≈ θθθ Simple but useful addition to our toolkit – implemented in WOMBAT K. M. | 12 / 12