Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling based appr ximation of
conﬁdence intervals for functions of
genetic covariance matrices
Karin Meyer 1
David Houle 2
1
Animal Genetics and Breeding Unit, University of New England, Armidale NSW 2351
2
Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295
AAABG 2013

Sampling standard errors | Introduction
REML sampling variances
REML estimates of covariance components
– multivariate normal distribution: ˆθθθ ∼ N (θθθ, I(θθθ)−1)
– inverse of information matrix −→ sampling errors
– large sample theory; asymptotic lower bounds
Linear functions of estimates
– sampling variances readily obtained
Non-linear functions
– obtain 1st order Taylor series expansion
– evaluate sampling variance of linear approximation
– needs partial derivatives w.r.t. all variables
−→ can be complicated / tedious
−→ options for evaluating in REML software limited
Conﬁdence intervals: ±zα s.e.
– misleading at boundary of parameter space?
K. M. | 2 / 12
“Delta method”

Alternatives
Dealing with boundary conditions
– Derive conﬁdence intervals from proﬁle likelihood
– Bayesian estimation
General procedure
– Sample data, repeat analysis −→ distribution over reps
– slow & laborious!
K. M. | 3 / 12

Alternatives
Dealing with boundary conditions
– Derive conﬁdence intervals from proﬁle likelihood
– Bayesian estimation
General procedure
– Sample data, repeat analysis −→ distribution over reps
– slow & laborious!
Objectives
1 Propose new scheme
– sample from (theoretical) distribution of estimates
– simple & fast
2 Examine quality of approximation of sampling errors
K. M. | 3 / 12

Sampling standard errors | Method
Sampling scheme
Large sample theory
– (RE)ML estimates have MVN distribution
– Sampling covariance ∝ inverse of information matrix
Sample from this distribution
˜θθθ ∼ N ˆθθθ, H(ˆθθθ)−1
Information matrix
– Use same parameterisation as REML analysis
→ eliminate linear approx., account for constraints
– Evaluate function(s) of interest for ˜θθθ
– Examine distribution over replicates
K. M. | 4 / 12

Sampling standard errors | Method
Sampling scheme
Large sample theory
– (RE)ML estimates have MVN distribution
– Sampling covariance ∝ inverse of information matrix
Sample from this distribution
˜θθθ ∼ N ˆθθθ, H(ˆθθθ)−1
Information matrix
– Use same parameterisation as REML analysis
→ eliminate linear approx., account for constraints
– Evaluate function(s) of interest for ˜θθθ
– Examine distribution over replicates
Mandel, M. (2013) Simulation-based conﬁdence intervals for
functions with complicated derivatives. American Statistician
67, 76–81.
K. M. | 4 / 12

Sampling standard errors | Simulation
Does it work?
Simulate two data sets
– 4000 animals, 6 traits
– h2
= 2 × (0.2, 0.3, 0.4)
– σ2
P
= 100
– rE = 0.3
– a) rG = 0.5, b) rG = |0.7||i−j|
REML analysis
– AI algorithm
– Cholesky factor
Estimates
– ˆθθθ
– H(ˆθθθ)
Compare estimates of sampling variances
REML Based on H(ˆθθθ), “Delta” method
Empirical Re-sample data using estimates as popul.
values, repeat analysis; 10000 replicates
Approx. Sample from MVN distribution, N(ˆθθθ, H(ˆθθθ)−1
)
200000 replicates
K. M. | 5 / 12

Sampling standard errors | Results
Sampling covariances for ˆΣΣΣG - a∗
Empirical vs. REML Approximate vs. REML Approximate vs. Empirical
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●●
●
●●●●●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
REML
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●●●
●●●●●●●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●●
●
●●●●
●●●●●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
REML
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●●●
●●●●●●●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●●
●
●●●●
●●●●●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
Empirical0
5
10
15
0 5 10 15 0 5 10 15 0 5 10 15
6 traits, 21 (co)variance components, 231 sampling (co)variances
variance, ◦ covariance
∗Case a: all genetic eigenvalues > 0
K. M. | 6 / 12

Sampling covariances for ˆΣΣΣG - b†
Rank 6 Rank 5
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●●●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
0
5
10
15
0 5 10 15 0 5 10 15
Empirical
Approximate
Approximation unreliable if model is over-parameterised
†Case b: one genetic eigenvalue ≈ 0
K. M. | 7 / 12

Delta method for ˆrij
Estimate elements of Cholesky L factor of ΣΣΣ = LL
– H(ˆθθθ)−1
gives Cov(ˆlij,ˆlmn)
– covariances between σij
Cov(ˆσij, ˆσkl) ≈
f(i,j)
t=1
f(k,m)
s=1
ˆljt
ˆlms Cov ˆlit,ˆlks +ˆljt
ˆlks Cov ˆlit,ˆlms
+ˆlit
ˆlms Cov ˆljt,ˆlks +ˆlit
ˆlks Cov ˆljt,ˆlms
For ˆrij = ˆσij/ ˆσ2
i
ˆσ2
j
Var(ˆrij) ≈ 4ˆσ4
i
ˆσ4
j
Var(ˆσij) + ˆσ2
ij
ˆσ4
j
Var(ˆσ2
i
) + ˆσ2
ij
ˆσ4
i
Var(ˆσ2
j
)
− 4ˆσij ˆσ2
i
ˆσ4
j
Cov(ˆσij, ˆσ2
i
) − 4ˆσij ˆσ4
i
ˆσ2
j
Cov(ˆσij, ˆσ2
j
)
+ 2ˆσ2
ij
ˆσ2
i
ˆσ2
j
Cov(ˆσ2
i
, ˆσ2
j
) / 4ˆσ6
i
ˆσ6
j
K. M. | 8 / 12

Approximation for ˆrij
Let ΣΣΣ = LL and θθθ = vech(L)
For many replicates
– Sample ˜θθθ ∼ N(ˆθθθ, H(ˆθθθ)−1
)
– Construct ˜L from ˜θθθ
– Calculate ˜ΣΣΣ = ˜L˜L
– Calculate correlation ˜rij = ˜σij/ ˜σ2
i
˜σ2
j
Evaluate Var(ˆrij) as emprical variance of ˜rij across
replicates
K. M. | 9 / 12

Distribution of ˆrG12 - b
Empirical
0.5 0.6 0.7 0.8 0.9 1.0
Correlation
Approximate
0.5 0.6 0.7 0.8 0.9 1.0
Correlation
REML Empirical Approxim.
ˆrG12 0.897 0.873 0.866
s.e. 0.059 0.066 0.063
K. M. | 10 / 12

Distribution of second eigenvalue
Empirical
20 30 40
Eigenvalue
Approximate
20 30 40
Eigenvalue
REML Empirical Approxim.
ˆλ2 32.93 33.25 33.84
s.e. – 3.27 3.30
K. M. | 11 / 12

Sampling standard errors | Results | Conclusions
Conclusions
Sampling from MVN distribution
– accommodates arbitrary functions
– yields good approximation of sampling variances
– easier than Delta method for complicated derivatives
– more appropriate conﬁdence interval at boundary of
parameter space
– but:
−→ relies on large sample theory
−→ information matrix needs to be safely p.d.
−→ assumes ˆθθθ ≈ θθθ
Simple but useful addition to our toolkit
– implemented in WOMBAT
K. M. | 12 / 12

Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Sampling based approximation of confidence intervals for functions of genetic covariance matrices

Similar to Sampling based approximation of confidence intervals for functions of genetic covariance matrices (20)

Recently uploaded

Recently uploaded (20)

Sampling based approximation of confidence intervals for functions of genetic covariance matrices