Upcoming SlideShare
×

# Local Genomic Relationship Matrices

441 views
385 views

Published on

Multiple kernel learning, chromosomal breeding, plant breeding, hierarchical testing designs

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
441
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
12
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Local Genomic Relationship Matrices

1. 1. Locally Epistatic Genetic Relationship Matrices for Improved Genomic Association, Prediction and Selection Deniz Akdemir Cornell University Plant Breeding & Genetics da346@cornell.edu January 24, 2013Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 1 / 42
2. 2. Semi-Parametric Mixed Model Selection in animal or plant breeding is usually based on estimates of genetic breeding values (GEBV) obtained with semi-parametric mixed models (SPMM). A SPMM for the n × 1 response vector y is expressed as y = Xβ + Zg + e (1) where X is the n × p design matrix for the ﬁxed eﬀects, β is a p × 1 vector of ﬁxed eﬀects coeﬃcients, Z is the n × q design matrix for the random eﬀects; the random eﬀects (g′ , e′ )′ are assumed to follow a multivariate normal distribution with mean 0 and covariance 2 σg K 0 0 2 σe I n where K is a q × q kernel matrix.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 2 / 42
3. 3. Kernel Functions A kernel function, k(., .) maps a pair of input points x and x′ into real numbers. A kernel function is by deﬁnition symmetric (k(x, x′ ) = k(x′ , x)) and non-negative. Given the inputs for the n individuals we can compute a kernel matrix K whose entries are Kij = k(xi , xj ). Linear kernel function: k(x; y) = x′ y. Polynomial kernel function: k(x; y) = (x′ y + c)d for c and d ∈ R. Gaussian kernel function: k(x; y) = exp(−(x′ − y)′ (x′ − y)/h) where h > 0.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 3 / 42
4. 4. Kernels, Heritability, Breeding value Taylor expansions of these kernel functions reveal that each of these kernels correspond to a diﬀerent feature map. For the marker based SPMM’s, a genetic kernel matrix calculated using a linear kernel matrix incorporates only additive eﬀects of markers. A genetic kernel matrix based on the polynomial kernel of order k incorporates all of the one to k order monomials of markers in an additive fashion. The Gaussian kernel function allows us to implicitly incorporate the additive and complex epistatic eﬀects of the markers.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 4 / 42
5. 5. Kernels, Heritability, Breeding value Simulation studies and results from empirical experiments show that the prediction accuracies of models with Gaussian or polynomial kernel are usually better than the models with linear kernel. However, it is not possible to know how much of the increase in accuracy can be transfered into better generations because some of the predicted epistatic eﬀects that will be lost by recombination.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 5 / 42
6. 6. Kernels, Heritability, Breeding value Commercial value of a line: overall genetic eﬀect (additive+epistatic) Breeding value of a line: potential for being a good parent (additive) It can be argued that linear kernel model estimates the breeding value where as the Gaussian kernel estimates the commercial value. The breeder can take advantage of some of the epistatic marker eﬀects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using the genetic map information and combine the local main and epistatic eﬀects.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 6 / 42
7. 7. Kernels, Heritability, Breeding value Heritability is deﬁned as the percentage of total variation that can be explained by the genotypic component. One can argue that SPMM’s with linear kernels produce estimates of narrow sense line heritability, and the SPMM’s with Gaussian Kernel produces estimates of broad sense line heritability. We expect that the estimates of local heritability developed in this paper to be between narrow and broad sense heritability.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 7 / 42
8. 8. Multiple Kernel Learning In recent years, several methods have been proposed to combine multiple kernel matrices instead of using a single one. These kernel matrices may correspond to using diﬀerent notions of similarity or may be using information coming from multiple sources. For example, genomic kernel + pedigree kernel, chromosome model, linear mixed models with linear covariance structure. A good review and taxanomy of multiple kernel learning algorithms in the machine learning literature can be found in Gonen and Alpaydin (2011)Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 8 / 42
9. 9. Multiple Kernel Learning Multiple kernel learning methods use multiple kernels by combining them into a single one via a combination function. The most commonly used combination function is linear. Given kernels K1 , K2 , . . . , Kp , a linear kernel is of the form K = η1 K1 + η2 K2 + . . . + ηp Kp . The kernel weights η1 , η2 , . . . , ηp are usually assumed to be positive and this corresponds to calculating a kernel in the combined feature spaces of the individual kernels. We will also assume that the weights sum to one.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 9 / 42
10. 10. Interaction Terms The components of K are usually input variables from diﬀerent sources or diﬀerent kernels calculated from same input variables. The kernel K can also include interaction components like Ki ⊙ Kj , Ki ⊗ Kj , or perhaps −(Ki − Kj ) ⊙ (Kj − Ki ). For example, if KE is the environment kernel matrix and KG is the genetic kernel matrix, then a component KE ⊙ KG can be used to capture the gene by environment interaction eﬀects.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 10 / 42
11. 11. Mixed Models with Heteregenous Covariance Structures Models in Burgue˜o et al (2007) n (Separating main eﬀects of genotype (x genotype) from genotype genotype(x genotype) x environment interaction) Data from g genotypes, s sites, and r replicates y = X + Zg (a+) + Zge (+) + Zr + a ∼ N(0, Ga ); ∼ N(0, Gi ); ∼ N(0, Gae ); ∼ N(0, Gie ); ∼ N(0, R); ∼ N(0, N). R = Σr ⊗ I , N = Σn ⊗ I ; Σr and Σe diagonal. Ga = Σg ⊗ A. Gi = Σi ⊗ A ⊙ A. Gae = Σge ⊗ A. Gie = Σie ⊗ A ⊙ A.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 11 / 42
12. 12. Mixed Models with Heteregenous Covariance Structures Models in Burgue˜o et al (2007) n Why A ⊙ A? ”If one assumes no dominance, all terms will vanish except the terms for additive and additive x additive variances, which will take the form Ci ′ i = 2fi ′ i σa + (2fi ′ i )2 σaa where fi ′ i is the COP between individuals i ′ and 2 2 i, σa 2 is the additive genetic variance, and σ aa is the additive x additive genetic variance. Assuming linkage and identity equilibrium, it seems justiﬁed to use (2fi ′ i )2 , which in matrix notation can be represented by ˜ (A ⊙ A) = A as the coeﬃcient of the additive x additive component (where ⊙ is the element-wise multiplication operator) (Falconer and Mackay, 1996).”Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 12 / 42
13. 13. Learning Kernel Weights Qiu and Lane (2009) propose simple heuristics to select kernel weights in regression problems: 2 rm ηm = p 2 h=1 rh and p h=1 Mh − Mm ηm = p (1 − p) h=1 Mh where rm is the Pearson correlation coeﬃcient between true response and the predicted response and Mm is the mean square error generated by the regression using the kernel matrix Km alone. Another approach in Qiu and Lane (2009) uses the kernel alignment.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 13 / 42
14. 14. Multiple Kernels Using Mixed Models 1 In the context of the SPMM’s we propose using weights that are proportional to the estimated variances attributed to the kernels. One possible approach is to use a SPMM with multiple kernels in the form of y = X β + Z1 g 1 + Z2 g 2 + . . . + Zk g k + e (2) 2 ˆ2 where gj ∼ Nqk (0, σgj Kj ) for j = 1, 2, . . . , k. Let σgj for ˆ2 j = 1, 2, . . . , k and σe be the estimated variance components. Under this model calculate local heritabilities as hm = σgm /( k σgℓ + σe ) for m = 1, 2, . . . , k. 2 ˆ2 ℓ=1 ˆ 2 ˆ2Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 14 / 42
15. 15. Multiple Kernels Using Mixed Models 2 Another model incorporates the marginal variance contribution for each kernel matrix. For this we use the following SPMM: y = X β + Zj gj + Z−j g−j + e (3) 2 where gj ∼ Nqk (0, σgj Kj ) for j = 1, 2, . . . , k. Z−j and g−j correspond to the design matrix and the random eﬀects for the input components other than the ones in group j. In this case calculate local heritabilities as 2 ˆ 2 σ2 ˆ2 ˆ2 hm = σgm /(ˆgm + σg−m + σem ) for m = 1, 2, . . . , k.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 15 / 42
16. 16. Multiple Kernels Using Mixed Models 3 A simpler approach is to use a separate SPMM for each kernel. ˆ2 ˆ2 Let σgm and σem be the estimated variance components from the SPMM model in (1) with kernel K = Km . 2 ˆ 2 σ2 ˆ2 Let hm = σgm /(ˆgm + σem ). Note that, in this case, the markers corresponding to the random eﬀect g−j which mainly accounts for the sample structure can now be incorporated by a ﬁxed eﬀects via their ﬁrst principal components.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 16 / 42
17. 17. Estimation of Parameters After local heritabilities are obtained, calculate the kernel weights η1 , η2 , . . . , ηp as h2 ηm = p m 2 . (4) h=1 hh The estimates of parameters for models in (1), (2) and (3) can be by maximizing the likelihood or the restricted (or residual, or reduced) maximum likelihood (REML). There are very fast algorithms devised for estimating the parameters of the single kernel model in (1).Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 17 / 42
18. 18. Local kernels Regions or groups in the genome and a separate kernel matrix for each group and region. The regions can be overlapping or discrete. If the some markers are associated with each other in terms of linkage or function it might be useful to combine them together. The whole genome can be divided physically into chromosomes, chromosome arms. Further divisions could be based on recombination hot-spots, or just merely based on local proximity. We could calculate a seperate kernel for introns and exons, non coding, promoter or repressor sequences. We can also use a grouping of markers based on their eﬀects on low level traits like lipids, metabolites, gene expressions, or based on their allele frequencies. When no such guide is present one can use a hierarchical clustering of the variables.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 18 / 42
19. 19. Kernels: Kernel Scanning This approach is similar to the one in Kizilkaya where the chromosomes are scanned with windows of 5 consecutive markers. M: q × p matrix of p markers on q lines, which is partitioned with respect to the chromosomes as (M1 , M2 , . . . , Mc ) where Mk has pk columns. Cumulative distances based on LD between markers in each chromosome: pk for k = 1, 2, . . . , c. Based on pk calculate block diagonal p × p kernel matrix S. k column of S is sk . A local kernel matrix Kk at position k involves using diag (sk )1/2 M in kernel matrix calculations. Kernel scanning approach involves calculation of a kernel matrix for selected marker across the genome at each marker location. By adjusting the kernel width parameter, we are able to determine the smoothness and locality of these kernel matrices.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 19 / 42
20. 20. I shrunk the relationship matrices too! When the number of markers in a region is less than the number of individuals in the training set the kernel matrix for this region becomes singular or ill conditioned. In these cases we can use shrinkage approaches to obtain well conditioned positive deﬁnite kernel. Shrinkage towards the identity matrix is not suitable to use with the SPMM since this involves allocating a ﬁxed proportion of the error variance to the variance of the random eﬀects. We instead propose and use shrinkage estimators which aim to introduce sparsity in the oﬀ diagonal elements of the kernel matrix. Many algorithms have been devised for learning sparse covariance matrices in the recent years. Some sparse covariance estimation techniques are implemented in an R package ”huge”.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 20 / 42
21. 21. Sparse relationship matrix In addition to possible decrease of computational burden by use of sparse matrix methods in mixed model parameter estimation, we can produce graphical representations of the kernel matrices. For a normally distributed random vector, the independence between two components is implied by zero covariance between the components, more interestingly, the conditional independence between two components is implied by the zero components in the inverse covariance matrix.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 21 / 42
22. 22. Sparse relationship matrix on graphDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 22 / 42
23. 23. Sparse relationship matrix on graph-GBS-Stem rust dataDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 23 / 42
24. 24. Sparse relationship matrix on graph cap123 FHB DON DataDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 24 / 42
25. 25. Hypothesis Testing 1 2 Under Model (1), twice the log-likelihood of y given the parameters β, σe 2 /σ 2 is, up to a constant, and λ = σg e −1 2 2 (y − X β)′ Vλ (y − X β) L(β, σe , λ) = −n log σe − log |Vλ | − 2 (5) σe where Vλ = In + λZKZ ′ and n is the size of the vector y. Twice the residual log-likelihood (Patterson, Harville) is, up to a constant, ˆ −1 ˆ (y − X βλ )′ Vλ (y − X βλ ) 2 2 RL(β, σe , λ) = −(n−p) log σe −log |Vλ |−log(X ′ Vλ X )− 2 σe (6)Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 25 / 42
26. 26. Hypothesis Testing 2 From both of these likelihoods, a test statistic for the signiﬁcance of the variance component 2 H0 : λ = 0 (σg = 0) 2 HA : λ > 0 (σg > 0) can be obtained by calculating the likelihood ratio statistic (Cox and Hinkley 1974).Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 26 / 42
27. 27. Hypothesis Testing 3 Under standard regularity conditions the null distribution of the likelihood ratio test statistic has a χ2 distribution with degrees of freedom given by the diﬀerence in the number of parameters between the null and alternative hypothesis. However, Self and Liang (1987) showed that the asymptotic distribution is a weighted mixture of χ2 distributions. For the SPMM in (1) they have recommended using a equally weighted mixture of χ2 (0) and χ2 (1) distribution where χ2 (0) distribution refers to a distribution degenerate at 0. In simulation studies Pinheiro and Bates (2000) Morrell (1998) found that equal contributions work well with residual log-likelihood where as a 0.65 to 0.35 mixture works better for the log-likelihood. A ﬁnite sample null distribution was recommended for the SPMM in (1) in Ruppert (2003) and it was shown that the mixture proportions depended on the kernel matrix.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 27 / 42
28. 28. Testing Scheme: Hierarchical testing scheme Relevant regions are divided further subregions and the procedure is repeated to a desired detail level. Nevertheless, suitable hierarchical testing procedures have been developed. Blanchard and Geman (2005) proposed and analyzed several hierarchical designs in terms of their cost / power properties. Multiple testing procedures where coarse to ﬁne hypotheses are tested sequentially have been proposed to control the family wise error rate or false discovery rate (Benjamini and Yekutieli (2003), Meinshausen (2007)). These procedures can be used along the ”keep rejecting until ﬁrst acceptance” scheme to test hypotheses in an hierarchy.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 28 / 42
29. 29. Meinshausen’s hierarchical testing procedure Meinshausen’s hierarchical testing procedure controls the family wise error by adjusting the signiﬁcance levels of single tests in the hierarchy. The procedure starts testing the root node H0 at level α. When a parent hypothesis is rejected one continues with testing all the child nodes of that parent. The signiﬁcance level to be used at each node H is adjusted by a factor proportional to the number of variables in that node: |H| αH = α |H0 | where |.| denotes the cardinality of a set.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 29 / 42
30. 30. Meinshausen’s hierarchical testing procedureDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 30 / 42
31. 31. Simulated Example 1.0 0.80 outhieri\$hypothesistests[(Nparts + 1 + 1):length(outhieri\$hypothesistests)] 0.8 0.75 0.70 0.6 0.65 0.4 0.60 0.2 0.55 0.0 0.50 0 50 100 150 200 1 2 3 IndexDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 31 / 42
32. 32. 0.8 0.7 0.6 0.5 0.4 4 9 16 25 36 49 64 81 100 121 144 169 196 225 linearDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 32 / 42
33. 33. 0.10 0.08 0.06 0.04 0.02 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Figure: Local kernel weights for non overlapping sections24, 2013Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 33 / 42
34. 34. 0.20 0.15 0.10 Hkv 0.05 0.00 0 500 1000 1500 2000 2500 3000 1:length(Hkv)Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 34 / 42
35. 35. Balancing short and long term gains While we may have conﬁdence that GS can accelerate short-term gain, no such conﬁdence is justiﬁed for long-term gain. Weighted GS model of Jean Luc was used so that markers for which the favorable allele had a low frequency should be weighted more heavily to avoid losing such alleles. We recommend using the similar approach which aims to conserve rare but favorable alleles for balancing short-term and long-term gains from GS.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 35 / 42
36. 36. Balancing short and long term gains Let gji for regions j = 1, 2, . . . , k and individuals i = 1, 2, . . . , N be ˆ the EBLUPs of random eﬀect components that correspond to the k local kernels and let pji denote the estimated density value of the ˆ alleles in region j for individual i. A selection criterion in the spirit of Goddart and Jean Luc for the ith individual is then k ˆ gji ci = ˆ . (7) j=1 1 + pjiˆ However, this criterion also down weights favorable but common alleles.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 36 / 42
37. 37. Balancing short and long term gains A weighting scheme that up weights both rare and common favorable alleles. Let gj(n) be the maximum EBLUP value among the individuals for region j ˆ and let ηj be the weight of the kernel j. The selection criterion k ηj ˆ 1 (ˆji − gj(n) )2 g ˆ ˆ2 pji ci = ˆ exp {− 2 + 2 } (8) 2πh1 sd(ˆj )h2 sd(ˆj ) g p 2 h1 var (ˆj ) g h2 var (ˆj ) p j=1 where the constants h1 and h2 are selected by the breeder based on preferences given to short term and long term gains correspondingly.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 37 / 42
38. 38. Balancing short and long term gains Kernel scanning 10 −20 3 380 709 33 31 39 8 92 70 73 49 4 790 94 46 93 419 5 347 69 19 84 1 423 54 Kernel scanning with density weighting 10 −10 0 3 550 214 889 541 129 292 143 964 95 65 270 66 118 614 86 916 60 67 1 311 226 Kernel scanning (best 50) 25 0 10 304 769 226 954 792 727 777 663 146 285 127 200 535 828 54 945 800 Kernel scanning with density weighting (best 50) 8 4 0 186 411 226 117 285 957 849 663 194 828 610 673 773 671 764 341 800Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 38 / 42
39. 39. Final remarks Knowledge transfer between experimentsDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 39 / 42
40. 40. Robust to Missing Data The local kernels use information collected over a region in the genome and thus will not be eﬀected by a few missing or outlier data points so this approach is also robust to missing data and outliers.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 40 / 42
41. 41. Chromosomal Breeding/ gene transplant Breed good chromosomes, combine them.Deniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 41 / 42
42. 42. Chromosomal Breeding/ gene transplantDeniz Akdemir (Plant Breeding & Genetics) Locally Epistatic Genetic Relationship Matrices January 24, 2013 42 / 42