Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Genomic selection in Livestock

Presented by Raphael Mrode, ILRI, at the workshop on Essential Knowledge for Effective Improvement and Dissemination of Genetics in Sheep and Goats, Addis Ababa, Ethiopia, 3–5 November 2020

  • Be the first to comment

Genomic selection in Livestock

  1. 1. Partner Logo Partner Logo Genomic selection in Livestock Raphael Mrode, ILRI Essential Knowledge for Effective Improvement and Dissemination of Genetics in Sheep and Goats 3 – 5 November 2020 Addis Ababa , Ethiopia
  2. 2. 2 The basic goal: genetic progress and increased productivity  Identifying animals with best genetic merit as parents of the next generation genetic improvement Distributio n of offsprin g p heno ty pes Gene tic im pro vem en t O P P P Distributio n of ph enoty pes in the p are ntal ge neration Anim als selected to be parents P S P P
  3. 3. 3 The basic goal: genetic progress and increased productivity  To achieve this goal we need accurate estimation breeding values (EBVs)  However what is available are phenotypes (Y) which are influenced by genetic and environmental effects  Y = Genetic (G) + Environment (E)  Thus Var(Y) = Var(G) + Var(E)  Sources of Var(E) could be environmental and management factors  Sources of Var(G) due to different forms of inheritance leading to different components of genetic variance (e.g. additive genetic variance, additive maternal genetic variance and so on.  Accurate estimate of G (EBVs) is our challenge
  4. 4. 4 Reality of Field data  Often we deal with field data → variety of environmental factors, animals with different degree of relatedness, many generations and unbalanced  We need framework to model phenotypic observations accounting for non-genetic systematic sources of variation (Var(E)) to estimate EBV accurately  The linear mixed model provides such a framework
  5. 5. 5 Example data and pedigree files • Data file Pedigree file
  6. 6. 6 Linear mixed model In matrix notation, a mixed linear model may be represented as y = Xb + Za + e where y = n x 1 vector of observations; n = number of records. b = p x 1 vector of fixed effects; p = number of levels for fixed (lactation number) effects a = q x 1 vector of random animal effects; q = number of levels for random effects e = n x 1 vector of random residual effects X = design matrix of order n x p, that relates records to fixed effects Z = design matrix of order n x q, that relates records to random animal effects Both X and Z are both termed design or incidence matrices.
  7. 7. 7 Assumptions of the linear mixed model  It is assumed that residual effects are independently distributed with variance σ2 e, therefore, var(e) = Iσ2 e = R; var(a) = G = Iσ2 a or Aσ2 a and A is the numerator relationship matrix. Thus                   R 0 0 G e a V
  8. 8. 8 MME for breeding values  Mixed Model equations (MME) with the relationship matrix incorporated are  with α = σ2 e/σ2 a = 1-h2/h2 = +                              y Z y X a b A Z Z X Z Z X X X 1 ˆ ˆ 
  9. 9. 9 Limitations of A matrix  The relationship matrix A based on pedigree is an average relationship which assumes infinite loci.  Real relationships are a bit different due to finite genome size  Therefore A is the expectation of realized relationships  Two half-sibs might have a correlation of 0.3 or 0.2
  10. 10. 10 Use of microsatellites as markers and limitations  Initially microsatellites were used as genetic markers in the 1980s and 1990s.  Microsatellites are set of short repeated DNA sequences at a particular locus on a chromosome, which vary in number in different individuals and so can be used as markers  Most significant genetic marker can be 10 cM or more from the QTL, therefore QTL are not mapped precisely.  The association between marker and QTL may not persist through the population.  The phase between marker and QTL may have to estimated for each family 10
  11. 11. 11 Single Nucleotide Polymorphism (SNP). • SNP is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome differs between paired chromosomes in an individual. • For example, two sequenced DNA fragments from an individual, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. • In this case we say that there are two alleles: C and T. Almost all common SNPs have only two alleles. 11
  12. 12. 12 SNP
  13. 13. Whole Genome Sequence & Genotyping chips  Began with Human (2001) and mice, WGS of the chicken (2004), the dog (2005), bovine (2006), horse (2007), pig (2009), ...  New technologies for genotyping and sequencing  Simultaneous genotyping of many SNP  From few dozens up to several million SNP  Two main technology providers, Illumina and Affymetrix  Illumina products in cattle  3000 (7000 1000 20000=« LD »)  54 000=« 50k »  777 000=« HD »
  14. 14. 14 Genomic Selection (GS)  GS - the use of genomic breeding values (GEBV), used for the selection of animals.  Genomic selection requires that markers (SNPs) are in linkage disequilibrium (LD) with the QTLs across the whole population  Thus the use of SNPs as markers enables all QTL in the genome to be traced through the tracing of chromosome segments defined by adjacent SNPs.
  15. 15. 15 Steps in Genomic Selection (GS) • Genotype animals with phenotypes (sires with daughters records for sex limited traits) • Estimate SNP solutions (SNP Key) in the reference population • Validate in another data set but records excluded to determine accuracy of SNP key • Genotype animals at birth or young age (no phenotypes) and use SNP key to prediction their GEBV and do selection Reference population Genotyped and phenotyped animals Genotyped but no phenotypes Selection candidates Genotyped & phenotyped but phenotypes excluded Validation candidates
  16. 16. 16 Main advantages of Genomics  Young bulls can be genotyped early in life and breeding values computed  Can be used to select young bulls to be progeny tested, thereby reducing cost  Higher accuracy of about 20-40% for young bulls above parent average  Reduction in generation interval
  17. 17. 17 Genomic Selection : efficiency  Two main factors : • Accuracy of SNP effect estimation • size of reference population • heritability of the trait • statistical methodology used • Linkage Disequilibrium (LD) between markers and QTL • marker density • effective size of the population => number of « independent » segments • Relationship between candidates and reference population
  18. 18. 18 Size of the Reference populaton  Greatly influences the accuracy of genomic evaluations (Goddard, 2008)
  19. 19. 20 25 30 35 40 45 20 25 30 35 40 45 DYD Estimated BV Training set 20 25 30 35 40 45 20 25 30 35 40 45 DYD (proxy of true BV) GEBV Training set Validation set Two important parameters : • R2 or r(DYD, GEBV) (should be « large enough ») • slope of the regression (should be close to 1) Overestimation = “inflation” Validation test
  20. 20. 20 Increasing the size of the Reference populations  Genotype as many progeny tested sires as possible  International collaborations  Holstein: 2 big consortia USA + Canada ~>35000 bulls + UK + Italy Eurogenomics France, The Netherlands, (Germany), Nordic countries, Spain,Poland ~34000 bulls (?)  For small breeds or other species (goats, Sheep, beef cattle : not enough sires  combine with many genotyped cows  About 4 -5 cows records provide equivalent information to one proven sire (Goddard (2009) and Daetwyler et al. (2013) ) 
  21. 21. 21 General linear model The general linear model underlying genomic evaluation is of the form y = Xb + gi + e where m is the number of SNPs ; y is the data vector, b the vector for mean or fixed effects gi the genetic effect of the ith SNP genotype and e is the error. The matrix M is of the dimension n (number of animals) and m, and Mi relates the ith SNPs to data It is assumed that all the additive genetic variance is explained by all the markers effects such that the estimate of animal’s total genetic merit or breeding value (a) is: a = gi.  m i i M  m i i M
  22. 22. 22 Data types used for genomic evaluation • y = YD (Yield deviation) = Individual record corrected for all fixed and non genetic random effects • y = DYD (Daughter yield deviation) = twice average for a bull of all YD of their daughters corrected for ½ genetic merit of their dams (with associated weight = EDC (Equivalent Daughter Contribution • y = de-regressed proofs -- obtained by solving the MME to get the right-hand side • EBVs --- NO
  23. 23. 23 Coding and scaling genotypes • The genotypes of animals (elements of M) are commonly coded as 2 and 0 for the two homozygotes (AA and BB) and 1 for the heterozygote (AB). • Or if alleles are expressed in terms of nucleotides, and reference allele at a locus is G and the alternative allele is C, then code 0 = GG , 1 = GC and 2 = CC. • The diagonal elements of MM’ then indicate the individual relationship with itself (inbreeding) and the off-diagonal indicate the number of alleles shared by relatives
  24. 24. 24 Scaling of genotypes • SNPs → 2 alleles A/B but only one effect defined substitution effect mi • Commonly elements of M are scaled – to set the mean values of the alleles effects to zero – account for differences in allele frequencies of the various SNPS – Let the frequency of the second or alternative allele at locus j be pj – Elements of M can be scaled by subtracting 2pj. – If the element of column j of a matrix P equals 2pj, then matrix Z, which contained the scaled elements of M is : Z = M - P. • Furthermore, the elements of Z be normalised by dividing the column for marker j by its standard deviation assumed to be .
  25. 25. 25 Mixed linear model for computing SNP effect • The most common random model used assumes – the effect of the SNP are normally distributed, – all SNP are from a common normal distribution (eg. the same genetic variance for all SNPs). • There are two equivalent models with these assumptions • (1) SNP-BLUP - a model fitting individual SNP effects simultaneously. – DGV for selection candidates are calculated as DGV = Zĝ, where ĝ are the estimates of random SNP effects. – Assumes σ2 g is known but this may not be the case in practise and σ2 g may be approximated from σ2 a. • (2) GBLUP - a model estimates DGV directly, with a (co) variance among breeding values of G σ2 a, where G is the genomic relationship matrix, the realised proportion of the genome that animals share in common estimated from the SNP.
  26. 26. 26 SNP BLUP model  In matrix form, model is  Y = Xb + Zg + e  Y = vector of observations: these can be de-regressed EBVs, phenotypes corrected for all fixed effects  where g = vector of additive genetic effects corresponding to allele substitution effects for each SNP and Z = scaled matrix of genotypes  MME are below with α = σ2 e/σ2 g                                 y Z y X g b I Z Z X Z Z X X X ˆ ˆ α
  27. 27. 27 SNP-BLUP • If y in MME = de-regressed breeding values of bulls, then – Each observation may be associated with differing reliabilities. – Thus a weighted analysis may be required to account for these differences in bull reliabilities. – Weight (wti) = effective daughter contribution or wti = (1/ reldtr) – 1, where reldtr is the bull’s reliability from daughters with parent information excluded
  28. 28. 28 SNP-BLUP • The MME then are • where R = D and D is a diagonal matrix with diagonal element i = wti. • In practise, the value of σ2 g may not been known and σ2 g could be obtained • either as σ2 g = σ2 a /m, with m = the number of markers • or as σ2 g = σ2 a /2Σpj(1 – pj) • and α = 2Σpj(1 – pj) *[ σ2 e/σ2 a]                                           y R Z y R X g b I Z R Z X R Z Z R X X R X 1 1 1 1 1 1 ˆ ˆ α
  29. 29. 29 Example 1 FAT SNP Animal Sire Dam Mean EDC DYD Genotype 13 0 0 1 558 9.0 2 0 1 1 0 0 0 2 1 2 14 0 0 1 722 13.4 1 0 0 0 0 2 0 2 1 0 15 13 4 1 300 12.7 1 1 2 1 1 0 0 2 1 2 16 15 2 1 73 15.4 0 0 2 1 0 1 0 2 2 1 17 15 5 1 52 5.9 0 1 1 2 0 0 0 2 1 2 18 14 6 1 87 7.7 1 1 0 1 0 2 0 2 2 1 19 14 9 1 64 10.2 0 0 1 1 0 2 0 2 2 0 20 14 9 1 103 4.8 0 1 1 0 0 1 0 2 2 0 21 1 3 1 13 7.6 2 0 0 0 0 1 2 2 1 2 22 14 8 1 125 8.8 0 0 0 1 1 2 0 2 0 0 23 14 11 1 93 9.8 0 1 1 0 0 1 0 2 2 1 24 14 10 1 66 9.2 1 0 0 0 1 1 0 2 0 0 25 14 7 1 75 11.5 0 0 0 1 1 2 0 2 1 0 26 14 12 1 33 13.3 1 0 1 1 0 2 0 1 0 0
  30. 30. 30 Example 1 • The observations are the daughter yield deviations for fat yield and the effective daughter contribution (EDC) for each bull is also given. • The EDC can be used as weights in the analysis but will ignore for this presentation • It is assumed the genetic variance for fat yield is 35.241kg2 and residual variance of 245kg2 • Animals 13 to 20 as assumed as the reference population and 21 to 26 as validation candidates. • SNP effects are predicted using using all 10 SNPs. • The incidence matrix X = Iq , with q = 8, the number of animals in the reference population
  31. 31. 31 Computing the matrices we need • The incidence matrix X = Iq , with q = 8, the number of animals in the reference population • X’ = [ 1 1 1 1 1 1 1 1] • The computation of Z requires calculating the allele frequency for each SNP.
  32. 32. 32 Computing Matrices • The allele frequency for the ith SNP was computed as with n = 14, the number of animals with genotypes and mij are elements of M. • Allele frequencies 0.321, 0.179, 0.357, 0.357, 0.143, 0.607, 0.071, 0.964, 0.571 and 0.393 respective. • Using those frequencies 2Σpj(1 – pj) = 3.5383. Thus α = 3.5383*(245/35.242) = 24.598 n * 2 m n j ij 
  33. 33. 33 Z matrix • Z= M – P and is • We have computed X and Z. • Remaining matrices X’Z and Z’X and Z’Z are computed by multiplication. Then add Iα to Z’Z then MME are formed. • When solved we these solutions:                                                                   0.786 0.857 0.071 0.143 0.214 0.286 0.714 0.286 0.643 0.643 0.786 0.857 0.071 0.143 0.786 0.286 0.286 0.286 0.357 0.643 0.214 0.857 0.071 0.143 0.786 0.286 0.286 0.714 0.643 0.357 1.214 0.143 0.071 0.143 1.214 0.286 1.286 0.286 0.643 0.643 0.214 0.857 0.071 0.143 0.214 0.286 0.286 1.286 0.357 0.643 1.214 0.143 0.071 0.143 1.214 0.714 0.286 1.286 0.643 0.357 0.786 0.143 0.071 0.143 0.786 0.286 0.714 0.714 0.357 0.357 1.214 0.143 0.071 0.143 1.214 0.286 0.286 0.286 0.357 1.357 Z
  34. 34. 34 Computing GEBVs • Solutions • ----------------------- • Mean effect • • 9.944 • • SNP effects (ĝ) • 1 0.087 • 2 -0.311 • 3 0.262 • 4 -0.080 • 5 0.110 • 6 0.139 • 7 0.000 • 8 0.000 • 9 -0.061 • 10 -0.016 • The SNP solutions are also called as the SNP key
  35. 35. 35 GEBVs for Validation animals • The DGV for the reference animals (animals 13- 20) is then computed as Zĝ. • For the validation animals (animals 21 -26) , DGV = Z2ĝ where Z2 contains the centralised genotypes for the validation candidates
  36. 36. 36 Solutions for validation animals                                                                                                                    016 . 0 061 . 0 000 . 0 000 . 0 139 . 0 110 . 0 080 . 0 262 . 0 311 . 0 087 . 0 786 . 0 143 . 1 929 . 0 143 . 0 786 . 0 286 . 0 286 . 0 286 . 0 357 . 0 357 . 0 786 . 0 143 . 0 071 . 0 143 . 0 786 . 0 714 . 0 286 . 0 714 . 0 357 . 0 643 . 0 786 . 0 143 . 1 071 . 0 143 . 0 214 . 0 714 . 0 714 . 0 714 . 0 357 . 0 357 . 0 214 . 0 857 . 0 071 . 0 143 . 0 214 . 0 286 . 0 714 . 0 286 . 0 643 . 0 643 . 0 786 . 0 143 . 1 071 . 0 143 . 0 786 . 0 714 . 0 286 . 0 0714 357 . 0 643 . 0 214 . 1 143 . 0 071 . 0 857 . 1 214 . 0 286 . 0 714 . 0 714 . 0 357 . 0 357 . 1 ˆ ˆ ˆ ˆ ˆ ˆ 26 25 24 23 22 21 a a a a a a                       354 . 0 054 . 0 143 . 0 240 . 0 114 . 0 027 . 0
  37. 37. 37 GBLUP  Equivalent model to SNP-BLUP  BLUP MME but with A-1) replaced by G-1  The DGV is computed directly as the sum of the SNP effects(a = Zg)  Model is  y = Xb + Wa + e  where a = vector of DGVs and W is the design matrix linking records to animals  Matrix X is as defined before and W is an identity matrix ( a diagonal matrix with all diagonal elements = 1)
  38. 38. 38 GBLUP  Given that a = Zg  Then var(a) = ZZ’σ2 g.  Note that σ2 g =  then the matrix ZZ’ can be scaled such that   G =   and var(a) = Gσ2 a .  Division by 2Σpi(1−pi) makes G analogous to A.   ) p (1 p 2 σ j j 2 a    ) p (1 p 2 Z Z j j
  39. 39. 39 G matrix from 42K SNPs Gall = 13 0.957 14 -0.108 0.973 15 0.452 -0.116 1.182 16 0.209 -0.058 0.424 1.025 17 0.234 -0.083 0.425 0.312 1.037 18 -0.040 0.438 0.097 -0.047 -0.043 1.151 symmetric 19 -0.089 0.458 0.039 -0.067 -0.070 0.426 1.175 20 -0.093 0.460 0.053 -0.058 -0.063 0.432 0.707 1.183 21 0.077 -0.082 0.064 0.104 0.082 -0.071 -0.069 -0.069 1.031 22 -0.056 0.418 0.093 -0.046 -0.038 0.408 0.355 0.342 -0.044 1.139 23 -0.005 0.464 -0.038 -0.035 -0.038 0.206 0.223 0.215 0.011 0.280 0.993 24 -0.070 0.468 0.075 -0.027 -0.053 0.403 0.521 0.550 -0.079 0.424 0.260 1.198 25 -0.052 0.416 0.098 -0.009 -0.031 0.386 0.363 0.342 -0.038 0.370 0.219 0.419 1.125 26 -0.070 0.493 -0.084 -0.039 -0.044 0.258 0.241 0.270 -0.072 0.253 0.178 0.259 0.214 1.009
  40. 40. 40 A matrix for the same individuals 13 1.008 14 0.033 1.037 15 0.545 0.021 1.041 16 0.288 0.021 0.536 1.016 17 0.285 0.031 0.541 0.293 1.020 18 0.047 0.580 0.036 0.028 0.032 1.062 19 0.033 0.613 0.021 0.021 0.031 0.365 1.095 symmetric 20 0.033 0.613 0.021 0.021 0.031 0.365 0.613 1.095 21 0.099 0.031 0.082 0.118 0.074 0.028 0.031 0.031 1.021 22 0.046 0.586 0.032 0.031 0.039 0.351 0.373 0.373 0.044 1.068 23 0.096 0.569 0.067 0.043 0.047 0.329 0.357 0.357 0.042 0.338 1.050 24 0.041 0.574 0.027 0.019 0.026 0.331 0.406 0.406 0.028 0.335 0.335 1.056 25 0.033 0.548 0.035 0.039 0.039 0.315 0.336 0.336 0.037 0.321 0.310 0.310 1.029 26 0.035 0.588 0.023 0.024 0.039 0.337 0.376 0.376 0.036 0.347 0.341 0.348 0.325 1.070
  41. 41. 41 GBLUP • MME are • where α now equals σ2 e/σ2 a . Solutions for example in previous table • Advantages: – Existing software for genetic evaluation can be used by replacing A with G – systems of equations are of the size of animals which tend to be fewer than the number of SNP. – In pedigreed populations G discriminates among sibs, and other relatives, capture information on Mendelian sampling. – method is attractive for populations without good pedigree as G will capture this information among the genotyped individuals                                            y R W y R X a b G W R W X R W W R X X R X 1 1 1 1 1 1 ˆ ˆ 1α
  42. 42. 42 Solutions for the example data • Reference Animals • 13 0.069 • 14 0.116 • 15 0.049 • 16 0.260 • 17 -0.500 • 18 -0.359 • 19 0.146 • 20 -0.231 • • Selection or validation candidates • 21 0.028 • 22 0.115 • 23 -0.240 • 24 0.143 • 25 0.054 • 26 0.353
  43. 43. 43 Single Step Method GBLUP computes genomic breeding values only for genotyped animals. How can non-genotyped animals benefit from genomic information Let g2 be the genetic (genomic) values of genotyped animals and g1 the genetic values of non genotyped animals An estimate of g1 based on genomic information is obtained by regression of g1 on g2 and added to information from BLUP through the usual MME
  44. 44. 44 Single Step Method • We define variance of vector of g1 (non-genotyped) and g2 (genotyped) H = Variance of 1 2       g g   1 1 1 11 12 11 12 22 22 22 21 12 22 1 21 22 22 21 =                    H H A A A G A A A A A G H H H GA A G non genotyped genotyped
  45. 45. 45 Single Step Method • Model is just as before but uses all data (genotyped and ungenotyped): • MME are the usual but with A-1 replaced with H-1 • Surprisely, H-1 has simple form:    y X Za e 1                        ' ' ' ' ' ' X X X Z 1 y Z y X g Z Z Z H
  46. 46. better lives through livestock ilri.org ILRI thanks all donors and organizations who globally supported its work through their contributions to the CGIAR Trust Fund
  47. 47. CRP and CG logos

×