Genomic selection, prediction models, GEBV values, genomic selection in plant breeding

Seminar-I
Mahesh Biradar
I PhD
PGS19AGR7965
Dept. of GPB, UASD
“Genomic Selection for crop
improvement ”
1

Specialized form of MAS
Concept introduced by Haley and Visscher at 6th
World Congress on Genetics Applied to Livestock
Production at Armidale, Australia in 1998.
Term GS - Meuwissen et al., 2001: Seminar paper
Prediction of total genetic value using genome-wide
dense marker maps. Genetics., 157: 1819-1829.
Introduction

Indirect selection of desired
allele based on molecular
markers linked to phenotype
Dense markers covering the
entire genome used to
predict the genetic value of a
trait or individual
Conventional
MAS
GS/GWS
QTL1
QTL2
QTL3
How specialized from MAS…?

Schematic representation of Genomic selection

Genomic Selection vs Marker Assisted Selection
Nakaya & Isobe, 2012 8

Process of GS
Estimation of Genomic estimated breeding
value (GEBVs) for individuals having only
genotypic data (breeding population) using a
model that was trained from the individuals
having both genotypic and phenotypic data
(training population)
GEBVs serves as an ideal selection criterion.

1. Development of training population
2. Statistical model development
3. Estimation of GEBVs
4. Cross validation
5. Selection of individuals
Steps involved in GS.…

Training population
Population with phenotypic and genotypic data
It must be representative of the breeding
population
Larger training population size improves the
accuracy of GEBV estimates
May be germplasm lines, bi-parental derived
population (F2, RIL, DH)

TP - Genotyping
 Markers like SNP, DArT, SSRs and GBS (Genotyping
by sequencing) are widely used in GS
 Dominant markers lower accuracy of GEBV prediction
than co-dominant markers
 Inexpensive, high density genotypes
No. of markers…?
 Dense marker coverage to maximize the number of QTL
TP - Phenotyping
 Accurate, replicated and multi-location.

Breeding population
Population with only genotypic data.
Genotyping done for the same markers as in
the training population.
Breeding population derived from the parental
lines that are present in the training
population.

1. Shrinkage models
 SR, RR-BLUP, G-BLUP
2. Dimension reduction methods
 Partial least square regression
 Principal component regression
 Least absolute shrinkage and selection operator
(LASSO)
3. Variable selection models
 Bayes A & B, BayesCπ, BayesDπ
4. Kernel Regression and machine learning methods
 Support vector machine regression (SVM)
 Random Forest (RF)
14
Statistical model development

(Whittacker et al., 2000)
Most widely used statistical models

 Treats marker effect as fixed.
 Only those markers that are associated with
significant effects are retained others discarded.
 Select most significant markers.
 Non-significant marker effects assigns zero values.
(Lande and Thompson, 1990)
Limitations:
 Detects only large effects, that cause overestimation
of significant effects. (Goddard and Hayes, 2007; Beavis,
1998)
 SR resulted in low GEBV accuracy due to limited
detection of QTLs. (Meuwissen et al., 2001)
Stepwise Regression (SR)

 Simultaneously select all marker effects by treating
markers as random effects with equal variance;
rather than categorizing into significant or non
significant.
 It shrinks all marker effects towards zero and over-
shrinks large marker effects.
 Appropriate when there are many QTL with small
effects. (Meuwissen et al., 2001)
 RR-BLUP superior to SR.
Limitation:
 RR-BLUP incorrectly treats all marker effects equally
which is unrealistic. (Xu et al., 2003)
Ridge Regression-BLUP (RR-BLUP)

 Estimates separate variance for each marker
and accommodates marker effects of different
sizes.
 BayesA: uses an inverted chi-square to regress
the marker variance towards zero.
 All marker effects shrinks close to zero but not
zero.
 BayesB: Allows some markers to have zero
effects; while other markers may have effects
more than zero.
(Meuwissen et al., 2001)
Bayesian Regression (BR)

Applications of GS in Plant breeding
Elisabeth Jonas and Dirk-Jan de Koning, 2013
 Example of tested breeding scheme using multiple DH maize populations
 CV was performed using random subsets of different DH lines(in different colors)
 Accuracies of predictions were high and only slight difference existed between
tested methods for estimation of GEBVs
22

Elisabeth Jonas and Dirk-Jan de Koning, 2013
 Study of half-di-allele crosses in maize.
 A total of 4 inbred lines were used to produce half-diallele crosses(104-143 plants
per cross), which were further selfed to F3 and to F3.4.
 The test cross with the opposite cross was phenotyped 23

 To check the model performance or to predict outcomes in the
validation set.
 Done by dividing the data of training set into ‘k’ groups/folds
and again it is subdivided into ‘n’ subsets.
Eg: Five fold cross validation
Cross validation
Subset 1 Subset 2 Subset 3 Subset 4 Subset 5
Fold 1 Training set Training set Training set Training set Validation set
Fold 2 Training set Training set Training set Validation set Training set
Fold 3 Training set Training set Validation set Training set Training set
Fold 4 Training set Validation set Training set Training set Training set
Fold 5 Validation set Training set Training set Training set Training set

 Example of study using six-row barley lines from three different breeding
populations(colored differently), consisting of two subpopulations
 Lines were inbred to at least F4 CV was performed in the final inbred generation
using training and validation sets separated by entry
25

Advantage of GS over PS
 Based on GS, the best
lines could go directly to
the second stage of multi-
location evaluation.
30

Objective: Assessing the predictive efficiency of genomic selection
for seed weight using SCAR markers
Material:
SCAR markers = 79
Soybean varieties = 288
Training population (N = 238)
Validation population (N = 50)

• Genotyped: 79 SCAR markers were genotyped in 288 soybean
varieties.
• Phenotyped: The phenotypic data of these varieties was collected
from CGRIS.
• Trained the model: TP was trained using RR-BLUP and BLR
• Evaluated: The correlation between predicting GEBVs and
the true HSW values in the validation population
was calculated for evaluating prediction efficiency.
• Compared: The GS models for RR-BLUP and BLR were
compared to evaluate the predictive effects of the
two methods.
Method

GEBVs from GS were highly correlated with true breeding values
especially considering the low density of the SCAR, genome-wide.
The maximum relationship values were 0.854 and 0.904.
Results indicated that HSW was controlled by many small-effect genes,
which was more suited to GS than MAS.
Therefore, GS would be suitable for estimation of crop breeding traits in
soybean.
Conclusion

Objective:
• Response resulting from genome-wide
selection compared with MARS
• Extent to which we can minimize the
phenotyping and maximizing the
genotyping.
37

Objective: To report on gains made through GS and to compare breeders practices of developing
improved source populations through S1 test-crosses and subsequent per se selections with
that of GS.
Material:
39

Conclusions
• A positive selection response can be obtained with the use of markers for
grain yield under drought.
• Statistical model used for determining marker effects works in practice
and thus stands validated.
• The use of GEBV-enabled selection of superior plant phenotypes, in the
absence of the target stress, resulted in rapid genetic gains in drought
tolerance in maize.
43

44
Objective:
To report the realized genetic gains of four cycles (C1, C2, C3, and C4),
plus the original training population (C0) in multi-environmental field trials
of RCGS-assisted breeding evaluation.

Material
18 CIMMYT Tropical Maize Inbred lines
Sl. No. Inbred lines
1 CML247
2 CML264
3 CML448
4 CML494
5 CML498
6 CML531
7 CLRCW72
8 CLRCW75
9 CLRCW76
Sl. No. Inbred lines
10 CLRCW93
11 CLRCW100
12 CLRCW260
13 CLWN201
14 CLWN228
15 CLWN229
16 CLWN247
17 CLG2312
18 CLSPLW04
45

46
Methods followed in RCGS
Fig: 1. Breeding scheme used in the MPPs reported in this study
(4800 individuals)
With single-cross tester CML 495/ CML549 from the
complementary heterotic group dent type
(heterotic group “B” flint type kernel)
(955,690 SNPs were generated for each DNA sample)

48
Table 2. Mean of GY (ton ha-1) for each genomic cycle C0, C1, C2, C3, and C4,
broad-sense heritability (H2) and mean of the four testers at Agua Fria
and Tlaltizapan and combined across the two locations

50
Table 3. Means of entry and checks for traits anthesis days (AD, days), silking
days (SD, days), plant height (PH, cm), ear height (EH, cm), and moisture
content (MOI, %) in each cycle across the two locations.

Conclusions
• Results described in this study are the first report of RCGS in MPPs.
• A realized genetic gain of 2% for GY with two rapid cycles per year saves
time and produces efficient genetic gains overall.
• The realized gain achieved in this study was 0.100 ton ha-1 yr-1 when
only GS cycles were considered (C1–C4).
• Other traits were correlated with grain yield, they did not show any
important change after three cycles of RCGS for GY.
• RCGS is a effective breeding strategy for simultaneously conserving
genetic diversity and achieving high genetic gain in a short period of
time. 51

Projects on GS
Crop Trait Markers Funding agency
Tomato Quality, shape,
shelf life
SNP USDA
Barley FHB resistance SNP Univ. of Minnesota
Trifolium Yield SNP Danish plant research
and for Aarhus
University
Wheat Winter wheat Genotype-by-
sequencing
Wheat Breeding
Presidential Chair
Maize Drought SNP CIMMYT
Maize Total biomass
yield and silage
quality
SNP USDA-AFRI
Sugar beet White sugar yield,
sugar content
SNP State Plant Breeding
Institute, University of
Hohenheim

Genomic selection, prediction models, GEBV values, genomic selection in plant breeding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Genomic selection, prediction models, GEBV values, genomic selection in plant breeding

Similar to Genomic selection, prediction models, GEBV values, genomic selection in plant breeding (20)

Recently uploaded

Recently uploaded (20)

Genomic selection, prediction models, GEBV values, genomic selection in plant breeding

Editor's Notes