This document summarizes a seminar presentation on genomic selection for crop improvement. The key points are:
1. Genomic selection is a specialized form of marker-assisted selection that uses dense molecular markers covering the entire genome to predict the genetic value or breeding value of individuals based on their genotypes.
2. The process of genomic selection involves developing a training population with both genotypic and phenotypic data to train statistical models, estimating genomic estimated breeding values (GEBVs) for individuals in a breeding population based only on their genotypes using the trained models, and selecting best individuals for further breeding.
3. Common statistical models used in genomic selection include ridge regression best linear unbiased prediction, Bayesian regression, and machine learning
Introduction:
Proposed by Meuwissen et al. (2001)
GS is a specialized form of MAS, in which information from genotype data on marker alleles covering the entire genome forms the basis of selection.
The effects associated with all the marker loci, irrespective of whether the effects are significant or not, covering the entire genome are estimated.
The marker effect estimates are used to calculate the genomic estimated breeding values (GEBVs) of different individuals/lines, which form the basis of selection.
Why to go for genomic selection:
Marker-assisted selection (MAS) is well-suited for handling oligogenes and quantitative trait loci (QTLs) with large effects but not for minor QTLs.
MARS attempts to take into account small effect QTLs by combining trait phenotype data with marker genotype data into a combined selection index.
Based on markers showing significant association with the trait(s) and for this reason has been criticized as inefficient
The genomic selection (GS) scheme was to rectify the deficiency of MAS and MARS schemes. The GS scheme utilizes information from genome-wide marker data whether or not their associations with the concerned trait(s) are significant.
GEBV: GenomicEstimated Breeding Values-
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Gene-assisted genomic selection:
A GS model that uses information about prior known QTLs, the targeted QTLs were accumulated in much higher frequencies than when the standard ridge regression was used
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Population used:
Training population: used for training of the GS model and for obtaining estimates of the marker-associated effects needed for estimation of GEBVs of individuals/lines in the breeding population.
Breeding population: the population subjected to GS for achieving the desired improvement and isolation of superior lines for use as new varieties/parents of new improved hybrids.
Training population-
large enough: must be representative of the breeding population: max. trait variance with marker : by cluster analysis
should have either equal or comparable LD, LD decay rates with breeding populations
Updated by including individuals/lines from the breeding population
Training more than one generation
Low colinearity between markers is needed since high colinearity tends to reduce prediction accuracy of certain GS models. (colinearity disturbed by recombination)
Presentation delivered by Dr. Jesse Poland (Kansas State University, USA) at Borlaug Summit on Wheat for Food Security. March 25 - 28, 2014, Ciudad Obregon, Mexico.
http://www.borlaug100.org
Introduction:
Proposed by Meuwissen et al. (2001)
GS is a specialized form of MAS, in which information from genotype data on marker alleles covering the entire genome forms the basis of selection.
The effects associated with all the marker loci, irrespective of whether the effects are significant or not, covering the entire genome are estimated.
The marker effect estimates are used to calculate the genomic estimated breeding values (GEBVs) of different individuals/lines, which form the basis of selection.
Why to go for genomic selection:
Marker-assisted selection (MAS) is well-suited for handling oligogenes and quantitative trait loci (QTLs) with large effects but not for minor QTLs.
MARS attempts to take into account small effect QTLs by combining trait phenotype data with marker genotype data into a combined selection index.
Based on markers showing significant association with the trait(s) and for this reason has been criticized as inefficient
The genomic selection (GS) scheme was to rectify the deficiency of MAS and MARS schemes. The GS scheme utilizes information from genome-wide marker data whether or not their associations with the concerned trait(s) are significant.
GEBV: GenomicEstimated Breeding Values-
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Gene-assisted genomic selection:
A GS model that uses information about prior known QTLs, the targeted QTLs were accumulated in much higher frequencies than when the standard ridge regression was used
The sum total of effects associated with all the marker alleles present in the individual and included in the GS model applied to the population under selection
Calculated on a single individual basis
Population used:
Training population: used for training of the GS model and for obtaining estimates of the marker-associated effects needed for estimation of GEBVs of individuals/lines in the breeding population.
Breeding population: the population subjected to GS for achieving the desired improvement and isolation of superior lines for use as new varieties/parents of new improved hybrids.
Training population-
large enough: must be representative of the breeding population: max. trait variance with marker : by cluster analysis
should have either equal or comparable LD, LD decay rates with breeding populations
Updated by including individuals/lines from the breeding population
Training more than one generation
Low colinearity between markers is needed since high colinearity tends to reduce prediction accuracy of certain GS models. (colinearity disturbed by recombination)
Presentation delivered by Dr. Jesse Poland (Kansas State University, USA) at Borlaug Summit on Wheat for Food Security. March 25 - 28, 2014, Ciudad Obregon, Mexico.
http://www.borlaug100.org
Association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of linkage disequilibrium to link phenotypes to genotypes.Varioius strategey involved in association mapping is discussed in this presentation
Association genetics‟ or ‟association studies,” or ‟linkage disequilibrium mapping”.
Tool to resolve complex trait variation down to the sequence level by exploiting historical and evolutionary recombination events at the population level.
Natural population surveyed to determine MTA using LD.
QTL is a gene or the chromosomal region that affects a quantitative trait, which should be polymorphic (have allelic variation) to have an effect in a population, must be linked to a polymorphic marker allele to be detected. The QTL mapping consists of 4 steps, like the development of mapping population, generation of polymorphic marker data set among the parents, construction of linkage map, and finally the QTL analysis
All the above steps are described in these slides very briefly along with two case studies.
Association mapping approaches for tagging quality traits in maizeSenthil Natesan
Association mapping has been widely used to study the genetic basis of complex traits in human and animal systems and is a very efficient and effective method for confirming candidate genes or for identifying new genes (Altshuler et al., 2008). Association mapping is now being increasingly used in a wide range of plants (Rafalski, 2010), where it appears to be more powerful than in humans or animals (Zhu et al., 2008). Unlike linkage mapping, association mapping can explore all the recombination events and mutations in a given population and with a higher resolution (Yu and Buckler, 2006). However, association mapping has a lower power to detect rare alleles in a population, even those with large effects, than linkage mapping (Hill et al., 2008). Yan et al., (2010) demonstrated that the gene encoding β-carotene hydroxylase 1 (crtRB1) underlies a principal quantitative trait locus associated with β-carotene concentration and conversion in maize kernels has been identified through candidate gene strategy of association mapping.
Multiple inbred founder lines are inter-mated for several generations prior to creating inbred lines, resulting in a diverse population whose genomes are fine scale mosaics of contributions from all founders.
Association mapping, also known as "linkage disequilibrium mapping", is a method of mapping quantitative trait loci (QTLs) that takes advantage of linkage disequilibrium to link phenotypes to genotypes.Varioius strategey involved in association mapping is discussed in this presentation
Association genetics‟ or ‟association studies,” or ‟linkage disequilibrium mapping”.
Tool to resolve complex trait variation down to the sequence level by exploiting historical and evolutionary recombination events at the population level.
Natural population surveyed to determine MTA using LD.
QTL is a gene or the chromosomal region that affects a quantitative trait, which should be polymorphic (have allelic variation) to have an effect in a population, must be linked to a polymorphic marker allele to be detected. The QTL mapping consists of 4 steps, like the development of mapping population, generation of polymorphic marker data set among the parents, construction of linkage map, and finally the QTL analysis
All the above steps are described in these slides very briefly along with two case studies.
Association mapping approaches for tagging quality traits in maizeSenthil Natesan
Association mapping has been widely used to study the genetic basis of complex traits in human and animal systems and is a very efficient and effective method for confirming candidate genes or for identifying new genes (Altshuler et al., 2008). Association mapping is now being increasingly used in a wide range of plants (Rafalski, 2010), where it appears to be more powerful than in humans or animals (Zhu et al., 2008). Unlike linkage mapping, association mapping can explore all the recombination events and mutations in a given population and with a higher resolution (Yu and Buckler, 2006). However, association mapping has a lower power to detect rare alleles in a population, even those with large effects, than linkage mapping (Hill et al., 2008). Yan et al., (2010) demonstrated that the gene encoding β-carotene hydroxylase 1 (crtRB1) underlies a principal quantitative trait locus associated with β-carotene concentration and conversion in maize kernels has been identified through candidate gene strategy of association mapping.
Multiple inbred founder lines are inter-mated for several generations prior to creating inbred lines, resulting in a diverse population whose genomes are fine scale mosaics of contributions from all founders.
Heterotic group “is a group of related or unrelated genotypes from the same or different populations, which display similar combining ability and heterotic response when crossed with genotypes from other genetically distinct germplasm groups.”
Golden Helix’s SNP & Variation Suite (SVS) has been used by researchers around the world to do trait analysis and association testing on large cohorts of samples in both humans and other species. As Next-Generation Sequencing of whole genomes becomes more affordable, large cohorts of Whole Genome Sequencing (WGS) samples are available to search for additional trait association signals that were not found in array-based testing. In fact, recent papers have shown that WGS analysis using advanced GREML (Genomic Relatedness Restricted Maximum Likelihood) techniques is able to outperform micro-array based GWAS methods in the analysis of complex traits and proportion of the trait heritability explained.
Our latest update release of SVS has expanded the exiting maximum likelihood and GRM methods to support these new techniques. We have also enhanced various other association testing and prediction methodologies. This webcast showcases:
- Newly supported analysis workflow for whole genome variants using LD binning and enhanced GBLUP analysis
- Enhanced gender correction using REML
- Additional capabilities for genomic prediction and phenotype prediction
We are continually improving our products based on our customer’s feedback. We hope you enjoy this recording highlighting the exciting new features and select enhancements we have made.
Potential for genomic selection in indigenous cattle breeds and results of GWAS in Gir dairy cattle of Gujrat by Dr.Pravin Kandhani and Dr. Vijay Trivedi KAMDHENU UNIVERSITY GANDHINAGAR
Application of nuclear and genomic technologies for improving livestock produ...ILRI
Presented by Raphael Mrode at the IAEA International Symposium on Sustainable Animal Production and Health—Current Status and Way Forward, Vienna, 28 June-2 July 2021
Longevity is a highly desirable trait that considerably affects overall profitability. With increased longevity, the mean production of the herd increases because a greater proportion of the culling decisions are based on production. Longevity did not receive adequate attention in breeding programs because genetic evaluation for this trait is generally difficult as some animals are still alive at the time of genetic evaluation. Therefore, three basic strategies were suggested to evaluate longevity for cows: Firstly, cow survival to a specific age, which can be analyzed as a binary trait by either linear or threshold models. Secondly, estimating life expectancy of live cows and including these records in a linear model analysis. Thirdly, survival analysis: a method of combining the information of dead (uncensored) and alive (censored) cows in same analysis. This review represents an attempt to shed a light on different strategies of genetic evaluation of longevity in dairy cattle in most of developed countries.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
5. Specialized form of MAS
Concept introduced by Haley and Visscher at 6th
World Congress on Genetics Applied to Livestock
Production at Armidale, Australia in 1998.
Term GS - Meuwissen et al., 2001: Seminar paper
Prediction of total genetic value using genome-wide
dense marker maps. Genetics., 157: 1819-1829.
Introduction
6. Indirect selection of desired
allele based on molecular
markers linked to phenotype
Dense markers covering the
entire genome used to
predict the genetic value of a
trait or individual
Conventional
MAS
GS/GWS
QTL1
QTL2
QTL3
How specialized from MAS…?
9. Process of GS
Estimation of Genomic estimated breeding
value (GEBVs) for individuals having only
genotypic data (breeding population) using a
model that was trained from the individuals
having both genotypic and phenotypic data
(training population)
GEBVs serves as an ideal selection criterion.
10. 1. Development of training population
2. Statistical model development
3. Estimation of GEBVs
4. Cross validation
5. Selection of individuals
Steps involved in GS.…
11. Training population
Population with phenotypic and genotypic data
It must be representative of the breeding
population
Larger training population size improves the
accuracy of GEBV estimates
May be germplasm lines, bi-parental derived
population (F2, RIL, DH)
12. TP - Genotyping
Markers like SNP, DArT, SSRs and GBS (Genotyping
by sequencing) are widely used in GS
Dominant markers lower accuracy of GEBV prediction
than co-dominant markers
Inexpensive, high density genotypes
No. of markers…?
Dense marker coverage to maximize the number of QTL
TP - Phenotyping
Accurate, replicated and multi-location.
13. Breeding population
Population with only genotypic data.
Genotyping done for the same markers as in
the training population.
Breeding population derived from the parental
lines that are present in the training
population.
14. 1. Shrinkage models
SR, RR-BLUP, G-BLUP
2. Dimension reduction methods
Partial least square regression
Principal component regression
Least absolute shrinkage and selection operator
(LASSO)
3. Variable selection models
Bayes A & B, BayesCπ, BayesDπ
4. Kernel Regression and machine learning methods
Support vector machine regression (SVM)
Random Forest (RF)
14
Statistical model development
16. Treats marker effect as fixed.
Only those markers that are associated with
significant effects are retained others discarded.
Select most significant markers.
Non-significant marker effects assigns zero values.
(Lande and Thompson, 1990)
Limitations:
Detects only large effects, that cause overestimation
of significant effects. (Goddard and Hayes, 2007; Beavis,
1998)
SR resulted in low GEBV accuracy due to limited
detection of QTLs. (Meuwissen et al., 2001)
Stepwise Regression (SR)
17. Simultaneously select all marker effects by treating
markers as random effects with equal variance;
rather than categorizing into significant or non
significant.
It shrinks all marker effects towards zero and over-
shrinks large marker effects.
Appropriate when there are many QTL with small
effects. (Meuwissen et al., 2001)
RR-BLUP superior to SR.
Limitation:
RR-BLUP incorrectly treats all marker effects equally
which is unrealistic. (Xu et al., 2003)
Ridge Regression-BLUP (RR-BLUP)
18. Estimates separate variance for each marker
and accommodates marker effects of different
sizes.
BayesA: uses an inverted chi-square to regress
the marker variance towards zero.
All marker effects shrinks close to zero but not
zero.
BayesB: Allows some markers to have zero
effects; while other markers may have effects
more than zero.
(Meuwissen et al., 2001)
Bayesian Regression (BR)
22. Applications of GS in Plant breeding
Elisabeth Jonas and Dirk-Jan de Koning, 2013
Example of tested breeding scheme using multiple DH maize populations
CV was performed using random subsets of different DH lines(in different colors)
Accuracies of predictions were high and only slight difference existed between
tested methods for estimation of GEBVs
22
23. Elisabeth Jonas and Dirk-Jan de Koning, 2013
Study of half-di-allele crosses in maize.
A total of 4 inbred lines were used to produce half-diallele crosses(104-143 plants
per cross), which were further selfed to F3 and to F3.4.
The test cross with the opposite cross was phenotyped 23
24. To check the model performance or to predict outcomes in the
validation set.
Done by dividing the data of training set into ‘k’ groups/folds
and again it is subdivided into ‘n’ subsets.
Eg: Five fold cross validation
Cross validation
Subset 1 Subset 2 Subset 3 Subset 4 Subset 5
Fold 1 Training set Training set Training set Training set Validation set
Fold 2 Training set Training set Training set Validation set Training set
Fold 3 Training set Training set Validation set Training set Training set
Fold 4 Training set Validation set Training set Training set Training set
Fold 5 Validation set Training set Training set Training set Training set
25. Example of study using six-row barley lines from three different breeding
populations(colored differently), consisting of two subpopulations
Lines were inbred to at least F4 CV was performed in the final inbred generation
using training and validation sets separated by entry
25
30. Objective: Assessing the predictive efficiency of genomic selection
for seed weight using SCAR markers
Material:
SCAR markers = 79
Soybean varieties = 288
Training population (N = 238)
Validation population (N = 50)
31. • Genotyped: 79 SCAR markers were genotyped in 288 soybean
varieties.
• Phenotyped: The phenotypic data of these varieties was collected
from CGRIS.
• Trained the model: TP was trained using RR-BLUP and BLR
• Evaluated: The correlation between predicting GEBVs and
the true HSW values in the validation population
was calculated for evaluating prediction efficiency.
• Compared: The GS models for RR-BLUP and BLR were
compared to evaluate the predictive effects of the
two methods.
Method
34. GEBVs from GS were highly correlated with true breeding values
especially considering the low density of the SCAR, genome-wide.
The maximum relationship values were 0.854 and 0.904.
Results indicated that HSW was controlled by many small-effect genes,
which was more suited to GS than MAS.
Therefore, GS would be suitable for estimation of crop breeding traits in
soybean.
Conclusion
35. Objective:
• Response resulting from genome-wide
selection compared with MARS
• Extent to which we can minimize the
phenotyping and maximizing the
genotyping.
37
37. Objective: To report on gains made through GS and to compare breeders practices of developing
improved source populations through S1 test-crosses and subsequent per se selections with
that of GS.
Material:
39
41. Conclusions
• A positive selection response can be obtained with the use of markers for
grain yield under drought.
• Statistical model used for determining marker effects works in practice
and thus stands validated.
• The use of GEBV-enabled selection of superior plant phenotypes, in the
absence of the target stress, resulted in rapid genetic gains in drought
tolerance in maize.
43
42. 44
Objective:
To report the realized genetic gains of four cycles (C1, C2, C3, and C4),
plus the original training population (C0) in multi-environmental field trials
of RCGS-assisted breeding evaluation.
44. 46
Methods followed in RCGS
Fig: 1. Breeding scheme used in the MPPs reported in this study
(4800 individuals)
With single-cross tester CML 495/ CML549 from the
complementary heterotic group dent type
(heterotic group “B” flint type kernel)
(955,690 SNPs were generated for each DNA sample)
46. 48
Table 2. Mean of GY (ton ha-1) for each genomic cycle C0, C1, C2, C3, and C4,
broad-sense heritability (H2) and mean of the four testers at Agua Fria
and Tlaltizapan and combined across the two locations
47. 50
Table 3. Means of entry and checks for traits anthesis days (AD, days), silking
days (SD, days), plant height (PH, cm), ear height (EH, cm), and moisture
content (MOI, %) in each cycle across the two locations.
48. Conclusions
• Results described in this study are the first report of RCGS in MPPs.
• A realized genetic gain of 2% for GY with two rapid cycles per year saves
time and produces efficient genetic gains overall.
• The realized gain achieved in this study was 0.100 ton ha-1 yr-1 when
only GS cycles were considered (C1–C4).
• Other traits were correlated with grain yield, they did not show any
important change after three cycles of RCGS for GY.
• RCGS is a effective breeding strategy for simultaneously conserving
genetic diversity and achieving high genetic gain in a short period of
time. 51
50. Projects on GS
Crop Trait Markers Funding agency
Tomato Quality, shape,
shelf life
SNP USDA
Barley FHB resistance SNP Univ. of Minnesota
Trifolium Yield SNP Danish plant research
and for Aarhus
University
Wheat Winter wheat Genotype-by-
sequencing
Wheat Breeding
Presidential Chair
Maize Drought SNP CIMMYT
Maize Total biomass
yield and silage
quality
SNP USDA-AFRI
Sugar beet White sugar yield,
sugar content
SNP State Plant Breeding
Institute, University of
Hohenheim
Good afternoon everyone welcome u all to my 1st doctoral seminar on genomic selection for crop improvement
My flow of seminar goes like this;
At first I am introducing about the concept of GS
Then the process of Genomic selection; it includes training population, its genotyping and phenotyping, estimation of GEBV’s afterwards
I will explain the Insights into the GS. Then I will tell about the applications of GS i.e., where and when we need to apply the GS in the plant breeding scenario. And I will cover about the research studies on GS and
Finally I will conclude my seminar.
Coming to the introduction. We the plant breeders our major goal is to breed for novel traits and genotypes. We all know Plant breeding is an art and science of improving the genetic make up of crop plants.
During crop breeding we need to carry out different activities. i.e. we need to create the variability its by natural or by artificial means. Naturally through domestication i.e. bringing wild species under human management. Germplasm collection from different countries or locations and introduction of cultivar from a new area where it is not grown earlier. Through artificially we will hybridize between the plants, we will do mutation and polyploidy and we will induce variation in clonally propagated crops means it is somaclonal variation and if we use recombinant DNA technology means it is genetic engineering.
After creating the variability we need to select the right variability which we need for the improvement of particular trait. Here selection is a key step.
It plays a crucial role for the plant breeders. There are two types of selection 1. Natural selection i.e selection by the nature here nature selects based on the survival of the fittest principle. It was proposed by Charles Darwin the other one is artificial selection here selection is by human i.e we select based on our experience and phenotypic observations of a particular trait.
Over the time Hazel and lush given a concept called selection index in this linear combination of characters associated with a particular trait we need to select. It is more reliable compared to single trait selection and increases aggregate genetic gain.
But mere selecting based on phenotype is not precise it may mislead. There may be a chances of selection of not desirable traits or individuals. So, markers were developed.
Conventional selection is based on phenotype so it is called phenotypic selection, here the environment having drastic impacts
Breeders choose good offspring using their experience and the observed phenotypes of crops, so as to achieve genetic improvement of the target traits.
There he considering one trait at a time.
So, in 1942 Hazel and Lush proposed the selection index method, which uses a total score to select for multiple traits simultaneously. It improve the aggregate genetic gain.
With the development of computer science, genetic evaluation methods for analysis of multiple traits combinely also developed.
In 1990’s markers were came to the rescue of plant breeders. As we all known that these molecular markers are the land marks on the chromosome which is use to track the dynamic trait of our interest.
MM are not Crop stage specific
They have simple inheritance and
They are environmentally neutral so they are very effective for selection instead based only on the phenotype
Markers are surrogates for the trait of our interest. If we select any genotype based on the markers data its called as MAS
And this MAS is suitable for traits controlled by small number of major genes.
But most economic traits of crops are complex and affected by a large number of genes, which is having small effect.
SO MAS has also some limitations i.e.
it is effective for only major gene/QTL
Success achieved with only qualitative traits
MAS does not identify minor QTL effects
But, most traits are quantitative in nature and contain both large and small effect QTLs
So, MAS for QTs and small effect QTL has resulted in less genetic gain
So, there is a need for other method which overcomes all these limitations
The method which rectifies all the limitations of MAS is Genomic selection
GS is a specialized form of MAS,
The concept was introduced by Haley and Visscher at 6th world congress on Genetics Applied to Livestock production at Armidale, Australia in the year 1998.
The term GS was coined by Meuwissen in the year 2001 in the seminal paper entitled Prediction of total genetic value using genome-wide dense marker maps.
Published in Genetics journal.
Seminar paper: is a work of original research that presents a specific thesis and is presented to a group of interested people, usually in an academic setting
GS is specialized form of MAS because
In conventional MAS we will indirectly select the desired allele based on the molecular markers linked to the trait interest.
In GS, all the markers or dense markers covering the entire genome is used for the selection, weather they are significant or not
This slide shows the schematic representation of the Genomic selection process
The GS consists of two types of population i.e.,
Training population and Breeding population
The training population is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model.
Genotyping is done with a large number of markers evenly distributed over entire genome
The phenotyping should be accurate, replicated and multi-location data should be there.
Based on the Genotypic and phenotypic data the GS model was trained.
Coming to the Breeding population; it is only genotyped based on the same markers of the training population but no phenotypic evaluation is done.
The genotypic values of the BP was put on the GS model to estimate the GEBV’s for individuals or lines from the marker data
Based on the GEB Values we will select the individuals
In GS main role of phenotyping is to calculate effect of markers & cross validation.
This slide indicates the difference between the GS and MAS. Both consists of Training phase and the breeding phase.
In the training phase of GS phenotyping and genotyping was done to train the GS model.
In MAS training phase mapping population is phenotyped and genotyped to identify QTLs.
In the breeding phase of GS. Based in genotyping and GS model the GEBVs were calculated. The individuals with highest GEBVs were selected in the GS.
In MAS the plants which were having QTLs are selected.
GS is a specialized form of MAS, in which data on marker alleles covering the entire genome forms the basis of selection
Information from genotypic data on all markers covering the entire genome form the basis of selection
Irrespective of weather the effects are significant or not, covering the entire genome are estimated
These markers effects are used to estimate the GEBV’s
The basic steps involved in GS are
Development of training population with complete genotypic and phenotypic data
Statistical model development
Estimation of Genomic Estimated Breeding Values of new breeding lines with genotypic data
Cross validation
Selection of individuals
Training population:
TP is the population with both phenotypic and genotypic data
It must be representative of the BP because it should maximize the proportion of trait variance with the markers. It should be achieved by including the lines with divergent GEBV’s:
Eg: HF TP did not produce accurate GEBV in a jersy popl.
Larger the training population size increases the accuracy of GEBV estimates
It may be Germpalsm lines, biparental derived population i.e., F2, RIL or DH’s.
Which ever the lines or population may be used as TP but make in to a note that it should capture maximum allelic diversity of a trait under study.
For example, Heffner et al. (2011) reported that the average ratio of GS accuracy to PS accuracy for grain quality traits in biparental wheat
populations containing were 0·66, 0·54 and 0·42 for training population sizes of 96, 48 and 24, r espectively.
higher training :breeding population ratio is required with greater genetic diversity, smaller-sized breeding populations, lower heritability of traits and larger numbers of existing QTLs to obtain GEBVs with high accuracy.
Exhibit low collinearity between markers
TP can be genotyped by
Single nucleotide Polymorphism, Diversity Array Technology, Simple Sequence Repeats, Restriction site Associated DNA and Genotyping by sequencing makers are widely used.
Because they can scan genome wide polymorphisms
Codominant markers should be used because it distinguishes the polymorphisms
RAD (Restriction site Associated DNA)
SNPs are
RAD and GBS marker systems that can scan GW polymorphisms in de novo
would bypass the need for prior marker development and rather allow direct genotyping of the training and breeding populations.
thereby also maximizing the number of QTL whose effects will be captured by markers.
Breeding population:
BP is the population with only genotypic data
Genotyping done for the same markers as in the training population
Ideally the breeding population should be derived from the parental lines that are present in the training population.
These are the models to predict the GEBVs
The GEBVs are calculated based on the sum of the effects of markers across the genome
Based on all the markers covering the entire genome we will calculate the GEBVs: these GEBVs forms the selection criterion in the GS.
There are different models of GEBV prediction: They are divided into
Shrinkage models:
Makes the data smaller in size. They are SR, RR-BLUP and G-BLUP
2. Dimension reduction methods:
Reduce the no. of random variables under study, which makes analyzing data much easier and faster. They are Partial least square regression, Principal component regression and LASSO-Least absolute shrinkage and selection operator- CAPTURE SMALL NO OF QTL WITH LARGER EFFECTS.
3. Variable selection models:
Selection of subset of relevant features, i.e., variables or predictors for use in model construction: they are Bays A & B, Bays Cpi and Bays Dpi.
4. Kernel regression:
It is a non parametric technique in statistics used to estimate the conditional expectation of a random variable. Objective is to find out a non-linear relation b/w a pair of random variable X and Y.
5. Machine learning methods:
Construction of algorithms; that can learn from and make predictions. They are SVM and RF: Support vector machine regression and Random forest.
which shrinks the variance towards zero. The models which shrinks the variance towards zero are: SR, RR-BLUP and G-BLUP
…and in the day of high-density markers, this means we probably have many more markers than observations, resulting in the well-known large p, small n problem. This means ordintary least squares cannot be used for estimation, but a variety of other more sophisticated models can be used. The most population is RR-BLUP, where markers are treated as random effects to be sampled from a common distribution. That’s all I’ll say about that.
Most widely used statistical models are SR – Stepwise regression, RR-BLUP – Ridge regression Best Linear unbiased predictors and Bayesian regression.
Genomic Prediction: basic idea
Choice of statistical methods for estimating marker Effects also can affect model accuracy. A variety of methods for genomic prediction is currently available. For brevity, we highlight three statistical methods available to train the GS model: ridge regression best linear unbiased prediction (RR-BLUP), Bayes-A, and Bayes-B.
Select most significant markers on the basis of arbitrary significant thresholds and non significant markers effect equals to zero.
Estimate the effect of significant markers using multiple regression and only a portion of the genetic variance will be captured.
Marker variance treated more realistically by assuming specified prior distribution.
assume a prior mass at zero, thereby allowing for markers with no effects.
Some marker effects can be = 0
Least demanding in terms of computation
GS uses all markers as predictors to achieve assessment and selection in early generations, it reduce the time cost per cycle and shorten the generation interval.
Population design plays a vital role for GS
Germplasm pool can be sampled as the training set for GS, but it may limit scope of prediction
In hybrid breeding breeders evaluate an inbred line not by its phenotype but by its potential to create superior hybrids, thus for selection of desirable hybrids needs field trail.
GS facilitate hybrid breeding by obtaining better hybrids with fewer crosses.
Xing wang suggested to look 2 aspects before implementing the GS. 1. is population design and 2. model design.
Population design should be like that as I told earlier
While designing the model we need to select a model based on the population of selection. If we are using a model which is having dominance and additive effect we need to select the model which explains those effects effectively
And we need to use the model which incorporate the Multi-trait and multi-environment data in to it.
All this accelerate the breeding and increase the genetic gain.
These are the GS methods for GEBV estimation into Parametric and Nonparametric methods.
In parametric methods there RR-BLUP, BayesA, BayesB, BayesC, LASSO, Bayesian LASSO, GBLUP, Elastic net.
The non parametric methods are SVM (Vector method), RKHS(Reproducing Kernel Hilbert Spaces), Random Forest, RBFNN.
Where to apply GS in the breeding cycle(which generations) and how many lines to select for genotyping.
Testing and development of models to implement GS in existing breeding schemes.
In this case training and breeding population will be used in the same generations
To assess the results of statistical model
It is a natural way of assessing model performance from breeder’s perspective
a training set and validation set, models are fitted in the training set, fitted models are used to predict outcomes in the validation set
or to verify the accuracy of GEBV’s
2. Verification of accuracy of GEBV’s on selection candidates :
We will be having predicted GEBV’s based on Genomic selection model, simultaneously we will be conventionally calculating breeding values.
Both will be compared by Pearson's correlation.
An exotic inbred can be crossed with an adapted inbred and the F2 and subsequent generations from this cross can be handled by the two-step procedure described above.
Incase it is desired to continue GS beyond 7-8 cycles.
The plants in cycle 7 or 8 should be genotyped as well as evaluated for testcross performance, marker effects should be estimated afresh, and the new estimates should be used for further GS.
The time efficiency over PS could come from the second cycle of selection, which uses the TP from the previous cycle to predict the new DH lines, thus excluding testcross formation and first-stage multi-location evaluation trials.
Reducing cost upto50%
Saving time by selecting lines directly for stage II instead going for stage I (used in PS)
This significantly reduces the cost of testcross formation and evaluation at each stage of multi-location evaluations.
This is the research study conducted for GS of seed weight based on low-density SCAR markers in soybean
Material consists of
288 soybean varieties : TP: N=238; VP: N=50
For genotyping 79 SCAR markers were used.
For GS prerequisite is TP and BP.
TP is genotyped and phenotyped
Genotyping done by using 79 SCAR markers in 288 soybean varieties
Phenotyping data were collected by the CGRIS – Chinese crop germplasm information system
From both this train the model using RR-BLUP and BLR – Bayesian linear regression
BP was genotyped and the data fed to the GS training model to estimate the GEBVs
Once GEBVs estimated, it was compared to evaluate the predictive effects
This slide indicates the genotyping results of soybean individuals based on SCAR markers.
This indicates the correlation coefficient of GEBVs and TBVs for Hundred seed weights of soybean
The prediction and true value is 0.9042
That is indicated in the small dots
GEBVs were highly correlated with true breeding values by using SCAR markers
The maximum relationship values were 0.854 and 0.904
Results indicated that HSW was controlled by many small-effect genes, which was more suited to GS than MAS
Therefore GS would be suitable for estimation of crop breeding traits in soybean
Vivek and his coworkers published a paper in the year 2017 in the article The Plant Genome entitled Use of Genomic Estimated Breeding Values Results in Rapid Genetic Gains for Drought Tolerance in Maize. The objective of this study was to report the genetic gains made through GS and compare breeders practice of developing source i.e training populations through S1 testcrosses and subsequent per se selections with that of GS.
Material: For this study they used a two bi-parental maize populations referred to as CAP i.e CIMMYT Asia populations. One population is CML470 and CML444 the other population is VL1012767 and CML444 here parent 2 i.e CML444 is common in the table 1 they mentioned the size of F2.3 population CAP1 is 276 with 342 polymorphic SNPs and CAP2 is 178 with 377 polymorphic SNPs.
This slide indicates the GS flowchart of work flow of genomic selection procedures used in development of various improved cycles of selection.
After polymorphism screening the parents were crossed and developed the F1s and those F1s were selfed to form F2’s and those F2’s were again selfed to produce F2.3 populations of each cross of CAP1 and CAP2.
Then the F2.3 families of each population were crossed to tester CML474.
Testcrosses were divided into several trails based on seed availability and were phenotyped under drought and optimal or well watered environments. CAP1 were evaluated in ICRISAT, Sabour and Ludhiana and the CAP2 were evaluated in ICRISAT, Sabour, Belgaum and Davanagere.
Alpha lattice design with two to three replications per location
Drought stress trial conducted during dry season. Drought stress at flowering to mid-grain fill was imposed on the crop by withdrawing subsequent irrigations. Recommended POP was followed in the specific location as recommended by the respective state department of Agriculture. 4m row 0.75m spacing between rows and 0.2 m spacing between plants were maintained and a plant population of 63,636 plants per hectare was maintained. Grain yield was recorded by adjusting the moisture to 12.5 percent. Best Linaera Unbiased Predictors were calculated for each entry to assess the performances across sites and also BLUPs were calculated for plant stand adjusted GY for both the drought and well watered conditions and BLUPs for ASI was calculated at drought situation/condition.
Genotyping was done for the polymorphic markers in the F2.3 families of both CAP1 and CAP2 populations using SNP markers.
Formation of Cycle1: A selection index was calculated based on BLUPs from test cross data and weighted as 35% grain yield under drought, 25% ASI under drought and 40% Grain yield under optimal or well watered conditions. For the formation of CYCLE1 top 10% of the F3 families are selected based on these selection index and then recombined.
Marker effects were calculated by correlating the testcross phenotypic performance with genotypic data of respective F2.3 families using R software.
Formation of Cycle 2: 350 seeds of C1 were planted and they were genotyped. Based on genotyping data GEBVs were calculated. A larger GEBV indicates a favorable plant. 24-30 Plants with top GEBVs were identified and were recombined. Such population is termed as C2(TC-GS) because C2 generated by GS using marker effects generated from testcross data.
C2(PerSe-PS): C2 generated by perse phenotype. C2 plants grown under optimal conditions visually appealing good (stainability/no lodging, vigor, general plant aspect including ear position and disease) and further visually appealing cobs were selected based on ear rots, texture, color, general ear aspects, grain fill and cob size were used for C2 perse-PS.
Superior families identified under drought are recombined under optimal conditions.
Based on average grain yield across the drought locations and two populations, per se performance of C2(PerSe-PS) Representing the phenotypic selection ranged from 32 to 39%.
While that of C2(TC-GS) representing GS ranged from 53 to 59%.
The per se performance of C2(TC-GS) was 10 to 20% better than C2(Perse-PS)
Zhang et al in the year 2017 published a paper in G3 i.e Genes Genomes and Genetics entitled Rapid Cyclig Genomic Selection in a Multiparental Tropical Maize Population.
The objective of there study was to report the genetic gains in four cycles say C1, C2, C3 and C4 and in Original population C0 in multi-environmental field trails of Rapid Cycling Genomic Selection assisted breeding with four checks in two Mexican environments or locations.
The rapid cycling GS experiment was started in 2009. By using 18 CIMMYT Tropical Maize Inbred lines. The list of inbred lines is as follows.
The steps or methods followed in Rapid Cycling GS is presented in the figure 1. The selected CIMMYT Tropical maize inbred lines were used as parents and crossed between them in a half diallele fashion in the year 2010B they were belong to flint type kernel heterotic group.
In 2011A season they intermated the F1s to form S1s there were 4800 individuals. Among them they selected 1000 best ears were selected and planted in ear to row in 2012A season.
They were testcrossed with a tester CML495 from the complementary dent heterotic group. These are the training population (C0) for developing genomic prediction models. The genotyping was done using the SNP markers by GBS platform. The phenotyping was done on the same population and >10 agronomic characters were recorded.
The best 50 families with best plant type, flowering and maturity were selected and planted ear to row, 25 plants per family.
Cycle 1 was formed by intermating 50 selected families. Based on visual evaluation of flowering time, plant type, plant/ear height, well-filled ears and reaction to naturally occurring diseases along with among and within family selection 157 ears were harvested and shelled individually to form C1. In C1 DNA was extracted and genotyping was done and calculated the GEBVs. Based on highest GEBValues top 25 families were selected and intermated to form C2 population. Based on visual evaluation of flowering time, plant type, plant height, well filled ears and reaction to naturally occurring diseases within family selection was implemented. A total of 91 ears were harvested and individually to form C2. DNA was isolated and genotyping was done using the SNP markers. GEBVs caluculated. Based on highest GEBValues top 22 families were selected. They were intermated to form C4 cycle/population. Total 45 cobs were harvested. And these C4 were testcrossed using testers to evaluate the genetic gains across cycles in 2 locations.
individuals the top
The produced F1 is intermated between them they were self pollinated. From those populations testcrosses were made with a tester CML495/CML549. They evaluated the test crosses in 4 optimal locations with location specific checks.
From the each cycle i.e in C0 they selected 50 families, in C1 they selected 25 families, in C2 they selected 18 families and in C3 they selected 22 families
This table indicates the number of families and the individual plants sown, individual plants selected and the plants advanced in each breeding cycle and among family, within family and the total selection intensity.
In the different cycles I.e in CO 1000 families sown
Coming to the results: Table 2 indicates the mean grain yield tons per hectare for each genomic cycle ie. C0, C1, C2, C3 and C4, Broad sense heritability and mean of four testers to 2 locations and combined across two locations. The average genetic gain in Grain yield across cycles was estimated for each location and across locations including the only genomic selection cycles (C1 – C4) and the all selection cycles (C0-C4). They observed the genetic gain in C4 cycle is highest compared to the other cycles in each location and the combined locations.
This table indicates the means of entry and checks for traits anthesis days, silking dates, plant height, ear height and moisture content in each cycle and across the two locations.
On an average the anthesis and silking days did not increase with respect to average of GS cycles C0, C1, C2, C3 and C4. They ranged from 56 days for anthesis and 57 days for silking and showd good synchrony between the flowering times.
However, GS produced taller plants and ear insertions during cycles C3 and C4 compared to C1 and C2.
Grain moisture content did not seem to have affected after the three cycles of Rapid cycle GS.
The genetic diversity was well controlled upto C3 then it is declined.
These are the some examples of GS work recently taken place in different crops. In maize for different traits say for grain yield, anthesis date, ASI, plant height under normal and water stress conditions and for diseases like Northern corn leaf blight and I am working on GS for Fusarium stalk rot in Maize.
These are the projects on GS in and around the world
In tomato GS for quality, shape and shelf life by SNP markers
In Barley GS for FHB resistance by using SNP markers
In Trifolium GS for yield by using SNP markers
In Wheat GS by sequencing
In Maize GS for drought using SNP markers
In Maize GS for total biomass yield and silage quality by SNP markers and
Sugarbeet GS for white sugar yield and sugar content by SNP markers
These projects were funded by different organizations for the improvement of the desired characters.