Your SlideShare is downloading. ×

Whole Genome Selection

1,008
views

Published on

Whole Genome Selection

Whole Genome Selection

Published in: Education

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,008
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
103
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Introduction (GS) & Method of Selection ; Where we come from..??Why Genomic Selection ?Factors contribute to success of GS Steps involved in GSGenomic selection prediction modelsGS prediction accuraciesFuture directions of GS Conclusion
  • Selection was played important role in Human-plant co evolution Conditions changed over time…. Resources are constraintThe goal of selection has never changed…the target has changedSelection should happen in short intervals of time with accuracyIf we need to increase the yield and productivity, we need to increase genetic gainGenetic gain / genetic advance; amount of increased performance ie., achieved through a breeding program after each cycle of selection. Selection in plant breeding is usually based on estimates of breeding values obtained with pedigree-mixed models.
  • Selection should happen in short intervals of time with accuracyIf we need to increase the yield and productivity, we need to increase genetic gain.Selection in GS is usually based on Genomic estimated of breeding values. In near future Selections can take place in laboratorySelection in plant breeding is usually based on estimates of breeding values obtained with pedigree-mixed models.
  • Limitations:Traits with low heritabilityTraits that are expressed late in individual’s lifeTraits that cannot be measured easily (ex: disease resistance & quality traits) Time consuming and the rate of breeding is slow  In contrast, the expected increase in INBREEDING among elite populations derived from intense prior selection may also limit the creation of new genetic combinations for future gain. Intermating source populations for genetic recombination may overcome this problem, but delays line development.
  • Limitations  Capture Major QTLs effects, but not Minor QTLs.
  • Prediction of total genetic value using genome-wide dense marker mapsA close look at the livestock breed where GS has been implemented in its full potential, Holstein–Friesian dairy cattle.
  • In recent years, tremendous advancements have been made in the area of plant genomics leading to the dramatic increase in the number of genomic tools and technologies for almost every crop species. Importantly, this Progress has been driven by next generation sequencing- (NGS-) based technologies and high-throughput (HTP) marker genotyping systems that have truly revolutionized the plant genomics. ie. Cost-effective NGS based sequencing technologies such as Roche 454 and Illumina have been successfully employed for de Novo WGS & WGRS, GBS, RAD, NGS Based Marker discovery, The remarkable progress has provided access to the Plethora of genome-wide genetic markers especially single nucleotide polymorphism (SNP) markers which are particularly important, precedes to availability of abundant markers & enormous genomic information . MAS has been successfully employed for improvement of monogenic traits & gene pyramiding programme with Major effect QTLs but inefficient for QTs the often controlled by many small effect & still a considerable proportion of genetic variation remains unexplored or uncounted This failure of MAS & To overcome the deficiencies of MAS access to the plethora of genome-wide genetic markers particularly SNP. To deal with this concern, Modification / variant of MAS was proposed Called as WGS/GS/GWS. Genomic selection is an emerging breeding methodology designed to exploit high-throughput, inexpensive DNA marker information to accurately predict the genetic value of breeding candidates for complex traits. (Lorenz lab)  A marker-based selection approach called genomic selection  Estimated breeding values calculated from phenotypic records and pedigrees, and on knowledge of the heritability of each trait.  GEVBs; prediction of the genetic merit of an individual based on its genome. GEBVs are estimated using the genomic relationship matrix (instead of the pedigree) in combination with the EBV or phenotypes of an individual. There is a wide variety of methods to estimate GEBVs that primarily differ in their assumptions about the genetic architecture of the trait of interest GEBVs sum of all marker effects for individuals  
  • Here p is the number of chromosome segments across the genome, Xi is a design matrix allocating animals to the haplotype effects at segment i, and ∧ i g is the vector of effects of the haplotypes within chromosome segment i.  Genomic selection exploits LD. -Assumption is that effect of haplotypes or markers within chromosome segments will have same effect across the whole population. Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously.
  • How to estimate this EBV..??Traditional Selection strongly depends upon phenotypic observationsWhat is the Breeding value of this cow for milk production?
  • Well, How this GS different form from MAS..?? GS: genome-wide panel of dense markers so that all QTls are in LD with at least one marker  MAS concentrate on a small number of QTLs that are tagged by markers with well verified associations. Marker-assisted selection strategies increase gain mainly through gain per unit time, rather than gain per cycle (Bernardo and Yu, 2007)From Marker-Assisted Selection to Genomic Selection
  • Markers used for GS must be able to tag all loci that explain some of the phenotypic variation of the trait of interest in the selection population. Platforms for genotyping must cover the whole genome sufficiently, implying that linkage disequilibrium (LD) between neighbouring markers and accuracy of genotyping must be high.  Higher marker densities are required if LD is expected to be low, such as in biparental populations, which are common in crop breeding.
  • Genotyping gives us ‘picture/snapshot’ of the genetic makeup of an individual - The more SNP the clearer the pictureHow…?By identifying the markers linked to the trait of interestMABC:Goal is to upgrade an established elite genotype with trait(s) controlled by one or a few loci, backcrossing is used to introgress a single geneMARC:+Information from a large number of markers could be used to estimate breeding values without having a precise knowledge of where specific genes are located on the genome• This new source of information can now (or soon) be used in genetic evaluation by – Combining genotyping data with traditional pedigree and phenotypic recordsCan breeder cut short intervals of time ?
  • The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. In GS main role of phenotyping is to calculate effect of markers & cross validation. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. Basic steps for implementation of GS can be summarized in four steps:(i) designing training populations with complete phenotypic and genotypic data, (ii) estimating marker effects in the training population, (iii) calculating GEBV of new breeding lines with genotype data, and (iv) selection
  • Validation is not theoretically essential for a GS, although it is practically important to confirm the adequacy of a GS model before moving onto the breeding phase.
  • Since, then marker of choice is very important to accurate estimate GEBVBs,&If want to do in polyploidy like wheat / crops with low markers. How can we accomplish ??
  • SNP is a biallelic markerAbundant in numberSNP marker throughout the genome average of one SNP every 1000 bases, 1 in 100 to 300 bases.
  • Precise phenotyping for a trait : accurate prediction GEBVs from a GS model
  • Genotyping by sequencing (GBS) in any large genome species requires reduction of genome complexityEfficient barcoding systemGenotyping-by-sequencing can be applied to different populations or even different species without any prior genomic knowledge as marker discovery is simultaneous with the genotyping of the populationThe use of GBS for GS, therefore, should be applicable to a range of model and Nonmodel crop species to implement genomics-assisted breeding. restriction-site-associated DNA sequencing (RAD-seq) Low coverage sequencing for genotypingThe above methods reduce the proportion of the genome targeted for sequencing so that each marker can be sequenced at high coverage with limited resources, thus enabling markers to be genotyped accurately across many individuals
  • GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2 over an established marker platform for wheat. The prediction accuracies found in this study are sufficiently high to merit implementation of GS in applied breeding program It is unclear why more accurate predictions were observed with GBS than with DArT, even when controlling for marker number. One possibility is that the GBS markers are free of the genotypic ascertainment bias that is found with fixed array genotyping.  No ascertainment bias+ Low per sample costPolymorphism discovery simultaneous with genotyping very good for wheat where polyploidy and duplications cause problems with hybridization/PCR assay GBS has become an attractive alternative technology for genomic selection. Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) array Genotyping-by-sequencing (GBS) is an NGS approach that reduces genome complexity via restriction enzymes   RFMVN EM Cornell University Buckler Group, Genotyping by sequencing or GBS, 1 Million SNPs, $30/sample. Later this year, $10-20/sample + DNA extractionGEBVs More Accurate than Current EBVsThe prediction accuracies found in this study are sufficiently high to merit implementation of GS in applied breeding programs. (Jp)
  • Genomic Prediction: basic ideaChoice of statistical methods for estimating marker Effects also can affect model accuracy. A variety of methods for genomic prediction is currently available. For brevity, we highlight three statistical methods available to train the GS model: ridge regression best linear unbiased prediction (RR-BLUP), Bayes-A, and Bayes-B.
  • (loss of elitebreeding lines)
  • Genomic selection for Fusarium head blight resistance in barleyGenomic selection for winter wheat breedingGenomic selection for soybean breedingGenomic selection for maize silage breeding
  • Ongoing projects
  • It is expected that genomic selection will revolutionize breeding in the next decade
  • Thank you one and all!
  • Transcript

    • 1. Whole Genome Selection ; Theoretical Consideration Raghavendra N.R Ph.D Scholar Plant Breeding & Genetics
    • 2. Presentation Overview 1. 2. 3. 4. 5. 6. Introduction (GS) Why Genomic Selection ? Steps involved in GS Factors contribute to success of GS Future directions of GS Conclusion
    • 3. Method of Selection ; Where we come from..?? Genetic gain /GA; Selection was played important role in Human-plant co evolution ΔG= Accuracy of selection X intensity of selection X genetic standard deviation Generation interval Selection in GS is usually based on Genomic estimated of breeding values. Selections can take place in laboratory
    • 4. Method of Selection ; Where we come from..?? Genetic gain selection QTL/gene Phenotype Breeder Genotype Find markers Find population
    • 5. Traditional Selection Traits with low heritability Traits that are expressed late in individual’s life Traits that can not be measured easily (ex: disease resistance & quality traits) Time consuming and the rate of breeding is slow
    • 6. PS
    • 7. Limitations of MAS “Picking the low hanging fruit” The genes with big QTL effects The major success is only achieved with the qualitative traits The biparental mapping populations used in most QTL studies do not readily translate to breeding applications
    • 8. The term ‘GS’ was first introduced by Haley and Visscher at the 6th World Congress on Genetics Applied to Livestock Production at Armidale, Australia in 1998. Dr. Theo Meuwissen GS was first propounded by Meuwissen et al (2001) : Seminal paper ‘Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-29.”
    • 9. Whole Genome Selection Genomic Selection; an emerging breeding methodology designed to exploit high-throughput, inexpensive DNA marker information to accurately predict the genetic value of breeding candidates for complex traits. EBV; An estimate of the additive genetic merit for a particular trait that an individual will pass on to its descendant's.” GEBVs; Prediction of the genetic merit of an individual based on its genome.
    • 10. Genome Selection Trace all segments of the genome with markers -Capture all QTL = all genetic variance Predict genomic breeding values as sum of effects over all segments Genomic selection exploits LD. Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously.
    • 11. How to estimate Breeding value? X 10 litre What is the Breeding value of this cow for milk production? 0.5 litre 8 litre 10 litre 12 litre Breeding value =h2(milk production-average) = (12-7.625)*h2 = 4.35 litres
    • 12. GS GS : genome-wide panel of dense markers so that all QTls are in LD with at least one marker MAS MAS concentrates on a small number of QTLs that are tagged by markers with well verified associations. 120 cms 15cms
    • 13. Nakaya et al 2012
    • 14. Why Genomic selection important to turn on now..?? Relatively slow progress via phenotypic selection Large cost of phenotyping Limited throughput (plot area, time, people) QTs + small effects Decreasing cost of genotyping Promising results from simulation and cross validation of GS. Meet the challenge of feeding 9.5 billion @ 2050.
    • 15. Pre-requisite for the introduction of GS The need for adequate and affordable genotyping platforms. Relatively simple breeding schemes in which selection of additive genetic effects will generate useful results. Statistical methods.
    • 16. Cont.. Genomic Information
    • 17. How can we do that..? Crops are Concerned Prerequisite Training Population (genotypes + phenotypes) Selection Candidates (genotypes) Accurate phenotypes Inexpensive, high-density genotypes Heffner et al (2009)
    • 18. Training Population Biparental vs. Multi-Family Biparental 1. 2. 3. 4. 5. 6. Population specific Reduced epistasis Reduced number of markers required Smaller training populations required Balanced allele frequencies Best for introgression of exotic Multi-Family 1. 2. 3. 4. 5. Allows prediction across a broader range of adapted germplasm Allows sampling of more E Cycle duration is reduced because retraining model is on-going. Allows larger training populations Greater genetic diversity
    • 19. Genomic Selection
    • 20. Cardinal points for success of GS 1. Population type & size of training population 2. Genotyping Platforms & marker densities. 3. Availability of HD genome wide markers. 4. Appropriate statistical methods for accurate GEBVs. 5. Epistasis & G x E. 6. Linkage disequilibrium 7. Long term selection
    • 21. Marker types & Marker density SNP DArT GBS
    • 22. SNP chip in Genomic selection Single markers (gene) predict in very small differences. Abundant in nature. 1kb-2SNP. Predicting differences in BVs.
    • 23. What sequences we can call as haplotypes? The similar haplotypes will make haplotype block where there will be high LD and less recombination's.
    • 24. Is GBS a suitable marker platform for genomic selection? Obviously ..!!!
    • 25. GBS Elshire et al (2010) GBS accesses regulatory regions and sequence tag mapping. Flexibility and low cost. GBS markers led to higher genomic prediction accuracies. Impute missing data. Highly multiplexed Even for a species with a genome as challenging as wheat (Absence of a reference genome)
    • 26. Poland et al (2011) Statistical model used i. RF ii. MVN EM GBS markers are more uniformly distributed across the genome than the DArT markers
    • 27. GS Prediction Accuracies Number and size of QTLs. LD between marker and QTLs. Marker density, marker type, and training population size. Number of lines increases (accuracy GEBVs ↑)
    • 28. GS Prediction Accuracies Heritability of the trait. Genetic structure of the trait. Simulation study results. Cross-validation; How close is the simulated data to real data?
    • 29. Genomic selection prediction models Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819-29.
    • 30. Stepwise Regression (SR) Select most significant markers on the basis of arbitrary significant thresholds and non significant markers effect equals to zero. (Lande and Thompson, 1990) Estimate the effect of significant markers using multiple regression Since, only a portion of the genetic variance will be captured. Limitations : Detects only large effects and that cause overestimation of significant effects. (Goddard and Hayes, 2007; Beavis, 1998 ) SR resulted in low GEBVs accuracy due to limited detection of QTLs. (Meuwissen et al 2001)
    • 31. Ridge Regression BLUP (RR-BLUP) Simultaneously select all marker effects rather than categorizing into significant or having no effect Ridge regression shrinks all marker effects towards zero. The method makes the assumption that markers are random effects with a equal variance. (Meuwissen et al 2001) Limitations : RR-BLUP incorrectly treats all effects equally which is unrealistic. (Xu et al 2003) RR-BLUP Superior to SR
    • 32. Bayesian Regression (BR) Marker variance treated more realistically by assuming specified prior distribution. BayesA: uses an inverted chi-square to regress the marker variance towards zero. All marker effects are > 0 (Bayes A) BayesB: assume a prior mass at zero, thereby allowing for markers with no effects. Some marker effects can be = 0 (Bayes B) (Meuwissen et al 2001)
    • 33. Other potential Genomic selection prediction models i. Least absolute shrinkage and selection operator (LASSO) ii. Reproducing Kernel Hilbert spaces and support vector machine regression. (RKHS) Gianola et al (2006) iii. Partial Least Squares regression & principle component regression. iv. RF (R package random forest) v. MVN EM Algorithm R-Package for GS http://www.r-project.org
    • 34. A genome of 1000 cM was simulated with a marker spacing of 1 cM
    • 35. Modeling epistasis and dominance Accurate prediction of dominance and epistatic effects fetch advantageous. Lorenza et al pointed out inclusion of epistatic effects in prediction models will give improve accuracy with condition as;  Epistasis is present & can be modelled accurately. Blanc et al (2006) reported that epistasis will contribute to marker effects. Empirical studies harnessing data are illuminating for this topic.
    • 36. GS in relation to strong subpopulation structure GWAS studies, SPS potentially cause spurious long distance / unlinked association b/w marker allele & phenotype. GS, shifts to being able to maintain predictive ability despite a structure training data set & spurious association will not be an important cause for loss of predictive ability. LD is not consistent, allelic effects estimated in one subpopulation will not be predictive for another subpopulation.
    • 37. Long-term selection Improving gain in the long-term necessarily requires a trade-off with short-term gain. Long-term gain is often explicit, as in quantitative genetic models that maximize immediate predicted gain subject to a constraint on the rate of inbreeding. Meuwissen (1997). Two approaches: 1. Select individuals or groups 2. Analytical prediction, deterministic simulation using Numerical approaches to optimization, and stochastic simulation
    • 38. Has proved its value in animal breeding particularly dairy cattle (Hayes and Goddard, 2010) Still to prove its value over generations in crop plants Simulation studies in plants suggest potential for improved gain per unit time. (Jannink et al 2010)
    • 39. Future Directions..??? GS has been seldom implemented in the field Where to apply GS in the breeding cycle (which generations) How many lines to select for genotyping. Where and how do we place our training population in comparison to the selection candidates?
    • 40. Future Directions..??? How many markers are required, determined by the extent of LD. How can we implement non additive effects into our models to allow predictions across multiple generations? How do non-additive effects affect the accuracy of genomic selection. How often to re-estimate the chromosome segment effects?
    • 41. Outstanding questions that remain unanswered..?? How much gain do we expect when using GS? how much potential loss ?? can a breeding program absorb?
    • 42. GS future perspectives Training population design. Epistatic modelling in GS. Strength of different statistical methods. Managing short & long term gain.
    • 43. Further Interest..?? Visit…. Lorenz Lab Department of Agronomy & Horticulture University of Nebraska-Lincoln http://www.lorenzlab.net Rex Bernardo Department of Agronomy and Plant Genetics University of Minnesota
    • 44. Ongoing projects on GS Crop Trait Markers FUNDING AGENCY PROJECT DURATION Tomato Quality, shape, shelf life SNP Barley FHB resistance SNP Univ. of Minnesota 2013 Trifolium Yield SNP Danish plant research and for Aarhus University 2010-2015 Wheat Winter wheat genotype-bysequencing Wheat Breeding Presidential Chair 2014 Maize Drought SNP CIMMYT 2014 Maize Total biomass yield and silage quality SNP USDA-AFRI 2014 Sugar beet White sugar yield, sugar content SNP State Plant Breeding Institute, University of Hohenheim 2013 2009-2013 USDA/AFRI
    • 45. Conclusion “Nothing In Science Has Any Value To Society If It Is Not Communicated”-Anne Roe