The CIMMYT Global Maize Program: Progress and Challenges

2,473 views
2,305 views

Published on

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,473
On SlideShare
0
From Embeds
0
Number of Embeds
219
Actions
Shares
0
Downloads
110
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The existence of genotype-by-environment interactions in the TPE may indicate that subdivision is necessary. However, it is not always true that small differences between environments require specifically-adapted varieties or that selection response can be increased by dividing the TPE. Whether to select across the TPE or separately in each subregion has to be validated for each breeding program individually.Same data used, but restricted to rDS and WW
  • The CIMMYT Global Maize Program: Progress and Challenges

    1. 1. The CIMMYT Global Maize Program:Progress and ChallengesGary Atlin and the GMP teamEl Batan22 June 2012
    2. 2. Outline1. The role of GMP in the world’s maize seed system2. How do our products compare to those of the multi-nationals?3. Adaptation to mega-environments: implications for breeding4. The role of managed stress testing in the breeding pipeline5. Identifying donors and delivering markers for abiotic and biotic stress tolerance6. Applying high density genotyping to maize breeding and managing the “data tsunami”7. An “open-source” model for delivering the benefits of high-density genotyping and genomic selection to small breeding programs8. Some things to watch out for
    3. 3. 1. CIMMYT’s role in the world’s maize seed system. Only source of freely available maize parental lines Our products support dozens of independent regional seed companies in Africa, Latin America, and Asia Our products help local companies compete with multinationals We provide direct support to seed companies in the commercialization of our hybrids (DTMA, IMIC) We are a key source of donors of drought tolerance and disease resistance
    4. 4. CIMMYT’s maize breeding effort• Africa: 5 line development breeders, 2 molecular breeders, 4 seed specialists, 1 physiologist, 2 biotic stress specialists• Latin America: 4 line development breeders, 1 physiologist, 1 nutritional specialist, 1 molecular breeder, 1 seed specialist• India: 1 line development breeder, 1 physiologist• China1 molecular breeding lead, 1 pathologist/breeder , 1 bioinformaticist, 2 molecular geneticists• ca. 10,000 lines genotyped with 500K SNPs via GBS• ca. 5000 DH lines produced in 2011 in-house• ca. 400,000 nursery and yield plots world-wide*• At least 2 million phenotypic data points annually• At least 25 billion genotypic data points annually
    5. 5. How do our products get tofarmers?• Hybrids are marketed mainly through regional seed companies• OPVs are distributed mainly through national subsidy schemes
    6. 6. 2. Where do we stand relative to the multinationals?Latin American tropics: PCCMCA trial 2011 (21 locations) Grain Bad Husk Yield Cover Hybrid Pedigree (t/ha) (%) Ear rot (%) MJ-9297 8.08 7.2 5.2 MH-9058 8.02 5.1 6.8 DK-357 Best Commercial Check 7.72 8.0 9.8 CIMMYT-2 CLRCW100/CLRCW96//CML494 7.68 5.2 9.5 CIMMYT-4 CML491/CLQ6316//CLRCWQ48 7.36 5.8 10.8 P4092W 6.93 3.3 6.5 P4063W 6.79 4.8 8.0 Heritability 0.91 0.89 0.91 LSD (0.05) 0.29 1.6 1.7
    7. 7. Regional tropical hybrid trial (PCCMCA), 2009: 28 public and privatesector hybrids, 18 locations in Mexico and Central America % % % ear Root StemPedigree Yield rot lodging lodgingP-4082W 7.25 7.45 3.42 5.27DK-357 6.99 9.80 5.29 3.53(CML-264/CML-269)/CML494 6.64 8.67 4.11 6.00MG9051 6.56 11.44 3.60 3.57P-4081W 6.43 9.63 1.42 9.13(CLQ-RCWQ26/CLQ-RCWQ108) /CML-491 6.23 6.69 8.13 10.16NC 7218 6.20 12.84 1.84 1.24INIFAP check- Mexico 5.43 9.01 9.38 5.49LSD.05 0.52 3.93 5.15 4.66Repeatability 0.94 0.67 0.53 0.47
    8. 8. Validation trials, 18 locations México, 2010 %bad husk % Root % StemHybrid Yield cover % Ear rot lodging lodging(CML269/CML264)//CML494 7.033 11.70 5.48 3.49 5.52P4082W 6.299 10.55 6.37 3.78 2.59(CLG2312/CML495)//CML494 6.457 8.46 7.11 3.60 5.10H565 6.294 8.74 10.14 1.82 3.45H561 6.134 4.90 6.78 4.36 5.76H564C 5.899 4.94 11.71 1.74 5.17(LPS C7 F64-2-6-2-2-BBB / CML-495)//CML494 5.552 9.61 8.55 2.64 2.73H520 5.091 6.64 8.70 8.37 5.73No. of locs. 18 11 13 13 13LSD 0.62 7.60 2.84 2.69 2.94Repeatability 0.86 0.00 0.78 0.80 0.41
    9. 9. Regional on-farm trials in ESA (2010/11 season) Days toName GY-Locations > 3 t/ha GY: Locs < 3 t/ha anthesisCZH0616 5.96 2.37 64.4CZH0946 4.48 2.22 56.8CZH0837 5.62 2.08 62.7SC627 4.82 2.03 64.6SC403 4.61 2.03 57.8ZM627 4.74 1.90 65.0ZM309 4.09 1.74 55.1ZM521 4.27 1.73 59.6SC513 4.76 1.60 64.0Farmers Variety 4.64 1.54 64.7Pan53 5.39 1.51 64.5Mean 4.87 1.80 62.30n 30 19H 0.80 0.72
    10. 10. Mean yield of CIMMYT hybrids in the 2005 and 2010 Early andIntermediate Regional Hybrid Trials (EIHYB) for Southern Africa Optimal Optimal Managed <3 t/ha >3 t/ha drought+low NCIMMYT hybrid mean, % of checks, 2005 102.3 104.3 86.9CIMMYT hybrid mean, % of checks, 2010 101.9 104.8 107.2Mean of checks 2005 (t/ha) 1.73 4.63 1.30Mean of checks 2010 (t/ha) 2.11 6.24 2.09No trials 2005 6 14 6No trials 2010 7 29 6
    11. 11. Mean yield of CIMMYT hybrids in the 2005 and 2010Intermediate and Late Regional Hybrid Trials (ILHYB) forSouthern Africa. Optimal Managed >3 t/ha drought+low NCIMMYT hybrid mean, % of checks, 2005 92.0 88.0CIMMYT hybrid mean, % of checks, 2010 94.5 101.8Mean of checks 2005 6.08 1.57Mean of checks 2010 7.29 2.08No trials 2005 15 6No trials 2010 24 7
    12. 12. So, overall, where do we stand?1. In Latin America, our materials compete with the best multinational products, but we are not ahead • Low-cost three-way and double crosses are competitive!2. In ESA, our materials are superior in low-yield, short- duration locations. We are equivalent or ahead in high- yield locations3. Investment by MNSCs is increasing in the tropics. We need to increase our rates of gain, especially in favorable rainfed
    13. 13. 3. Adaptation to mega-environments: implications for breeding1. Within and across huge regions, there is little local adaptation that is not explained by local diseases, elevation, and rainfall- Breeding programs in Eastern and Southern Africa must be fully integrated- Germplasm moves easily from one continent to another- We need efficient methods for transferring resistances to adaptive diseases- This means we need markers linked to QTLs!- This means we need a marker-development pipeline!
    14. 14. Retrospective analysis in EIHYB and ILHYBYears: 2001-2009Genotypes: 448 (24-65/year)Maturity: early and late513 trials with h² > 0.15 in 17 countriesα-lattice design with 3 reps Weber et al. (2012a, b), Crop Science
    15. 15. Subdivision strategies of the TPESubdivision Typical environmentClimate A: Mid altitude, humid warm B: Mid altitude, humid hot C: Mid altitude, dry D: Lowland, tropical humid E: Lowland, tropical dryYield level low-yielding subregion, < 3 t ha-1 high-yielding subregion, ≥ 3 t ha-1Geographic Eastregion South Bänziger et al., 2006
    16. 16.  ge( ys)  gs (sys) 2 2 2 2  ( ) gy ge g Variance components of maize grain yield in five different subdivision systems of the undivided target population of environments from 2001 to 2009: Southern Africa. Early maturity group (n=219) † VG VGS VGY(S) VGE(YS) VE Climate 0.18±0.10 0.01±0.01 0.06±0.08 0.32±0.09 0.56±0.09 Altitude 0.15±0.09 0.01±0.01 0.07±0.10 0.33±0.09 0.56±0.09 Yield level 0.09±0.04 0.05±0.05 0.08±0.12 0.30±0.09 0.56±0.10 Geographic 0.19±0.09 0.00±0.00 0.06±0.12 0.33±0.09 0.57±0.10 region Country 0.21±0.11 0.01±0.01 0.06±0.07 0.30±0.09 0.57±0.11
    17. 17.  ge( ys)  gs (sys) 2 2 2 2  ( ) gy ge g Variance components of maize grain yield in five different subdivision systems of the undivided target population of environments from 2001 to 2009: Southern Africa. Early maturity group (n=219) † VG VGS VGY(S) VGE(YS) VE Climate 0.18±0.10 0.01±0.01 0.06±0.08 0.32±0.09 0.56±0.09 Altitude 0.15±0.09 0.01±0.01 0.07±0.10 0.33±0.09 0.56±0.09 Yield level 0.09±0.04 0.05±0.05 0.08±0.12 0.30±0.09 0.56±0.10 Geographic 0.19±0.09 0.00±0.00 0.06±0.12 0.33±0.09 0.57±0.10 region Country 0.21±0.11 0.01±0.01 0.06±0.07 0.30±0.09 0.57±0.11
    18. 18. Rank changes over yield levels in the2011 Southern African regional trialTop 10 of 54 entries in 14 high-yield trials and 9 low-yield trials All trials High yield trials Low yield trials PEX 501 PEX 501 CZH1033 SC535 X7A344W CZH0935 AS113 AS113 CZH1036 X7A344W SC535 CZH0928 AS115 AS115 CZH1031 013WH63 CZH0923 CZH0946 CZH0935 013WH63 CZH1030 CZH0923 013WH29 AS115 CZH1036 CZH0935 013WH63 013WH29 CZH1036 CZH0831Mean yield 4.81 6.51 2.17H 0.88 0.89 0.75
    19. 19. Rank changes over yield levels in the2011 Southern African regional trialTop 10 of 54 entries in 14 high-yield trials and 9 low-yield trials All trials High yield trials Low yield trials PEX 501 PEX 501 CZH1033 SC535 X7A344W CZH0935 Correlations among AS113 AS113 CZH1036 X7A344W SC535 CZH0928 yield levels AS115 AS115 CZH1031 All High 013WH63 CZH0923 CZH0946 CZH0935 013WH63 CZH1030 High 0.97 CZH0923 013WH29 AS115 Low 0.57 0.36 CZH1036 CZH0935 013WH63 013WH29 CZH1036 CZH0831Mean yield 4.81 6.51 2.17H 0.88 0.89 0.75
    20. 20. Some important points about maize hybridadaptation:2. Genotype x trial interaction and field “noise” are huge constraints on precision of screening- Large multi-location testing networks drive gains- Genotype x trial interaction and plot-to-plot variability in managed stress trials is greater than in optimally- managed trials- Too much weight on low-H managed stress trials can reduce gains
    21. 21. 2 g2ge Means, variances, and H for ESA regional trials conducted under optimal, managed drought (MD), low N, and random abiotic stress* (RAB) 2001-9 Test No. Grain VG VGE VE Predicted H for testing environment of yield in: trials (t ha-1) 5 trials 20 trials Int-late trials Optimal 175 6.26 22.2 22.4 55.3 0.68 0.92 RAB 63 1.73 10.4 18.2 71.5 0.38 0.83 MD 22 2.11 17.6 15.7 66.7 0.49 0.90 Low-N 34 1.82 15.7 15.3 68.9 0.49 0.89
    22. 22. Managing field variation: developingcomprehensive field maps EM38 Penetrometer NDVIKiboko Chiredz Harare i Soil penetration resistance (MPa)
    23. 23. 4. The role of managed stress testing in the breeding pipelinePH Zaidi, CIMMYT
    24. 24. Managed stress screening Notable border effect indicates N depletion was successful60-80% yieldreductiontargeted forboth low N anddrought
    25. 25. Managed stress screening over 30years led to the development ofthe world’s most drought tolerantmaize germplasm Edmeades, Lafitte, Bolaños, Bänziger
    26. 26. Pedigree selection for drought tolerance by CIMMYT in eastern and southern Africa: Stage 1 evaluationManagement Season Sites WeightOptimal Main 3-5 ?Managed low N Main 1 ?Managed drought Dry 1 ?3000+ genotypes per year in Stage I testcross evaluationScreens weighted based on their (assumed) importance in the targetenvironment (= southern and eastern Africa)
    27. 27. We select in selection environments (SE) tomake gains in the target population ofenvironments (TPE) via correlated response rG(SE-TPE) HSESE CR1(TPE-SE) = i rG √H SE σP(SE) TPE
    28. 28. Using managed-stress data to improve breeding gains is complicated! rGSS Stress Hstress rGSNrG(SE) rGNS rGNN Non-stress Hnonstress Hnonstress > Hstress SE TPE All of the rG’s are positive
    29. 29. Using managed-stress data to improve breeding gains is complicated! rGSS Stress Hstress rGSNrG(SE) rGNS rGNN Non-stress Hnonstress Hnonstress > Hstress SE TPE All of the rG’s are positive
    30. 30. Genetic correlations for yield between low-N and random abioticstress (RAB) target environments and optimal, managed drought,and low-N selection environments: ESA 2001-9 Selection environment Random abiotic stress* Genetic correlation Early maturity group Optimal 0.80 Managed drought 0.64 Low-N 0.91 Late maturity group Optimal 0.75 Managed drought 0.76 Low-N 0.90
    31. 31. 5. Success in identifying donors for abiotic and biotic stress tolerance• A massive effort has been undertaken by the breeders and physiologists to characterize AM sets to identify donors for drought, heat, and low N tolerance• George has established a large hot-spot screening network to characterize donors for MSV, GLS, turcicum, tar spot, rust, ear rots• Sudha and Babu have implemented a pipeline for developing breeder-ready markers.• MSV is in validation now
    32. 32. 5. Success in identifying donors for abiotic and biotic stress tolerance CIMMYT donors of drought and heat tolerance identified through screening in multiple environments in Mexico, Africa, and Asia Grain yield (t ha-1)Pedigree Colour Texture Drought Drought + Well- heat wateredDTPWC9-F24-4-3-1 White Flint 3.10 1.43 6.97DTPYC9-F46-1-2-1-1-2 Yellow Flint 3.07 1.58 7.12La Posta Sequia C7-F64-2-6-2-2 White Flint 3.06 1.39 7.72Check (CML442/CML444) 2.36 0.96 7.70Number of locations 7 3 7H 0.64 0.50 0.84Trial mean 2.58 1.13 6.88Finally on the DTMA website!…but these lines are at least 15 years old!
    33. 33. Best - bet sources of disease resistance (G. Mahuku) Mean Disease rating (1-5)Stock ID Pedigree GLS MSV NCLB Rust (6 locs) (3 locs) (12 locs) (5) locs [(CML395/CML444)-B-4-1-3-1- B/CML395//DTPWC8F31-1-1-2-2]-5-1-2-2-DTMA-3 BB 1.43 1.12 1.74 1.30DTMA-10 CIMCALI8843/S9243-BB-#-B-5-1-BB-2-3-4 2.06 1.60 1.67 2.13DTMA-11 CIMCALI8843/S9243-BB-#-B-5-1-BB-4-1-3 1.74 1.41 1.41 1.26DTMA-12 CIMCALI8843/S9243-BB-#-B-5-1-BB-4-3-3 1.71 1.72 1.79 1.63DTMA-13 CIMCALI8843/S9243-BB-#-B-5-1-BB-4-3-4 1.93 1.60 1.70 1.38 [CML312/CML445//[TUXPSEQ]C1F2/P49-DTMA-17 SR]F2-45-3-2-1-BBB]-1-2-1-1-2-BBB-B 1.87 1.12 1.80 1.59DTMA-90 CML311/MBR C3 Bc F112-1-1-1-B-B-B-B-B 2.24 2.37 2.50 1.59DTMA-146 [CML-384 X CML-176]F3-107-3-1-1-B-B-B 2.25 2.45 1.94 1.71DTMA-268 La Posta Sequia C7-F33-1-2-1-B-B 2.25 2.23 1.99 1.58DTMA-293 La Posta Seq C7-F153-1-1-1-2-B-B-B 2.50 2.35 2.33 2.43 [CML144/[CML144/CML395]F2-8sx]-1-2-3-DTMA-40 2-B*5 2.01 2.03 1.70 1.52 [CML312/CML445//[TUXPSEQ]C1F2/P49-DTMA-19 SR]F2-45-3-2-1-BBB]-1-2-1-1-1-BBB-B 2.20 1.61 1.77 1.23DTMA-26 P502SRC0-F2-54-2-3-1-B 1.71 1.60 1.76 1.51
    34. 34. Association Mapping for Disease ResistanceMSV – Harare 2010 data (Heritability = 0.79) GLS-combined analysis (Heritability = 0.6)
    35. 35. Msv1 –Case Study QTL mapping in three populations and identification of consensus interval Initial interval identified about 75-132Mb on chr1 for Msv1 Large F2 populations screened for the flanking markers of Msv1 and other QTLs PZE01132220936 PHM14104_23 PZE0175698629 QTL isogenic recombinants identified PZA00529_4 PZA02090_1 PZA03527_1 PZA02614_2 PZA03651_1 Chr.1 Chr.3 Chr.4 Chr.8 Msv1 R R R S S S  Phenotyping of recombinants under artificial disease pressure in field conditions at Harare and IITA green house facilities  Association analysis in DTMA panel with 55K SNP chip and GBS genotypes identified SNP hits in the same interval  The SNP hits and other markers in the interval used in further linkage mapping on recombinants for fine-scale mapping  The mapping confidence interval reduced to 7Mb  8 SNPs in this interval tested for validation in breeders’ populations  Initial results are encouraging!  Further reduction in interval to a probable gene-based marker expected with the recombinants in this interval
    36. 36. 6. Applying high density genotyping to maize breeding and managing the “data tsunami” Genotypic data tsunami (25 billion data points annually) maize breeder
    37. 37. Reduced representation sequencing for rapidly genotyping highly diverse speciesRJ Elshire, JC Glaubitz, Q Sun, JA Poland, K Kawamoto, ES Buckler, and SE Mitchell Institute for Genomic Diversity http://www.maizegenetics.net/
    38. 38. Genotyping-by-sequencing (GBS) Genomes Genomerepresentations SNP: ATGACATATCAG Polymorphism within the fragments SNP ATGAAATATCAG
    39. 39. Main genotyping options used byCIMMYTLow density: KasPar uniplex assays through KBiosciences• KBio uniplex SNP assays: cost $20 to develop• CIMMYT has about 3000, can share• KBio SNPs are used for low-density QTL mapping, tracking specific (“forward breeding”) @ ca. $.10 per data point ($20/DNA sample for 200 markers) - Heterozygote calls are easily made• Genotyping x sequencing for GWAS, genomic selection, and soon forward breeding @ $20/DNA sample for 500K+ markers• - ca 50% missing data that must be imputed - Heterozygotes are not easily called, but heterozygote calls probably don’t matter for GS applications
    40. 40. Status of our breeding informatics effort• All breeders, but not all phenotypers, are routinely generating pedigrees in the IMIS database• All lines have Genotype Identification Number (GID) to link pedigree, phenotypic data, and genotypic data• We have no high-density genotype database. Relational databases do not work with more than 100K data points per element. Flat files are searched with custom scripts. New database systems are being developed by Cornell• We have mixed-model software for combined analysis available via SAS and R scripts in Fieldbook, in routine use by breeders.• Plan is for all lines entering replicated testing to be genotyped at high density next year• Statistical support is excellent, informatics support is inadequate
    41. 41. Current status of high-density genotypingapplication in CIMMYT GMP• All new CIMMYT lines have GID and are in IMIS pedigree database• Over 10000 breeding lines have been GBS’d by the Cornell IGD• Past phenotypic data are poorly linked to pedigree and genotype data• No database capable of storing and searching 500+K allele calls in place• GS pipeline is conceptualized but not in place; models are developed de novo for each GS experiment
    42. 42. Where should we be in two years?• Over half of breeding lines should be DH• All lines entering replicated field trials should be genotyped at high density• All phenotypic data should be linked through the GID to pedigree and genotype• Imputation, allele calling, and prediction pipeline should be delivering predictions to breeders• SAGA should be operational
    43. 43. Lessons from our experience with high-density genotypic data• As a rule of thumb, 25% of the PYs in a modern maize breeding program in a MNSC are devoted to breeding informatics• Breeding informatics and breeding pipeline teams must be closely linked• If you have no database, you have no molecular breeding program• Pedigree and phenotypic databases must be linked and in very good condition• Development teams are led by breeders or other agricultural scientists, preferably with programming skills.• Development scientists are the interface between breeders and programmers• These scientists do not manage breeding programs but are devoted full-time to application development• Support must be available in real time.
    44. 44. At Pioneer, molecular breeding scientists support the adoption and use of new tools Line Line Line breeder breeder breeder 1 2 3 MB scientistApp team 1 App team 2 App team 3
    45. 45. What is genomic selection?• Much research shows that the inheritance of quantitative traits like yield in maize is controlled by many genes with small effects. QTL- based breeding approaches do not work well for such traits• Genomic selection (GS) is the selection of genotypes for advancement or use as parents based on a high-density marker genotype, rather than phenotype• GS differs from older QTL-based breeding approaches in that it uses all markers in a prediction of performance (genomic estimated breeding value) GEBV• Low-cost genotyping systems make selection based on high-density markers feasible• Bioinformatics requirements and breeding methods are complex• Being used by multinational companies• Networked approaches needed for small companies
    46. 46. Genomic selection systems can be used to:- Discard unpromising lines based on genotype for disease resistance, abiotic stress tolerance- Predict the best lines within a full-sib family for advancement of lines that have not been phenotyped- Drastically reduce breeding cycle time through the use of recurrent selection schemes with selection based on genotype rather than phenotype
    47. 47. Basic steps in the GS process:1. A set of lines (training population) is genotyped at high density. - These lines can be unselected testcrosses in the breeding pipeline2. Lines are phenotyped in testcross and/or per se.3. Effects of markers or haplotype alleles are estimated.4. Sum of marker effects in a line is the Genomic Estimated Breeding Value (GEBV)5. GEBVs are calculated on the next cohort of unselected lines and used to predict their performance6. GEBVs can be calculated for any trait for which the training population has been phenotyped7. Accuracy of the GEBV is expressed as the correlation between the phenotype and the GEBV. Depends on population size, heritability, marker number8. The accuracy of a GEBV doesn’t need to be 1. It just needs to be close to √H for the screening system(see Heffner et al. 2009 Crop Sci. 49:1-12)
    48. 48. Factors that affect GS accuracy1. Relatedness between training and selected populations2. Training population size3. Broad-sense heritability in the phenotyping system used for model training4. Marker density
    49. 49. Advantages of GS for stress-prone environments• GS allows programs to select for traits for which they cannot screen, if they can have access to haplotype effects from other programs• Breeding cycle times could be reduced five-fold, greatly increasing gains• Sharing haplotype effects permits novel and synergistic ways to network small breeding programs• GS networks could make available to NARS and SME breeding programs tools, methods, and scale now only available to multinationals
    50. 50. There are 3 main ways to use GS in cultivar development1. Incorporate GEBVs into a conventional pedigree breeding pipeline to discard lines with weaknesses. As number of DH lines increases, we will need to discard many lines without phenotyping, based on GEBV First use will be for defensive traits, with slightly higher H than yield. Breeder will receive a two-way table of GEBVs for all traits, and discard lines predicted to have a serious weakness. Breeders will assess the reliability of predictions by comparing validation r with √H achieved in field testing. To achieve gains, many more lines must be genotyped than phenotyped Entry GY-Opt GY-DT GLS Ear rot CKL001 4.69 1.4 2.5 14.5 CKL002 5.24 4.2 4.0 3.8 CKL003 7.15 3.1 2.2 4.9 r between geno. and pheno. in training pop 0.34 0.22 0.62 0.58 √H 0.80 0.55 0.85 0.80
    51. 51. Empirical results to dateZhao et al Theor Appl Genet (2012) 124:769–776- For grain yield, r across half-sib pops summing to 788 lines: 0.54Albrecht et al, 2011:-For grain yield, r=0.7 when prediction and validation sets containclose relatives; 0.5 for prediction across distantly related families- Crossa et al 2010-For yield and other traits, r up to 0.79- These are all huge over-estimates of GS accuracy!!
    52. 52. GS prediction ability across breeding groups for grain yield (GY)and anthesis date (AD) on 55K markers. GY ADBreeding populations 0.12±0.28 0.02±0.25• Cross-validation studies that use random lines with population structure overestimate GS accuracy• Markers simply assign the lines to groups, and the means of the groups predict the phenotype• Not relevant to real breeding situations
    53. 53. 2. Use GEBVs to select unphenotyped DH lines within full-sib families for advancement from Stage 1 to Stage 2 . As number of DH lines increases, we will need to discard many lines without phenotyping, based on GEBV We know predictions are very poor across families, and only work for close relatives in high-LD populations Models can be trained on part of a large full-sib family, then used to advance some ungenotyped lines to Stage 2Example A set of 200 DH lines is extracted from an elite cross All lines are genotyped 50 are phenotyped and used as a training set to build a GS model Best lines from training set are advanced based on phenotype Best lines from unphenotyped group are advanced based on GEBV Should result in modest gains from increased selection intensity
    54. 54. Correlation between GEBV and phenotype withinfull-sib families: mean of cross-validation in 6 bi-parental populations Mean Size of training pop accuracy 50 0.38 70 0.40 90 0.41 √H 0.70 No. of lines 236.5 No. of markers 240.2 No. of trials 4.33
    55. 55. 3. Set up closed synthetic populations of key inbreds, and conduct recurrent selection Advantages for GS are greatest with rapid-cycling Closed populations where a few elite parents contribute equally ensure that marker allele effect estimates relate directly to the population under selection High LD  low marker density required Improved populations can be used directly or as sources of new inbreds Most CIMMYT breeding programs have now set up these populations in the A and B heterotic groups, and are beginning to phenotype
    56. 56. 7. Implementing an open-source GS network “Open-source” breeding networks can provide companies with proprietary lines, but allow haplotypes to be shared Sharing haplotype effects allows phenotyping done by one program to benefit another, even if they don’t test the same lines. Small programs could receive unique, unphenotyped DH lines (say, 500 ) from a “hub” program, with a GEBV predicting their performance Lines would then be testcrossed Company would phenotype the testcrossed set, and contribute the phenotypes to the “training population” for the next cycle Company advances the lines with the best performance into product testing.
    57. 57. “Open-source” genomic selection breeding plan Rapid-cycle marker-only selection
    58. 58. “Open-source” genomic selection breeding plan Rapid-cycle marker-only selection Line extracted, genotyped: untested, proprietary DH lines provided to companies based on GEBVs
    59. 59. “Open-source” genomic selection breeding plan Rapid-cycle marker-only selection Line extracted, genotyped: untested, proprietary DH lines provided to companies based on GEBVs Phenotyping: company 1 Phenotyping: company 2 Phenotyping: company 3
    60. 60. “Open-source” genomic selection breeding plan Rapid-cycle marker-only selection Line extracted, genotyped: untested, proprietary DH lines provided to companies based on GEBVs Phenotyping: company 1 Phenotyping: company 2 Phenotyping: company 3
    61. 61. “Open-source” genomic selection breeding plan Rapid-cycle marker-only selection Line extracted, genotyped: untested, proprietary DH lines provided to companies based on GEBVs Phenotyping: company 1 Phenotyping: company 2 Phenotyping: company 3 Commercialization:company 1 Commercialization: company 2 Commercialization: company 3
    62. 62. Distribution of roles in an open-sourcebreeding networkHub program• Manages rapid-cycle source pops• Extracts DH lines• Genotypes DH lines at high density• Coordinates managed stress screening• Estimates GEBVs• Updates model with new phenotypic data from partners• Maintains database
    63. 63. Distribution of roles in an open-sourcebreeding networkPartner (spoke?) programs• Receive and own proprietary DH lines with GEBV• Phenotype, and contribute phenotypes to model• Commercialize and deliver to farmers the best lines on the basis of their own phenotyping• Form new pedigree breeding populations, provide to hub for DH line extraction, genotypingDoes this model make sense for pre-breeding inChina?
    64. 64. Advantages of open-source network model• Small programs can access haplotype effect estimates for stresses, environments, and traits for which they cannot do evaluation• Partners benefit from the phenotyping done by other network members, without having to share germplasm• The small partner program accesses DH lines without the cost of setting up a DH facility• Lines are proprietary- only haplotype (marker) effects are shared• The hub program provides partners with efficient DH, genotyping, and informatics pipeline services, with economies of scale• Low-cost out-sourced genotyping allows breeding programs to focus on screening, selection, seed production, and marketingThe open-source GS network model can provide SMEsand NARS with powerful breeding technologies now onlyavailable to multinationals
    65. 65. Things to watch out for:• Projects vs pipelines• Over-weighting and inappropriate use of managed stress data• Failure to deliver the products of molecular breeding to the product development pipeline• Failure to exploit synergies and economies of scale across regions• Failure to exploit synergies and economies of scale across maize and wheat• Failure to come to grips with our data and breeding informatics needs• Thinking small about our science
    66. 66. The CIMMYT biparental populations: theworld’s largest resource for GS, GWAS intropical maize• 28 biparental populations from DTMA and WEMA MARS pops• >200 lines/pop, over 5000 lines in total• All elite Africa-adapted parents or drought donors• Several linked half-sib families• All genotyped with ca. 200 SNPs• 100 lines per family GBS’d• Imputation will permit assignment of genotypes for >500K SNPs to each of the >5000 lines• Phenotyped in 3-4 drought and 3-4 optimal environments• We will find genes for drought tolerance and disease resistance, and pilot GS methods that work
    67. 67. Conclusions1. GMP is the world’s most important source of elite and stress-resistant germplasm, and the only large “open” public breeding program2. Our germplasm is competitive with MNSC hybrids in most of our target regions, and usually superior in low-yield environments3. Gains in favorable conditions are inadequate. We must remain competitive in commercial systems to interest seed company partners4. We need to think hard about how to use managed stress data5. Our drought and heat-tolerant germplasm is well-characterized and unequalled: it needs to be used.6. Using our stress-tolerant germplasm requires development of breeder-ready markers7. We have made no gains on maximum DT since the end of the physiology breeding program8. We have unparalled resources for genetic and breeding research for development. Are we up to the task?

    ×