Advertisement
Advertisement

More Related Content

Similar to Searching for traits in PGR collections using Focused Identification of Germplasm Strategy(20)

Advertisement

More from CIAT(20)

Advertisement

Searching for traits in PGR collections using Focused Identification of Germplasm Strategy

  1. Searching for traits in PGR collections using Focused Identification of Germplasm Strategy (FIGS) Abdallah Bari, Kenneth Street, Michael Mackay, Eddy De Pauw, Dag Endresen, Ahmed Amri, Kumarse Nazari and Ammor Yahiaoui CIAT Palmira, Colombia 14 March 2012 Grain Research & Development Corporation
  2. Content • Background – PGR - traits – FIGS - traits • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 2 Corporation
  3. ICARDA ICARDA’s Worldwide presence International Center for Agricultural Research in the Dry Areas (ICARDA) Grain Research & Development Corporation
  4. ICARDA PGR centers of origin and diversity Grain Research & Development Corporation
  5. PGR contribution Traits of importance to agriculture – phenological adaptation (short growth duration), – efficient use of water, – resistance to biotic stresses (diseases and insects), – tolerance to abiotic stresses (such as drought and salinity), and – superior grain quality Grain Research & plant pre-evaluation Development Corporation
  6. PGR Challenges • 50 - 60 000 traits (loci) • 7 million of accessions • 1400 genebanks Seed samples Grain Research & Development Corporation
  7. PGR Challenges A needle in a hay stack PGR users want variation for specific traits and a hundred germplasm accessions to evaluate. Grain Research & Development 7 Corporation
  8. PGR Challenges and Concerns • Size of collections – Addressed by Brown et al. 1999 • Cost in evaluating accessions lacking the desired trait – Addressed by Gollin et al. 2000 Grain Research & Development Corporation
  9. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 9 Corporation
  10. Objective FIGS searches genetic resources (data) germplasm collections to detect any particular trait-environment patterns/ relationships (as a priori information). This a priori information is then used to develop predictive models to find novel genetic variation of the traits of interest and where it is likely to occur the most. Quantification Utilization of of trait- A priori Develop genetic environment information trait subsets resources relationship Grain Research & Development 10 Corporation
  11. Origin of FIGS approach Boron toxicity of wheat and barley – early FIGS examples Mediterranean Sea Wheat landraces from marine origin soils in Mediterranean region provided Grain all the genetic variation needed to produce boron tolerant varieties Research & Development M.C. Mackay, 1995 Corporation
  12. FIGS approach “FIGS applies to plant genetic resources (stored collections) the same selection pressure exerted on plants by evolution.” PGR Collection sampling core (Biodiversity) PGR sampling trait user (Biodiversity) Grain Research & Development 12 Corporation
  13. FIGS approach FIGS has helped breeders identify long sought-after plant traits such as resistance to: – Net blotch (barley), – Powdery mildew, – Russian wheat aphid (RWA) and – Sunn pest. Braidotti, G.2009. Keys to the gene bank, Biotechnology. Partners in Research for Development 16-17. Grain Research & Development Corporation
  14. Sunn pest trait of resistance 8 landrace accessions from Afghanistan and 2 from Tajikistan identified as resistant at juvenile stage Now developing mapping populations Grain Research & 14 Development Corporation
  15. FIGS approach to Pm 16,000 variétés locales de blé FIGS applique 1,300 sélectionnées Phenotyping 40% yielded accessions that were 211 accs entre R et IR resistant to the Genotyping isolates used 7 nouveau allèles Au moins 2 ont la spécificité de race nouvelle 100 ans de génétique classiques = 7 allèles Kaur K; Street K; Mackay M; Yahiaoui N; Keller B (2008). Allele mining and sequence Grain Research & diversity at the wheat powdery mildew resistance locus Pm3. 11th IWGS, 24-29 Aug., Development Brisbane) Corporation
  16. Locating new Pm3 alleles The distribution of the new seven functional alleles of Pm3 Out of 96.2% of the total set screened Turkey Afghanistan Iran Pakistan and Armenia Grain Research & 16 Development Corporation
  17. The FIGS picture Genotypes x Environments x Time1 = Genetic Variation Can we use the same evolutionary principles in reverse to identify the environments that ‘engender’ trait specific genetic variation? Environments x Traits x Time = Trait variation (ExT)? 1 plus some selection Grain Research & 17 Development Corporation
  18. Examples of eco-geographic variation of traits linked to environmental influences Environment influence Trait Species Reference Low altitudes, high winter emp., Cyanogenesis Trifolium repens Pederson, Fairbrother et al. low summer rain, spring 1996 cloudiness Aridity Seed dormancy, early Annual legumes Ehrman and Cocks 1996 flowering, high seed to pod ratio Soil type Tolerance to Boron toxicity Bread wheat Mackay (1990) Altitude, winter temp, RWA Russian Wheat Aphid (RWA) Bread wheat Bohssini, et al accepted for distribution resistance publication 2008 Temperature, aridity Drought resistance Triticum dicoccoides Peleg, Fahima et al. 2005 Altitude Glume colour and beak length Durum wheat Bechere, Belay et al. 1996 Climate, soil and water Heading date, culm length, Triticum dicoccoides Beharav and Nevo 2004 availability biomass, grain yield and its Components Precipitation, minimum Glutenin diversity Durum wheat Vanhintum and Elings 1991 January temperature, altitude. temperature, aridity More efficient RUBISCO Woody perennials Galmes et al, 2005 activity Grain Research &relations, Water Development temperature Hordatine accumulation Barely After18 C Mackay M. Batchu, Zimmermann et al. and Corporation (disease defence) 2006
  19. FIGS system PGR collections User defined needs Database Filters Type of material Evaluation data Collection site Interface Other information Size limit 500 1500 250 750 Grain See www.figstraitmine.com New Subset After M. C Mackay 1995 Research & 19 Development Corporation
  20. Mining natural variation By linking traits, environments (and associated selection pressures) with genebank accessions (e.g. landraces and crop relatives) we can ‘focus’ in on those accession most likely to possess trait specific genetic variation. 60 50 40 Latitude 30 20 10 0 0 50 100 150 Longitude Environnement Trait FIGS subset Grain Research & Development Corporation
  21. FIGS approach – summarized Focused Identification of Germplasm Strategy Environment (E) Trait (T) Geo-referencing of Evaluation collecting places (phenotyping) Accession (G) Grain Research & Development Corporation 21
  22. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 22 Corporation
  23. Eco-climate data (X) ICARDA eco-climatic database, average: annual temperature (front), annual precipitation (middle), and winter precipitation (back) (De Pauw 2008) Climate data (X as independent variables) site_code1 prec01 prec02 prec03 prec04 prec05 ….. ari01 ari02 ari03 ari04 ari05 ETH-S893 25 36 72 154.22 148.88 0.167 0.246 0.439 1.098 1.169 ETH-S1222 29 44 92 167.46 168 0.223 0.344 0.646 1.354 1.612 NS_339 44 67 130.43 177.96 185.74 0.351 0.552 0.949 1.457 1.751 ETH-S1153 36 48 86 140.92 131.94 0.28 0.39 0.609 1.108 1.078 NS_415 32 46.61 95.42 150.3 157 0.271 0.419 0.732 1.289 1.437 NS_424 31.94 45 90 143.62 150 0.257 0.38 0.641 1.146 1.272 ETH64:55 28 38.26 57 97.57 81 0.247 0.344 0.45 0.834 0.662 NS_525 28 39 57 97.13 80.78 0.248 0.352 0.452 0.836 0.669 NS_526 27 39 57 97.01 80.77 0.241 0.354 0.455 0.842 0.68 NS_559 23 40 61.89 129.04 102 0.226 0.397 0.511 1.206 0.998 . . . Source: International Center for Agricultural Research in the Dry Areas (ICARDA) . . Grain Research & Development Corporation
  24. Eco-climate data (X) Layers used in the stem rust studies: • Precipitation (rainfall) • Maximum temperatures • Minimum temperatures + Derived GIS layers such as: • Potential evapotranspiration (water-loss) • Agro-climatic Zone (UNESCO classification) • Moisture/Aridity index (mean values for month and year) Grain Research & Development 24 Corporation
  25. Trait data set (Y) Trait data (Y as dependent variable) http://www.news.cornell.edu/ site_code1 R_state0 R_state1 R_state2 R_state3 R_state4 R_state5 R_state6 R_state7 R_state8 R_state9 ETH-S893 0 0 0 0 0 0 0 0 1 0 ETH-S1222 0 0 0 0 0 0 0 0 0 1 NS_339 0 0 0 0 0 0 1 0 1 0 ETH-S1153 0 0 0 0 2 1 3 0 0 0 NS_415 0 0 0 0 0 0 1 0 0 0 NS_424 0 0 0 1 0 0 0 0 0 0 ETH64:55 0 0 1 0 0 0 0 0 0 0 NS_525 0 0 0 0 0 0 1 0 0 0 NS_526 0 1 2 1 2 0 3 0 0 0 . NS_559 2 5 1 0 0 2 0 0 0 0 . ETH64:53 . 0 0 1 0 0 0 0 0 0 0 . . Source: (USDA) National Genetic Resources Program (NGRP) GRIN database Grain Research & Development Corporation
  26. Searching for stem rust trait of resistance - concerns Stem rust spreading to wheat production areas http://www.news.cornell.edu/ Grain Research & Development Corporation
  27. Stem rust on wheat landraces – trait data Green dots indicate collecting sites for resistant wheat landraces and red dots collecting sites for susceptible landraces. USDA GRIN, trait data online: Field experiments made in http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?65049 Minnesota by Don McVey Grain Research & Development Corporation 27
  28. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 28 Corporation
  29. Data preparation Climate data (X as independent variables) Power relationship ~ 2(p) (spread) site_code ….. ari02 ….. ETH-S893 0.246 ETH-S1222 0.344 NS_339 0.552 ETH-S1153 0.390 NS_415 0.419 NS_424 0.380 ETH64:55 0.344 NS_525 0.352 NS_526 0.354 NS_559 0.397 500 800 400 600 Frequecy 300 Frequecy 400 200 200 100 0 0 0 5 10 15 -4 -2 0 2 4 Aridity or Moisture Index during February Aridity or Moisture Index during February Grain Research & Development 29 Corporation
  30. Platform Geographical R language Information System (Development of algorithms) (GIS) > Data transformation ( ) Arc Gis > Model <- model(trait ~ climate) > Environmental data/layers Measuring accuracy metrics > …. (surfaces) Modeling purpose Generation of environmental data Grain Research & Development 30 Corporation
  31. Modeling framework Trait data (Y) Environmental data (X) Y ~ f(X) Fist linear approach irrespective of the underlying distributions describing the data Yi ~ X is the set of variables that contains explanatory variables or predictors (climate data) where X ∈ Rm, Y ∈ Y that is either a categorical (label) or a numerical response (trait descriptor Yi ~ states). Grain Research & Development 31 Corporation
  32. Modeling framework • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) Bari A., Street K., Mackay M., Endresen D.T.F., De Pauw E. & Amri A. (2011) Focused identification of germplasm strategy (FIGS) detects wheat stem rust resistance linked to environmental variables. Genetic Resources and Crop Evolution http://www.springerlink.com/content/m7140x68v2065113/fulltext.pdf Grain Research & Development Corporation
  33. Principal Component Analysis (PCA) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) B a matrix of coefficients. The prediction was initially carried out using the number of components (PCs) that account for 95% of explained variance. Followed by adding a component at a time till the error reached a minimum Grain Research & Development 33 Corporation
  34. Partial Least Square (PLS) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) PLS : A product of factors and their loadings (regression coefficients) where both environmental dataset and trait dataset simultaneously The prediction was initially carried out using the number of components (PCs) that account for 95% of explained variance. Followed by adding a component at a time till the error reached a minimum Grain Research & Development 34 Corporation
  35. Random Forest (RF) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) Data • Support Vector Machines (SVM) • Neural Networks (NN) Bootstrapping (with replacement) Training (set) Out-of-bag (set) OOB ntree 1 ntree 2 ntree 1000 Grain Research & Development 35 Corporation
  36. Support Vector Machines (SVM) • Principal component analysis (PCA) • Partial Least Square (PLS) SVM a learning-based technique that maps • Random Forest (RF) input data to a high-dimensional space. • Support Vector Machines (SVM) • Neural Networks (NN) Optimally separates mapped input into respective classes v v (x) v (x) v (x) (x) (x) From l-dimensional space (input variable space) into k-dimensional space, where k is more higher than l. Grain Research & Development 36 Corporation
  37. Neural Networks (NN) • Principal component analysis (PCA) • Partial Least Square (PLS) Neural Networks (RBF) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) error Test set x1 x2 F(x) Training set xp epochs number Grain Research & Development 37 Corporation
  38. Optimization/tuning error Test set Training set PCs, LVs or epochs number Trend of output error versus the number of components(PCs/LVs) or epochs (NN) Grain Research & Development Corporation
  39. Accuracy metrics Parameters that provide information on the specificity (“trait agro-climate”) Confusion matrix (2-by-2 contingency table) Observed Resistant Susceptible Predicted Resistant a b Susceptible c d Sensitivity a/ (a + c) = Specificity d/(b + d) = and are indicators of the models ability to correctly classify observations. Grain Research & Development Corporation
  40. Accuracy metrics Parameters that provide information on the specificity (“trait agro-climate”) .. High AUC (area) values indication of potential trait-environment relationship 1- ROC curve pdf’s of trait distribution 1 1 Grain The ROC curve and the resulting pdf’s of trait distribution (trait states) Research & Development Corporation
  41. Accuracy metrics Randomness 1- ROC curve pdf’s of trait distribution 1 1 Grain Research & Development Corporation
  42. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 42 Corporation
  43. Data preparation - Raw data PCs = 42 1.0 1 0.46 0.71 0.8 True positive rate 0.44 0.6 RMSE 0.13 0.4 0.42 0.2 0.40 -0.45 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 AUC = 0.67 1.5 Density Kappa = 0.40 1.0 0.5 0.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  44. Data preparation – Transformed data PCs = 42 1.0 0.46 0.59 0.8 True positive rate 0.44 0.6 RMSE 0.42 0.03 0.4 0.2 0.40 -0.54 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 1.5 AUC = 0.71 Density 1.0 Kappa = 0.45 0.5 0.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  45. Data preparation - Raw data (PLS) LVs = 30 1.0 0.46 0.68 0.8 True positive rate 0.44 0.6 RMSE 0.07 0.4 0.42 0.2 0.40 -0.55 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 AUC = 0.70 1.5 Density Kappa = 0.43 1.0 0.5 0.0 -1.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  46. Data preparation – Transformed data LVs = 22 0.6 0.85 1.0 0.46 0.8 True positive rate 0.44 0.6 RMSE 0.42 0.09 0.4 0.2 0.40 -0.42 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 AUC = 0.71 1.5 Density 1.0 Kappa = 0.44 0.5 0.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  47. Optimization process R_CALC R_CALC 0.46 0.46 0.44 0.44 RMSEP RMSEP 0.42 0.42 0.40 0.40 0 10 20 30 40 50 60 0 10 20 30 40 50 60 number of components number of components Mean square error (RMSEP) for PCA (left) and PLS (right) models. Arrow indicate minimum errors where the number of components (PCs and LVs) were selected for prediction (red/discount nous = test data, continuous line = training set) Grain Research & Development 47 Corporation
  48. PCA PC2 Few components  ~ random Distribution per R_CALC 1.0 12 Resistant 0.8 Susceptible 10 True positive rate 0.6 8 Density 6 0.4 4 0.2 2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.3 0.4 0.5 False positive rate ... Grain Research & Development 48 Corporation
  49. PCA PC5 Distribution per R_CALC 1.0 4 Resistant Susceptible 0.8 3 True positive rate 0.6 Density 2 0.4 1 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate ... Grain Research & Development 49 Corporation
  50. PLS LV2 2 latent variables of PLS are better than 2 PCs of PCA Distribution per R_CALC 1.0 4 Resistant Susceptible 0.8 3 True positive rate 0.6 Density 2 0.4 1 0.2 0.0 0 0.0 0.2 0.4 0.6 0.8 1.0 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate ... Grain Research & Development 50 Corporation
  51. PLS LV10 Distribution per R_CALC 1.0 Resistant 2.0 0.8 Susceptible True positive rate 0.6 1.5 Density 0.4 1.0 0.2 0.5 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate -0.5 0.0 0.5 1.0 ... Grain Research & Development 51 Corporation
  52. PCA (optimized) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 2.0 True positive rate 0.8 1.5 Density 0.6 1.0 0.4 0.5 0.2 0.0 0.0 0.0 0.4 0.8 -0.5 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  53. PLS (optimized) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 2.0 True positive rate 0.8 1.5 Density 0.6 1.0 0.4 0.5 0.2 0.0 0.0 0.0 0.4 0.8 -0.5 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  54. RF • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 3.0 1.0 2.5 True positive rate 0.8 2.0 Density 0.6 1.5 0.4 1.0 0.2 0.5 0.0 0.0 0.0 0.4 0.8 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  55. SVM • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 4 True positive rate 0.8 3 Density 0.6 2 0.4 1 0.2 0.0 0 0.0 0.4 0.8 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  56. NN • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 3.0 True positive rate 0.8 2.5 Density 2.0 0.6 1.5 0.4 1.0 0.2 0.5 0.0 0.0 0.0 0.4 0.8 -0.2 0.2 0.6 1.0 False positive rate Prediction Grain Research & Development Corporation
  57. Random (PCA) R_CALC 0.470 1.0 Complete random 0.8 distribution 0.465 True positive rate 0.6 RMSEP of trait of 0.4 stem rust 0.460 resistance 0.2 AUC ~ 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 False positive rate number of components 1.0 0.1 0.465 0.2 Partially 0.460 0.8 0.3 random 0.455 True positive rate 0.6 0.4 0.450 distribution RMSE of trait of 0.4 0.445 0.5 stem rust 0.2 0.440 resistance 0.6 0.435 0.0 0.8 0.7 0.0 0.2 0.4 0.6 0.8 1.0 Grain False positive rate 0 10 20 30 40 50 60 Research & Number of components Development 57 Corporation
  58. Stem rust hot spots 60 50 40 Latitude 30 20 10 0 0 50 100 150 Longitude Grain Research & Development Corporation
  59. Stem rust hot spots areas where resistance is latitude 60 50 40 likely to occur (longitude wise) Latitude 30 1 20 10 0 0 50 100 150 60 Longitude b 50 40 Latitude 30 20 10 Grain 0 Research & Development Corporation 0 50 longitude Longitude 100 150
  60. PLS (optimized) Areas where resistance is likely to occur (dark red) 60 -0.2 0.8 0 50 0.2 0.6 2 -0. 0.4 0 0.6 -0.2 0.6 Latitude 40 0.2 0 0.2 0.4 0.6 0.4 Y 0.6 30 0 0.2 0.6 0.4 0.2 20 0 0 0.0 0 0 0.2 0.4 -0.2 10 0.4 0.08 0 20 40 60 80 100 120 Longitude 0.06 X semivariance 0.04 0.02 Grain Research & 10 20 30 40 Development distance Corporation
  61. Random Forest (RF) Areas where resistance is likely to occur (dark red) 60 0.4 50 0.8 0.2 0 0.4 0 0.6 0.8 0.6 Latitude 40 0.2 0.4 0.6 0.2 0.6 0 0.4 0.2 0.4 0.2 Y 0 0.4 30 0.6 0.4 0.6 0.6 0.2 20 0.0 10 0.2 0.4 0.15 0 20 40 60 80 100 120 Longitude X 0.10 semivariance 0.05 Grain Research & Development 10 20 distance 30 40 Corporation
  62. svm Areas where resistance is likely to occur (dark red) 60 1.0 50 0 0 0.8 0.6 0.6 0 0.6 Latitude 40 0.2 0.4 0.4 0.6 1 0.6 0.8 0 0.2 0 0.2 0 0.8 Y 0.6 30 0.2 0.4 0.8 0.6 0.6 0.4 0.4 0.2 20 0 0.4 0.2 0.0 10 0 0.4 0.2 0 20 40 60 80 100 120 Longitude X Grain Research & Development Corporation
  63. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 63 Corporation
  64. Results – stem rust on wheat Dataset (unit) PPV LR+ Estimated gain Stem rust 0.54 (0.50-0.59) 3.07 (2.66-3.54) 1.95 (1.79-2.09) (accession) Random 0.29 (0.26-0.33) 1.04 (0.90-1.20) 1.03 (0.91-1.16) (28 % resistant samples) Stem rust (site) 0.50 (0.40-0.60) 4.00 (2.85-5.66) 2.51 (2.02-2.98) Random 0.19 (0.13-0.26) 0.94 (0.63-1.39) 0.95 (0.66-1.33) (20 % resistant samples) PPV = Positive Predictive Value; LR+ = Positive Diagnostic Likelihood Ratio Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw (2011). Predictive association between biotic stress traits and ecogeographic data for wheat and barley Grain Research & landraces. Crop Science 51: 2036-2055. DOI: 10.2135/cropsci2010.12.0717 Development Corporation 64
  65. Results – stem rust on wheat AUC = Area Under the ROC Curve (ROC, Receiver Operating Curve) Classifier method AUC Cohen’s Kappa Principal Component Regression 0.69 (0.68-0.70) 0.40 (0.37-0.42) (PCR) Partial Least Squares (PLS) 0.69 (0.68-0.70) 0.41 (0.39-0.43) Random Forest (RF) 0.70 (0.69-0.71) 0.42 (0.40-0.44) Support Vector Machines (SVM) 0.71 (0.70-0.72) 0.44 (0.42-0.45) Artificial Neural Networks (ANN) 0.71 (0.70-0.72) 0.44 (0.42-0.46) Bari, A., K. Street, , M. Mackay, D.T.F. Endresen, E. De Pauw, and A. Amri (2011). Focused Identification of Germplasm Strategy (FIGS) detects wheat stem rust resistance linked to Grain environment variables. Genetic Resources and Crop Evolution [online first]. doi:10.1007/s10722- Research & 011-9775-5; Published online 3 Dec 2011. Development Corporation 65
  66. Results – stem rust on wheat Classifier method PPV LR+ Estimated gain kNN (pre-study) 0.29 (0.13-0.53) 5.61 (2.21-14.28) 4.14 (1.86-7.57) SIMCA 0.28 (0.14-0.48) 5.26 (2.51-11.01) 4.00 (2.00-6.86) Ensemble classifier 0.33 (0.12-0.65) 8.09 (2.23-29.42) 6.47 (2.05-11.06) Random 0.06 (0.01-0.27) 0.95 (0.13-6.73) 0.97 (0.16-4.35) (pre-study, 550 + 275 accessions) Ensemble 0.26 (0.22-0.30) 2.78 (2.34-3.31) 2.32 (2.00-2.68) Random 0.11 (0.09-0.15) 1.02 (0.77-1.36) 0.95 (0.77-1.32) (blind study, 825 + 3738 accessions) PPV = Positive Predictive Value; LR+ = Positive Diagnostic Likelihood Ratio Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw, K. Nazari, and A. Yahyaoui (2012). Sources of Resistance to Stem Rust (Ug99) in Bread Wheat and Durum Wheat Identified Using Focused Identification of Germplasm Strategy (FIGS). Crop Science Grain [online first]. doi: 10.2135/cropsci2011.08.0427; Published online 8 Dec 2011. Research & Development Corporation 66
  67. Results of stem rust (Ug99) on wheat 4563 wheat landraces screened for Ug99 10.2 % resistant accessions. The true trait scores for 20% of the accessions (825 samples) 500 accessions more likely to be resistant from 3728 accession with true scores hidden 25.8 % resistant samples and thus 2.3 times higher than expected by chance. Grain Research & Development Corporation 67
  68. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 68 Corporation
  69. Conclusion ... Results – Raw data vs Transformed data – PLS vs PCA – Non-linear vs linear – FIGS vs random (selection) Issues – Extent of variables (trait/agro-climate) – Phenology (adaptation) – Fuzzy approach (trait variation capture) Grain Research & Development 69 Corporation
  70. Grain Research & Development Corporation

Editor's Notes

  1. Landrace samples (genebank seed accessions)Trait observations (experimental design) - High cost dataClimate data (for the landrace location of origin) - Low cost dataThe accession identifier (accession number) provides the bridge to the crop trait observations.The longitude, latitude coordinates for the original collecting site of the accessions (landraces) provide the bridge to the environmental data.
  2. GRIN database (USDA-ARS, National Plant Germplasm System, Germplasm Resources Information Network, online http://www.ars-grin.gov/npgs) USDA GRIN, trait data online: http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?65049
  3. Photo: USDA ARS Image k1192-1, http://www.ars.usda.gov/is/graphics/photos/mar09/k11192-1.htm
  4. USDA ARS Image Archive, http://www.ars.usda.gov/is/graphics/photos/
  5. Photo: Wheat infected by stem rust (Ug99) at the Kenya Agricultural Research Station in Njoro northwest of Nairobi.
  6. Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw, K. Nazari, and A. Yahyaoui (2012). Sources of Resistance to Stem Rust (Ug99) in Bread Wheat and Durum Wheat Identified Using Focused Identification of Germplasm Strategy (FIGS). Crop Science [online first]. doi: 10.2135/cropsci2011.08.0427; Published online 8 Dec 2011.
Advertisement