Mining CIMMYT germplasm data to inform breeding
targets for CC adaptation
Zakaria KEHEL, Jose CROSSA, Thomas PAYNE and Mat...
Collection Wild
Land-
race
Breeding
materials
Genetic
stocks
Cultivars
Unknown
or Other
TOTAL
Bread Wheat 213 32,428 41,99...
WGB: Opportunities, Challenges and Gaps
● Pedigrees, for GWAS or GS precision
● Phenotypes, so expensive (Curation)
● Core...
WGB: Opportunities, Challenges and Gaps
● Species accessions
Too many!
Yet, extent of in situ diversity?
Generate new d...
Data quality control (single field analysis)
Identification of out layers
Verification (field books)
Data storage
Database...
Data verified by trait and by nursery
LOC_N
O COUNTRY LOCDESCRIP INSTITUTEN
10601MAURITIUS REDUIT
Agricultural Research and
Exte
19011ALGERIA ITGC-DAHMOWNE ITGC...
Data control a continuing process
• The same location with different management system has only
one planting and harvest d...
MET Analysis
MET data
GxE analysis
Variance components, G corr,
BLUPS, Stability
GxE with
covariables
Patterns of GxE
(spa...
y = 0.0176x + 4.3539
R² = 0.1952
y = 0.113x + 8.0045
R² = 0.5792
y = 0.0013x + 1.0054
R² = 0.0005
0.00
2.00
4.00
6.00
8.00...
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
ESWYT30
ESWYT29
ESWYT28
ESWYT27
ESWYT26
ESWYT25
ESWYT24
ESWYT23
ESWYT22
ES...
tavg_gf tavg_rep tavg_seas tavg_veg tmax_gf tmax_rep tmax_seas tmax_veg tmin_gf tmin_rep tmin_seas tmin_veg
5398434 539063...
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
PHB30R73
1368/(9071XBABAMGOYO)-1//9091
PHB30H83
1368/9071//9091
SC621...
Basic Model: YLD = Line + Location + LocationxYear+ Error
Full Model: YLD = Line + Location + LocationxYear+ Climate + Gen...
0.2
0.25
0.3
0.35
0.4
0.45
BasicModel
FullModel
LOC1
LOC2
LOC3
LOC4
LOC5
Predicting all genotypes in single location (4
Ye...
Best matching clusters for all ESWYT locations
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14
M1
M5
Basic Model: Line + LocxYear
Full Model: Line + Locx...
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7
M1
M5
Yield prediction on Elite nursery SEA
25 - 75 50 - 50
RF 82.2 88.9
IRR 46.7 57.8
Can we predict some genotype in all locations?
Latest years
Genetic structure of 32 years of ESWYT
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
ALL ENV GENO ALL ENV GENO ALL ENV GENO ALL ENV GENO
Linear Regression SVM Regression...
Maize landraces in Latino-America
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LR RF SVM KNN
Training
TS=C
TS=C+G(PCs)
TS=C+Pop
TS=PCs
PC1=C
PC2=C
0
0.1
0.2
0.3
...
Cycle
SOW_julia
n
Emergence
_Julian
HARVEST_julia
n
FOLIAR_DISEASE_DEVELOPM
ENT
IRRIGATE
D
LODGIN
G
2005 11/19/2005 4/21/2...
Table of genotype by
location values +
mean, max, min and
SD of genotypes and
locations
Install a win-win
relationships wi...
IWIN-DAP: An Excel Add-In to analyze CIMMYT data
● Curation is important
● Vey helpful to complete info at the genebank and
creation of stress populations  accelerate
ger...
Upcoming SlideShare
Loading in …5
×

THEME – 1 Mining CIMMYT germplasm data to inform breeding targets for CC adaptation

847 views

Published on

Published in: Science, Technology
  • Be the first to comment

  • Be the first to like this

THEME – 1 Mining CIMMYT germplasm data to inform breeding targets for CC adaptation

  1. 1. Mining CIMMYT germplasm data to inform breeding targets for CC adaptation Zakaria KEHEL, Jose CROSSA, Thomas PAYNE and Matthew REYNOLDS Rabat-Morocco. 24-27 June 2014
  2. 2. Collection Wild Land- race Breeding materials Genetic stocks Cultivars Unknown or Other TOTAL Bread Wheat 213 32,428 41,995 8,150 6,278 331 89,395 Durum Wheat 25 5,578 14,262 1,089 1,156 58 22,356 Triticale 0 0 16,964 3,402 345 9 20,720 Barley 0 669 13,898 200 1,755 11 16,533 Species & other 6,541 1,658 155 820 30 15 9,219 Rye 36 109 132 168 219 13 677 Total 6,816 40,442 91,057 13,829 9,783 437 158,713 TOTAL (excl. barley) 142,180 CIMMYT Wheat Germplasm Bank
  3. 3. WGB: Opportunities, Challenges and Gaps ● Pedigrees, for GWAS or GS precision ● Phenotypes, so expensive (Curation) ● Core reference sets (SeeD, GCP, WGB, FIGS) ● GRIN Global and GeneSys ● Actions as a “global system” ● Little overlap with USDA and ICARDA The phenotypic values, representing over 11.2M data points, are held by CIMMYT’s IWIS database. The value of these phenotypic values exceed USD100M, if the trials resulting in the assembled data were to be repeated today.
  4. 4. WGB: Opportunities, Challenges and Gaps ● Species accessions Too many! Yet, extent of in situ diversity? Generate new diversity with existing accessions? ● Frustration of limited access to new, improved germplasm (this might also extend to collecting landraces). ● Most exchange is bank-to-bank ● “my institution/government owns the germplasm”
  5. 5. Data quality control (single field analysis) Identification of out layers Verification (field books) Data storage Database with meta data available Data control of wheat nurseries
  6. 6. Data verified by trait and by nursery
  7. 7. LOC_N O COUNTRY LOCDESCRIP INSTITUTEN 10601MAURITIUS REDUIT Agricultural Research and Exte 19011ALGERIA ITGC-DAHMOWNE ITGC 19012ALGERIA EL HARRACH ITGC 19121EGYPT SERS EL-LIYAN Agr. Res. Center 20701LEBANON BEKA'A VALLEY Agric. Res. Inst. 21221TURKEY AGRICULTURE FACULTY University of Trakya 22243INDIA NAGAON EXP. STA. DWR 24059CHINA AN DA ALKALI SALINE SOIL INST. Heilongjiang Academy 27121THAILAND NONGKAI RICE EXP. STN. Rice Research Inst. 41303 UNITEDSTAT ES ALABAMA AMU Alabama A & M Univ. 42109MEXICO MEXICALI CIMMYT 42138MEXICO CIANO - FULL IRRIGATION CIMMYT 65001GREECE KENTZIKO THERMI NA 65004GREECE CEREAL INSTITUTE (EPANOMI) NAGREF-DW Dept. 65009GREECE SCHOOL OF AGRICULTURE YPSILON SA LOC_NO Point:COUNTRY Polyg:COUNTR Y LOCDESCRIP INSTITUTE 12308KENYA Ethiopia ENDEBESS Kenya Seed Company Ltd. 19013ALGERIA Morocco AIN EL HADJAR ITGC 19126EGYPT India KHATTARA Agr. Res. Center 20011AFGHANISTAN Kazakhstan TAKHAR-TALOQAN CIMMYT 20330IRAN Russia BIRJAND AGRIC. RES. STN. SPII 21115SYRIA Turkey AL RQA Ministry of Agriculture 21117SYRIA Turkey HRAN Ministry of Agriculture 21121SYRIA Iraq HIMO Ministry of Agriculture FIGS without roots, or imbalanced passport, characterization & evaluation data
  8. 8. Data control a continuing process • The same location with different management system has only one planting and harvest date • Full irrigation or irrigated locations with “NO” irrigation in the corresponding field value • Same location, IRR YLD less than RF YLD • 13 Ton/Ha in RF location (Mexico Obregon) as an example other control methods with time • Outliers across locations and years • Validating dates using earlier years or neighboring locations • RF versus IRR • …
  9. 9. MET Analysis MET data GxE analysis Variance components, G corr, BLUPS, Stability GxE with covariables Patterns of GxE (spatially changing relationships) Identification of co-variables (Factors, variates) Meta data stored in the DB All, RF, IRR
  10. 10. y = 0.0176x + 4.3539 R² = 0.1952 y = 0.113x + 8.0045 R² = 0.5792 y = 0.0013x + 1.0054 R² = 0.0005 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 mean max min Linear (mean) Linear (max) Linear (min) y = -0.3232x + 15.653 R² = 0.4153 y = 0.308x + 82.476 R² = 0.3959 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00 Vg Vgxe Linear (Vg) Linear (Vgxe) Change in yield variability in Wheat Nursery
  11. 11. 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 ESWYT30 ESWYT29 ESWYT28 ESWYT27 ESWYT26 ESWYT25 ESWYT24 ESWYT23 ESWYT22 ESWYT20 tmin_veg tmin_rep tmin_gf tmin_seas 0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 ESWYT30 ESWYT29 ESWYT28 ESWYT27 ESWYT26 ESWYT25 ESWYT24 ESWYT23 ESWYT22 ESWYT20 tmax_veg tmax_rep tmax_gf tmax_seas Climate/stage driving variability
  12. 12. tavg_gf tavg_rep tavg_seas tavg_veg tmax_gf tmax_rep tmax_seas tmax_veg tmin_gf tmin_rep tmin_seas tmin_veg 5398434 5390631 5398434 5390631 5398434 5390631 5398434 5390631 5398434 5551629 5551629 5551629 5534459 5551747 5551629 5551629 5534459 5551747 5390631 5552140 5534459 5551747 5398434 5390631 2430154 5551765 5390631 5551765 2430154 2430154 2430154 5551765 2430154 5390631 5398450 5552189 5398450 5551629 5398450 5552140 5398450 5398434 5398450 5551629 5534344 5551765 5390631 5552010 5398424 2430154 2430154 5552010 5398424 5551765 5551629 5398450 5398424 5534344 2430154 5552193 5551747 5534459 5551747 5398450 5390631 5534459 5551747 5398434 5398450 5534459 5534312 5534312 FDgf FDrep FDveg prec_gf prec_veg R10mmCL R10mmgf R10mmrep R10mmvegR5mmCL R5mmrep R5mmveg 5398530 5398530 5535500 5534326 5534326 5534312 5398530 5534312 5534312 5534312 5534312 5534312 5534312 2673706 5534475 5551704 5551704 5535534 5534312 5535534 5535534 5535534 5535534 5535534 5534459 4893489 5398136 5551798 5551798 5552327 5534459 5552327 5552327 5552327 5552327 5552327 5551690 5398471 2673706 5398160 5535534 5398530 5551690 5398530 5398530 5398530 5398530 5398530 5552193 5535415 5535514 5534335 5534335 5535415 5552193 5535415 5535415 5535415 5535415 5535415 2430154 5535428 5398471 5534339 5398160 5390809 2430154 5390809 5390809 5390809 5390809 5390809 How to use all these outputs? Genotypic sensitivities
  13. 13. -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 PHB30R73 1368/(9071XBABAMGOYO)-1//9091 PHB30H83 1368/9071//9091 SC621 PHB30H37 102/1368//9071 CZH99052 CZH99063(QPM) CZH99044 PAN6573 CZH99055 CZH99053(QPM) SC713 CZH99049(QPM) PAN67 CZH00023 9071/(KU1403X1368)-2-1//1393 PAN5503 CZH00025 TZ9043DMRSR/9071 SC633 CZH00028 CZH99038 CZH99040 983WH23 DK8051 CZH99061 SC627 CZH00029 CZH00030 CZH99020 CZH99037 CZH00026 PHB30G97 CZH99025 CZH00024 CZH00027 SC709 CZH99021 SC715 CZH99030 Post Silk Tmax (3% of variability of Grain WT in African Maize nursery) to have stress populations or using Pedigree to identify useful parents Again genotype’ sensitivities to climate is useful!
  14. 14. Basic Model: YLD = Line + Location + LocationxYear+ Error Full Model: YLD = Line + Location + LocationxYear+ Climate + Genetic Markers + Genetic MarkersxClimate + Genetic MarkersxLocation + Error Attempt to dissect GxE
  15. 15. 0.2 0.25 0.3 0.35 0.4 0.45 BasicModel FullModel LOC1 LOC2 LOC3 LOC4 LOC5 Predicting all genotypes in single location (4 Years)
  16. 16. Best matching clusters for all ESWYT locations
  17. 17. 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 M1 M5 Basic Model: Line + LocxYear Full Model: Line + LocxYear + Linex LocxYear + W + LinexW + G + GxLocxYear + GxW Yield prediction on Elite nursery worldwide
  18. 18. -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 5 6 7 M1 M5 Yield prediction on Elite nursery SEA
  19. 19. 25 - 75 50 - 50 RF 82.2 88.9 IRR 46.7 57.8 Can we predict some genotype in all locations?
  20. 20. Latest years Genetic structure of 32 years of ESWYT
  21. 21. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ALL ENV GENO ALL ENV GENO ALL ENV GENO ALL ENV GENO Linear Regression SVM Regression Random Forest PLS Regression Drought Optimal LowN Model CV 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ALL ENV GENO ALL ENV GENO ALL ENV GENO ALL ENV GENO Linear Regression SVM Regression Random Forest PLS Regression Drought Optimal LowN Modeling Maize African nurseries (EIHYB)
  22. 22. Maize landraces in Latino-America
  23. 23. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 LR RF SVM KNN Training TS=C TS=C+G(PCs) TS=C+Pop TS=PCs PC1=C PC2=C 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 LR RF SVM KNN CV TS=C TS=C+G(PCs) TS=C+Pop TS=PCs PC1=C PC2=C Modeling TS and genetic structure with long-term climate
  24. 24. Cycle SOW_julia n Emergence _Julian HARVEST_julia n FOLIAR_DISEASE_DEVELOPM ENT IRRIGATE D LODGIN G 2005 11/19/2005 4/21/2006 TRACES YES SLIGHT Environmental data Cycle SOW_julian Emergence _Julian HARVEST_julian FOLIAR_DISEASE_DEVELOPMENT IRRIGATED LODGING 2005 11/19/2005 4/21/2006 TRACES YES SLIGHT Traits Varieties tested PBW343 CHAM 6 KLEIN CHAMACO HIDDAB CHAKWAL 86 DHARWAR DRY MILAN/KAUZ//PASTOR FLORKWA-1/DHARWAR DRY PASTOR/BAV92 CNDO/R143//ENTE/MEXI_2/3/AEGILOPS SQUARROSA (TAUS)/4/WEAVER/5/PASTOR Grain yield Days to heading Plant heightCan feed the phenology table presented earlier The Wheat Atlas Website
  25. 25. Table of genotype by location values + mean, max, min and SD of genotypes and locations Install a win-win relationships with collaborators: They send data, we provide analysis and reports The CIMMYT IWIS web-page http://apps.cimmyt.org/wpgd/index.htm
  26. 26. IWIN-DAP: An Excel Add-In to analyze CIMMYT data
  27. 27. ● Curation is important ● Vey helpful to complete info at the genebank and creation of stress populations  accelerate germplasm exchange ● Pipelines for prediction and genomic selection: Pedigrees and markers ● Data management and sharing; analytical and visualization tools ● Collaborations Conclusions

×