The role of a biometrician in anInternational Agricultural Center:   service and research in an      interdisciplinary fra...
Interdisciplinary collaboration• No man (or woman) is an island• Few problems are simple problems• Problems are better sol...
Should be possible for the Biometrician to          participate in all or some of• Planning of surveys and experiments• Da...
Trying to• Allow researchers to work in depth within  their disciplines and problems, not on  methods for data analysis, c...
Four levels of participation1. Routine analysis and known problems: short   and fast responses2. New problems, known metho...
Some examples ofinterdisciplinary collaboration                            JFD
Experimental Design    (continuous traits, Mixed Model)• Spatial analysis of an Experimental Design• Variety trial designe...
Layout (r=2, t=64)                                    -lattice (0,1,2)r re o                                        column...
Statistical modelsyij         i     j    ij                                               RCBDyij         i     j    k ( j...
• Assumptions on the “error” RV: V( ) =                                            2R        Usual GLM:                   ...
three distance functions f(d ij )       1.00f(d)       0.50       0.00              0       10               20           ...
Spatial statistical model                 entry     yijkl               i       ijkl    ε ~ N (0, R)   Where:   i = 1,2,…,...
ResultsPrecision of estimates and test of hypothesis  Average Standard Error of ls-means                          rcbd    ...
conclusionWhen possible it is convenient to use layouts that allow the spatial analysis with the objective of reaching low...
Spatial distribution of two insect plagues    (apple and peach trees) on a region• Grafolita (Cydia  molesta)• Carpocapsa ...
Objectives• To obtain regionalized maps of the spatial  distribution of an insect plague across time and  on the whole cro...
traps location(25 x 26 km)   JFD
Semivariogram (geostatistics)         Sill point             2600                                2400                     ...
Spherical :                                           3                       d ij         d ijf (d ij )   1 1.5          ...
A linear mixed model is used to selecting the best fit model, and       estimating the range                              ...
• Null model: the independence model     R0= 2I, in which ij = 0 for all ij• Alternative model: the spatially related mode...
ResultsModel                        ˆ (m)      -2 Log(L) (1)       -2 log( )         P(   2>   2                          ...
Prediction (interpolation) Prediction of non observed values using theobserved values plus the knowledge of a good    spat...
CarpocapsaMean   Standard    Range    Number of traps       deviation 60       49        6-259                   JFD      ...
Forecasting      (a more complex model)• Forest inventory and growth models  – Non linear (but linearized models)  – struc...
Growth forestry model•   Eucalyptus (Bicostata)•   In a region•   Using the inventory data sets (yearly data)•   n = 2461 ...
First data set   (coefficient estimation and model fit)nplot   Y1   Y2   H1   H2   BA1    BA2   N1    N2    V1      V2   S...
An external equation(dominant height and site index)                            2          1 exp(   1 A2 )  h2   h1       ...
And 5 structured non-linear       (but linearized) difference equations     density1.   ln2~ln1+I(lA2-lA1)+ 1     basal ar...
characteristics• results from an equation are independent variables on another• “residuals” ( 1, 2,…, 5) are correlated an...
model                 evaluation                                       External:       Internal:                          ...
Second data set: forecasting         (prediction and external fit)nplot   Y1   Y2   HD1    BA1      N1      V1       S100 ...
Observed correlations between  residuals (3SLS-method)          density    basal-1 basal-2 vol-1 vol-2density     1.0000  ...
Predicted growth curves                                                          (4 plots)                                ...
CompetitivenessComparison of the socio-economic competitiveness of three regional  blocks: Mercosur, NAFTA, EU            ...
Competitiveness driversDriver                       Nr. variables1. Knowledge                      112. Innovation platfor...
Objectives and Methods• Generating a Competitiveness Index (CI)  – STAGE 1: PCA per driver  – STAGE 2: PCA using the first...
Driver                         Nr. vars.   Contribution    Explained                                              to CI   ...
Driver-year CI        2.0        1.5                                                                   KNOW        1.0    ...
One dimension CI                                3.50                                0.00-3.50   -2.50   -1.50       -0.50 ...
RELATIVE IMPROVEMENT 2000/1990            (weighted means)               KNOWLEDGE                    0.8                 ...
Association mapping• The mixed model on association mapping• Example:  – 46 wheat genotypes  – 374 markers (DArT)  – 5 tra...
Components                           Mixed                           Model                                                ...
Data set (subset)        (one environment, 7groups, 4 markers)geno    y2    gr1     gr2     gr3     gr4     gr5     gr6   ...
Proposition• The marker is associated to the phenotype if the  trait average for genotypes owning a “0” is “very”  differe...
Q+K model (one site)                                fixed                           random                                ...
Variance-covariance matrices   V (ε)          2                      I                          2   V (u )      A         ...
matrix K (= A)                 JFD
Bonferroni bound           (when tests are not independent)• Example: m=46 tests, =0.05,     P[reject at least one Ho/Ho t...
Results (yield)         (10 environments, 374 DArT, Q+K model)                                       0.10             0.05...
Methodological Research1. The Ward-MLM three-way strategy for   classifying genetic resources in multiple   environments2....
Interdisciplinary group• Jose Crossa         CIMMYT-Biometrics• Suketoshi Taba      CIMMYT-Maize Genebank• Marilyn Warburt...
The Ward-MLM three-way strategy         for classifying genetic resources in        multiple environments: evaluate GxE   ...
Properties of the Ward-MLM strategy1. It assigns the observations to an optimal number of   clusters based on membership p...
Example               256 Caribbean maize genotypesDiscrete variable :   • Agronomic Scale (1=poor, 2=regular, 3=good)Cont...
Information to the 3-way analysis                                     (first 5 observations)NOBS   DA1     PH1    DS1     ...
Canonical representation (87%)                                                     6                                      ...
Agronomic performance of genotypes    Group    good       Regular    Poor       Non id.            (GGG,GGR)             (...
Average values groups 5 and 8       DA           PH        DS       EL          AgSG E1   E2 E3   E1   E2   E3 E1 E2 E3 E1...
group X environment interaction                                 (no COI)110                                               ...
Agronomic scale (COI**)3 .53 .02 .5                                                             env 12 .0                 ...
Variable relations vs Environment                                                (between groups)              0.8        ...
Sampling strategies for conserving diversity when forming core subsets using genetic                 markers• DEF: A core ...
STEP 1. Numerical classification• The most used methods are Ward (minimumvariance within cluster), and UPGMA (average ofdi...
STEP 2. drawing accessions from clusters    (how many accessions from each cluster?)Allocation methods : are methods for d...
D-method: proportional to any measure of the            cluster diversity                                  di          ni ...
Diversity                                                            Distances among               Diversity indexes      ...
Example: three maize data sets               (SSR markers)              Obs. Alleles Markers    Values MissingBulks       ...
24 stratified strategies + MSTRAT           (20 independent core subsets)Code    Cluster    Distance      Allocation metho...
Means of 20 reps (different core subsets)                                         Bulk data set                           ...
Bulks                               3                               2                                1                    ...
Landraces                                            3                                            2                       ...
Conclusions• When constructing core subsets of individuals  (landraces/accessions) the D allocation method used  with a st...
Uses of D allocation method• D method was used define reference  collections for inbred lines and populations  of maize fr...
Current research  A method for classifying genotypes   using phenotypic and genotypic     information simultaneouslyPhenot...
Upcoming SlideShare
Loading in...5
×

The role of a biometrician in an International Agricultural Center: service and research in an interdisciplinary frame

1,562

Published on

obtaining regionalized maps of the spatial distribution of an insect plague across time and on the whole crop growth stage,Define the “convenient” distance between (pheromone) traps,Software: SAS, S+, GS+, GIS software

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,562
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The role of a biometrician in an International Agricultural Center: service and research in an interdisciplinary frame

  1. 1. The role of a biometrician in anInternational Agricultural Center: service and research in an interdisciplinary frame Some ideas and examples JFD
  2. 2. Interdisciplinary collaboration• No man (or woman) is an island• Few problems are simple problems• Problems are better solved using teams of specialists working together• The biometrician needs scientist’s data to propose and test methodologies and the scientists need the analytical tools to address their problems JFD
  3. 3. Should be possible for the Biometrician to participate in all or some of• Planning of surveys and experiments• Data processing, analysis and interpretation• Design of novel tools/methodologies for analysis• Writing and/or editing results• Seminars and courses on new methodologies and software (the free software issue)• Capacity building (internal and external) JFD
  4. 4. Trying to• Allow researchers to work in depth within their disciplines and problems, not on methods for data analysis, computer routines, software, etc.• Supply new points of view and new methodological tools on the research work• Improve the quality of inferences JFD
  5. 5. Four levels of participation1. Routine analysis and known problems: short and fast responses2. New problems, known methodologies: we need think together3. New problems, existing methodology: we need to understand, study and propose solutions4. New problems, unknown methodology: we need do some methodological research JFD
  6. 6. Some examples ofinterdisciplinary collaboration JFD
  7. 7. Experimental Design (continuous traits, Mixed Model)• Spatial analysis of an Experimental Design• Variety trial designed as a row-column design with 64 varieties planted in two contiguous replicates laid out in 8 rows and 16 columns Nguyen and Williams, 1993 (Austral. J. Stat. 35: 363-370) JFD
  8. 8. Layout (r=2, t=64) -lattice (0,1,2)r re o columnp w 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 1 5 19 55 23 27 38 64 44 14 13 59 25 45 49 11 431 2 2 36 62 29 34 40 52 54 28 26 58 42 32 6 39 531 3 37 3 17 24 15 50 47 56 16 30 31 60 7 18 8 211 4 20 35 57 63 46 12 41 10 22 1 51 61 33 48 4 92 5 58 44 15 25 18 53 5 22 55 36 33 29 12 47 8 12 6 41 17 63 37 38 19 26 21 28 4 6 45 56 46 59 422 7 14 9 62 48 34 31 60 13 20 7 64 3 39 40 10 232 8 24 52 11 54 51 57 35 2 50 27 30 61 43 49 32 16 5,3 =0 5,2 =1 5,44 =2 JFD
  9. 9. Statistical modelsyij i j ij RCBDyij i j k ( j) l( j) ijkl Row-Col Local control of field heterogeneity JFD
  10. 10. • Assumptions on the “error” RV: V( ) = 2R Usual GLM: dij: distance in time and/or space Independence Repeated measures Spatial analysis 1 0 0 0 ... 0 0 1 0 0 ... 0 0 0 1 0 ... 0 1 f (d12 ) ... f (d1n ) 2R 0 0 0 1 ... 0 2 f (d 21 ) 1 ... f (d 2 n ) R ... ... ... ... ... ... ... ... ... ... 0 0 0 0 ... 1 f (d n1 ) f (d n 2 ) ... 1 JFD
  11. 11. three distance functions f(d ij ) 1.00f(d) 0.50 0.00 0 10 20 30 40 50 distance spherical exponential gaussian JFD
  12. 12. Spatial statistical model entry yijkl i ijkl ε ~ N (0, R) Where: i = 1,2,…, t (entries) j = 1,2,…, J (replications) k = 1,2,.., K (rows) l = 1,2,…, L (columns) JFD
  13. 13. ResultsPrecision of estimates and test of hypothesis Average Standard Error of ls-means rcbd row-col spat-sph Average 0.5076 0.4041 0.4526 Standard Errors of some ls-means differences Label rcbd row-col spat-sph 5 vs 3 0 0.7179 0.5256 0.4737 5 vs 2 1 0.7179 0.4833 0.4286 5 vs 14 2 0.7179 0.5232 0.4906 5 vs 44 2 0.7179 0.5159 0.4938 JFD
  14. 14. conclusionWhen possible it is convenient to use layouts that allow the spatial analysis with the objective of reaching lower standard errors for the contrasts JFD
  15. 15. Spatial distribution of two insect plagues (apple and peach trees) on a region• Grafolita (Cydia molesta)• Carpocapsa pomonella• Pheromone traps JFD
  16. 16. Objectives• To obtain regionalized maps of the spatial distribution of an insect plague across time and on the whole crop growth stage• Define the “convenient” distance between (pheromone) traps• Software: SAS, S+, GS+, GIS software, R? JFD
  17. 17. traps location(25 x 26 km) JFD
  18. 18. Semivariogram (geostatistics) Sill point 2600 2400 2200 2000 semivar 1800 1600 1400Nugget (intercept) 1200 1000 0 5000 10000 15000 distance (m) Range JFD
  19. 19. Spherical : 3 d ij d ijf (d ij ) 1 1.5 0.5 I (d ij )Exponential : d ijf (d ij ) eGaussian : (range): distance from 2 which observations are independent d ijf (d ij ) e 2f (d ii ) 1 JFD
  20. 20. A linear mixed model is used to selecting the best fit model, and estimating the range 2yij V( ) R F ij F { f (d ij )} JFD
  21. 21. • Null model: the independence model R0= 2I, in which ij = 0 for all ij• Alternative model: the spatially related model: R = 2F, F={f(dij)} matrix• Using a Likelihood Ratio Test (LRT) JFD
  22. 22. ResultsModel ˆ (m) -2 Log(L) (1) -2 log( ) P( 2> 2 c) CarpocapsaIndependence 1169Spherical 4818 1156 12.7 0.002 **Gaussian 2256 1157 11.9 0.003 **Exponential 1343 1157 11.8 0.003 ** Grafolitaindependence 1785Spherical 1290 1782 3.5 0.174 nsGaussian 604 1781 4.0 0.135 nsExponential 527 1780 4.6 0.100 ns (1) smaller -2Log(L) is better JFD
  23. 23. Prediction (interpolation) Prediction of non observed values using theobserved values plus the knowledge of a good spatial model: KRIGING proceduresKrige, Danie G. (1951). "A statistical approach to some basic minevaluation problems on the Witwatersrand". J. of the Chem., Metal.and Mining Soc. of South Africa 52 (6): 119–139. JFD
  24. 24. CarpocapsaMean Standard Range Number of traps deviation 60 49 6-259 JFD 111
  25. 25. Forecasting (a more complex model)• Forest inventory and growth models – Non linear (but linearized models) – structured equations models JFD
  26. 26. Growth forestry model• Eucalyptus (Bicostata)• In a region• Using the inventory data sets (yearly data)• n = 2461 - from 506 plots (300 m2, circular) - Measured each year (from 3 to 15)• “difference” models JFD
  27. 27. First data set (coefficient estimation and model fit)nplot Y1 Y2 H1 H2 BA1 BA2 N1 N2 V1 V2 S100 3 4 12.5 16.9 15.7 22.2 1367 1367 53.2 108.0 22.3100 4 5 16.9 17.7 22.2 26.1 1367 1367 108.0 135.9 24.2100 5 6 17.7 22.9 26.1 30.3 1367 1367 135.9 183.9 21.7100 6 7 22.9 23.3 30.3 33.2 1367 1333 183.9 219.3 25.0100 7 8 23.3 23.3 33.2 36.6 1333 1333 219.3 244.9 23.3100 8 9 23.3 23.9 36.6 38.2 1333 1333 244.9 264.6 21.7100 9 10 23.9 24.1 38.2 39.7 1333 1333 264.6 275.7 21.1100 10 11 24.1 25.7 39.7 41.7 1333 1333 275.7 304.9 20.3100 11 12 25.7 25.0 41.7 46.4 1333 1333 304.9 327.0 20.9 JFD
  28. 28. An external equation(dominant height and site index) 2 1 exp( 1 A2 ) h2 h1 1 exp( 1A ) 1 ˆ2 1 exp( ˆ1 7) S h1 1 exp( ˆ1 A1 ) JFD
  29. 29. And 5 structured non-linear (but linearized) difference equations density1. ln2~ln1+I(lA2-lA1)+ 1 basal area-12. lg1~I(1/A1)+ln1+lh1+I(lh1/A1)+I(S*A1/100) + 2 basal area-23. lg2~lg1+I(1/A2-1/A1)+I(ln2-ln1)+I(lh2-lh1)+I(lh2/A2- lh1/A1)+I(S*(A2/100-A1/100)) + 3 Volume-14. lv1~ln1+lh1+lg1+I(S*A1/100) + 4 Volume-25. lv2~lv1+I(ln2-ln1)+I(lh2-lh1)+I(lg2-lg1)+I(S*(A2/100- A1/100)) + 5 JFD
  30. 30. characteristics• results from an equation are independent variables on another• “residuals” ( 1, 2,…, 5) are correlated and, possibly, do not have homogeneous variances• it is necessary a simultaneous estimation process• 22 regression coefficients• Possible methods: OLS, SUR, 2SLS, 3SLS• Software: Systemfit in “R” (freeware) JFD
  31. 31. model evaluation External: Internal: adjust of estimated modelModel fit on a randomly on other “independent” selected sub-dataset sub-dataset Year by year Long term JFD
  32. 32. Second data set: forecasting (prediction and external fit)nplot Y1 Y2 HD1 BA1 N1 V1 S100 3 4 12.5 15.74 1366.7 53.2 22.3100 3 5 12.5 15.74 1366.7 53.2 22.3100 3 6 12.5 15.74 1366.7 53.2 22.3100 3 7 12.5 15.74 1366.7 53.2 22.3100 3 8 12.5 15.74 1366.7 53.2 22.3100 3 9 12.5 15.74 1366.7 53.2 22.3100 3 10 12.5 15.74 1366.7 53.2 22.3100 3 11 12.5 15.74 1366.7 53.2 22.3100 3 12 12.5 15.74 1366.7 53.2 22.3 JFD
  33. 33. Observed correlations between residuals (3SLS-method) density basal-1 basal-2 vol-1 vol-2density 1.0000 0.0993 0.0362 -0.0861 -0.0234basal-1 1.0000 -0.4837 -0.7354 0.1089basal-2 1.0000 0.2935 -0.6356vol-1 1.0000 -0.4777vol-2 1.0000 JFD
  34. 34. Predicted growth curves (4 plots) obs pred Lower Upper 500 400volume (m3) 300 200 100 0 4 6 8 10 12 4 6 8 10 5 7 9 11 5 7 9 11 Age JFD
  35. 35. CompetitivenessComparison of the socio-economic competitiveness of three regional blocks: Mercosur, NAFTA, EU (UNDP) JFD
  36. 36. Competitiveness driversDriver Nr. variables1. Knowledge 112. Innovation platform 123. Connectivity 124. Infrastructure 85. Macroeconomic variables 96. Social cohesion 13Total 65 JFD
  37. 37. Objectives and Methods• Generating a Competitiveness Index (CI) – STAGE 1: PCA per driver – STAGE 2: PCA using the first PC from STAGE 1 (CI = scores on the first PC from STAGE 2)• Improvement comparison: advances by driver – Relative improvement = (2000 vs. 1990) – Weighted average per driver (Weight = participation of each driver into the CI) JFD
  38. 38. Driver Nr. vars. Contribution Explained to CI Variability(1)Knowledge 11 20.1 74.5I Platform 12 18.4 66.1Connectivity 12 21.6 82.5Infrastructure 8 13.4 78.0Macro Econ -vars 9 5.6 47.8S-cohesion 13 20.9 62.4Sum 65 100(1) First PC within driver JFD
  39. 39. Driver-year CI 2.0 1.5 KNOW 1.0 IPLAT 0.5 CONECscore 0.0 INFR -0.5 -1.0 MACV -1.5 COHE -2.0 MS90 MS95 MS00 NF90 NF95 NF00 EU90 EU95 EU00 block-year JFD
  40. 40. One dimension CI 3.50 0.00-3.50 -2.50 -1.50 -0.50 0.50 1.50 2.50 3.50 MS NF UE -3.50 Competitiveness Index JFD
  41. 41. RELATIVE IMPROVEMENT 2000/1990 (weighted means) KNOWLEDGE 0.8 0.6S-COHESION 0.4 I-PLATFORM 0.2 0.0MACRO-VARS CONECTIVITY INFRASTRUCTURE MS NF UE JFD
  42. 42. Association mapping• The mixed model on association mapping• Example: – 46 wheat genotypes – 374 markers (DArT) – 5 traits: w1000, leaf rust, steam rust, maturity, yield – environments: 15, 17, 5, 15, 10 JFD
  43. 43. Components Mixed Model Relations Markers Phenotypic Underlying betweendata (DArT) data structure genotypes Binary Groups Coancestry traits values (co-variables) (Parentage) (Y1,…,Ye) {0, 1} Q matrix K matrix STRUCTURE K=2pij or similarity no linked markers from markers JFD
  44. 44. Data set (subset) (one environment, 7groups, 4 markers)geno y2 gr1 gr2 gr3 gr4 gr5 gr6 gr7 m45 m46 m47 m48 1 8.6 0.001 0.994 0.001 0.001 0.001 0.001 0.001 0 1 1 1 2 12.9 0.996 0.001 0.001 0.001 0.000 0.000 0.001 1 1 1 1 3 11.8 0.001 0.001 0.001 0.001 0.001 0.993 0.002 1 0 1 1 4 12.3 0.281 0.003 0.001 0.002 0.462 0.199 0.052 1 1 0 0 5 10.0 0.002 0.007 0.003 0.239 0.136 0.610 0.005 1 0 1 1 . . . . . . . . . . . . . 45 10.8 0.001 0.001 0.991 0.002 0.001 0.002 0.002 0 1 1 1 46 7.0 0.001 0.002 0.948 0.042 0.004 0.001 0.002 0 1 1 1 JFD
  45. 45. Proposition• The marker is associated to the phenotype if the trait average for genotypes owning a “0” is “very” different to the trait average for genotypes owning a “1”• “very”= statistically different in a test of hypothesis• 3 possible models: one way anova, Q model, Q+K model JFD
  46. 46. Q+K model (one site) fixed random g 1 yij i l 1 (l ) ˆ( qijl ) j (i ) ij Xβ Zu εi = 1,2 (two states of marker) i = 1 if the ith marker is present in jth genotypeˆ(qijl )= membership probability of the jth genotype to the lth group j(i) = genotype nested in the marker state (i=1 or i=0) JFD
  47. 47. Variance-covariance matrices V (ε) 2 I 2 V (u ) A a• A is known: coefficient of parentage or somesimilarity using non linked markers• Variances should be estimated JFD
  48. 48. matrix K (= A) JFD
  49. 49. Bonferroni bound (when tests are not independent)• Example: m=46 tests, =0.05, P[reject at least one Ho/Ho true] = 1-(1- )m = = 1-(0.95)46 = 0.906• if you want 0.05 “on all tests” type I error Reject Ho when 1 ˆ 1 (1 0.05) 46 0.001114 JFD
  50. 50. Results (yield) (10 environments, 374 DArT, Q+K model) 0.10 0.05Site DART F p p<=0.00028 p<=0.000138 348 wPt-0086 14.9740 1.26E-04 1 1 275 wPt-1859 14.3168 1.77E-04 1 0 279 wPt-3992 15.1771 1.13E-04 1 1 194 wPt-4115 15.3556 1.04E-04 1 1 163 wPt-4487 24.1353 1.27E-06 1 1 234 wPt-9075 19.5032 1.30E-05 1 1 424 wPt-9930 14.7447 1.42E-04 1 0 total 7 5 JFD
  51. 51. Methodological Research1. The Ward-MLM three-way strategy for classifying genetic resources in multiple environments2. Sampling strategies for conserving diversity when forming core subsets using genetic markers JFD
  52. 52. Interdisciplinary group• Jose Crossa CIMMYT-Biometrics• Suketoshi Taba CIMMYT-Maize Genebank• Marilyn Warburton CIMMYT-USDA-Molecular genetics• Sarah Hearne IITA- Molecular genetics• Steve A. Eberhart USDA-Genebank• Jose Villaseñor C. P.-mathematics• Jorge Franco UDELAR-CIMMYT-Biometrics JFD
  53. 53. The Ward-MLM three-way strategy for classifying genetic resources in multiple environments: evaluate GxE JFDFranco et al (2003) Crop Sci 43; 1249-1258
  54. 54. Properties of the Ward-MLM strategy1. It assigns the observations to an optimal number of clusters based on membership probability (is a statistical method)2. It uses discrete an continuous variables simultaneously3. It follows the optimization of two objective functions (minimum variance within group, and maximum Log- likelihood)4. It allows the estimation of the quality of the resulting clustering (average of the assignment probabilities)5. It can be used on 3-way data sets (genotype × environment × trait). JFD
  55. 55. Example 256 Caribbean maize genotypesDiscrete variable : • Agronomic Scale (1=poor, 2=regular, 3=good)Continuous variables: • Days to anthesis (DA) • Plant height (PH, cm) • Days to senescence of the ear leave (DS) • Ear length (EL, cm)Three environments (Mexico) JFD
  56. 56. Information to the 3-way analysis (first 5 observations)NOBS DA1 PH1 DS1 EL1 DA2 PH2 DS2 EL2 DA3 PH3 DS3 EL3 AS1 AS2 AS3 1 73.5 222.4 45.5 14.7 91.6 210.0 48.1 17.3 59.8 262.8 45.8 21.0 1 2 3 2 71.4 225.7 47.5 14.9 90.1 209.8 40.0 13.1 61.7 262.3 43.9 15.5 2 2 2 3 64.5 165.8 51.1 13.5 75.6 171.3 56.5 16.4 54.3 224.1 47.4 17.5 2 3 2 4 73.5 180.3 47.5 12.2 91.9 176.2 49.7 16.3 61.1 234.8 55.0 17.5 2 2 3 5 74.5 200.5 50.4 15.3 93.0 186.6 56.8 16.2 61.2 249.1 51.4 19.0 3 3 3 JFD
  57. 57. Canonical representation (87%) 6 4 G1 G2 2Can 2 (EL+,PH+, 8%) G3 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 G4 -2 G5 G6 -4 G7 -6 G8 -8 Can 1 (DS+, DA-, EL+,79% ) JFD
  58. 58. Agronomic performance of genotypes Group good Regular Poor Non id. (GGG,GGR) (RPP,PPP) (GRP)1 (46) 462 (25) 18 5 23 (27) 19 84 (53) 535 (24) 246 (22) 13 97 (32) 7 19 68 (27) 3 24 JFD
  59. 59. Average values groups 5 and 8 DA PH DS EL AgSG E1 E2 E3 E1 E2 E3 E1 E2 E3 E1 E2 E3 E1 E2 E3 n5 75 94 63 213 218 266 45 52 49 15 16 19 3.0 3.0 2.9 248 81 102 70 218 219 257 38 43 39 13 15 16 1.3 1.4 1.0 27m 77 97 66 214 218 266 42 48 45 14 16 18 1.9 2.0 2.4 256 JFD
  60. 60. group X environment interaction (no COI)110 280100 DA 270 PH 26090 250 env 1 240 env 180 env 2 env 2 env 3 230 env 370 220 21060 20050 190 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 group group55 20 DS 19 EL50 18 17 env 1 env 145 env 2 16 env 2 env 3 env 3 1540 14 1335 12 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 group group JFD
  61. 61. Agronomic scale (COI**)3 .53 .02 .5 env 12 .0 env 2 env 3 1.51.00 .5 0 1 2 3 4 5 6 7 8 9 g r o up JFD
  62. 62. Variable relations vs Environment (between groups) 0.8 PH3 0.6 PH2 Q1 PH1 0.4 0.29 EL1 0.53 0.2 -0.17 EL3Dimension 2 Env 1 EL2 env 2 0.0 env 3 DS1 Q -0.94 -0.2 DS3 -0.87 DA1 DA3 DS2 -0.88 -0.4 DA2 Q2 -0.6 Q3 -0.8 -1.0 -0.5 0.0 0.5 1.0 1.5 Dimension 1 JFD
  63. 63. Sampling strategies for conserving diversity when forming core subsets using genetic markers• DEF: A core collection (or core subset) is a sample from a large germplasm collection that contains, with a minimum of repetitiveness, the maximum possible genetic diversity of the species in question (Frankel and Brown, 1984)• Forming core subsets requires sampling JFD
  64. 64. STEP 1. Numerical classification• The most used methods are Ward (minimumvariance within cluster), and UPGMA (average ofdistances)• They require an initial matrix of distances betweengenotypes• With SSR we can use genetic distances (ModifiedRogers, Cavalli-Sforza & Edwards). Both are Euclidianmetrics JFD
  65. 65. STEP 2. drawing accessions from clusters (how many accessions from each cluster?)Allocation methods : are methods for determining thenumber of observations to be randomly drawn from eachstratum (cluster)Optimal (Neyman): proportional to the size and variabilityof the clusterP: proportional to the cluster sizeL: proportional to the log of the cluster size JFD
  66. 66. D-method: proportional to any measure of the cluster diversity di ni n t = 1,2,…,number of clusters d t t ni: number of accessions to be drawn from ith cluster di: average of distances between accessions within the ith cluster n: size of the core subset JFDFranco et al (2006) Crop Sci 46; 854-864
  67. 67. Diversity Distances among Diversity indexes Individuals (allele richness) or groups Expected Non informative Informativeheterozygosis Markers Markers (He) p {0,1} p [0,1] Number of Simple Matching Modified Rogerseffective alleles (Ne) Cavalli-Sforza andShannon Index Jaccard Edwards Nei and Li Euclidian (Dice) JFD
  68. 68. Example: three maize data sets (SSR markers) Obs. Alleles Markers Values MissingBulks 275 186 24 [0, 1] 1.5 %Landraces 521 209 26 {0,.5,1} noPopulations 25 209 26 [0, 1] no JFD
  69. 69. 24 stratified strategies + MSTRAT (20 independent core subsets)Code Cluster Distance Allocation methods 1 Modified P, L, to Ward Rogers D: MR,SH,HE,NE 12 Cavalli-S & P, L, Edwards D: CE,SH,HE,NE 13 Modified P, L, To Average Rogers D: MR,SH,HE,NE 24 (UPGMA) Cavalli-S & P, L, Edwards D: CE,SH,HE,NE 25 MSTRAT (non stratified method) JFD
  70. 70. Means of 20 reps (different core subsets) Bulk data set MRs CEs SHs HEs NEs Best strategy (24)† 0.503a 0.578a 4.411b 0.626b 2.980b M-Strategy 0.467b 0.559b 4.478a 0.644a 3.166a Entire collection 0.440 0.521 4.399 0.620 2.937 δ(%) 1.37 1.20 0.31 0.77 1.60 Landraces data set MRs CEs SHs HEs NEs Best strategy (24) 0.653a 0.719a 4.525a 0.619a 2.963a M-Strategy 0.637b 0.704b 4.504b 0.599b 2.808b Entire collection 0.635 0.700 4.467 0.591 2.742 δ(%) 0.48 0.37 0.26 0.82 1.23† (24) UPGMA – Cavalli-sforza & Edwards – Number of effective alleles JFD
  71. 71. Bulks 3 2 1 0Dim-2 -4 -3 -2 -1 0 1 2 3 4 -1 -2 -3 -4 Dim-1 Data U-CE-NE (24) MSTRAT JFD
  72. 72. Landraces 3 2 1Dim-2 0 -3 -2 -2 -1 -1 0 1 1 2 2 3 -1 -2 -3 Dim-1 Data U-CE-NE (24) MSTRAT JFD
  73. 73. Conclusions• When constructing core subsets of individuals (landraces/accessions) the D allocation method used with a stratified sampling strategy was better than the M strategy• For bulks and populations the M-strategy was better for diversity indexes.• Some stratified sampling strategies (21-24) were always better showing the higher average distance (MR, and CE) between accessions• All 25 strategies selected non-informative alleles but the M strategy selected less than the others. JFD
  74. 74. Uses of D allocation method• D method was used define reference collections for inbred lines and populations of maize from Mexico with Marilyn Warburton from CIMMYT• D method was used in a collaboration with Sarah Hearne of IITA to define the reference germplasm collection of cowpea accessions JFD
  75. 75. Current research A method for classifying genotypes using phenotypic and genotypic information simultaneouslyPhenotypic: continuous and categoricalGenotypic: SSR, DART, SNPWe have a draft JFD
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×