•  Domes'ca'on	
  bo-leneck	
  	
  
•  U'liza'on	
  of	
  gene'c	
  diversity	
  
•  Core	
  collec'on	
  subset	
  selec'...
 

                                    wild	
  tomato	
  




                                        tomato	
  

teosinte...
B	
                                 B	
                                 A	
  
                        C	
                 ...
•  Scien'sts	
  and	
  plant	
  breeders	
  want	
  a	
  
   few	
  hundred	
  germplasm	
  accessions	
  
   to	
  evalua...
•  The	
  scien'st	
  or	
  the	
  breeder	
  
   need	
  a	
  smaller	
  subset	
  to	
  cope	
  
   with	
  the	
  field	...
•  Given	
  that	
  the	
  trait	
  
   property	
  you	
  are	
  
   looking	
  for	
  is	
  rela'vely	
  
   rare:	
  
•...
9	
  
Wild	
  rela'ves	
  are	
  shaped	
  	
     Primi've	
  cul'vated	
  crops	
            Tradi'onal	
  cul'vated	
  crops	
...
 Objec/ve	
  of	
  this	
  study:	
  	
  

  –  Explore	
  climate	
  data	
  as	
  a	
  predic'on	
  
     model	
  for	
...
•  Primi/ve	
  crops	
  and	
  tradi/onal	
  landraces	
  are	
  
   an	
  important	
  source	
  for	
  novel	
  traits	
...
 The	
  underlying	
  assump'on	
  of	
  
    FIGS	
  selec'on	
  is	
  that	
  the	
  
    climate	
  at	
  the	
  origin...
1)  Landrace	
  samples	
  (genebank	
  seed	
  accessions)	
  
   2)  Trait	
  observa'ons	
  (experimental	
  design)	
 ...
Alnarp,	
  Sweden	
       Lima,	
  Peru	
  




           Svalbard	
            Benin	
  

                              ...
Faba	
  bean,	
  Finland	
                           Field	
  trials,	
  Gatersleben,	
  Germany	
     Potato	
  Priekuli	...
 The	
  climate	
  data	
  is	
  extracted	
  from	
  
    the	
  WorldClim	
  dataset.	
  
	
  h-p://www.worldclim.org/	
...
FIGS	
  selec'on	
  is	
  a	
  
new	
  method	
  to	
  
predict	
  crop	
  traits	
  
of	
  primi've	
  
cul'vated	
  mate...
What is                           h-p://www.figstraitmine.org/	
  	
  




    Mediterranean	
  region	
  




Origin of Co...
FIGS	
  
	
  The	
  FIGS	
  technology	
  takes	
  much	
  of	
  the	
  guess	
  
    work	
  out	
  of	
  choosing	
  whi...
Slide made by
Michael Mackay 1995


                      21	
  
•  No	
  sources	
  of	
  Sunn	
  pest	
  resistance	
  
   previously	
  found	
  in	
  hexaploid	
  wheat.	
  
•  2	
  0...
23	
  
–  The	
  ini'al	
  model	
  is	
  developed	
  from	
  the	
  training	
  set	
  

–  Fine	
  tuning	
  of	
  model	
  pa...
–  For	
  the	
  ini'al	
  calibra'on	
  or	
  
   training	
  step.	
  


–  Further	
  calibra'on,	
  tuning	
  step	
  ...
26	
  
27	
  
28	
  
Sta/on	
                              Al/tude	
   La/tude	
   Longitude	
  
Priekuli,	
  Latvia	
                   83	
  ...
accide    AccNum      Country             Locality       Eleva/on   La/tude   Longitude    Coordinate

 7436    NGB27     ...
From	
  a	
  total	
  of	
  19	
  landrace	
  
accessions	
  included	
  in	
  the	
  dataset,	
  
only	
  4	
  of	
  the	...
32	
  
3	
  



                                                                                                                 ...
6	
  
         	
  	
  Mode	
  3	
  
         *	
  LVA	
  2002	
  
         *	
  LVA	
  2003	
                            ...
35	
  
36	
  
tmin	
                   tmax	
                  prec	
  

                                                            Mod...
PARAFAC	
  split-­‐half	
  
(mode	
  1)	
  analysis:	
  

The	
  two	
  PARAFAC	
  
models	
  each	
  calibrated	
  
from	...
39	
  
•  Oien	
  the	
  cri'cal	
  levels	
  (α)	
  for	
  the	
  p-­‐value	
  significance	
  	
  
   is	
  set	
  as	
  0.05,	
...
Heading	
     Ripening	
     Length	
     H-­‐Index	
     Vol	
  wgt	
     TGW	
     Priekuli	
  (L)	
     Bjorke	
  (N)	
...
LVA	
  (2002)	
  




 LVA	
  (2003)	
  




NOR	
  (2002)	
  




NOR	
  (2003)	
  




SWE	
  (2002)	
  




 SWE	
  (20...
•  Latvia	
  2002	
  (LY11)	
  
     –  May	
  2002	
  was	
  extreme	
  dry	
  in	
  Priekuli.	
  
     –  June	
  2002	
...
Sowing	
                     Rainfall	
  (mm)	
  
                Sta/on	
               Year	
  
                        ...
 
              	
  




       	
            	
  




                            45	
  
46	
  
47	
  
•  The first dataset I started to work with is a “FIGS”
   dataset with genebank accessions of Barley
   (Hordeum vulgare ...
•    Agro-­‐clima'c	
  Zone	
  (UNESCO	
  classifica'on)	
  
•    Soil	
  classifica'on	
  (FAO	
  Soil	
  map)	
  
•    Ari...
Discriminant Analysis: obs_nb versus acz_moisture; ... 	
  
Quadratic Method for Response:                  obs_nb	
  
Pre...
Michael	
  Mackay	
  
FIGS	
  coordinator	
  




Ken	
  Street	
  
FIGS	
  project	
  leader	
  




Harold	
  Bockelman	...
52	
  
Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)
Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)
Upcoming SlideShare
Loading in...5
×

Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)

562

Published on

Scientific seminar at the Carlsberg Research Institute (CRI) in Copenhagen, Denmark on trait data mining using the Focused Identification of Germplasm Strategy (FIGS), 4th November 2009.
Endresen, D.T.F. (2010). Predictive association between trait data and ecogeographic data for Nordic barley landraces. Crop Sci. 50(6):2418-2430. doi: 10.2135/cropsci2010.03.0174

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
562
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)

  1. 1. •  Domes'ca'on  bo-leneck     •  U'liza'on  of  gene'c  diversity   •  Core  collec'on  subset  selec'on   •  Trait  mining  selec'on   •  Computer  modeling   •  Example  1:     •  Nordic  Barley  Landraces  (2005)   •  N-­‐PLS  regression  (in  MATLAB)   •  Example  2:   •  Net  blotch  in  barley  (ICARDA,  USDA)   •  Discriminant  analysis  (DA)   2  
  2. 2.   wild  tomato   tomato   teosinte   corn,  maize  
  3. 3. B   B   A   C   A   A   A   A   A   Crop  Wild  Rela'ves   Tradi'onal  landraces   Modern  cul'vars   Gene/c  bo1lenecks  during  crop  domes/ca/on  and  modern  plant  breeding.  The   circles  represent  allelic  varia'on.  The  funnels  represents  allelic  varia'on  of  genes   found  in  the  crop  wild  rela'ves,  but  gradually  lost  during  domes'ca'on,  tradi'onal   cul'va'on  and  modern  plant  breeding.  
  4. 4. •  Scien'sts  and  plant  breeders  want  a   few  hundred  germplasm  accessions   to  evaluate  for  a  par'cular  trait.   •  How  does  the  scien'st  select  a   small  subset  likely  to  have  the   useful  trait?   •  Example:  More  than  560  000  wheat   accessions  in  genebanks  worldwide.   6   Slide  adopted  from  a  slide  by  Ken  Street,  ICARDA  (FIGS  team)  
  5. 5. •  The  scien'st  or  the  breeder   need  a  smaller  subset  to  cope   with  the  field    screening   experiments.   •  A  common  approach  is  to   create  a  so-­‐called  core   collec/on.   Sir  O-o  H.  Frankel  (1900-­‐1998)   proposed  a  limited  set  or  "core   collec'on”  established  from  an   exis'ng  collec'on  with  minimum   similarity  between  its  entries.   The  core  collec'on  is  of  limited  size   and  chosen  to  represent  the  gene/c   diversity  of  a  large  collec'on,  a   crop,  a  wild  species  or  group  of   species  (1984)  .   7  
  6. 6. •  Given  that  the  trait   property  you  are   looking  for  is  rela'vely   rare:   •  Perhaps  as  rare  as  a   unique  allele  for  one   single  landrace  cul'var...   •  Geeng  what  you  want   is  largely  a  ques'on  of   LUCK!   8   Slide  adopted  from  a  slide  by  Ken  Street,  ICARDA  (FIGS  team)  
  7. 7. 9  
  8. 8. Wild  rela'ves  are  shaped     Primi've  cul'vated  crops   Tradi'onal  cul'vated  crops   by  the  environment   are  shaped  by  local   (landraces)  are  shaped  by   climate  and  humans   climate  and  humans   Modern  cul'vated  crops  are   Perhaps  future  crops  are   mostly  shaped  by  humans   shaped  in  the  molecular   (plant  breeders)   laboratory…?   10  
  9. 9.  Objec/ve  of  this  study:     –  Explore  climate  data  as  a  predic'on   model  for  “pre-­‐screening”  of  crop   traits  BEFORE  full  scale  field  trials.   –  Iden'fica'on  of  landraces  with  a   higher  probability  of  holding  an   interes'ng  trait  property.   11  
  10. 10. •  Primi/ve  crops  and  tradi/onal  landraces  are   an  important  source  for  novel  traits  for   improvement  of  modern  crops.   •  Landraces  are  oien  not  well  described  for   the  economically  valuable  traits.   •  Iden'fica'on  of  novel  crop  traits  will  oien   be  the  result  of  a  larger  field  trial  screening   project  (thousands  of  individual  plants).   •  Large  scale  field  trials  are  very  costly,  area   and  human  working  hours.   12  
  11. 11.  The  underlying  assump'on  of   FIGS  selec'on  is  that  the   climate  at  the  original  source   loca'on,  where  the  landrace   was  developed  during  long-­‐ term  tradi'onal  cul'va'on,  is   correlated  to  the  trait.      The  aim  is  to  build  a   computer  model  explaining   the  crop  trait  score  (dependent   variables)  from  the  climate  data   (independent  variables).   13  
  12. 12. 1)  Landrace  samples  (genebank  seed  accessions)   2)  Trait  observa'ons  (experimental  design)   3)  Climate  data  (for  the  landrace  loca'on  of  origin)   •   The  accession  iden'fier  (accession  number)  provides  the  bridge  to  the  crop  trait  observa'ons.   •   The  longitude,  la/tude  coordinates  for  the  original  collec'ng  site  of  the  accessions  (landraces)  provide  the   bridge  to  the  environmental  data.     14  
  13. 13. Alnarp,  Sweden   Lima,  Peru   Svalbard   Benin   15  
  14. 14. Faba  bean,  Finland   Field  trials,  Gatersleben,  Germany   Potato  Priekuli  Latvia   Forage  crops,  Dotnuva,  Lithuania   Radish  (S.  Jeppson)   Linnés  äpple   Powdery  Mildew,     Leaf  spots   Yellow  rust   Black  stem  rust   16   Blumeria  graminis   Ascochyta  sp.   Puccinia  strilformis   Puccinia  graminis   h-p://barley.ipk-­‐gatersleben.de    
  15. 15.  The  climate  data  is  extracted  from   the  WorldClim  dataset.    h-p://www.worldclim.org/      Data  from  weather  sta'ons   worldwide  are  combined    to  a   con'nuous  surface  layer.    Climate  data  for  each  landrace  is   Precipita'on:  20  590  sta'ons   extracted  from  this  surface  layer.   Temperature:  7  280  sta'ons   17  
  16. 16. FIGS  selec'on  is  a   new  method  to   predict  crop  traits   of  primi've   cul'vated  material   from  climate   variables  by  using   mul'variate   sta's'cal  methods.     18  
  17. 17. What is h-p://www.figstraitmine.org/     Mediterranean  region   Origin of Concept (1980s): Wheat and barley landraces from South  Australia   marine soils in the Mediterranean region provided genetic variation Slide made by for boron toxicity. Michael Mackay 1995 19  
  18. 18. FIGS    The  FIGS  technology  takes  much  of  the  guess   work  out  of  choosing  which  accessions  are  most   likely  to  contain  the  specific  characteris'cs  being   sought  by  plant  breeders  to  improve  plant   produc'vity  across  numerous  challenging   environments.        h-p://www.figstraitmine.org/     20   20  
  19. 19. Slide made by Michael Mackay 1995 21  
  20. 20. •  No  sources  of  Sunn  pest  resistance   previously  found  in  hexaploid  wheat.   •  2  000  accessions  screened  at  ICARDA   without  result  (during  last  7  years).   •  A  FIGS  set  of  534  accessions  was   developed  and  screened  (2007,  2008).     •  10  resistant  accessions  were  found!   •  The  FIGS  selec'on  started  from  16  000  landraces   from  VIR,  ICARDA  and  AWCC   •  Exclude  origin  CHN,  PAK,  IND  were  Sunn  pest  only   recently  reported  (6  328  acc).   •  Only  accession  per  collec'ng  site  (2  830  acc).   •  Excluding  dry  environments  below  280  mm/year   •  Excluding  sites  of  low  winter  temperature  below  10   degrees  Celsius  (1  502  acc)   Slide  adopted  from  Ken  Street,  ICARDA  (FIGS  team)   22  
  21. 21. 23  
  22. 22. –  The  ini'al  model  is  developed  from  the  training  set   –  Fine  tuning  of  model  parameters  and  seengs   –  No  model  can  ever  be  absolutely  correct   –  A  simula'on  model  can  only  be  an  approxima'on   –  A  model  is  always  created  for  a  specific  purpose   –  The  simula'on  model  is  applied  to  make   predic'ons  based  on  new  fresh  data   –  Be  aware  to  avoid  extrapola'on  problems   24  
  23. 23. –  For  the  ini'al  calibra'on  or   training  step.   –  Further  calibra'on,  tuning  step   –  Oien  cross-­‐valida'on  on  the   training  set  is  used  to  reduce  the   consump'on  of  raw  data.   –  For  the  model  valida'on  or   goodness  of  fit  tes'ng.   –  New  external  data,  not  used  in   the  model  calibra'on.   25  
  24. 24. 26  
  25. 25. 27  
  26. 26. 28  
  27. 27. Sta/on   Al/tude   La/tude   Longitude   Priekuli,  Latvia   83  m   57.3167   25.3667   Two  years:     •   2002   Bjørke  forsøksgård,  Norway   149  m   60.7667   11.2167   •   2003   Landskrona,  Sweden   3  m   55.8667   12.8333   29  
  28. 28. accide AccNum Country Locality Eleva/on La/tude Longitude Coordinate 7436 NGB27 Finland Sarkalahti, Luumäki 95 m 61.0333 27.3333 SESTO 9717 NGB456 Norway Dønna, Nordland 71 m 66.1167 12.5 Georeferenced 9601 NGB468 Norway Trysil 400 m 61.2833 12.2833 Georeferenced 9600 NGB469 Norway BJØRNEBY 400 m 61.2833 12.2833 Georeferenced 7966 NGB775 Sweden Överkalix, Allsån 45 m 66.4 22.9333 SESTO 8510 NGB776 Sweden Överkalix 100 m 66.4 22.7667 SESTO 7810 NGB792 Finland Luusua, Kemijärvi 145 m 66.4833 27.35 SESTO 9538 NGB2072 Norway Finset 1220 m 60.6 7.5 Georeferenced 8482 NGB2565 Sweden Öland 11 m 56.7333 16.6667 Georeferenced 9102 NGB4641 Denmark Støvring, Jylland 55 m 56.8833 9.8333 Georeferenced 9015 NGB4701 Faroe Islands Faroe Islands 81 m 62.0167 -6.7667 Georeferenced 9039 NGB6300 Faroe Islands Faroe Islands 81 m 62.0167 -6.7667 Georeferenced 8531 NGB9529 Denmark Lyderupgaard 9m 56.5667 9.35 Georeferenced 7344 NGB13458 Finland Koskenkylä, Rovaniemi 91 m 66.5167 25.8667 Georeferenced 30  
  29. 29. From  a  total  of  19  landrace   accessions  included  in  the  dataset,   only  4  of  the  landrace  accessions   included  geo-­‐referenced  coordinates   in  the  NordGen  SESTO  database.     10  accessions  were  geo-­‐referenced   from  the  reported  place  name  and   descrip'ons  of  the  original  gathering   site  included  in  SESTO  and  other   sources.     For  5  accessions  there  were  not   enough  informa'on  available  to   locate  the  original  gathering  loca'on.   Right  side  illustra.on     Example  of  georeferencing  for  NGB9529,  landrace  reported   as  originaGng  from  Lyderupgaard  using  KRAK.dk  and   maps.google.com   31  
  30. 30. 32  
  31. 31. 3     14   12   (loca'on  of  origin)   Climate  data  (mode  3):   14  landraces   •   Minimum  temperature   •   Maximum  temperature   •   Precipita'on   •   …  (many  more  layers  can  be  added)   12  monthly   means   Min.  temperature   Max.  temperature   Precipita'on   Jan,  Feb,  Mar,  …   Jan,  Feb,  Mar,  …   Jan,  Feb,  Mar,  …   14  samples   33  
  32. 32. 6      Mode  3   *  LVA  2002   *  LVA  2003     *  NOR  2002   28   6   *  NOR  2003   *  SWE  2002   14  landraces  (x2)      Mode  2  (Traits)     *  SWE2003   *  Heading  days   *  Ripening  days   *  Length  of  plant   *  Harvest  index   *  Volumetric  weight   6  traits   *  Grain  weight  (tgw)   Bjørke  (N)   Bjørke  (N)   Landskrona  (S)   Landskrona  (S)   Priekuli  (Lv)   Priekuli  (Lv)   2002   2003   2002   2003   2002   2003   6  traits   6  traits   6  traits   6  traits   6  traits   6  traits   28  records   34  
  33. 33. 35  
  34. 34. 36  
  35. 35. tmin   tmax   prec   Mode  3  (climate  variables)   Box  plot,  raw  data   have  very  different  range  of     numerical  values  (tmin,  tmax,   and  prec).  Scaling  across  mode   3  is  thus  applied  to  the  mul'-­‐ way  models.     Lei  is  displayed  the  box-­‐plot   for  the  3-­‐way  data  unfolded  as   tmin   tmax   prec   to  keep  the  dimensions  of   mode  3.   The  3-­‐way  climate  data  was   reasonably  well  described  by  a   PARAFAC  model  of  two   Scaling  across  mode  3     components.   37  
  36. 36. PARAFAC  split-­‐half   (mode  1)  analysis:   The  two  PARAFAC   models  each  calibrated   from  two  independent   split-­‐half  subsets,  both   converge  to  a  very   similar  solu'on  as  the   model  calibrated  from   the  complete  dataset.   The  PARAFAC  model  is   thus  a  general  and   stable  model  for  the   scope  of    Scandinavia.   38  
  37. 37. 39  
  38. 38. •  Oien  the  cri'cal  levels  (α)  for  the  p-­‐value  significance     is  set  as  0.05,  0.01  and  0.001.   •  For  the  modeling  of  14  samples  (landraces)  gives:   –  12  degrees  of  freedom  for  the  correla'on  tests  (mean  x,  y)   –  One-­‐tailed  test  (looking  only  at  posi've  correla'on  of   predic'ons  versus  the  reference  values).   –  A  coefficient  of  determina'on  (r2)  larger  than  0.56  is   significant  at  the  0.001  (0.1%)  level  for  14  values/samples.   Many  introductory  text  books  on  sta's'cs  include  a  table  of  Cri'cal  Values  for  Pearson’s  r.   40  
  39. 39. Heading   Ripening   Length   H-­‐Index   Vol  wgt   TGW   Priekuli  (L)   Bjorke  (N)   Landskrona  (S)     41  
  40. 40. LVA  (2002)   LVA  (2003)   NOR  (2002)   NOR  (2003)   SWE  (2002)   SWE  (2003)   42  
  41. 41. •  Latvia  2002  (LY11)   –  May  2002  was  extreme  dry  in  Priekuli.   –  June  2002  was  extreme  wet  in  Priekuli.   –  The  wet  June  caused  germina'on  on  the   spikes  for  many  of  the  early  varie'es.   •  Landskrona  2003  (LY32)   –  June  2003  was  extreme  dry  in  Landskrona.   –  June  was  the  'me  for  grain  filling  here.   •  Too  extreme  for  the  genotype  to  be   “normally”  expressed  ?   •  Too  large  effect  from  “G  by  E”   interac'on  ?   43  
  42. 42. Sowing   Rainfall  (mm)   Sta/on   Year   week   May   June   July   August   Bjørke  forsøksgård,  Norway   2002   17   82.9   67.4   128.5   136.5   2003   21   75.1   85.7   67.1   53.2   Landskrona,  Sweden   2002   13   53.5   75.3   76.4   68.9   2003   15   70.7   40.4   76.0   45.7   Priekuli,  Latvia   2002   17   38.2   111.1   67.0   11.3   2003   19   88.0   59.2   87.8   175.8   44  
  43. 43.         45  
  44. 44. 46  
  45. 45. 47  
  46. 46. •  The first dataset I started to work with is a “FIGS” dataset with genebank accessions of Barley (Hordeum vulgare ssp. vulgare) collected from different countries worldwide and tested for susceptibility of net blotch infection. Net blotch is a common disease of barley caused by the fungus Pyrenophora teres.   •  The barley plants were inoculated with the fungus and the percentage of the leaves infected with the disease was normalized to an interval scale (1 to 9). •  1-3 are basically resistant  group 1 •  4-6 are intermediate  group 2 •  7-9 are susceptible  group 3 48  
  47. 47. •  Agro-­‐clima'c  Zone  (UNESCO  classifica'on)   •  Soil  classifica'on  (FAO  Soil  map)   •  Aridity  (dryness)   •  Precipita'on   •  Poten'al  evapotranspira'on  (water  loss)   •  Temperature     •  Maximum  temperatures     •  Minimum  temperatures    (mean  values  for  month  and  year)   49  
  48. 48. Discriminant Analysis: obs_nb versus acz_moisture; ...   Quadratic Method for Response: obs_nb   Predictors: acz_moisture; acz_winter_temp; acz_summer_temp; arid_annual;  pet_annual; prec_annual; temp_annual; tmax_annual; tmin_annual   •  The  correctly  classified  groups   Group 1 2 3   for  the  training  dataset  was   Count 1049 1190 234   45.9%,  and  we  would  expect  a   Summary of classification   similar  success  rate  for  the   Put into Group 1 2 3   predic'on  of  the  “blinded”   1 523 427 48   values.   2 287 451 25   3 238 314 163   •  Remember  that  random   Total N 1048 1192 236   classifica'on  of  three  groups   N correct 523 451 163   Proportion 0,499 0,378 0,691   are:  33.3%   N = 2476 N Correct = 1137 •  A  test  set  of  9  samples   showed  a  propor'on  correct   Proportion Correct = 0,459     classifica'ons  of  44.4%   50  
  49. 49. Michael  Mackay   FIGS  coordinator   Ken  Street   FIGS  project  leader   Harold  Bockelman   Net  blotch  data   Eddy  De  Pauw   Climate  data   Dag  Endresen   Data  analysis   51  
  50. 50. 52  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×