Trait data mining using FIGS (2006)


Published on

Trait Mining, prediction of agricultural traits in plant genetic resources with ecological parameters. Focused Identification of Germplasm Strategy (FIGS). For the Vavilov seminars at the IPK Gatersleben 13th June 2007. Dag Endresen, Michael Mackay, Kenneth Street.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 13-JUN-2007 10:00 AM . Seminarraum der Genbank . Vavilov-Seminar . Pr ediction of agricultural traits in plant genetic resources with ecological parameters ; Dag Terje Filip Endresen Nordic Gene Bank, Alnarp, Sweden . Host: Dr. H. Knüpffer []
  • Some keywords; the main topics of the talk
  • Image:
  • Image: Wine grapes, field trip during a PGR Forum workshop at the Azores (2004). Photographer Dag Terje Endresen (NGB Picture Archive, image 003869)
  • Image: A field of sugar beet at Alnarp. Sugar B eet ( Beta vulgaris L.) . Photographer Dag Terje Endresen (NGB Picture Archive, image 003896)
  • [ 1 ] Hutchinson, G.E. (1957). Concluding Remarks. In Cold Spring Harbor Symposia on Quantitative Biology. 22: 415-42. [2] Sutton, T., R. de Giovanni, M. F. de Siqureira (2007). Introducing openModeller . OSGeo Journal volume 1, May 2007. [ ] Image: Söderåsen, Sweden, June 24 2006, Dag Endresen.
  • * Alercia A., Diulgheroff S., metz T. (2001). Multicrop Passport Descriptors. FAO, IPGRI. * Frankel O.H. (1984). Genetic perspectives of germplasm conservation. In Arber W., Illmensee K., Peacock W.J., Starlinger P. (eds.): Genetic Manipulation: Impact on Man and Society. Cambridge University Press, for ICSU Press, Cambridge U.K., 161-170. Frankel O.H. (1970): Preface. In Frankel O.H., Bennett E. (eds.): Genetic Resources in Plants – Their Exploration and Conservation. Oxford & Edinburgh, Intrnational Biological Programme & Blackwell Scientific Publications, 1-4. * Mackay , M. C., Von Bothmer R., & Skovmand B. 2005 . Conservation and utilization of genetic resources - what will happen in the future? 5th Triticea Symposium, Prague, Check Republic. * Image Centers of origin, This image is a work of a United States Department of Agriculture employee, taken or made during the course of the person's official duties. As a work of the U.S. federal government , the image is in the public domain .
  • Image: Wheat spikes, Dag Endresen, 2004-07-20 [ ] Illustration: Distribution of the FIGS core set . Yellow dots represent collection sites - 700 sites, 750 accessions, 52 countries represented. [ ]
  • We can actually begin to introduce “layers” of information for a range of environmental and geographic parameters. Here we can see precipitation data for collection sites. P
  • Just quickly explain the diagram.
  • These layers of information can be in the form of continuous surfaces of, for example, temperature, humidity, precipitation, salinity problems, and soil types. In this example we are looking at the probability of salinity being a problem based on soil types. This is recent work undertaken by ICARDA scientists, and the collection sites are for bread wheat landraces held by the AWCC. The great thing about GIS technology is that, in this case, we can directly assign a ‘salinity probability index’ on germplasm collected at each individual site. P
  • Here we can compare the two approaches geographically. As I said earlier, this is a work in progress - and we hope to have much more information about the results that the FIGS approach can produce within a few years. But it certainly represents a new way of combining biological and environmental information to benefit PGR utilization and plant improvement.
  • We can then use these layers of information as sieves, through which we can filter our accessions to identify those most likely to contain the genetic variation we are seeking. We can “tune in” the environments from which we want to sample germplasm. For example, we can specify annual precipitation, temperature during heading, humidity at another growth stage - and so on. Then we tip our collection of accessions in the top of the FIGS machine and wait for those that filter out the bottom. These ones we evaluate thoroughly.
  • OK , we have different types of PGR and a range of these are likely to provide the resources we need to breed cultivars in the future. The next question to ask is - Do we have sufficient PGR? Is our coverage of the available genetic variation good enough? Ex situ gene banks were a growth industry in the 1960s, 1970s and 1980s. Collecting missions were conducted in many parts of the globe, including Vavilov’s centres of diversity, and thousands of landraces and wild relatives were sampled and put into long term storage. But there does not seem to be a collated central register of what was sampled - at the accession level. So, it is still difficult to make definitive statements about how well we have achieved the goal of conserving genetic variation before it is displaced by modern cultivars. One way we can get an idea of how well we have sampled genetic variation is by using GIS, or geographic information system, technology. By mapping the sites where primitive varieties and wild relatives were collected we can quite easily provide an overview of the coverage achieved. It is also possible to include different layers of information, using this technology, to make decisions about collection coverage on an environmental basis as well as geographically. Dr Valkoun also demonstrated the value of GIS in his lecture on wild cereal relatives yesterday. This example shows how several gene banks can collaborate to build a ‘virtual’ primitive varieties (landraces) germplasm distribution map.
  • Text
  • Image: GBIF-MAPA banner (rotated 90°) [] Illustration: PYXIS grid system [ ]
  • The images are from the openModeller web site [ ]
  • Image: Barley accession from NGB, EURISCO and SINGER as input to the MaxEnt Niche Modeling algorithm, prediction of occurences, displayed with ArcView (created by Dag Endresen during the GBIF workshop on Niche Modeling, December 2004 in Kansas, US).
  • Image: Rheum x hybridum Murray , 2004, Photographer Gitte K. Björn . [ ]
  • Dynamic Evaluation Data Analyser [] Image: Solanum tuberosum L. Potato. Light sprout. Photographer NGB (NGB Picture Archive, image 001289).
  • Dynamic Evaluation Data Analyzer [ ] The original version is developed in SESTO [ ]
  • Dynamic Evaluation Data Analyzer. When a trait character is selected the results are displayed split on Site, Year, Taxon, Biological status of sample, Country of origin … or any other useful categories as defined by the responsible administrator. [ ]
  • Photo: Field been from Boreal, accession NGB11518, 2005-03-05, Dag Endresen []
  • Image: Rheum x hybridum Murray Rhubarb (2004). Photographer Gitte K. Björn (NGB Picture Archive, image 003683) Image: Rheum x hybridum Murray Rhubarb (2004). Photographer Gitte K. Björn (NGB Picture Archive, image 003714) Image: Brassica nigra (L.) W. D. J. Koch Black Mustard . Photographer Dag Terje Endresen (NGB Picture Archive, image 003840) [ ]
  • * IPGRI Descriptors lists [] (119 descriptor lists, 2005) * MCPD [] * UPOV - International Union for the Protection of New Varieties of Plants (UPOV) [] * UPOV - The International Union for the Protection of New Varieties of Plants or UPOV (French: Union internationale pour la protection des obtentions végétales) is an intergovernmental organization with headquarters in Geneva, Switzerland. [] * COMECON - The Council for Mutual Economic Assistance (COMECON / Comecon / CMEA / CEMA), 1949 – 1991, was an economic organisation of communist states and a kind of Eastern European equivalent to the European Economic Community. The military counterpart to the Comecon was the Warsaw Pact. [] * Multi-crop Passport Descriptors (MCPD) [] F AO (Food and Agricultural Organization of the United Nations) - IPGRI (International Plant Genetic Resources Institute). This is a revised version (December 2001) of the 1997 MCPD List. * FAO World Information and Early WarningSystem ( WIEWS) [] * 19 Plant Uses Categories based on categories developed for the Working Group on Taxonomic Databases (TDWG) (Cook, Frances E.M., 1995. Economic Botany: Data Collection Standard. Royal Botanic Gardens Kew). [] * The mapping of MCPD to ABCD was started in 2004 by Helmut Knüpffer and Walter Berendsohn, and continued by Javier de la Torre and Dag Terje Filip Endresen in 2005. [] [ ]
  • * Illustration: Corn earworm pupae that will be used to produce control parasites for release in the field. Photo by Scott Bauer. [] * UBIF is an attempt to define a common foundation for several TDWG/GBIF standards like SDD (see SDD WIKI), ABCD (see ABCD content schema homepage) or TaxonConceptNames (see Taxonomic Concept Transfer Schema WIKI). * Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. * Complex Types are part of the UBIF infrastructure (TDWG common complex type for several schemas, ABCD, SDD, TCS, Lnnean Core, etc.)
  • * The mapping of MCPD to ABCD was started in 2004 by Helmut Knüpffer and Walter Berendsohn, and continued by Javier de la Torre and Dag Terje Filip Endresen in 2005. [] [ http://www. bgbm .org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD. pdf ]
  • GCP_Passport v 1.03 [] The GCP Passport 1.03 descriptor standard is based on the MCPD and ABCD standards and implemented for the PyWrapper/BioCASE data exchange software. A mapping for automatic “upgrade” between ABCD 2.06 and GCP_Passport_1.03 is also included in the PyWrapper/BioCASE software. The Generation Challenge Programme is a research and capacity building network that uses plant genetic diversity, advanced genomic science, and comparative biology to develop tools and technologies that enable plant breeders in the developing world to produce better crop varieties for resource-poor farmers. []
  • OMG-LSR, Object Management Group – Life Science Research []
  • Solanum tuberosum L. Potato. Light sprout. Photographer NGB (NGB Picture Archive, image 001289).
  • Image: Field of wheat at Alnarp. Photographer Dag Terje Endresen (NGB Picture Archive, image 002981) Image: Spider in a spiderweb Image: Dag Terje Filip Endresen in Benin Image: Michael Mackay Michael Mackay: <>
  • Trait data mining using FIGS (2006)

    1. 1. Cover slide Utilization of Genetic Resources Prediction of agricultural traits in plant genetic resources with ecological parameters June 13, 2007, IPK Gatersleben Dag Terje Filip Endresen , Nordic Gene Bank (NGB), Sweden Michael Mackay , Australian Winter Cereals Collection (AWCC), Tamworth Agricultural Institute , NSW DPI, Australia Kenneth Street , Project Coordinator, Genetic Resource Unit, ICARDA
    2. 2. TOPICS <ul><li>Utilization of genetic resources: </li></ul><ul><li>Prediction of agricultural trait values with ecological parameters </li></ul><ul><li>Distributed information network, data standards, data exchange tools </li></ul>
    3. 3. Utilization <ul><li>Utilization of genetic resources </li></ul><ul><li>Strategies to improve the utilization of the accessions in the genebank collections are of high priority to increase the genetic diversity of the food crops for enhanced food security. </li></ul><ul><li>Data access, interoperability </li></ul><ul><li>Data mining tools </li></ul>
    4. 4. Utilization, data access <ul><li>Utilization of genetic resources, data access </li></ul><ul><li>Access to user-friendly, interoperable documentation on genetic resources across genebank collections could be a constraint for wider use of the genetic resources conserved in genebanks. </li></ul><ul><li>Today no global one-stop data portal to access genebank accessions from all parts of the world exists. In Europe the EURISCO search catalogue was developed. </li></ul><ul><li>Methods to improve data exchange with web services and enhanced interoperability with data standards and ontologies will be explored further. </li></ul><ul><li>(Examples: TDWG, GBIF, BioCASE, Bioversity International) </li></ul>
    5. 5. Utilization, tools <ul><li>Utilization of genetic resources, data mining tools </li></ul><ul><li>Perhaps the availability of powerful tools to analyze and find (mine) accessions with a higher probability to have attractive phenotypes for further crop improvement is another constraint for wider utilization. </li></ul><ul><li>The new methods under development for prediction of agrobotanical traits will be the main topic of this study. </li></ul><ul><li>This will involve the building of ecological niche models based on the ecological parameters of the site of origin for the genetic resources included in a study. </li></ul><ul><li>(FIGS, openModeller, GBIF-MAPA) </li></ul>
    6. 6. Trait mining <ul><li>Trait mining with ecological parameters </li></ul><ul><li>Landraces and other cultivars have been created during cultivation over centuries. The ecological parameters of this culture landscape takes part in forming the cultivated plant material. </li></ul><ul><li>When searching for suitable genebank accessions for a specific crop improvement program, a breeder may have thousands of candidate accessions. </li></ul><ul><li>Often only a small number of the candidate accessions have previously been screened for the relevant phenotypic trait characters. </li></ul><ul><li>Based on the assumption that these trait characters are correlated to the ecological attributes of the site of origin, trait values can be predicted in accessions not yet screened. </li></ul>
    7. 7. Trait mining <ul><li>Trait mining with ecological parameters </li></ul><ul><li>A subset of accessions with known character expression are used to build a “habitat signature” (niche model) to rank the probability of the different “habitats” to “produce” a valuable phenotypic trait. </li></ul><ul><li>This niche model is then applied to the ecological parameters of the origin sites to predict desired trait values. </li></ul><ul><li>The first studies within the FIGS project proved this assumption and generated very interesting results. </li></ul>
    8. 8. Ecological Niche <ul><li>The fundamental ecological niche of an organism was formalized by Hutchinson [1] in 1957 as a multidimensional hypercube defining the ecological conditions that allow a species to exist. </li></ul><ul><li>Full understanding of all the environmental conditions for any organism is a monumental task [2] . </li></ul><ul><li>Extrapolating of the occurrence localities together with selected associated environmental conditions such as rainfall, temperature, day length etc., an approximation of the fundamental niche can be made. </li></ul><ul><li>Some popular software implementations for modeling the ecological niche include BioCLIM, DesktopGARP, MaxEnt, etc. </li></ul>
    9. 9. Ecological Niche <ul><li>Ecological niche, trait mining </li></ul><ul><li>In this study of agrobotanical trait mining, correlation between the trait and this fundamental niche is assumed. </li></ul><ul><li>The distribution of the studied traits in the spatial space will be correlated to an approximated fundamental ecological niche, or “distribution in the ecological space”. </li></ul><ul><li>The ecological niche of a trait will be used to predict trait values in the studied dataset of genebank accessions. </li></ul>
    10. 10. Biological status of sample <ul><li>Types of plant genetic resources: </li></ul><ul><li>Wild relatives of crop species (MCPD: 100, 200) </li></ul><ul><li>Landraces , traditional cultivars (MCPD: 300) </li></ul><ul><li>Research material, genetic stocks (MCPD: 400) </li></ul><ul><li>Modern varieties, advanced cultivars (MCPD: 500) </li></ul><ul><li>Multi-Crop Passport Descriptors (SAMPSTAT). FAO, IPGRI (Alercia et al. 2001) Genetic Resources in Plants (Frankel, 1984, 1970) </li></ul><ul><li>It is more often the landraces and non-cultivated types of plant genetic resources that are targeted as sources of novel genetic variation for breeding activities, especially for overcoming biotic and abiotic stresses. (Mackay et al. , 2005; Harlan, 1977) </li></ul>
    11. 11. FIGS <ul><li>Focused Identification of Germplasm Strategy </li></ul><ul><li>The FIGS technology takes much of the guess work out of choosing which accessions are most likely to contain the specific characteristics being sought by plant breeders to improve plant productivity across numerous challenging environments. </li></ul><ul><li>The development of FIGS was a joint project involving The Australian Winter Cereals Collection ( AWCC ), Tamworth , Australia, the International Center for Agricultural Research in the Dry Areas ( ICARDA ) in Syria, and the N .  I . Vavilov Research Institute of Plant Industry ( VIR ) in St . Petersburg, Russia. </li></ul><ul><li>[ ] </li></ul><ul><li>[ ] </li></ul>
    12. 12. FIGS <ul><li>The Focused Identification of Germplasm Strategy (FIGS) exploits the relationships between genotype and environment to select sets of collected germplasm containing specified genetic variation. </li></ul><ul><li>The coordinates of the collection sites provide the link between germplasm and the environment where it evolved over millennia. </li></ul><ul><li>Using geographic information system (GIS) technology, each collection site can be individually profiled for available environmental parameters such as precipitation, humidity, temperature, ago-climatic zoning, and soil characteristics. </li></ul>
    13. 13.
    14. 14. Long-term average precipitation for all collection sites
    15. 15. Logical process Select Parents Identify the Problem Understand Problem Information & Knowledge Identify Likely Accs. Evaluate Sub Set Breeding & Selection Cultivar
    16. 16. VIR ICARDA AWCC ? USDA ? Database GIS Traits specific selection Figs Set Figs Set Figs Set Figs Set Figs Set Evaluation VIR ICARDA AWCC ? USDA IPK? Database GIS Trait-specific selection Figs Set Figs Set Figs Set Figs Set Figs Set Evaluation
    17. 17. FIGS salinity set
    18. 18. Core and FIGS drought sets  Core accessions  FIGS accessions
    19. 19. After M C Mackay 1995
    20. 20. Distribution of 17,000 bread wheat landraces ICARDA, Aleppo, Syria VIR, St Petersburg, Russia AWCC, Tamworth, Australia A virtual collection from these gene banks:
    21. 21. Origin of Concept : Boron toxicity of wheat and barley example of late 1980s FIGS What is F ocused I dentification of G ermplasm S trategy
    22. 22. Online web application <ul><li>A similar project of inspiration for the PhD research is the GBIF-MAPA. </li></ul><ul><li>The trait mining methods to be developed in the PhD study will be implemented as a public online web application as well as a downloadable desktop tool. </li></ul><ul><li>Users will be able to extract occurrence data from the GBIF index and environmental parameters for the sites in a similar manner as with the GBIF-MAPA application. </li></ul>
    23. 23. GBIF-MAPA <ul><li>GBIF-MAPA Mapping and Analysis Portal Application </li></ul><ul><li>Survey Gap Analysis . The survey gap analysis (SGA) tool helps you design a biodiversity survey that will best complement the existing survey effort by identifying those areas least well surveyed in terms of environmental conditions. </li></ul><ul><li>Species Richness Assessment . Use this tool to provide an estimate, from GBIF data, of the number of species recorded in an area; and to gain insight into the adequacy of sampling based on abundance distributions for those species. </li></ul><ul><li>Environment Values Extraction . Query a range of environmental layers (e.g. climate) using GBIF species record point data to create a table showing the environmental values at those points. This data can then be used in your own statistical analyses. </li></ul><ul><li>The main target of the GBIF-MAPA is users who have a focus on conservation planning and habitat conservation . </li></ul><ul><li>GBIF-MAPA is developed by researchers from </li></ul><ul><li>the Australian Museum, the University of Colorado </li></ul><ul><li>(Boulder, Colorado, USA) and the New South </li></ul><ul><li>Wales Department of Environment and Conservation </li></ul><ul><li>[] </li></ul>
    24. 24. openModeller The openModeller project aims to provide a flexible, user friendly, cross-platform environment where the entire process of conducting a fundamental niche modeling experiment can be carried out. The software includes facilities for reading species occurrence and environmental data, selection of environmental layers on which the model should be based, creating a fundamental niche model and projecting the model into an environmental scenario.   [ ] The project is currently being developed by the Centro de Referência em Informação Ambiental (CRIA) , Escola Politécnica da USP (Poli) , and Instituto Nacional de Pesquisas Espaciais (INPE) as an open-source initiative.
    25. 25. openModeller <ul><li>openModeller was initiated in 2003 by CRIA (Brazil). </li></ul><ul><li>Developed in C++ and cross-platform, MS Widows, Mac OS X, and Linux. </li></ul><ul><li>Open Source freely available under the GPL license. </li></ul><ul><li>A plug-in architecture and have today plug-in for a number of fundamental niche modeling algorithms. (Bioclim [ Bioclimatic Envelopes] , GARP, CSM [ Climate Space Model] , Environmental Distance and others) </li></ul><ul><li>There is a user-friendly desktop version, a web service API based on SOAP, a CGI application and a console interface for the command line. </li></ul><ul><li>Occurrence data can be retrieved directly from GBIF in the openModeller Desktop application to start a new experiment. </li></ul><ul><li>openModeller Desktop comes with a mapping module to visualize the predicted niche model (species) distribution on a map view. </li></ul>
    26. 26. <ul><li>Phenotype agricultural traits </li></ul>
    27. 27. Phenotype data <ul><li>NGB has developed a tool to preview evaluation and characterization data. © NGB, 2003, GPL 2.0, (Morten Hulden, 2001) </li></ul><ul><li>Dynamic Evaluation Data Analyzer: [] </li></ul><ul><li>When an individual trait character is selected the results are displayed split on </li></ul><ul><ul><li>variation by observation site </li></ul></ul><ul><ul><li>variation by observation year </li></ul></ul><ul><ul><li>variation in observed tax on </li></ul></ul><ul><ul><li>variation in observed biological status of sample </li></ul></ul><ul><ul><ul><li>(wild relative, landrace, advanced cultivar…) </li></ul></ul></ul><ul><ul><li>variation by country of origin </li></ul></ul><ul><ul><li>variation in observed accessions </li></ul></ul><ul><li>… or any other useful categories </li></ul><ul><li>as defined by the responsible </li></ul><ul><li>administrator. </li></ul>
    28. 28. Phenotype
    29. 29. <ul><li>Template slide </li></ul>
    30. 30. … screen dump cropped Screen dump continued…
    31. 31. <ul><li>Data Standards </li></ul>
    32. 32. TDWG :: SDD <ul><li>Structured Descriptive Data </li></ul><ul><li>In taxonomy, descriptive data takes a number of very different forms. </li></ul><ul><li>Natural-language descriptions are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment. </li></ul><ul><li>The goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data in all the forms shown above, using a platform- and application-independent, international standard. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases. </li></ul><ul><li>Hagedorn, G.; Thiele, K.; Morris, R. & Heidorn, P. B. 2005. The Structured Descriptive Data (SDD) w3c-xml-schema, version 1.0. [ http://www. tdwg .org/standards/116/] . [Retrieved 05-May-2007 ] </li></ul><ul><li>[ ] </li></ul>
    33. 33. Crop Descriptors <ul><li>The Bioversity International (IPGRI) crop descriptors are developed to standardize characterization and evaluation data – called “descriptive data” in TDWG context. </li></ul><ul><li>The MCPD (Multi Crop Passport Descriptors) is designed to standardize &quot;passport data&quot; across crops. It enables compatibility with the crop specific descriptor lists and the FAO World Information and Early Warning System (WIEWS) and serves as a basis for data exchange. </li></ul><ul><li>The MCPD descriptor list was made fully compatible with ABCD 2.06 </li></ul>
    34. 34. Taxonomic Database Working Group <ul><li>Darwin Core 2 - Element definitions designed to support the sharing and integration of primary biodiversity data&quot;. [] </li></ul><ul><li>Access to Biological Collection Data (ABCD) 2.06 - An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“. </li></ul><ul><li>[] </li></ul>
    35. 35. ABCD A ccess to B iological C ollection D ata <ul><li>ABCD is a common data specification for data on biological specimens and observations (including the plant genetic resources seed banks). </li></ul><ul><li>The design goal is to be both comprehensive and general (about 1200 elements). </li></ul><ul><li>Development of the ABCD started after the 2000 meeting of the TDWG. </li></ul><ul><li>ABCD was developed with support from TDWG/CODATA , ENHSIN, BioCASE, and GBIF. </li></ul><ul><li>The MCPD descriptor list is completely mapped and compatible to ABCD 2.06 </li></ul>
    36. 36. Generation Challenge Programme, GCP_Passport_1.04 <ul><li>The Generation Challenge Programme is a research and capacity building network that uses plant genetic diversity to produce better crop varieties for resource-poor farmers. </li></ul><ul><li>In the context of the GCP (Generation Challenge Programme), the GCP Passport data exchange schema was developed. </li></ul>
    37. 37. W3C :: RDF <ul><li>Resource Description Framework </li></ul><ul><li>Scenario: You have a dataset of genebank accessions with pointers to the source datasets of the holding genebanks. You produce phenotypic evaluation data on accessions in this dataset. You find evaluation data from other sources on some of the accessions in your dataset. Some of the evaluation data are produced in areas of different day length, rainfall, soils… Some of the accessions in your dataset originate from areas of higher population densities; other accessions originate from more natural habitats. Unfortunately most of the different sources of information are located on different web sites and it is difficult to bring the information together. </li></ul><ul><li>You would need to go through more or less the same process as other researchers in many domains of gathering heterogeneous data from multiple sources, combining and analysing it. This is the challenge that faces the web as a whole and is being addressed by the Semantic Web project. </li></ul><ul><li>RDFs can assist you to relate information from different sources. </li></ul><ul><li>A RDF triplet looks like this: subject-predicate-object </li></ul><ul><li><rdf:Description rdf:about=&quot;;> </li></ul><ul><li> <dc:creator>John Smith</dc:creator> </li></ul><ul><li></rdf:Description> </li></ul>anytime approximate case study diagnosis inconsistent kads banana apples stem color knowledge based systems knowledge level knowledge management knowledge representation LSID accession number GUID unitID ontology owl parametric design Full Scientific Name peer to peer systems problem solving landrace traditional cultivar 300 methods rdf rdf WEB2 ABCD SDD semantic web semantics specification languages web based web ontology INSTCODE plant genetic resources germplasm agricultural traits Aegilops
    38. 38. <ul><li>Life Science IDentifiers </li></ul><ul><li>LSID is a digital name tag. </li></ul><ul><li>LSIDs are GUIDs, Global Unique Identifiers. </li></ul><ul><li>[] </li></ul><ul><li>Structure urn:lsid: authority : namespace : object : revision </li></ul><ul><li>Example (fictive) </li></ul><ul><li>The LSID concept introduces a straightforward approach to naming and identifying data resources stored in multiple, distributed data stores . </li></ul><ul><li>LSID define s a simple, common way to identify and access biologically significant data ; whether that data is stored in files, relational databases, in applications, or in internal or public data sources, LSID provides a naming standard to support interoperability. </li></ul><ul><li>Developed by OMG-LSR and W3C, implemented by IBM. </li></ul>W3C/TDWG :: LSID
    39. 39. <ul><li>Biodiversity data exchange tools </li></ul>
    40. 40. Data Provider Software <ul><li>DiGIR , Di stributed G eneric I nformation R etrieval. [] </li></ul><ul><li>PyWrapper, based on the BioCASE Python wrapper software. </li></ul><ul><li>[] </li></ul>
    41. 41. Decentralized model
    42. 42. Outlook <ul><li>The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community. </li></ul><ul><li>Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work. </li></ul><ul><li>Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections. </li></ul><ul><li>The same data sharing methods can also be applied to germplasm trait data. Agrobotanical phenotype characters could be described by common global standard and shared with the same data exchange tools as the accession passport data. </li></ul>
    43. 43. Thank you for listening!