GCP GRM Theme Reports - Crop InformationProgress on Data Management1. We have established a good network of crop informatics curators who form the nucleusof an Informatics COP which will have a social space on the IBPortal. There is a lotgoing on for the curation of public Crop Information for each crop.2. Arllet, Clarissa and I are working steadily to make that public information accessible bydownload or on-line queerly tools through the IB Portal.3. The RI groups are doing a good job of inventorying data sets which are available or willbe available from GCP research. There are several options available for publication ofthat data:a) By upload to Public CI databases by the Crop Curator of the CLC.b) By season through GCP for help with formatting before upload.c) By publication in data-type specific repositoriesd) By deposition in the GCP central registry.4. We are making progress on the IBP tools and prototypes are available for download fromthe IBPortal. Updates will constantly be posted there and there are facilities there forfeedback, bug reports and feature requests.5. We have distributed the IB Fieldbook to many of you – this is still a prototype for yourfeedback. We anticipate the first release with full functionally for storage and retrieve ofdata to IB database in the next days.6. We have chosen the Galaxy Tablet for evaluation for data collection in the field. Thisupdates the IPAQ and Honeywell devices some of you tested last year and addressesmany of the concerns you all had with those tools - screen size, battery life, weight etc.The down side is that the modern tools use the Android operating system and this meansthat we have to catch-up in re-programing the fieldlog to run on the new system.Prototypes of Fieldroid will the available from the IBPortal.Report on the Brainstorming Sessions1. Data most useful for acceleration crop improvementPedigree information and evaluation data are most valuable to breeders for choosing parentsand focusing crop improvement. The evaluation data needs to have characterization data forbiophysical conditions and biotic constraints. Breeders would prefer peer reviewed data butwould use data from trusted colleagues. Breeders’ recommendations could also provideuseful guidance.Fingerprinting data on elite lines would be valuable for diversity maintenance, and genotypicdata for known genes and validated markers would be useful for trait selection.2. Publication of Crop Information through the Crop Lead Centers
The CLCs present in the discussions recognized a responsibility to support crop improvementfor mandate crops in any way possible including integration and publication of cropinformation from external partners.Partners present recognized the advantage of seeing their own data integrated with publicinformation but had concerns about policy and authority to publish through the CLCs.Incorporation of the reformed CGIAR with global international status could facilitate asolution to the policy issues in some cases.3. Management and sharing of geo-referenced and time indexed dataThe importance and usefulness of biophysical characterization of evaluation sites wasreaffirmed. It was recognized that data is often anecdotal or separated from the evaluationdata so there is a need to take more care in storing and publishing this data.Use and publication of socioeconomic data was found to be woefully inadequate and mostgroups had at least one horror story about expensive technology which was not adoptablebecause of socioeconomic constraints which should have been seen at an early stage intechnology development – including crop improvement.Informatics meetings1. Crop Ontologies and Trait DictionariesThe first task is to establish the trait dictionary, this might need some work in order to reviewand agree on all the traits that are there, but once this is done there will be not much neededto be done, maybe one or two new traits per season.The second task is to create the trait templates, which is a job that will be performed byinformatics people in collaboration with breeders. This task will be performed at the planningstage of a project.An ontology of traits may be useful in upstream biological research, but not at the field level,because it lacks two key components: the protocols for measuring those traits and the scalesor units in which those traits are reported in the field books. So, if you add these twocomponents to the crop ontology you create a crop trait dictionary.a) Multilingual capabilities: It is needed Templates must indicate in which language the traits are expressed in Template must contain trait identifier column when updating existing traits withother language definitions. Requests in API should indicate the language
b) Methods & scales ICIS groups all three together as a TRIPLET in the variate table. How do we handle ICIS variate tables? Variate entries are created only whenassociating data to the trait/method/scale = triplet. The ontology makes trait granpa of method pa and scale son. When annotating data, it is the scale identifier that will be used, since it implies aspecific method and trait.c) Ontology structure Traits should not be repeated, e.g. plant height should not be duplicated for eachcrop, but it should belong to a single ontology. Each trait, category, method and scale is identified in a collection, withoutrelationships. Working terms (combination of trait, method and scale under a specific crop) arenodes of a tree referencing terms described in the previous point. Each node of the tree represents a combination of terms that is unique, so thatcassava plant height method will have an identifier different from coconut plantheight method.d) Synchronization between the tool and fieldbook/database: Workflow step 1 is user driven as the Trait dictionary is new – manual annotation– Local data dictionary – upload of the Trait Dictionary on our tool and generatethe annotated template to be Uploaded in the local database Feature beingdeveloped to directly upload form the Trait Dictionary into the tool with a directdisplay as a tree- mapping template columns to allocate the info under Trait,methods, scales, add predicates and add attributes. At the installation of a new local database, we can have an automaticsynchronization of the Trait dictionary with the ontology and populate the newDB in the proper language A new trait central Database synchronization of central Database -> theannotation will be given back to the local DB next iteration2. Technical Issues for curation and use of trait dictionariesThere was a discussion on handling the triple terms - trait, scale and method. We have idsfor each component as well as ids for all valid triplets.3. Data UploadingWilliam Eusebio from IRRI gave a tutorial to Crop Curators on the procedure foruploading data from a local databse to the central database. William’s procedures can beused as a sequence of SQL macros, and they are being integrated into an Administrator’sTool by Shawn Yates from AAFC. The prototype for which is available on CropForge:4. Analytical Pipeline.a. Establish a source repository an crop forge (for R-scripts, and Java waveinterface)b. Relocate the prototype wave interface ASAP for feedback objective is to have amenu system into which R-scripts can we plugged easily.
c. Specify Standards for data structure and parameters passing for R-scriptdevelopers.d. Develop data access facilities for reading standard tools and datasets from thedatabase.5. Simulationa. Development of a training moduleb. Explore opportunities for simulation of strategic and tactical issues with IBP usecases.Future direction for the Crop Information ThemeAs with most aspect of GCP activities the direction of the last laps of phase II are more or lessset, and there have been no particular issues raised at GRM to require major changes:1. Informatics InfrastructureDevelop and deploy the Configurable Workflow System2. User SupportSupport the use of GCP tools and best practices including standards (ontology, traitdictionaries) data management, analysis, decision support for traditional and emergingtechnologies (NGS and GWS)3. Researcha. Tools and methods for emerging technologies (NGS, GWS)b. Improve statistical analysis (mapping, QTL analysis, QLTXE, Selection Indices)c. Blups for haplotypes in GWSd. Trait dictionaries and ontologies.