1. GCP GRM Theme Reports - Crop Information
Progress on Data Management
1. We have established a good network of crop informatics curators who form the nucleus
of an Informatics COP which will have a social space on the IBPortal. There is a lot
going on for the curation of public Crop Information for each crop.
2. Arllet, Clarissa and I are working steadily to make that public information accessible by
download or on-line queerly tools through the IB Portal.
3. The RI groups are doing a good job of inventorying data sets which are available or will
be available from GCP research. There are several options available for publication of
that data:
a) By upload to Public CI databases by the Crop Curator of the CLC.
b) By season through GCP for help with formatting before upload.
c) By publication in data-type specific repositories
d) By deposition in the GCP central registry.
4. We are making progress on the IBP tools and prototypes are available for download from
the IBPortal. Updates will constantly be posted there and there are facilities there for
feedback, bug reports and feature requests.
5. We have distributed the IB Fieldbook to many of you – this is still a prototype for your
feedback. We anticipate the first release with full functionally for storage and retrieve of
data to IB database in the next days.
6. We have chosen the Galaxy Tablet for evaluation for data collection in the field. This
updates the IPAQ and Honeywell devices some of you tested last year and addresses
many of the concerns you all had with those tools - screen size, battery life, weight etc.
The down side is that the modern tools use the Android operating system and this means
that we have to catch-up in re-programing the fieldlog to run on the new system.
Prototypes of Fieldroid will the available from the IBPortal.
Report on the Brainstorming Sessions
1. Data most useful for acceleration crop improvement
Pedigree information and evaluation data are most valuable to breeders for choosing parents
and focusing crop improvement. The evaluation data needs to have characterization data for
biophysical conditions and biotic constraints. Breeders would prefer peer reviewed data but
would use data from trusted colleagues. Breeders’ recommendations could also provide
useful guidance.
Fingerprinting data on elite lines would be valuable for diversity maintenance, and genotypic
data for known genes and validated markers would be useful for trait selection.
2. Publication of Crop Information through the Crop Lead Centers
2. The CLCs present in the discussions recognized a responsibility to support crop improvement
for mandate crops in any way possible including integration and publication of crop
information from external partners.
Partners present recognized the advantage of seeing their own data integrated with public
information but had concerns about policy and authority to publish through the CLCs.
Incorporation of the reformed CGIAR with global international status could facilitate a
solution to the policy issues in some cases.
3. Management and sharing of geo-referenced and time indexed data
The importance and usefulness of biophysical characterization of evaluation sites was
reaffirmed. It was recognized that data is often anecdotal or separated from the evaluation
data so there is a need to take more care in storing and publishing this data.
Use and publication of socioeconomic data was found to be woefully inadequate and most
groups had at least one horror story about expensive technology which was not adoptable
because of socioeconomic constraints which should have been seen at an early stage in
technology development – including crop improvement.
Informatics meetings
1. Crop Ontologies and Trait Dictionaries
The first task is to establish the trait dictionary, this might need some work in order to review
and agree on all the traits that are there, but once this is done there will be not much needed
to be done, maybe one or two new traits per season.
The second task is to create the trait templates, which is a job that will be performed by
informatics people in collaboration with breeders. This task will be performed at the planning
stage of a project.
An ontology of traits may be useful in upstream biological research, but not at the field level,
because it lacks two key components: the protocols for measuring those traits and the scales
or units in which those traits are reported in the field books. So, if you add these two
components to the crop ontology you create a crop trait dictionary.
a) Multilingual capabilities:
It is needed
Templates must indicate in which language the traits are expressed in
Template must contain trait identifier column when updating existing traits with
other language definitions.
Requests in API should indicate the language
3. b) Methods & scales
ICIS groups all three together as a TRIPLET in the variate table.
How do we handle ICIS variate tables? Variate entries are created only when
associating data to the trait/method/scale = triplet.
The ontology makes trait granpa of method pa and scale son.
When annotating data, it is the scale identifier that will be used, since it implies a
specific method and trait.
c) Ontology structure
Traits should not be repeated, e.g. plant height should not be duplicated for each
crop, but it should belong to a single ontology.
Each trait, category, method and scale is identified in a collection, without
relationships.
Working terms (combination of trait, method and scale under a specific crop) are
nodes of a tree referencing terms described in the previous point.
Each node of the tree represents a combination of terms that is unique, so that
cassava plant height method will have an identifier different from coconut plant
height method.
d) Synchronization between the tool and fieldbook/database:
Workflow step 1 is user driven as the Trait dictionary is new – manual annotation
– Local data dictionary – upload of the Trait Dictionary on our tool and generate
the annotated template to be Uploaded in the local database Feature being
developed to directly upload form the Trait Dictionary into the tool with a direct
display as a tree- mapping template columns to allocate the info under Trait,
methods, scales, add predicates and add attributes.
At the installation of a new local database, we can have an automatic
synchronization of the Trait dictionary with the ontology and populate the new
DB in the proper language
A new trait central Database synchronization of central Database -> the
annotation will be given back to the local DB next iteration
2. Technical Issues for curation and use of trait dictionaries
There was a discussion on handling the triple terms - trait, scale and method. We have ids
for each component as well as ids for all valid triplets.
3. Data Uploading
William Eusebio from IRRI gave a tutorial to Crop Curators on the procedure for
uploading data from a local databse to the central database. William’s procedures can be
used as a sequence of SQL macros, and they are being integrated into an Administrator’s
Tool by Shawn Yates from AAFC. The prototype for which is available on CropForge:
4. Analytical Pipeline.
a. Establish a source repository an crop forge (for R-scripts, and Java wave
interface)
b. Relocate the prototype wave interface ASAP for feedback objective is to have a
menu system into which R-scripts can we plugged easily.
4. c. Specify Standards for data structure and parameters passing for R-script
developers.
d. Develop data access facilities for reading standard tools and datasets from the
database.
5. Simulation
a. Development of a training module
b. Explore opportunities for simulation of strategic and tactical issues with IBP use
cases.
Future direction for the Crop Information Theme
As with most aspect of GCP activities the direction of the last laps of phase II are more or less
set, and there have been no particular issues raised at GRM to require major changes:
1. Informatics Infrastructure
Develop and deploy the Configurable Workflow System
2. User Support
Support the use of GCP tools and best practices including standards (ontology, trait
dictionaries) data management, analysis, decision support for traditional and emerging
technologies (NGS and GWS)
3. Research
a. Tools and methods for emerging technologies (NGS, GWS)
b. Improve statistical analysis (mapping, QTL analysis, QLTXE, Selection Indices)
c. Blups for haplotypes in GWS
d. Trait dictionaries and ontologies.