TLI 2012: Data management for bean researchPresentation Transcript
DATA MANAGEMENT - BEANS ALBERTO FABIO GUERRERO CIAT Annual Meeting TLI-Phase 2 Addis Ababa, Ethiopia 7-11 May
AGENDA1. ONTOLOGY IN BEAN2. TL1 DATA
1. ONTOLOGY IN BEAN
BACKGROUND• The project to develop the ontology for beans was initially lead by Dr. Matthew Blair and Juana Cordoba was in charge of this. With the departure of Dr. Blair and Juana, Dr. Steve Beebe is leading this activity and I have been responsible to coordinate for the past one year.• Until December 2011,a total of 140 traits were described by the bean team. The process included the scientists in the disciplines of breeding, pathology, entomology, virology and physiology.
RANKING• In December 2011, GCP decided to rank the 140 traits and based on this ranking to establish a list of 50 most commonly used traits that will be included initially in the IB-Fieldbook.• Fernando Rojas (data manager consultant) helped in this process. Members of the bean community of practice created in October-2011 in Malawi participated in the ranking process.• Ranking of traits was based on the frequency of use in breeding (1 = always used, 2 = sometimes used, 3 = seldom used, and 4 = never used so far)
RANKING BEAN TRAITS RANKING SUMMARY TOT- TRAITS MORPHOLOGICAL AGRONOMIC BIOTIC STRESS ABIOTIC STRESS QUALITY PASSPORT RANKING 23 7 3 7 4 0 2 1 60 20 1 5 22 2 10 2 57 20 1 14 10 6 6 3, 4 140 47 5 26 36 8 18 140• According to the ranking, 23 were classified as the traits most used, 60 traits as sometimes used and 57 traits as seldom used. Of the first 2 groups (83 in total) were further prioritized to 60 by the bean team for the IB-Fieldbook.
Crop Ontology CoP Workshop• In March of this year we had a meeting in Rome "Crop Ontology Community of Practice Workshop", where the traits were classified in two categories. The traits most used were called Primary Traits that will be used in Fieldbook (60 traits) and the rest were called secondary Traits. The full documentation for the primary traits should be completed in May.
Some traits divided• After this meeting, it was decided by the bean team to subdivide some traits. For example, all disease traits were divided into greenhouse and field evaluations; and each of them for their reaction in leaves and pods.• Anthracnose was divided in Anthracnose on leaves in greenhouse, Anthracnose on leaves in field, Anthracnose on pods in field, given the importance of maintaining the origin of data and the plant organ, to set the context of the data.
Traits subdivided OLD TRAIT NEW TRAIT Angular Leaf Spot on leaves in field Angular Leaf Spot Angular Leaf Spot on leaves in greenhouse Angular Leaf Spot on pods in field Anthracnose on leaves in field Anthracnose Anthracnose on leaves in greenhouse Anthracnose on pods in field Common Bacterial Blight on leaves in field Common Bacterial Blight on leaves in Common Bacterial Blight greenhouse Common Bacterial Blight on pods in field Fusarium solani in field Fusarium solani Fusarium solani in greenhouse Halo blight on leaves in field Halo blight Halo blight on leaves in greenhouse Halo blight on pods in field Pythium spp. in field Pythium spp Pythium spp. in greenhouse•The above disease traits were subdivided
70 primary traits• In total 70 primary traits were defined (before it was 60). Trait Class Number Morphological 10 Agronomic 4 Biotic stress 22 Abiotic stress 25 Quality 3 Passport 6 Total 70 • The template for Fieldbook with these 70 traits (description, scale, method) was provided to GCP in April.
Other tasks defined in Rome Workshop:May:• In parallel with the English version of the primary traits (70), we are working on a Spanish version, which will be delivered.October:• Translation into French – Primary Traits• Translation into Portuguese - Primary Traits• Complete documentation of all methods and scales (English and Spanish) – Secondary TraitsDecember:• Translation into French – Secondary Traits• Translation into Portuguese - Secondary Traits
Phenotypic Data• Phenotypic Data are present in activities 1,4 and 5 (28 datasets).• We have focused in the curation of these Excel files. All our phenotypic datasets include primary and secondary traits. As was mentioned previously, our priority has been to have the 70 primary traits in English and Spanish. At present, these traits are fully documented (unit, method, synonym, references) and will be delivered to GCP in May.• Other traits (secondary traits) have not been fully reviewed in detail by the bean team and these will be ready for October as was decided in Rome. It is extremely important to have completely revised trait dictionary-ontology (primary and secondary traits), before uploading information to the database.
Phenotypic DataSome problems encountered•Checking file by file, we have found that the majorityof these traits are not documented adequately. Manytrait names are not fully defined, lacking unit and plotarea that are important for conversions, etc.•Also, we found that investigators use different unitsfor some traits. These need to be standardized.•Many new traits (up to 20) will be added to the traitdictionary and these will be fully defined.
Phenotypic Data• As mentioned before, we renamed variables, done conversions for many traits to synchronize with the ontology (units), added traits to the ontology.• To complete this work and have all files fully curated, it is essential for bean team to complete the task of fully defining the secondary traits.• For all the above reasons, it was decided to postpone the uploading of the information to the IPHIS database.
Next steps for Phenotypic Data• Upload all the datasets completed from Bean trials by December, after completing trait dictionary.
Genotypic Data• Genotypic datasets (23) are assembled from activities 2 and 3.• We are curating these files.Next steps• Upload all the datasets in status completed in GDMS by February 2013