Grassbase: the data volume challenge Maria Vorontsova 26 May 2011
Grassbase: the first botanical agglomerate database behemoth? Strengths :  coherent classification system complete and up ...
Grass genera are defined by the variability in spikelet  and floret composition pooid spikelet  similar to Bromus  (Holcus...
FAO 2009: global consumption of 10 major vegetal foods (2003-2005)
Grass taxonomy at Kew Otto Stapf: Flora of Tropical Africa. 1934. Charles Hubbard: Grasses (of Britain). 1954. C.R. Metcal...
 
1985 Lazarides, Clayton & Palmer World Grass Species 600 characters 2011 Grassbase GrassWorld 1017 characters 1977 Cliffor...
NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACC...
Access SYNON groups names into homotypic groups
an average of 88 pieces of information per species in DELTA descriptive language
species description webpages linked by a single index page
NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACC...
NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACC...
NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACC...
NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACC...
Grassbase: coding a new species
Grassbase: coding a new species
Protolgue = 28 characters Items file = 87 characters Grassbase: coding a new species Program “Check”  confirms internal co...
Protolgue = 28 characters Items file = 87 characters Grassbase: coding a new species
Recent changes in grass names affect common species Bromus sterilis  Anisantha sterilis
“ Panicum” as used by Grassbase includes numerous evolutionary lineages with simple panicoid spikelets
widening gap between morphological and phylogenetic: ca. 15% in species and generic names Grassbase: The Kew View, an auth...
Upcoming SlideShare
Loading in …5
×

Grassbase: the data volume challenge

874
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
874
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Grassbase: the data volume challenge

  1. 1. Grassbase: the data volume challenge Maria Vorontsova 26 May 2011
  2. 2. Grassbase: the first botanical agglomerate database behemoth? Strengths : coherent classification system complete and up to date Weaknesses : divided stakeholder community dated software poor web functionality limited usefulness for identification no plan for data exploitation CATE Araceae 2,000 descriptions Solanaceae Source 1,000 descriptions Grassbase 11,161 descriptions area of boxes proportional to number of descriptions included
  3. 3. Grass genera are defined by the variability in spikelet and floret composition pooid spikelet similar to Bromus (Holcus) panicoid spikelet common in tropical grasses (Panicum) glume 2 lower lemma glume 1 upper palea upper lemma lower palea
  4. 4. FAO 2009: global consumption of 10 major vegetal foods (2003-2005)
  5. 5. Grass taxonomy at Kew Otto Stapf: Flora of Tropical Africa. 1934. Charles Hubbard: Grasses (of Britain). 1954. C.R. Metcalfe: Anatomy of the Monocotyledons: Gramineae. 1960. N.L. Bor: The Grasses of India. 1960. Derek Clayton: Flora of Tropical East Africa. 1970. Flora of West Tropical Africa. 1972. Genera Graminum. 1986. World Grass Flora: “The Kew View” 1985 onwards.
  6. 7. 1985 Lazarides, Clayton & Palmer World Grass Species 600 characters 2011 Grassbase GrassWorld 1017 characters 1977 Clifford & Watson Australian Grass Genera 332 characters Evolutionary reconstruction of DELTA grass datasets 1992 Watson & Dallwitz Grass Genera of the World Australian National University morphological data Clayton Genera Graminum dataset 25 years full time data entry
  7. 8. NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACCEPTED SPECIES 11,000 ACCEPTED GENERA 700
  8. 9. Access SYNON groups names into homotypic groups
  9. 10. an average of 88 pieces of information per species in DELTA descriptive language
  10. 11. species description webpages linked by a single index page
  11. 12. NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACCEPTED SPECIES 11,000 ACCEPTED GENERA 700
  12. 13. NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACCEPTED SPECIES 11,000 ACCEPTED GENERA 700 TRIBES TYPES species + infra AREAS TDWG countries
  13. 14. NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACCEPTED SPECIES 11,000 ACCEPTED GENERA 700 programs to check coding and tidy descriptions programs for renaming files programs to output natural language and INTKEY files ca. 50 permanent and temporary queries buttons to output species and generic lists code to check data presence and spelling within and between tables simplified version for website generated with one button special program for putting name lists into two columns!
  14. 15. NAMES species + infra 63,000 NAMES generic 2,000 DESCRIPTIONS species DESCRIPTIONS genera INTKEY species INTKEY genera ACCEPTED SPECIES 11,000 ACCEPTED GENERA 700 TRIBES TYPES species + infra AREAS TDWG countries IPNI updates for newly described names only coding from literature holding files for new items custom set of programs to sync Access and DELTA multi-stage import procedure via a series of tables macros for different data type imports
  15. 16. Grassbase: coding a new species
  16. 17. Grassbase: coding a new species
  17. 18. Protolgue = 28 characters Items file = 87 characters Grassbase: coding a new species Program “Check” confirms internal consistency of data
  18. 19. Protolgue = 28 characters Items file = 87 characters Grassbase: coding a new species
  19. 20. Recent changes in grass names affect common species Bromus sterilis Anisantha sterilis
  20. 21. “ Panicum” as used by Grassbase includes numerous evolutionary lineages with simple panicoid spikelets
  21. 22. widening gap between morphological and phylogenetic: ca. 15% in species and generic names Grassbase: The Kew View, an authoritative system with morphologically defined genera Grass Phylogeny Working Group, GrassWorld, and others: multiple research groups in USA and Australia
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×