Kampmeier ecn 2012


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • First a little background…
  • We were fortunate to have two rounds of funding for this project on a medium-sized family of flies. We trained dipterists that are contributing their expertise even today, and continuing to work on the family Therevidae as well as other Diptera.
  • For better or worse, not yet.
  • The main part of the work, which has consumed many person hours to enter and verify is in the Mandala database devoted to the Therevidae.
  • Find all specimens with valid names and a localityID
  • You cannot create style sheets in Acrobat
  • Photo of Kevin,
  • A spreadsheet is not flexible, neither is a field notebook or index cards
  • But not just the community, the individual also needs to see and embrace this for him or herself
  • All this goes on in the background, once you have indicated which taxa you want to delimit.
  • Kampmeier ecn 2012

    1. 1. Catalog magic: Behind the Scenes of Creating a World Catalog of the Therevidae Gail E. Kampmeier Illinois Natural History Survey, Prairie Research Institute University of Illinois at Urbana-Champaign gkamp@illinois.edu Irina Brake National Museum of Natural History, London Kristin Algmin University of Illinois at Urbana-Champaign
    2. 2. Why is it so Difficult to get from Here… to… Here? Therevidae
    3. 3. What Would Taxonomists Rather Be Doing?
    4. 4. What do Taxonomists Wish Would Happen?
    5. 5. 1995 Freshmen of NSF PEET* • Towards a World Monograph of the Therevidae (Insecta: Diptera) – 1995 – 2006 • Therevidae is medium-sized family with (now) – 4 subfamilies – ~130 genera – ~1150 species *National Science Foundation's Partnerships for Enhancing Expertise in Taxonomy
    6. 6. Products • Trained – 9 dipterists, 7 through Ph.D. – Scientific illustrator – Dozens of students in databasing • Publications – 71 publications during grant – 20 more since & counting • Digitization – Mandala database 1995- – Website – Collaborations with DiscoverLife.org & GBIF …and the world is unlikely to run out of flies to study!
    7. 7. Process: Specimens • Collect, sort, curate, label, sex, determine, & database specimen information – Assign unique identifiers where none exist • Visit & borrow material from museums • Examine types
    8. 8. Is All that Work Worthwhile? • "A taxonomic paper often plants the very seeds of its own obsolescence." (Johnson 2011) • There is no getting around the work required to produce a catalog or any taxonomic treatment. • What we can do, is make sure that the information is accessible and reusable. • Is it time to ditch traditional catalogs? Henicomyia by J. Marie Metz
    9. 9. What Choices Do You Have? • Last year's symposium on Arthropod Collections databases explored some of your options, but not all are suitable. – Online collections database platforms (not suitable for creating taxonomic catalogues that cross collections not included) • Arctos • Specify 6 – Online taxonomic database platforms – optimize creation of species pages • Species File – taxonomic authority files • Scratchpads – community-oriented contributions • 3I – online revisions of taxa • Encyclopedia of Life – Expert LifeDesks
    10. 10. What Choices Do You Have? • Last year's symposium on Arthropod Collections databases explored some of your options, but not all are suitable. – Online platforms designed to parse or take parsed data & repurpose it (incl. online taxonomic database platforms above) • GBIF's Integrated Publishing Toolkit (IPT) – not thought of as a workbench-level tool • LUCID – especially good for keys & descriptive data • Biodiversity Informatics Journal - Will take in parsed data from Scratchpads and IPT & eventually databases (mechanism unclear) – Desktop or server-based platforms – usually in Filemaker or 4D or MSAccess • Mandala – http://www.inhs.illinois.edu/research/mandala/ • Biota - http://viceroy.eeb.uconn.edu/biota/ • Mantis - http://insects.oeb.harvard.edu/etypes/Downloads.htm
    11. 11. The Process: Decide on a Format • Was decided to publish as traditional Myia catalog • Expectations about what is in a "traditional catalog" or taxonomic treatment & how it should be formatted – Print styles (italics, bold, centered, hanging indents) – Accented characters (for literature references, authority names, and localities) – Special characters (for ♂ and ♀ signs) – Notes kept with the taxon entry or as an appendix? • Use Mandala to achieve retrievability & formatting of output
    12. 12. General Workflow: TherevidMandala Database • Input raw data: The Bulk of the Work is HERE! • Link data in related tables • Create fields for catalog output for Taxa & their history Literature (including disambiguation of similar citations) List of countries (& selected states/provinces) by biogeographic region for valid taxa Create & number notes for listing in appendix • Create a script that finds data to be exported • Create scripts to format data including styles (bold, italics, codes for paragraph formatting) • Export TaxonID& catalog output field only to Filemaker Pro to isolate output & preserve formatting including accented characters Mandala production db Acrobat MSWord Catalog Catalog Output to new FMP db
    13. 13. Things Can Get Messy • Some operations require expert eyes to determine fitness-for-use • A database can find, sort, & summarize, but ultimately does not "see" anomalies unless specifically programmed to do so • Automation (scripting, creation of calculated fields) requires time, refinement, & expertise • Parsed data are key to flexibility
    14. 14. Create Taxonomic Hierarchy Use to automate searches & sort catalog output by classification hierarchy, rank, & alphabetically
    15. 15. Use Reason for Status to Dictate Formatting
    16. 16. We Used the Specimen* Table to Define our Distribution *based on 105,889 specimens with valid names & parsed localities
    17. 17. Script to Find & Sort Specimens • Once sorted, export a summary for each taxon
    18. 18. • Summary can then be formatted in MSWord • Bring back into Filemaker for final formatting • Spot possible outliers • Match TaxonID to import formatted information into production db TaxonIDx Biogeographic Region x Country x State/Province
    19. 19. Filling in the Cracks • All taxa, literature, and specimens to be included in the catalog were marked by an expert with a code for easier retrieval • Communication about scripts & field calculations were done in Google Docs • Literature with the same authors and years had to be disambiguated with letters following the year. – Used in both the literature cited and text of the catalog • After including the notes in the text flow, it was decided by the authors to number and put them into an appendix. – Finding & sorting of these could be automated – Replace with series allowed numbering of notes – Awkward (but necessary) to renumber notes when new ones were found to be needed.
    20. 20. General Workflow • TaxonID is for reference only • Resize catalog output field (in layout mode) so all contents will always be seen (page size) & make sure to size the field to fit the contents • Open in Preview to check • Save as PDF Mandala production db Catalog Output to new FMP db
    21. 21. General Workflow • This step mainly preserves catalog text styles & accented characters out of FMP • Save As MS Word document after verifying expected results. • Saving as Word will collapse the formatting into giant paragraphs Mandala production db Catalog Output to new FMP db Acrobat
    22. 22. General Workflow Mandala production db Acrobat MSWord Catalog Catalog Output to new FMP db • Create styles in MSWord for formatting text & paragraphs • Search & replace special characters (%%, $$, zzz, ||, //); ♂ and ♀ signs • Clean up extra spaces, paragraphs, & punctuation • Using Google Docs is not (yet) an option for a traditionally published catalog as the formatting tools aren't adequate
    23. 23. Send Out to Experts
    24. 24. Consensus! • When the experts are happy, we're done, right? • Still have to update the database & web output online – complements printed catalog as it is dynamic • Push corrections to public portals of data (own website, DiscoverLife, GBIF, etc.) • So "magic" is a relative, kind of wishful term—the future is more likely in platforms such as those being coordinated by Pensoft.
    25. 25. References, Resources • Miller, J. et al. 2012. From taxonomic literature to cybertaxonomic content. BMC Biology 10:87http://www.biomedcentral.com/content/pdf/1741- 7007-10-87.pdf • Johnson, N.F. 2012. A collaborative, integrated and electronic future for taxonomy. Invertebrate Systematics 25: 471–475. http://www.publish.csiro.au/?act=view_file&file_id=IS11052.pdf • Biodiversity Data Journal (publication debut Dec. 2012)http://www.pensoft.net/journals/bdj • Symposium: Arthropod Collections Databases. 2011 ECN meeting, Reno, NV http://www.ecnweb.org/past/2011 • Darwin Core Standard http://rs.tdwg.org/dwc/ • Kampmeier, G. E. and M. E. Irwin. 2009. Meeting the interrelated challenges of tracking specimen, nomenclature, and literature data in Mandala. Chapter 15 in T. Pape, D. Bickel, and R. Meier (eds.) Diptera Diversity: Status, Challenges and Tools. Leiden: Brill Academic Publishers, pp. 407-437. http://www.inhs.illinois.edu/research/mandala/Ch15_Mandala_DiptDiv2009.pdf
    26. 26. More Refs & Resources • Kennedy, J., R. Hyam, R. Kukla, T. Paterson. 2006. Standard data model representation for taxonomic information. A Journal of Integrative Biology 10(2): 220-230. http://www.hyam.net/publications/omi.2006.10.220.pdf • Penev, L., T. Georgiev, P. Stoev, D. Roberts, V. Smith. 2012. Making small data big! The Biodiversity Data Journal (BDJ). TDWG 2012, Beijing, 22-26 October. http://www.tdwg.org/fileadmin/2012conference/slides/Biodiversity_Data _Journal.pdf • Catalog of Life http://www.catalogueoflife.org/colwebsite/sites/default/files/2012_CoL- Standard_Dataset_v6_3.pdf
    27. 27. Acknowledgements • Michael E. Irwin • F. Chris Thompson • Neal Evenhuis • Christine Lambkin • Shaun Winterton • Don Webb • Mark Metz • Martin Hauser • Kevin Holston • Steve Gaimari • J. Marie Metz • David Yeates • Amanda Buck • Brian Wiegmann • Evert Schlinger • John Pickering • FMWebschool • National Science Foundation • Schlinger Foundation • Illinois Natural History Survey • University of Illinois • Discover Life • Biodiversity Information Standards (TDWG) NSF Projects: Therevid PEET: DEB-95-21925; 99-77958 Fiji Arthropod Survey: DEB- 0425790 FLYTREE: EF- 0334948 Tabanid PEET: DEB 07-31528
    28. 28. ©2012 University of Illinois Board of Trustees. All rights reserved. For permission information, contact the Illinois Natural History Survey. References to commercial products are for informational purposes only and do not imply endorsement.
    29. 29. Appendix Additional information for the curious of slides jettisoned for time
    30. 30. Why Use A Database? • Flexibility – Finely parsed data may be pieced together for publication, labels – Scripting of often used functions • Reuse/repurposing of data – Sharing with GBIF, DiscoverLife.org, museums • Centralization of work environment – Workers can be anywhere, any time zone – Backup can be automated • Individual work environment – Choice with platforms not required to be online (although trade-off)
    31. 31. Vision • "Taxonomy should fully embrace electronic media and informatics tools. Particularly, this step requires the development and widespread implementation of community data standards. The barriers to progress in these areas are not technological, but are primarily social. The community needs to see clear evidence of the value added through these changes in procedures and insist upon their use as standard practice." Johnson, N.F. 2011. A collaborative, integrated and electronic future for taxonomy. Invertebrate Systematics 25: 471.
    32. 32. Any Database Can Record the Basics, but… • How the information is related is also key – defining taxonomic ranks as parent-child relationship – valid taxonomic entities related to their synonyms – types and specimens determined for a taxon – literature associated with a taxonomic name – collecting localities and collecting events • Readability – if a published work rather than raw database output • Format – Based on existing print models? – Print styles (italics, bold, centered, hanging indents) – Accented characters (for literature references, authority names, and localities) – Special characters (for ♂ and ♀ signs) – Notes kept with the taxon entry or as an appendix?
    33. 33. Mandala Data Model • Not all of this is required for a traditional catalog, but these tables contain a wealth of vital, interrelated data. • Tables with rounded edges are authority files
    34. 34. Use the Classification Hierarchy to Automate Searches
    35. 35. Reason for Status Used for Formatting