Buffington ecn 2012


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We must vet all hym type names to make sure LSIDs are not already extant before proceeding with generating them for all.
  • GN is the US component of the international Global Names Architecture (GNA). It will become a multi-agent enterprise level names-based cyberinfrastructure that will act primarily behind the scenes, like the Domain Name System (DNS) for the internet, providing quick, effective and persistent services that use scientific names as the key to a virtual layer that interconnects distributed information about any species; indexing, organizing and making data discoverable for reuse. We are also collaborators on GN2 NSF proposal of Pyle, Patterson, Franz to: We will develop stable enterprise level services in six areas: (1) Names discovery and recognition software to index documents, databases, and other electronic sources allowing discovery of and linking to myriad biodiversity resources. (2) Names reconciliation - our solution to the many-names-for-one-taxon problem - and resolution services to returns the current name(s) for taxa. Reconciliation and resolution map alternative names for the same taxa together and keep the infrastructure up to date with current taxonomic knowledge. By standardizing names in multiple data sources, data are interconnectable and can thereby facilitate the transition (translation?) of the long tail of biology to the Big Data world. (3) Improved access to content by expansion of the Global Names Usage Bank (GNUB) indexing environment, its web services, and exposure of its metadata to the Linked Open Data Cloud. (4) Enhancement of ZooBank for nomenclaturalists and extension to non-animals. (5) Increased engagement with taxonomy, integration of synonymy information for reconciliation, and taxonomic concept and classification provenance management.  We will offer a registry and clearing house for classifications and add classification integration based on the Euler and CleanTax software. We will expand the data model and GNITE editing environment to cover more classes of annotations of taxa and be more fully interoperable with other taxonomic workbench software. We will integrate Filtered Push logic to improve synchrony among initiatives and to enhance data quality control. (6) Integration of insufficiently named taxa identified in reference to short gene sequences.  A final component is our sustainability agenda, through which we will establish GNA as an integral part of an international infrastructure, and will develop mirrors to remove the risk of single-point-of-failure. We will identify elements of our efforts that can be monetized in a not-for-profit enterprise to sustain the core functions.
  • Had to modifiy our original spreadsheet to make it palatable (EndNote fields/format) to bulk upload by CiteBank. We will need to locate PDFs already avaialable at BHL and can include those pub and page URLs in USNMHymtypes. Those original citations that cannot be located as original scans anywhere online will need to be scanned (by BHL staff on NMNH) and uploaded to BHL.
  • Buffington ecn 2012

    1. 1. Digitization efforts and products from the USNM Hymenoptera Unit Matthew L. Buffington & Michael Gates Systematic Entomology Lab, ARS-USDA NMNH, Smithsonian Institution
    2. 2. Whole Drawer Imaging: Giga panning the USNM gall wasps Type digitization in the Hymenoptera Unit
    3. 3. USNM Hymenoptera Types  Online Database: www.usnmhymtypes.com  7,267 primary types included.  Four images per type, incl. labels (~3,200)  All types being affixed with unique matrix code label.
    4. 4. Type database specimen views
    5. 5. Value Added I  Zoobank: www.zoobank.org  Working with Rich Pyle to assign Life Science Identifiers to all Hymenoptera type species in USNM.  Cross check in advance with Hym. Name Server.  Provide link to LSID on USNMHymtypes.
    6. 6. Value Added II  Global Names Interface for Taxonomic Editing: www.gnite.org  Name-based cyberinfrastructure for interconnecting all online information about any species.  Ultimately, link all USNMhymtypes  information.
    7. 7. Value Added III  CiteBank: www.citebank.org  Our contractor and CiteBankstaff:  Provide links on USNMHymtypes  to each publication (click volume).  Provide link to page on which species  was originally described (click page).  All original citations are/will be available through BHL portal.
    8. 8. Whole Drawer Imaging: Giga panning the USNM gall wasps  NMNH Contains ca 35 million specimens, in: 132,354 drawers, stored in 5200 cabinets Specimen-level databasing limited How can we quickly ‘digitize’ the collection? How can we better serve the research community? “Bugs on pins” Smithsonian Castle project.
    9. 9. Whole Drawer Imaging: Giga panning the USNM gall wasps  Cost-effective way to digitize whole drawers with 200+ megapixel images  High-end setup: $5000 USD; ‘budget’ setup ca. $2500 USD  Setup: 30 sec; capture: 90 sec; stitching: 90-100 sec. Time to capture: 3.5-4min  Metadata management (e.g. species names) time-intensive: 5-10 min/drawer  14 min/drawer x 140 drawers = 32.6 hours
    10. 10. The setup
    11. 11. The results of our research
    12. 12. The results of our research.
    13. 13. Gall wasp digitizing
    14. 14. Gall wasp digitizing
    15. 15. Data served via Gigapan website
    16. 16. Pollinators and invasive species of economic importance. Osmia species Vespa crabro L.
    17. 17. Documentation of collections of historical importance: The Alfred Russel Wallace collection.
    18. 18. Documentation of collections of historical importance: The Alfred Russel Wallace collection.
    19. 19. Any collection curated in drawers can be digitized via the Gigapan system. In this case, dinosaurs.
    20. 20. Future Directions of the Gigapan Project  Research cameras that are better designed for this application (e.g. battery compartment on the side; wireless file transfer)  Determine a better file management system  Determine the best delivery system: will SI agree to have Gigapan ‘host’ the images?  Metadata management (e.g. species names) time- intensive: 5-10 min/drawer; can this be improved?  14 min/drawer x 140 drawers = 32.6 hours; can this be improved?
    21. 21. Acknowledgements  SI DigiComm committee, SI Women’s Committee, SI Collections Care Fund and Alma Solis (SEL) for helping to finance the USNMHymTypes and Gigapan projects  Matt Bertone and Andy Deans (NCSU) for Gigapan inspiration, ideas and hints  Patricia Gentille-Poole (SI Entomology) Gino Nearns (recent PhD!; our web designer), Jolearra Tshiteya (SI/RIT imaging intern) and Yaz Sarraj (SI gigapan intern)