The document discusses the process of digitizing the collection at the Lyman Entomological Museum, including challenges faced and benefits gained. About 10% of the collection has been digitized so far. Digitization is a time-consuming process that involves verifying specimen identification, adding unique identifiers, georeferencing localities, and cleaning data. Making the data openly accessible online through databases creates opportunities for research.
1. Amélie Grégoire Taillefer and Terry A. Wheeler
Dept. of Natural Resource Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
Databasing the Lyman Entomological Museum:
challenges and opportunities
6 weeks sampling
Databases create opportunities
Challenges
Acknowledgments
Future work
1. Preparation
• Identify specimens to lowest taxonomic level possible
• Verify status of taxonomic name
• Add unique identifier to each specimen
The Lyman Entomological Museum began as the private collection of Henry H. Lyman,
which was bequeathed in 1914 to McGill University. The largest university insect
collection in Canada, it holds specimens in all orders with a worldwide terrestrial
coverage collected from 1860 to the present. Since the mid 1990s much of the focus in
collection development has been in the Diptera, although ongoing research projects,
donations and exchanges continue to add material in all orders, particularly
Coleoptera. 70% of the Diptera specimens were collected from Canada.
Digitization – recording specimen collection labels in digital form – is a time-consuming
and laborious process. Retrospective digitization of large collections is a costly
undertaking, but the benefits in terms of data sharing and accessibility far outweigh the
costs. Canadensys (canadensys.net), the Canadian biodiversity open database,
compiles taxonomic, geographic, temporal, numerical, and historical information about
three megadiverse groups: plants, insects and fungi, housed in 18 institutions across
Canada, which collectively house several million specimens. About 1.3 million
specimen records are currently available on Canadensys; the Lyman Entomological
Museum makes up 20% of that total.
Steps in digitization
Background and history
A digitized collection is a rich source of primary biodiversity data for a range of
applications in taxonomy, inventories, catalogs, and ecology. Data can be searched via
maps (as above) or in list format. Shared, open, accessible data creates opportunities
for building large datasets for analysis of large-scale patterns. Extraction of data on
Canadensys for a particular taxon, locality or set of samples is easy and rapid.
Collection databases have traditionally been used for curation, loan management or
taxonomic research. Digitization facilitates all these functions. However, because of the
extensive spatial, temporal and ecological data associated with specimen records,
these databases are also valuable resources for ecological and conservation research.
The dataset can easily be managed for the purpose of loans, systematic research,
taxonomic coverage within an area for systematic, ecological or conservation
purposes. The databases provide baseline data, as well as evidence of change over
time, for regions or biotas areas that may have experienced habitat change.
1. Implementing an efficient, standard data entry procedure
2. Old labels with minimal information
3. Georeferencing old specimen localities
4. Errors in coordinates or localities on labels
5. Misidentified specimens
6. Data cleaning, validation and correction
7. Training volunteers and staff for data search and new entries
Major funding for Canadensys was provided by the Canada Foundation for Innovation.
Canadensys coordinates ongoing open access to our database. We thank David
Shorthouse and Carole Sinou for all their help and advice in data cleaning and
formatting for publication on Canadensys.
No database is ever completed. Data checking and verification are an ongoing process
as taxonomic experts verify identifications or provide finer taxonomic resolution. New
specimens added to the collection require ongoing commitment by collection staff,
students or volunteers in data entry and publication. For example, more than 150,000
arctic Diptera and new accessions from other regions currently await digitization in the
Lyman Museum.
Lyman Museum
LEM-0013538
Progress to date
Order Geographic scope Specimens
databased
Diptera Worldwide 240,000 +
Neuroptera Canada 2,700 +
Coleoptera (Buprestidae, Dermestidae) Canada 2,600 +
Hymenoptera (Vespidae, Eumenidae) Canada 2,900 +
Araneae Canada 4,500 +
Source: Lyman Entomological Museum georeferenced records (253,061), Canadensys, Google Earth.
(accessed on 2013-10-11)
LEM0249541, from McGill University http://
dataset.canadensys.net/lemq-specimens
(accessed on 2013-10-11)
Biota 2-The Biodiversity Database Manager, R.K. Colwell, University of Connecticut,
http://viceroy.eeb.uconn.edu/Biota/biota, specimen and collection record tables.
About 10% (253,000 specimens) of the
Lyman collection has been databased with
Canadensys support. Our database is freely
hosted by Canadensys and shared
internationally via the Global Biodiversity
Information Facility (www.gbif.org).
2. Databasing
• BIOTA 2 program used at
Lyman
• Data entry requires
frequent data verification
• Georeference records
3. Data publication
• Export data as text file
• Add columns and formulas for accepted data format
• Convert database information into Darwin Core (internationally accepted biodiversity
information standard)
• Add collection metadata
• Serve data via Canadensys and GBIF