Kampmeier ecn 2012

Catalog magic:
Behind the Scenes of Creating
a World Catalog of the
Therevidae
Gail E. Kampmeier
Illinois Natural History Survey, Prairie Research Institute
University of Illinois at Urbana-Champaign
gkamp@illinois.edu
Irina Brake
National Museum of Natural History, London
Kristin Algmin
University of Illinois at Urbana-Champaign

Why is it so Difficult to get from
Here… to… Here?
Therevidae

What Would Taxonomists Rather
Be Doing?

What do Taxonomists Wish
Would Happen?

1995 Freshmen of NSF PEET*
• Towards a World
Monograph of
the Therevidae
(Insecta: Diptera)
– 1995 – 2006
• Therevidae is
medium-sized
family with (now)
– 4 subfamilies
– ~130 genera
– ~1150 species
*National Science Foundation's Partnerships for Enhancing Expertise in Taxonomy

Products
• Trained
– 9 dipterists, 7 through Ph.D.
– Scientific illustrator
– Dozens of students in
databasing
• Publications
– 71 publications during grant
– 20 more since & counting
• Digitization
– Mandala database 1995-
– Website
– Collaborations with
DiscoverLife.org & GBIF
…and the world is unlikely to run
out of flies to study!

Process: Specimens
• Collect, sort, curate,
label, sex, determine, &
database specimen
information
– Assign unique identifiers
where none exist
• Visit & borrow material
from museums
• Examine types

Is All that Work Worthwhile?
• "A taxonomic paper often
plants the very seeds of its own
obsolescence." (Johnson 2011)
• There is no getting around the
work required to produce a
catalog or any taxonomic
treatment.
• What we can do, is make sure
that the information is
accessible and reusable.
• Is it time to ditch traditional
catalogs?
Henicomyia by J. Marie Metz

What Choices Do You Have?
• Last year's symposium on Arthropod Collections
databases explored some of your options, but
not all are suitable.
– Online collections database platforms (not suitable
for creating taxonomic catalogues that cross
collections not included)
• Arctos
• Specify 6
– Online taxonomic database platforms – optimize
creation of species pages
• Species File – taxonomic authority files
• Scratchpads – community-oriented contributions
• 3I – online revisions of taxa
• Encyclopedia of Life – Expert LifeDesks

What Choices Do You Have?
• Last year's symposium on Arthropod Collections
databases explored some of your options, but not all
are suitable.
– Online platforms designed to parse or take parsed data &
repurpose it (incl. online taxonomic database platforms
above)
• GBIF's Integrated Publishing Toolkit (IPT) – not thought of as a
workbench-level tool
• LUCID – especially good for keys & descriptive data
• Biodiversity Informatics Journal - Will take in parsed data from
Scratchpads and IPT & eventually databases (mechanism
unclear)
– Desktop or server-based platforms – usually in Filemaker
or 4D or MSAccess
• Mandala – http://www.inhs.illinois.edu/research/mandala/
• Biota - http://viceroy.eeb.uconn.edu/biota/
• Mantis - http://insects.oeb.harvard.edu/etypes/Downloads.htm

The Process: Decide on a Format
• Was decided to publish as traditional Myia catalog
• Expectations about what is in a "traditional catalog" or taxonomic
treatment & how it should be formatted
– Print styles (italics, bold, centered, hanging indents)
– Accented characters (for literature references, authority names, and localities)
– Special characters (for ♂ and ♀ signs)
– Notes kept with the taxon entry or as an appendix?
• Use Mandala to achieve retrievability & formatting of output

General Workflow:
TherevidMandala Database
• Input raw data: The Bulk of the Work is HERE!
• Link data in related tables
• Create fields for catalog output for
Taxa & their history
Literature (including disambiguation of
similar citations)
List of countries (& selected
states/provinces) by biogeographic region for
valid taxa
Create & number notes for listing in
appendix
• Create a script that finds data to be exported
• Create scripts to format data including styles
(bold, italics, codes for paragraph formatting)
• Export TaxonID& catalog output field only to
Filemaker Pro to isolate output & preserve
formatting including accented characters
Mandala
production db
Acrobat
MSWord
Catalog
Catalog Output
to new FMP db

Things Can Get Messy
• Some operations require expert
eyes to determine fitness-for-use
• A database can find, sort, &
summarize, but ultimately does
not "see" anomalies unless
specifically programmed to do so
• Automation (scripting, creation
of calculated fields) requires
time, refinement, & expertise
• Parsed data are key to flexibility

Create Taxonomic Hierarchy
Use to
automate
searches &
sort catalog
output by
classification
hierarchy,
rank, &
alphabetically

Use Reason for Status to Dictate
Formatting

We Used the Specimen* Table to
Define our Distribution
*based on 105,889 specimens with valid names & parsed localities

Script to Find & Sort Specimens
• Once sorted, export a summary for each
taxon

• Summary can then be
formatted in MSWord
• Bring back into Filemaker
for final formatting
• Spot possible outliers
• Match TaxonID to import
formatted information
into production db
TaxonIDx Biogeographic Region x
Country x State/Province

Filling in the Cracks
• All taxa, literature, and specimens to be included in the
catalog were marked by an expert with a code for easier
retrieval
• Communication about scripts & field calculations were
done in Google Docs
• Literature with the same authors and years had to be
disambiguated with letters following the year.
– Used in both the literature cited and text of the catalog
• After including the notes in the text flow, it was decided
by the authors to number and put them into an appendix.
– Finding & sorting of these could be automated
– Replace with series allowed numbering of notes
– Awkward (but necessary) to renumber notes when new ones
were found to be needed.

General Workflow
• TaxonID is for reference only
• Resize catalog output field
(in layout mode) so all contents
will always be seen (page size)
& make sure to size the field to
fit the contents
• Open in Preview to check
• Save as PDF
Mandala
production db
Catalog Output
to new FMP db

General Workflow
• This step mainly
preserves catalog text styles &
accented characters out of FMP
• Save As MS Word document
after verifying expected results.
• Saving as Word will collapse
the formatting into giant
paragraphs
Mandala
production db
Catalog Output
to new FMP db
Acrobat

General Workflow
Mandala
production db
Acrobat
MSWord
Catalog
Catalog Output
to new FMP db
• Create styles in MSWord for
formatting text & paragraphs
• Search & replace special
characters (%%, $$, zzz, ||, //);
♂ and ♀ signs
• Clean up extra spaces,
paragraphs, & punctuation
• Using Google Docs is not (yet)
an option for a traditionally
published catalog as the
formatting tools aren't adequate

Consensus!
• When the experts are happy,
we're done, right?
• Still have to update the
database & web output online
– complements printed
catalog as it is dynamic
• Push corrections to public
portals of data (own website,
DiscoverLife, GBIF, etc.)
• So "magic" is a relative, kind
of wishful term—the future is
more likely in platforms such
as those being coordinated by
Pensoft.

References, Resources
• Miller, J. et al. 2012. From taxonomic literature to cybertaxonomic
content. BMC Biology 10:87http://www.biomedcentral.com/content/pdf/1741-
7007-10-87.pdf
• Johnson, N.F. 2012. A collaborative, integrated and electronic future for
taxonomy. Invertebrate Systematics 25: 471–475.
http://www.publish.csiro.au/?act=view_file&file_id=IS11052.pdf
• Biodiversity Data Journal (publication debut Dec.
2012)http://www.pensoft.net/journals/bdj
• Symposium: Arthropod Collections Databases. 2011 ECN
meeting, Reno, NV http://www.ecnweb.org/past/2011
• Darwin Core Standard http://rs.tdwg.org/dwc/
• Kampmeier, G. E. and M. E. Irwin. 2009. Meeting the interrelated
challenges of tracking specimen, nomenclature, and literature data in
Mandala. Chapter 15 in T. Pape, D. Bickel, and R. Meier (eds.) Diptera
Diversity: Status, Challenges and Tools. Leiden: Brill Academic
Publishers, pp. 407-437.
http://www.inhs.illinois.edu/research/mandala/Ch15_Mandala_DiptDiv2009.pdf

More Refs & Resources
• Kennedy, J., R. Hyam, R. Kukla, T. Paterson. 2006.
Standard data model representation for taxonomic
information. A Journal of Integrative Biology 10(2):
220-230. http://www.hyam.net/publications/omi.2006.10.220.pdf
• Penev, L., T. Georgiev, P. Stoev, D. Roberts, V. Smith.
2012. Making small data big! The Biodiversity Data
Journal (BDJ). TDWG 2012, Beijing, 22-26 October.
http://www.tdwg.org/fileadmin/2012conference/slides/Biodiversity_Data
_Journal.pdf
• Catalog of Life
http://www.catalogueoflife.org/colwebsite/sites/default/files/2012_CoL-
Standard_Dataset_v6_3.pdf

Acknowledgements
• Michael E. Irwin
• F. Chris Thompson
• Neal Evenhuis
• Christine Lambkin
• Shaun Winterton
• Don Webb
• Mark Metz
• Martin Hauser
• Kevin Holston
• Steve Gaimari
• J. Marie Metz
• David Yeates
• Amanda Buck
• Brian Wiegmann
• Evert Schlinger
• John Pickering
• FMWebschool
• National Science
Foundation
• Schlinger Foundation
• Illinois Natural History
Survey
• University of Illinois
• Discover Life
• Biodiversity Information
Standards (TDWG)
NSF Projects:
Therevid PEET:
DEB-95-21925;
99-77958
Fiji Arthropod
Survey: DEB-
0425790
FLYTREE: EF-
0334948
Tabanid PEET:
DEB 07-31528

©2012 University of Illinois Board of Trustees.
All rights reserved. For permission information,
contact the Illinois Natural History Survey.
References to commercial products are for informational purposes
only and do not imply endorsement.

Appendix
Additional information for the
curious of slides jettisoned for
time

Why Use A Database?
• Flexibility
– Finely parsed data may be
pieced together for
publication, labels
– Scripting of often used
functions
• Reuse/repurposing of data
– Sharing with GBIF,
DiscoverLife.org, museums
• Centralization of work
environment
– Workers can be anywhere,
any time zone
– Backup can be automated
• Individual work environment
– Choice with platforms not
required to be online
(although trade-off)

Vision
• "Taxonomy should fully embrace
electronic media and informatics tools.
Particularly, this step requires the
development and widespread
implementation of community data
standards. The barriers to progress in
these areas are not technological, but are
primarily social. The community needs to
see clear evidence of the value added
through these changes in procedures and
insist upon their use as standard practice."
Johnson, N.F. 2011. A collaborative, integrated and electronic future for taxonomy.
Invertebrate Systematics 25: 471.

Any Database Can Record the
Basics, but…
• How the information is related is also key
– defining taxonomic ranks as parent-child relationship
– valid taxonomic entities related to their synonyms
– types and specimens determined for a taxon
– literature associated with a taxonomic name
– collecting localities and collecting events
• Readability – if a published work rather than raw database output
• Format
– Based on existing print models?
– Print styles (italics, bold, centered, hanging indents)
– Accented characters (for literature references, authority names, and
localities)
– Special characters (for ♂ and ♀ signs)
– Notes kept with the taxon entry or as an appendix?

Mandala Data Model
• Not all of this is
required for a
traditional
catalog, but
these tables
contain a
wealth of vital,
interrelated
data.
• Tables with
rounded edges
are authority
files

Use the Classification
Hierarchy to
Automate Searches

Reason for Status
Used for
Formatting

Kampmeier ecn 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Viewers also liked

Viewers also liked (17)

Similar to Kampmeier ecn 2012

Similar to Kampmeier ecn 2012 (20)

More from ECNOfficer

More from ECNOfficer (20)

Recently uploaded

Recently uploaded (20)

Kampmeier ecn 2012

Editor's Notes