Loaded term What does this mean? « Able to be reached or entered »
Homonyms same name for many taxa Synonyms different names for same taxa Variant representations orthography, spelling, differences in authority
Requires infrastructure or the capacity to reconcile scientific names and their surrogates (eg taxon concepts)
- Data for red-listed species in Canada (SARA)
The project is an international collaboration between the National Centre for Text Mining (UK), Missouri Botanical Garden (US), Dalhousie University’s Big Data Analytics Institute (Canada) and Ryerson University’s Social Media Lab (Canada). NaCTeM was also a recipient of the 2011 Digging into Data call with the Integrated Social History Environment for Research (ISHER) project.
Started March 2014
What do accessible occurrence data and checklists tell us about species diversity in Canada?
What do accessible occurrence
data and checklists tell us about
species diversity in Canada?
David P. Shorthouse
Université de Montréal / Canadensys
• Legally unrestrictive
• Openly licensed
• Can be reused, reconstituted, recombined
The Canadian Museum of Nature owns the copyright
of almost all information (data and images) accessed
through its databases.
Data may not be transferred to another database for
distribution to others without prior written permission
from the Canadian Museum of Nature.
• Establish provenance
• Track citations & metrics of use
• Build new networks of collaboration
• Play a significant role in « Big Data »
Innovation happens when you use
someone else’s data
Are Occurrence and
Checklist Data in Canada
Carabidae on Canadensys
Bembidion (Bracteon) punctatostriatum
Credit: Henri Goulet
Species at Risk on Canadensys
Cypripedium candidum Muhl. ex Willd.
Small White Lady’s-slipper
Antennaria flagellaris (A. Gray) A. Gray
Utility of « Accessible » Occurrence Data
Hjarding, A., Tolley, K. A., & Burgess, N. D. 2014. Red List assessments of East
African chameleons: a case study of why we need experts. Oryx
Photographer: H. Vannoy Davis
« 99.9%of GBIF records
used outdated taxonomy
and 20% had no locality
What Can We Mine Now?
Canadensys new research initiatives workshop
Anne Bruneau, firstname.lastname@example.org
Data Publication Workshop K.W. Neatby Building:
January 13, 14, 2015; co-located in Tallahassee, FL with
James Macklin, email@example.com
David Shorthouse, firstname.lastname@example.org
• « Accessible » means computable & open
• Occurrence and checklist data can be
published, but they require curation
– Seek assistance from Canadensys & elsewhere
• There are significant challenges, but we can be
• What new, innovative research can we do with
What do accessible
occurrence data and checklists
say about us?