The Canadensys Network
Past successes, Present Challenges, and
a Look to the Future
David P. Shorthouse
A Successful Network
30 collections
Plants, insects and fungi
Open Communications
www.canadensys.net (news & documentation)
https://groups.google.com/d/forum/canadensy
s (113 members)
@canadensys (300 followers)
Goal
Mobilize 3 million
specimen records (20%)
Vascular Plants of Canada
32+ citations
1,500 pageviews / day
110 users / day
integration in rOpenSci
doi:10.3897/phytokeys.25.3100
« Once operational, the network
will be available to any collection or
researcher that is not part of the initial
application. »
Recent Updates
DFO Maritimes Region Cetacean Sightings
Recent Updates
associatedSequences
GenBank:JQ662503
www.ncbi.nlm.nih.gov/…/JQ662503
Contributions Made to GBIF
Darwin Core Archive Validator
Narwhal Processor
IPT Customization
Data Licensing
Digital Object Identifiers
Country Pages Requirements
Code DevelopmentConsultations
Data license
Use without restriction
bit.ly/cc0-for-data
doi: 10.3897/zookeys.283.4674
doi: 10.5886/txsd3at3
doi: 10.3897/zookeys.360.4742
doi: 10.5886/998dbs2a
https://github.com/gbif/dwca-validator
$ java -jar dwca-validator.jar -s http://ipt.iobis.org/obiscanada/archive.do?r=dfo_canadianais
CORE : 1698481
->WARNING,RECORD_CONTENT:The value 79:48 [sic] is not a numerical value
->WARNING,RECORD_CONTENT:The value 44:29 [sic] is not a numerical value
CORE : 1698482
->WARNING,RECORD_CONTENT:The value 79:31 [sic] is not a numerical value
->WARNING,RECORD_CONTENT:The value 44:35 [sic] is not a numerical value
CORE : 1002205
->WARNING,RECORD_CONTENT:The value -123.333º is not a numerical value
CORE : null
->ERROR,FIELD_UNIQUENESS:The value null was already used for term coreId
CORE : null
Darwin Core Archive Validator
IPT Customization
Collaborations
Developers & International Projects
17 Canadensys code projects
20 « forks »
Working collaborations with
GBIF, France, Colombia, Brazil,
French Guiana
Present Challenges
Pieris spp.
Pieris japonica
Pieris rapae
Better Fidelity With Taxa
Better Fidelity With Media
Harvesting & Caching
Processing & Storing
OCR
Crowd-sourcing
Sharing
DwC-A Simple Media Extension
Better Fidelity With Reporting
Citations of specimens, datasets, checklists
Data quality
• Collection dates
• Georeferencing
• Taxon names & concepts
A Look to the Future
« Developing the next generation of
IPM tools – mobile applications for
pest identification, monitoring and
forecasting for sustainable and
profitable crop production »
Dr. Barb Sharanowski
Mobile first
Occurrence observations
Real-time species distribution modelling
Identification keys
CFI Cyberinfrastructure Initiative
January 2015 Notice of intent (NOI). The CFI
expects all institutions to identify
collaboration with Compute
Canada
June 2015 Full proposal due
Canadensys Wish List
1. Formal representation of NA research projects in GBIF
Governance Model
2. Recognition of in-kind support
3. Shared development & hackathons
• accelerate delivery of generic solutions
• minimize duplication

2014.07.22 shorthouse

Editor's Notes

  • #7 What is Canadensys History, funding, goals
  • #11 DwC-A harvester: semi-automated from any IPT
  • #18 -todo: ensure presence of field, not merely that a field has content or is empty -todo: validate fields when they are split across fields (eg dates that are present in separate year, month, day fields) -todo: evaluate a field based on content of other fields (eg is datum provided when there is a latitude and longitude) -architecture is an evaluation chain, may have subchains -500,000 records in 5s -application, as web site, as a service that can be extended and reused -core of the codebase is format agnostic, ie although designed for use with DarwinCore Archives as input, there’s nothing in the code that precludes implementation elsewhere (eg integrated within the IPT to be used prior to data publication as a mechanism to ensure Data Quality Numerical values Invalid characters Blank values Unique values (occurrenceID) Adherence to controlled vocabularies