SlideShare a Scribd company logo
1 of 27
The Canadensys Network
Past successes, Present Challenges, and
a Look to the Future
David P. Shorthouse
A Successful Network
30 collections
Plants, insects and fungi
Open Communications
www.canadensys.net (news & documentation)
https://groups.google.com/d/forum/canadensy
s (113 members)
@canadensys (300 followers)
Goal
Mobilize 3 million
specimen records (20%)
Vascular Plants of Canada
32+ citations
1,500 pageviews / day
110 users / day
integration in rOpenSci
doi:10.3897/phytokeys.25.3100
« Once operational, the network
will be available to any collection or
researcher that is not part of the initial
application. »
Recent Updates
DFO Maritimes Region Cetacean Sightings
Recent Updates
associatedSequences
GenBank:JQ662503
www.ncbi.nlm.nih.gov/…/JQ662503
Contributions Made to GBIF
Darwin Core Archive Validator
Narwhal Processor
IPT Customization
Data Licensing
Digital Object Identifiers
Country Pages Requirements
Code DevelopmentConsultations
Data license
Use without restriction
bit.ly/cc0-for-data
doi: 10.3897/zookeys.283.4674
doi: 10.5886/txsd3at3
doi: 10.3897/zookeys.360.4742
doi: 10.5886/998dbs2a
https://github.com/gbif/dwca-validator
$ java -jar dwca-validator.jar -s http://ipt.iobis.org/obiscanada/archive.do?r=dfo_canadianais
CORE : 1698481
->WARNING,RECORD_CONTENT:The value 79:48 [sic] is not a numerical value
->WARNING,RECORD_CONTENT:The value 44:29 [sic] is not a numerical value
CORE : 1698482
->WARNING,RECORD_CONTENT:The value 79:31 [sic] is not a numerical value
->WARNING,RECORD_CONTENT:The value 44:35 [sic] is not a numerical value
CORE : 1002205
->WARNING,RECORD_CONTENT:The value -123.333º is not a numerical value
CORE : null
->ERROR,FIELD_UNIQUENESS:The value null was already used for term coreId
CORE : null
Darwin Core Archive Validator
IPT Customization
Collaborations
Developers & International Projects
17 Canadensys code projects
20 « forks »
Working collaborations with
GBIF, France, Colombia, Brazil,
French Guiana
Present Challenges
Pieris spp.
Pieris japonica
Pieris rapae
Better Fidelity With Taxa
Better Fidelity With Media
Harvesting & Caching
Processing & Storing
OCR
Crowd-sourcing
Sharing
DwC-A Simple Media Extension
Better Fidelity With Reporting
Citations of specimens, datasets, checklists
Data quality
• Collection dates
• Georeferencing
• Taxon names & concepts
A Look to the Future
« Developing the next generation of
IPM tools – mobile applications for
pest identification, monitoring and
forecasting for sustainable and
profitable crop production »
Dr. Barb Sharanowski
Mobile first
Occurrence observations
Real-time species distribution modelling
Identification keys
CFI Cyberinfrastructure Initiative
January 2015 Notice of intent (NOI). The CFI
expects all institutions to identify
collaboration with Compute
Canada
June 2015 Full proposal due
Canadensys Wish List
1. Formal representation of NA research projects in GBIF
Governance Model
2. Recognition of in-kind support
3. Shared development & hackathons
• accelerate delivery of generic solutions
• minimize duplication

More Related Content

Similar to 2014.07.22 shorthouse

20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing SymposiumAndrew Su
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatoriesgloriakt
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...GigaScience, BGI Hong Kong
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET
 
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 KeynoteKaitlin Thaney
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...GigaScience, BGI Hong Kong
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Natusfera Lifewatch Competence Center EGI amsterdam 2016 small
Natusfera Lifewatch Competence Center EGI amsterdam 2016  smallNatusfera Lifewatch Competence Center EGI amsterdam 2016  small
Natusfera Lifewatch Competence Center EGI amsterdam 2016 smallFrancisco Pando
 
Making the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesMaking the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
 
Calit2 - CSE's Living Laboratory for Applications
Calit2 - CSE's Living Laboratory for ApplicationsCalit2 - CSE's Living Laboratory for Applications
Calit2 - CSE's Living Laboratory for ApplicationsLarry Smarr
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...Kathleen Jagodnik
 
What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...David Shorthouse
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 

Similar to 2014.07.22 shorthouse (20)

20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium20120220 Tri-Con Cloud Computing Symposium
20120220 Tri-Con Cloud Computing Symposium
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatories
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017
 
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
"Designing for Truth, Scale and Sustainability" - WSSSPE2 Keynote
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Cifar
CifarCifar
Cifar
 
Natusfera Lifewatch Competence Center EGI amsterdam 2016 small
Natusfera Lifewatch Competence Center EGI amsterdam 2016  smallNatusfera Lifewatch Competence Center EGI amsterdam 2016  small
Natusfera Lifewatch Competence Center EGI amsterdam 2016 small
 
Ci days notre_dame_april2010
Ci days notre_dame_april2010Ci days notre_dame_april2010
Ci days notre_dame_april2010
 
Making the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesMaking the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture Series
 
Calit2 - CSE's Living Laboratory for Applications
Calit2 - CSE's Living Laboratory for ApplicationsCalit2 - CSE's Living Laboratory for Applications
Calit2 - CSE's Living Laboratory for Applications
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Linked data in industry
Linked data in industryLinked data in industry
Linked data in industry
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
 
What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...What do accessible occurrence data and checklists tell us about species diver...
What do accessible occurrence data and checklists tell us about species diver...
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 

More from David Shorthouse

Have We Got the Names "Right"?
Have We Got the Names "Right"?Have We Got the Names "Right"?
Have We Got the Names "Right"?David Shorthouse
 
GlobalNames - Canadensys - Shorthouse
GlobalNames - Canadensys - ShorthouseGlobalNames - Canadensys - Shorthouse
GlobalNames - Canadensys - ShorthouseDavid Shorthouse
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics David Shorthouse
 
Canadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial PlatformCanadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial PlatformDavid Shorthouse
 
Chach Eol Drupalsprint Presentation
Chach Eol Drupalsprint PresentationChach Eol Drupalsprint Presentation
Chach Eol Drupalsprint PresentationDavid Shorthouse
 
Eol Drupal Dman Presentation
Eol   Drupal   Dman PresentationEol   Drupal   Dman Presentation
Eol Drupal Dman PresentationDavid Shorthouse
 
2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And Rycroft2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And RycroftDavid Shorthouse
 
Improving Drupal Taxonomy Editor
Improving Drupal Taxonomy EditorImproving Drupal Taxonomy Editor
Improving Drupal Taxonomy EditorDavid Shorthouse
 

More from David Shorthouse (14)

Have We Got the Names "Right"?
Have We Got the Names "Right"?Have We Got the Names "Right"?
Have We Got the Names "Right"?
 
GlobalNames - Canadensys - Shorthouse
GlobalNames - Canadensys - ShorthouseGlobalNames - Canadensys - Shorthouse
GlobalNames - Canadensys - Shorthouse
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Canadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial PlatformCanadensys - Federal Geospatial Platform
Canadensys - Federal Geospatial Platform
 
Global Names ievobio 2012
Global Names ievobio 2012Global Names ievobio 2012
Global Names ievobio 2012
 
BSC Shorthouse ESC 2011
BSC Shorthouse ESC 2011BSC Shorthouse ESC 2011
BSC Shorthouse ESC 2011
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 
Chach Eol Drupalsprint Presentation
Chach Eol Drupalsprint PresentationChach Eol Drupalsprint Presentation
Chach Eol Drupalsprint Presentation
 
Eol Drupal Dman Presentation
Eol   Drupal   Dman PresentationEol   Drupal   Dman Presentation
Eol Drupal Dman Presentation
 
2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And Rycroft2008.Eol Chicago Smith And Rycroft
2008.Eol Chicago Smith And Rycroft
 
Eol Shorthouse
Eol ShorthouseEol Shorthouse
Eol Shorthouse
 
Eol Matthias Hutterer
Eol Matthias HuttererEol Matthias Hutterer
Eol Matthias Hutterer
 
10minutes Roger
10minutes Roger10minutes Roger
10minutes Roger
 
Improving Drupal Taxonomy Editor
Improving Drupal Taxonomy EditorImproving Drupal Taxonomy Editor
Improving Drupal Taxonomy Editor
 

2014.07.22 shorthouse

Editor's Notes

  1. What is Canadensys History, funding, goals
  2. DwC-A harvester: semi-automated from any IPT
  3. -todo: ensure presence of field, not merely that a field has content or is empty -todo: validate fields when they are split across fields (eg dates that are present in separate year, month, day fields) -todo: evaluate a field based on content of other fields (eg is datum provided when there is a latitude and longitude) -architecture is an evaluation chain, may have subchains -500,000 records in 5s -application, as web site, as a service that can be extended and reused -core of the codebase is format agnostic, ie although designed for use with DarwinCore Archives as input, there’s nothing in the code that precludes implementation elsewhere (eg integrated within the IPT to be used prior to data publication as a mechanism to ensure Data Quality Numerical values Invalid characters Blank values Unique values (occurrenceID) Adherence to controlled vocabularies