Brian Westra
 University of Oregon
bwestra@uoregon.edu
Data services needs assessment: 2009-2010

Interviewed 25 faculty:

Biology
Center for Advanced Materials Characterization at Oregon
Chemistry
Computer & Information Science
Geological Sciences
Human Physiology
Institute for a Sustainable Environment
Museum of Natural and Cultural History
Physics
Psychology
o   Connecting data sources to data viewing and
    usage
o   Data organization
o   Metadata/annotation of files
o   Recording workflow, procedures, provenance

Preservation, archiving and publishing data
were farther down the list
Clearly articulated need and opportunity;
also tie-in to data management plan
implementations

Logical extension of the role for libraries
beyond traditional services

Support for e-Science is a goal

Working in the data lifecycle/ecosystem is more
robust than ‗just‘ archiving/preservation
Maintaining, preserving and adding value to
digital research data throughout its lifecycle.




http://www.dcc.ac.uk/digital-curation/what-digital-curation
File management tools: i.e., Sharepoint

Best practices: naming conventions, version
control software

Are there other solutions or services?
Going beyond file management systems to
embedded, more holistic tools/systems:

o   Electronic Lab Notebooks

o   Content/format-specific data management
    software
―…how a laboratory tracks and manages its
information resources, particularly the data
that represents the laboratory‘s product.‖
(Avery, McGee, & Falk, 2000)


―a data and sample management system that is
designed to improve the management of
laboratory workflow‖ (―Clinical LIMS,‖ 2011)

Most basic function: sample handling and
reporting.
Data (create, store, share, organize, analyze)
                           +
                 information (notes)

May include: sample handling, storeroom inventory,
signatures, collaboration, protocols and SOPs,
embedded workflows, data analysis and
visualization

LIMS and ELN functions and features often overlap
Many of them! UWisconsin-Madison RFI responses
included these vendors:

 o Accelrys
 o Agilent
 o Amphora
 o Axiope
 o Contur
 o IDBS
 o Kinematik
 o Labtrack
 o Notebookmaker
 o Rescentris
 o Waters
Continuously changing field of vendors and
products

 o Nature article

 o Other options: open source, or a mix of basic tools,
   often used in open science
Some UO considerations:

o   Academic audience (vs. FDA compliance)
o   Cost – S/W, hardware, sys-admin, training
o   Interface and ease of use
o   Account management
o   Platform
o   Research domain integration*
o   Metadata support*
o   Data file management*

*curation characteristics
o   Research domain
    o Workflow integration with analytical tools, methods
    o Data capture from typical hardware/sources
    o Ontologies

o   Metadata
    o Capture/extraction
    o Representation, standards
    o Export with files

o   Data file management
    o   File format standards, transformations
    o   Export options
    o   Metadata
    o   Provenance, version control
    o   Archiving raw and derivatives
Wisconsin-Madison RFI

o   Some highlights from an excellent list of
    considerations

o   Good process

o   Plan to field test with 60 participants
What might be your ―make or break‖ issues?

How would you assign weights or ranking to
the metrics?
1. Costs
2. Platform
3. Product lock-in
4. etc.
‗Ground truth‘ the
metrics and
values/comparators

Satellite or high-altitude
(pre-pilot) might not
conform to on the ground
(during the pilot)

                             http://www.seawead.org/index.php?option=c
                             om_content&view=article&id=29:ground-
                             truthing&catid=9&Itemid=9
Have realistic team work load and timeline
expectations

It‘s progress! It may be difficult to apply
measures of curation capacity to an ELN

 o Archiving and preservation capacity
 o Exportable relational (semantic) representation
 o Publication of data
It may be more realistic to ask:

o   Will this help you (the PI) find and understand the
    data and notes this week/ next year/after the
    student is gone?

o   Can this improve your ability to do data
    management (and write a better plan for the next
    grant proposal)?

o   Is it simple enough that it will become part of the
    routine?
    i.e., folklore: info everyone knows but no one
    records
Example: publish direct to ChemSpider

Chemspider record

ELN data exchange project: Dial-a-molecule
A compelling reason for faculty to participate

Collaboration and coordination with
stakeholders (Office of Research, IT,
Libraries, research faculty, Tech Transfer)

Champion(s) – these are usually not easy or
inexpensive to implement, in the lab or with
limited budgets
What is the scope of a ―pilot case‖?
o Duration
o Number of participants
o Hardware capacity
o Level of training and support
o Evaluation criteria and roles
o Exit strategy – and dealing with success


Who‘s going to pay for this (right now)?

Might anticipate who is going pay for this (if it
works well and goes to production)
―Data you enter in the ELN software will be stored in a secure
location, however; at the end of the pilot period, the data will
be removed and we cannot guarantee that it can be recovered
fully from the ELN. Therefore, we very strongly encourage you
to keep an additional copy of all data and notebook entries in
electronic and/or hard copy format during the pilot as a backup
measure and as a means of keeping a complete and continuous
record of your work during the pilot period.‖


https://academictech.doit.wisc.edu/informed-consent-electronic-lab-notebook-pilot
Many biology labs produce a lot of still images
and video




                 Cresko lab - UO
Open Microscopy Environment (OME)-developed
system for image file management
Embeds/supports curation:

o   Uses a metadata standard for description (OME
    XML)
o   Employs file format standards (import to tiff)
o   Can archive raw and derivative files
o   Provides intuitive organizational schema
o   Annotation and description support on multiple
    levels
o   Export of files with metadata
video
It‘s open source – what is the level of
support/installation base? Longevity/stability?

How well does it fit into the workflow of the lab?

Can it support the proprietary formats generated
in the labs?

What are the IT/systems requirements?
Finding a host and participants

Establishing realistic expectations
o Host obligations
o Project scope
DCXL: Digital Curation for Excel

Discussion: what other options are you
exploring?
Avery, G., McGee, C., & Falk, S. (2000). Product Review: Implementing LIMS: A ―how-to‖ guide. Analytical
Chemistry, 72(1), 57 A-62 A. American Chemical Society. doi:10.1021/ac0027082

CIO Office, U. of W.-M. (n.d.). Charter 6.7: eLab Notebooks | CIO Office | UW-Madison. Retrieved February 9, 2012, from
http://www.cio.wisc.edu/plan-docs-Charter6-7.aspx

Clinical LIMS. (2011). Retrieved from http://www.scientificcomputing.com/product-IN-Clinical-LIMS-
072811.aspx?terms=LIMS

Giles, J. (2012). Going paperless: The digital lab. Nature, 481(7382), 430-1. doi:10.1038/481430a

PerkinElmer. (n.d.). PerkinElmer Informatics. Retrieved February 9, 2012, from http://www.cambridgesoft.com/?l=en

Rescentris. (n.d.). Rescentris | CERF Software. Retrieved February 9, 2012, from http://rescentris.com/cerf-software/
University of Dundee & Open Microscopy Environment. (n.d.). About OMERO — OME. Retrieved February 9, 2012, from
http://www.openmicroscopy.org/site/products/omero

University of Wisconsin-Madison. (2012). Informed Consent for Electronic Lab Notebook Pilot | Technology Solutions for
Teaching and Research. Retrieved February 9, 2012, from https://academictech.doit.wisc.edu/informed-consent-
electronic-lab-notebook-pilot

University of Wisconsin-Madison. (n.d.-a). Electronic Lab Notebooks | Technology Solutions for Teaching and Research.
Retrieved February 9, 2012, a from http://academictech.doit.wisc.edu/ideas/electronic-lab-notebooks

University of Wisconsin-Madison. (n.d.-b). Electronic Lab Notebook Request for Information - University of Wisconsin-
Madison. Retrieved February 9, 2012, b from https://academictech.doit.wisc.edu/files/115349rfi.pdf

Curation-Friendly Tools for the Scientific Researcher

  • 1.
    Brian Westra Universityof Oregon bwestra@uoregon.edu
  • 2.
    Data services needsassessment: 2009-2010 Interviewed 25 faculty: Biology Center for Advanced Materials Characterization at Oregon Chemistry Computer & Information Science Geological Sciences Human Physiology Institute for a Sustainable Environment Museum of Natural and Cultural History Physics Psychology
  • 3.
    o Connecting data sources to data viewing and usage o Data organization o Metadata/annotation of files o Recording workflow, procedures, provenance Preservation, archiving and publishing data were farther down the list
  • 4.
    Clearly articulated needand opportunity; also tie-in to data management plan implementations Logical extension of the role for libraries beyond traditional services Support for e-Science is a goal Working in the data lifecycle/ecosystem is more robust than ‗just‘ archiving/preservation
  • 5.
    Maintaining, preserving andadding value to digital research data throughout its lifecycle. http://www.dcc.ac.uk/digital-curation/what-digital-curation
  • 6.
    File management tools:i.e., Sharepoint Best practices: naming conventions, version control software Are there other solutions or services?
  • 7.
    Going beyond filemanagement systems to embedded, more holistic tools/systems: o Electronic Lab Notebooks o Content/format-specific data management software
  • 8.
    ―…how a laboratorytracks and manages its information resources, particularly the data that represents the laboratory‘s product.‖ (Avery, McGee, & Falk, 2000) ―a data and sample management system that is designed to improve the management of laboratory workflow‖ (―Clinical LIMS,‖ 2011) Most basic function: sample handling and reporting.
  • 9.
    Data (create, store,share, organize, analyze) + information (notes) May include: sample handling, storeroom inventory, signatures, collaboration, protocols and SOPs, embedded workflows, data analysis and visualization LIMS and ELN functions and features often overlap
  • 10.
    Many of them!UWisconsin-Madison RFI responses included these vendors: o Accelrys o Agilent o Amphora o Axiope o Contur o IDBS o Kinematik o Labtrack o Notebookmaker o Rescentris o Waters
  • 11.
    Continuously changing fieldof vendors and products o Nature article o Other options: open source, or a mix of basic tools, often used in open science
  • 12.
    Some UO considerations: o Academic audience (vs. FDA compliance) o Cost – S/W, hardware, sys-admin, training o Interface and ease of use o Account management o Platform o Research domain integration* o Metadata support* o Data file management* *curation characteristics
  • 13.
    o Research domain o Workflow integration with analytical tools, methods o Data capture from typical hardware/sources o Ontologies o Metadata o Capture/extraction o Representation, standards o Export with files o Data file management o File format standards, transformations o Export options o Metadata o Provenance, version control o Archiving raw and derivatives
  • 14.
    Wisconsin-Madison RFI o Some highlights from an excellent list of considerations o Good process o Plan to field test with 60 participants
  • 15.
    What might beyour ―make or break‖ issues? How would you assign weights or ranking to the metrics? 1. Costs 2. Platform 3. Product lock-in 4. etc.
  • 16.
    ‗Ground truth‘ the metricsand values/comparators Satellite or high-altitude (pre-pilot) might not conform to on the ground (during the pilot) http://www.seawead.org/index.php?option=c om_content&view=article&id=29:ground- truthing&catid=9&Itemid=9
  • 17.
    Have realistic teamwork load and timeline expectations It‘s progress! It may be difficult to apply measures of curation capacity to an ELN o Archiving and preservation capacity o Exportable relational (semantic) representation o Publication of data
  • 18.
    It may bemore realistic to ask: o Will this help you (the PI) find and understand the data and notes this week/ next year/after the student is gone? o Can this improve your ability to do data management (and write a better plan for the next grant proposal)? o Is it simple enough that it will become part of the routine? i.e., folklore: info everyone knows but no one records
  • 19.
    Example: publish directto ChemSpider Chemspider record ELN data exchange project: Dial-a-molecule
  • 20.
    A compelling reasonfor faculty to participate Collaboration and coordination with stakeholders (Office of Research, IT, Libraries, research faculty, Tech Transfer) Champion(s) – these are usually not easy or inexpensive to implement, in the lab or with limited budgets
  • 21.
    What is thescope of a ―pilot case‖? o Duration o Number of participants o Hardware capacity o Level of training and support o Evaluation criteria and roles o Exit strategy – and dealing with success Who‘s going to pay for this (right now)? Might anticipate who is going pay for this (if it works well and goes to production)
  • 22.
    ―Data you enterin the ELN software will be stored in a secure location, however; at the end of the pilot period, the data will be removed and we cannot guarantee that it can be recovered fully from the ELN. Therefore, we very strongly encourage you to keep an additional copy of all data and notebook entries in electronic and/or hard copy format during the pilot as a backup measure and as a means of keeping a complete and continuous record of your work during the pilot period.‖ https://academictech.doit.wisc.edu/informed-consent-electronic-lab-notebook-pilot
  • 23.
    Many biology labsproduce a lot of still images and video Cresko lab - UO
  • 24.
    Open Microscopy Environment(OME)-developed system for image file management
  • 25.
    Embeds/supports curation: o Uses a metadata standard for description (OME XML) o Employs file format standards (import to tiff) o Can archive raw and derivative files o Provides intuitive organizational schema o Annotation and description support on multiple levels o Export of files with metadata
  • 26.
  • 27.
    It‘s open source– what is the level of support/installation base? Longevity/stability? How well does it fit into the workflow of the lab? Can it support the proprietary formats generated in the labs? What are the IT/systems requirements?
  • 28.
    Finding a hostand participants Establishing realistic expectations o Host obligations o Project scope
  • 29.
    DCXL: Digital Curationfor Excel Discussion: what other options are you exploring?
  • 31.
    Avery, G., McGee,C., & Falk, S. (2000). Product Review: Implementing LIMS: A ―how-to‖ guide. Analytical Chemistry, 72(1), 57 A-62 A. American Chemical Society. doi:10.1021/ac0027082 CIO Office, U. of W.-M. (n.d.). Charter 6.7: eLab Notebooks | CIO Office | UW-Madison. Retrieved February 9, 2012, from http://www.cio.wisc.edu/plan-docs-Charter6-7.aspx Clinical LIMS. (2011). Retrieved from http://www.scientificcomputing.com/product-IN-Clinical-LIMS- 072811.aspx?terms=LIMS Giles, J. (2012). Going paperless: The digital lab. Nature, 481(7382), 430-1. doi:10.1038/481430a PerkinElmer. (n.d.). PerkinElmer Informatics. Retrieved February 9, 2012, from http://www.cambridgesoft.com/?l=en Rescentris. (n.d.). Rescentris | CERF Software. Retrieved February 9, 2012, from http://rescentris.com/cerf-software/ University of Dundee & Open Microscopy Environment. (n.d.). About OMERO — OME. Retrieved February 9, 2012, from http://www.openmicroscopy.org/site/products/omero University of Wisconsin-Madison. (2012). Informed Consent for Electronic Lab Notebook Pilot | Technology Solutions for Teaching and Research. Retrieved February 9, 2012, from https://academictech.doit.wisc.edu/informed-consent- electronic-lab-notebook-pilot University of Wisconsin-Madison. (n.d.-a). Electronic Lab Notebooks | Technology Solutions for Teaching and Research. Retrieved February 9, 2012, a from http://academictech.doit.wisc.edu/ideas/electronic-lab-notebooks University of Wisconsin-Madison. (n.d.-b). Electronic Lab Notebook Request for Information - University of Wisconsin- Madison. Retrieved February 9, 2012, b from https://academictech.doit.wisc.edu/files/115349rfi.pdf