Your SlideShare is downloading. ×
Data Library Services In The Data Stewardship Lifecycle
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Data Library Services In The Data Stewardship Lifecycle


Published on

This presentation has been prepared for the 2009 TICER workshop at Tilburg University.

This presentation has been prepared for the 2009 TICER workshop at Tilburg University.

Published in: Education, Business, Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data Library Services in the Data Stewardship Lifecycle Charles (Chuck) Humphrey University of Alberta
  • 2. Outline
    • Canada as a case study for data library services: a twenty-year experiment
    • Lessons learned from Canada
      • General observations about forces shaping data library services
      • Data and other digital collections
      • The data “continuum of access” in collection development
      • Data reference and technical services
    • Planning levels of service for data libraries
      • Applying a data stewardship lifecycle model
  • 3. The Canadian experience 1970 1980 1990 2000 2010 Introduction of public use data products from the 1971 Census in digital format A set of the 1981 Census data products cost~$12,000 The cost of 1986 Census data products > $200,000 CARL Census Data Consortium was formed in 1989 The “Modern” Census Era
  • 4. The Canadian experience
    • 1989 is a benchmark year in the development of data library services in Canada, which arose out of a response to Statistics Canada’s new pricing policy mandated by the Conservative government in power.
    1970 1980 1990 2000 2010 CARL Census Data Consortium was formed in 1989
  • 5.
    • Data Library Context in 1989
    • 8 data libraries
    • 3 in the west
    • 5 in the east
    • 3 in libraries
    • 2 in academic
    • computing
    • centres
    • 2 in research
    • centres
    • 1 library hybrid
    • Data Library Context in 1989
    • 8 data libraries
    • 3 in the west
    • 5 in the east
    • 3 in libraries
    • 2 in academic
    • computing
    • centres
    • 2 in research
    • centres
    • 1 library hybrid
  • 6.
    • Data Library Context in 1989
    • 8 data libraries
    • 3 in the west
    • 5 in the east
    • 3 in libraries
    • 2 in academic
    • computing
    • centres
    • 2 in research
    • centres
    • 1 library hybrid
    The closest data library to my service was 1,200 km away.
  • 7.
    • Data Library Context in 2009
    • 75 data library services
    • 25 in the west
    • 50 in the east
    • All 75 are located in libraries
  • 8. Changes between 1989 and 1998 CARL Census Data Consortium, 1989 CARL General Social Survey Microdata Consortium,1991 COPPUL/ICPSR Federation, 1993 CARL Data Consortium for the 1991 Census, 1994 Data Liberation Initiative Pilot Launched, 1996 Annual COPPUL Data Service Training Workshops, 1992 Ontario-Quebec/ICPSR Federation, 1994 DLI Regional Training Workshops, 1997 1989 1994 1999 2004 2009
  • 9. Changes between 1999 and 2009 1999 2004 2009 Research Data Centre Network, 2000 National Data Archive Consultation, 2001-2002 Consultation on Access to Scientific Research Data, 2005 Research Data Strategy Working Group, 2008 DLI Train the Trainers Workshop, 2004 Canadian Digital Information Strategy, 2007
  • 10. General lessons from Canada
    • Collections were a driving force behind libraries introducing data library services.
    • Institutions working through cooperative arrangements helped introduce data as a library resource.
    • Collection development at the local level was largely driven by the general availability of data.
    • Training has been an ongoing factor in the continued participation of libraries in data collection initiatives.
  • 11. General lessons from Canada
    • Peer-to-peer training has been an effective method in DLI, using the general rule that as you learn, you teach others.
    • Training has allowed for differences in data needs and data cultures across institutions and regions in the country.
    • Training opportunities have been continuous through annual regional workshops. DLI workshops have become an expectation.
    • IASSIST conferences provide an immersion to data services and should be viewed as a training opportunity.
  • 12. General lessons from Canada
    • National consultations and international pressures have made data a well-discussed topic in this decade but have failed to make data a political priority.
    • While everyone seems to be talking about data, few are actually doing anything to address concerns about data access and preservation.
    • Part of the inability to mobilise a collective response to data access and preservation in Canada is the absence of a forum for data people to plan and coordinate work together. The Research Data Strategy Working Group is a first attempt at this.
  • 13. Collection development
    • Data collections are part of a growing number of digital collections being managed in today’s libraries.
    • Libraries face buying or leasing these collections, producing their own through digitisation projects, or serving as stewards for collections that are entrusted with them.
    • In Canada, data collections have tended to be leased. Most data licenses require that the producer’s data must be destroyed once a lease is terminated. One result is that these leases have become long-term commitments by libraries.
  • 14. Collection development
    • With leased data, the role of a data librarian becomes one of managing the contractual relationship between data producers and her or his local institution.
    • Collection development consists of choosing data producers that have data corresponding to patron needs on campus. Often these omnibus collections have a mix of data that will support a variety of research interests. One can characterise these as “collections of access.”
    • One strategy in Canada has been to select data collections that support a continuum of access to products.
  • 15. Continuum of access Open access Free access Published statistics Restricted access Expensive access Confidential data Conditional access Fees for access Anonymised data Aggregate databases
  • 16. Continuum of access Web Access Data Enclave Access Web and in person Access Open access Free access Published statistics Restricted access Expensive access Confidential data Conditional access Fees for access Anonymised data Aggregate databases
  • 17. Collection preservation
    • Unlike many other countries that have national data archives, Canada is without an institution providing stewardship for the long-term preservation of and access to data.
    • This institutional gap in Canada is now being addressed by proposals to establish trusted data repositories in some universities. The goal is to build a network of repositories nationally to preserve data collections.
    • Data libraries, to a limited extent, have helped fill the gap in the absence of a national data archive.
  • 18. Data reference
    • Data reference is dependent on the level of service being supported, which can include:
      • locating data that has been requested by title,
      • finding data to support a line of research enquiry,
      • interpreting data documentation,
      • extracting subsets of data and providing the data in a format directly useable by a patron,
      • merging and manipulating data files to produce new data products;
      • providing advice to researchers throughout a project on metadata and data management.
  • 19. Technical services
    • Metadata for data collections should include (i) a general item description and (ii) a detailed content description that documents the data for machine processing as well as human understanding.
    • A general item description in MARC format is typically produced for online catalogues and may be generated by the data producer or local bibliographic services.
    • The detailed content metadata is generated by the data producer and can be delivered in a variety of formats and in conventions that often are not based on standards.
  • 20. Technical services
    • The computing support will depend on the level of service being provided, just as collection development and data reference services depend of a defined level of service.
    • Web 2.0 services for data delivery are tempered by the license agreements with data producers. Typically, institutions are required to use IDs and passwords to access data holdings.
    • As federated authentication systems become more widely shared across institutions, redundant storage of data collections will lessen as institutions share the physical storage of data.
  • 21. Planning levels of service
    • The importance of levels of service has been mentioned repeatedly in the context of data collections, data reference and technical services.
    • What are the options for levels of service? How does one go about planning for levels of service? What co-dependencies must be including in data service plans?
    • These questions can be addressed using a new framework based on the data stewardship lifecycle.
  • 22. A framework for planning data services
    • The concept of data stewardship in combination with a lifecycle model of data provides a useful tool for planning data library services.
    • Data stewardship identifies the roles and responsibilities of all individuals and groups engaged in the production, access and preservation of data throughout its lifecycle.
    • A data service plan should clearly state the roles and responsibilities identified with the level of service to be supported.
  • 23. A framework for planning data services
    • The lifecycle of data is a representation of the various stages through which data flow from production to use to preservation to new uses.
    • Each stage consists of a set of related activities that culminate in a significant product, which is then passed to a subsequent stage.
    • By linking together a series of stages in logical sequence, the processes of data production and use are described.
  • 24. Lifecycle of data
    • As with any project management operation, the views of a project vary depending on the granularity at which activities are described.
    • Similarly, the stages in the data’s lifecycle can be aggregated or disaggregated into larger or smaller groupings, depending on the viewpoint one desires.
    • Keep these points in mind while examining a couple of lifecycle representations.
    • The first model is the widely circulated DCC curation lifecycle.
  • 25.
  • 26. Data Services
  • 27. Data Services
  • 28.
    • This table lists changes to the stages in the DCC model, re-aggregating activities in the lifecycle to create a data library viewpoint.
    Data Services repurpose transform discovery data repository ingest, store, access and use dissemination appraisal and select data production create or receive Data Lib DCC
  • 29. Data lifecycle for data libraries Data Repurposing Data Production Data Repository Data Dissemination Data Discovery
  • 30. Data production
    • Stewardship role:
      • Responsible for the terms of data use specified in the license of the data producer;
      • Serve as lifecycle advisor to local data producers.
    • Potential data services activities:
      • Help local project managers develop data plans incorporating a lifecycle perspective;
      • Provide researchers who are collecting data on human subjects with support statements for their ethics approval applications;
      • Provide researchers with support statements on data management in grant applications;
  • 31. Data production
      • Provide feedback to data producers about data in demand on campus;
      • Provide data producers with usage statistics on their data;
      • Assist with literature and data searches in the study design stage;
      • Consult with local data producers on metadata standards for data documentation;
      • Organise training on the DDI metadata standard;
      • Provide data preservation services throughout the data production stage.
  • 32. Data dissemination
    • Stewardship role:
      • Responsible for communicating the terms of the license with patrons;
      • Ensure the data products that are delivered are complete, documented and machine readable;
      • Ensure the appropriate level of security is maintained for the data.
    • Potential data services activities:
      • Monitor the release dates of data from producers;
      • Acquire data and metadata from data producers;
      • Prepare catalogue records for data titles;
  • 33. Data dissemination
      • Develop and maintain local access to data, providing formats appropriate for local needs;
      • Support infrastructure that provides online access to data;
      • Provide data dissemination services for researchers on campus;
      • Provide data anonymisation services for human-subject data collected on campus;
      • Coordinate the deposit of local research data with a data archive or repository.
  • 34. Data repository
    • Stewardship role:
      • Responsible for the data collection in local repository;
      • Responsible for service plan and operation of local repository;
      • Responsible for metadata practices in local repository.
    • Potential data services activities:
      • Prepare and implement a data collection development plan;
      • Acquire and ingest data from producers;
      • Ensure data product authenticity;
  • 35. Data repository
      • Appraise, select and ingest data originating on your campus into the repository;
      • Provide services to support the use of data, including help with data extractions, reformatting and subsetting;
      • Manage the collection of data and metadata, including refreshing digital media and migrating data to new digital media;
      • Coordinate activities with the local Institutional Repository;
      • Achieve and maintain “trusted” repository status.
  • 36. Data discovery
    • Stewardship role:
      • Responsible for providing each patron with a comprehensive data reference interview or, when appropriate, for making an informed referral.
    • Potential data services activities:
      • Provide reference services to assist patrons in their search for data;
      • Produce and maintain metadata services to help find data (may involve metadata production, loading records into local OPAC, supporting Nesstar or Dataverse service, etc.);
  • 37. Data discovery
      • Conduct data literacy training for library colleagues and establish the grounds for informed referrals to data services;
      • Conduct data literacy training for patrons;
      • Promote data citation practices on your campus as part of the ethics of academic integrity;
      • Contribute to the development of new tools to exploit metadata.
  • 38. Data repurposing
    • Stewardship role:
      • Responsible for ensuring permissions are in place for repurposing data;
      • Ensure the deposit of new data in a repository.
    • Potential data services activities:
      • Help patrons mine metadata to discover data for repurposing;
      • Provide technical support to help patrons merge data from multiple sources for new purposes;
  • 39. Data repurposing
      • Participate in projects seeking new ways of exploiting metadata;
      • Integrate the production of metadata into data repurposing practices;
      • Engage in tools development to support data mining and data visualisation.
  • 40. Across all stages of the lifecycle
    • Work with data producers on campus to establish roles and responsibilities for the long-term access and preservation of data by clarifying stewardship roles.
    • Work to ensure that information gaps do not occur throughout the lifecycle, such as files getting stranded on hard drives.
  • 41. Start a service appropriate for today
    • A data library does not have to begin with a large mandate but can be tailored to the level of support that can be maintained and staffed today.
    • Having said this, plan for expansion! The services offered by a data library have a way of generating new expectations that will require further resources, spanning staff, training and infrastructure.
    • The returns on the service will surprise you!