• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Management for Librarians: An Introduction
 

Data Management for Librarians: An Introduction

on

  • 1,116 views

Slides from a training session given to librarians on data management. The session was intended to help librarians to consider the challenges associated with maintaining research data and steps that ...

Slides from a training session given to librarians on data management. The session was intended to help librarians to consider the challenges associated with maintaining research data and steps that may be taken to address these issues. It was also used to discuss their role in supporting data management activities within LSHTM

Statistics

Views

Total Views
1,116
Views on SlideShare
1,025
Embed Views
91

Actions

Likes
4
Downloads
15
Comments
0

2 Embeds 91

https://twitter.com 46
http://www.scoop.it 45

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Data may refer to physical and digital artefacts
  • http://www.flickr.com/photos/marc_smith/5943394090/
  • http://www.flickr.com/photos/johnhurn/2419971258/ http://www.flickr.com/photos/nonny/199568095/ http://www.flickr.com/photos/calotype46/6683293291/ http://www.flickr.com/photos/eq/4990131757/
  • http://www.flickr.com/photos/johnhurn/2419971258/ http://www.flickr.com/photos/nonny/199568095/ http://www.flickr.com/photos/calotype46/6683293291/ http://www.flickr.com/photos/eq/4990131757/
  • 117 respondents
  • Provide first point of contact for library visitors
  • Provide first point of contact for library visitors
  • Strategy is format conversion, computer museum, emulation
  • Strategy is format conversion, computer museum, emulation
  • A good file name is contextual, tailored to research needs

Data Management for Librarians: An Introduction Data Management for Librarians: An Introduction Presentation Transcript

  • Data Management for Librarians: An Introduction February 19th 2013 Gareth Knight Manager RDM Support Service
  • What is Data? “Data are facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or  analysed, experimental or observational.“ http://research.unimelb.edu.au/integrity/conduct/data/review May originate from various sources:  Primary and/or secondary May contain different content: Quantitative and/or qualitative May be expressed in different forms: Datasets, still images, audio‐video, audio recordings, interactive resources   May be held in a number of variations: Raw, cleaned, anonymised/pseudomised, analysed May be encoded in different formats: MS Excel, TIFF, MPEG2, STATA, FoxPro What type of data do you have at home?
  • Data in the Research Lifecycle Brainstorm Finalise &  Develop  submit Proposal Write‐up  Plan Project Results Perform  Research
  • Data in the Research Lifecycle Brainstorm Finalise &  Develop  Produce Data  Develop  submit Proposal Management  Proposal Plan Write‐up  Plan Project Results Perform  Research
  • Data in the Research Lifecycle Brainstorm Finalise &  Develop  submit Proposal Write‐up  Plan  Results Project Perform  Perform  Research Research Create /  Share Reuse Describe Analyse Store
  • Data in the Research Lifecycle Share Brainstorm Finalise &  Finalise &  Develop  submit submit Proposal Archive Write‐up  Plan  Results Project Perform  Perform  Research Research Create /  Share Reuse Describe Analyse Store
  • What is Data Management?1. Plan • Determine requirements • Identify risks & opportunities • Decide approach2. Implement3. Monitor • Evaluate approach • Change approach/perform  corrective action4. Evaluate • Is it Fit for purpose? • What additional action is  needed? ‘Benign neglect’ and Poorly‐made decisions in short‐term will have long‐term implications
  • Short-term decisions with long-term implications Software products File formats & standardsData organisation & labelling Quality Controls
  • Why does data need to be managed?Ensure data can be located Enable analysis Interesting paper. Where’s the data? Ability to understand for Enable sharing & validation current and future need
  • Why does data need to be managed?Ensure data can be located Enable analysis Comply with Funder & School requirements Interesting paper. Where’s the data? Ability to understand for Enable sharing & validation current and future need
  • Researcher Challenges Issues/challenges encountered when creating, managing, and sharing research data (web survey results) Other challenges • Database creation & management • Storage of physical questionnairesResponse Type • Lack of time Multiple choice • Software instability (particularlycheckbox + free NVivo) text for other • Ability to enter & access data at challenges different locations
  • Training NeedsInterest in training on topics related to data management (web survey results) Note: Graph omits percentages for other responses (None, slight, moderate, no opinion)
  • RDM Support Service Location of Library staff
  • RDM Support Service Role of Library staff Provide first point of contact Help researchers to express  requirements & needs Direct to potential solution (staff,  website) Contribute to training activities Incorporate data considerations  into teaching Location of Library staff
  • Data Access Over Time digital vs. analogue “traditionally, preserving things meant keeping them unchanged;  however … if we hold on to digital information without  modifications, accessing the information will become increasingly  more difficult, if not impossible.” Su‐Shing Chen, 2001 + + + =data computer OS application information content
  • Change in Process over Time Intel PC, 2000 Mac laptop, 2006 X64 Ubuntu laptop, 2010 operating software information hardware system application content
  • Change in Process over Time Intel PC, 2000 Mac laptop, 2006 X64 Ubuntu laptop, 2010 operating software information hardware system application content
  • Task• Select two of the following problems when managing digital data: 1. Difficulty locating data 2. Difficulty accessing media 3. Difficulty rendering data in an understandable form 4. Difficulty recreating data as originally intended 5. Difficulty understanding information content 6. Uncertain provenanceConsider the following questions:a. In what circumstances will the chosen problem occur?b. What consequences may occur if the problem occurs (e.g. financial  implications)c. How could you ensure that the problem doesn’t occur?d. What could you do to resolve the problem after it has occurred? (Can direct to someone for help)
  • 1. Difficulty Locating Data Problem “I created some data 5 years ago. Where is it?”“I’ve lost my original disk. Do I have the data elsewhere? Scenarios & Reasons Loss of storage media Lots of data stored in many locations Vague filenames make it difficult to locate (Potential) SolutionsPreventative:• Copy data to several storage devices – increase likelihood of finding itPost event:• Find better discovery software?• Attempt to recreate content?
  • 2. Difficulty accessing Media Problem “How do I access this old media?” “Why can’t I read this disk?” Scenario & Reasons Media obsolescence Physical deterioration & failure (Potential) SolutionsPreventative:• Copy data to several storage devices• Transfer data to new storage media on obsolescence / every 3 years• Deposit data into a data archive and/or copy to serverPost event:• Data recovery software
  • Potential Storage Locations Pros: Local machine &  Cheap, high capacity storage, fast access Storage Cons: Lack of support; potential for theft, loss, or  damage Pros: Recommended Academic Storage  Automatic monitoring & backup, multiple  Systems redundancy, remote access, secure (if required) Cons: Limited space allocation, Not always accessible  overseas Third party service  Pros: providers Automated backup, accessible in diff. countries  (usually) Cons: Security concerns, ownership concerns, services  can close account at any time http://www.flickr.com/photos/m0n0/4479450696/
  • 3. Difficulty Rendering Data Problem “How can I view data? “Where do I find software to access my data?” Scenarios & Reasons Software obsolescence New software use different decoding method (Potential) SolutionsPreventative:• Transform data to new formats (format conversion strategy)• Maintain original machine and software to access content (computer museum)Post event:• Track down original software product• Emulate original environment (emulation/virtualisation)
  • Choosing File Formats Creation Preservation DisseminationContent Type Preferred Format Acceptable Alternatives Documents Rich Text Format Microsoft DocX Open Document Format Still Images TIFF PNG, JPEG 2000 (uncompressed) RAW Audio Wav format MP3 AIFF FLAC AudioVideo MPEG2, MPEG4 When working with multiple copies, decide which is the master copy
  • 4. Difficulty Maintaining Authenticity Problem “Why does my data look different?” Scenarios & ReasonsNew version of software application use different  decoding method Different software application in use (Potential) SolutionsPreventative:• Determine significant properties that should be maintained• Maintain original machine and software to access content (computer museum)Post event:• Emulate original environment (emulation/virtualisation)
  • 5. Difficulty Understanding Content Problem “Where was this information created? Why did the creator make this decision? “What does this value mean?” “How does this data relate to other content?” Scenarios & ReasonsMemory fails – cannot remember decisions made Disorganised and poorly labelled data Lack of documentation (Potential) Solutions• Organise data (Chronology, Experiment type,  location, content type) Does a Rosetta stone exist• Adopt labelling conventions for your data? • Documentation
  • Filename conventions• Consider the elements that will help you to organise and locate  content – E.g. Participant ID, site of data collection,date of data collection• Consider how data files and directories may be organised & sorted – 001, 002, 003, 004, can be used for sequential files – YYYY‐MM‐DD (2012‐12‐04) useful for organising by date (use year first)• Identify different versions of content in filename (and in content) – Creation date (YY‐MM‐DD) – Version/draft number• Consider how your filenames will look to others – Avoid spaces ‐ ‘My file.pdf’ becomes ‘My%20file.pdf’ on the web – Avoid capitalisation ‐ Alters file sorting & CAUSES HEADACHES! Golden Rule: Be Consistent
  • Data Documentation What would someone want to know if they were looking at your data the first time?1. What is the context of creation?• Why did you create it? For what purpose?• What methodology did you use? What assumptions were made?• Who is the target audience?2. Collection and set of files:• What information does each file contain?• When was it created?• By whom?• What actions were performed?• How does the data contained in the collection relate to each other?3. Individual components• What is the meaning of this word/column/row, etc.?• How are these items measured?• What are the boundaries of the measurement?
  • 6. Uncertain Provenance Problem1. “When was the data created and/or modified?”2. “Who created/modified the data?”3. “Why was it created and/or modified? Scenarios & Reasons• Lack/Loss of trust in information content• Reluctance to use information content (Potential) Solutions Preventative: • Limit update to authorised users only • Store change history • Keep each version Post event: • Locate data creator & editor?
  • Things to RecommendAdvise researchers to:1. Choose an appropriate storage location and create backups2. Organise data in a consistent and logical manner3. Document the data and information content (as well as structure)4. Consider how you will ensure that information can be accessed in  the long‐term5. Consider potential for data sharing and ensure it is performed with  consideration of ethics  
  • A Few Good References• Digital Curation Centre http://www.dcc.ac.uk/resources• MANTRA – Data Management training for PhD students http://datalib.edina.ac.uk/mantra/• UK Data Archive – Managing and Sharing Data http://www.data‐archive.ac.uk/media/2894/managingsharing.pdf• Cambridge University – RDM Guidance http://www.lib.cam.ac.uk/dataman/index.html• Australia National Data Service http://ands.org.au/resource/data‐management‐planning.html• LSHTM Research Data Management Support Service• http://blogs.lshtm.ac.uk/rdmss/