Introduction to Research Data Management


Published on

Presented by Stuart Macdonald at RDM Training, 7/11/2012, University of Edinburgh, School of Geosciences

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 25 years ago disk storage - expensive researchers interested in working with data came together to petition the PLU and the University’s Library – wanting a university-wide provision for files that were too large to be stored on individual computing accounts Early holdings were research data from universities of edinburgh, glasgow, and strathclyde
  • Primarily social sciences but not exclusively so, large scale government surveys (micro data), macro-economic time series data (country-level data), Elections studies, Geospatial data, financial datasets, population census data Free on internet / subscription / through national data centres/archives / resource discovery portals Registration / authorisaiton and authentication / special conditions / budget to pay for data SPSS, STATS, SAS, R, ArcGIS – interpret documentaiton/codebooks, merge and match users data with other data (via look-up tables), subset data Data Catalogue
  • Training for postgraduates and early career researchers   These  were  the  School  of  Divinity,  School  of  History,  Classics  and  Archaeology),  School of Biomedical Sciences),  (School  of  Molecular  and  Clinical  Medicine),   (School  of  Physics  and  Astronomy).  Also,  the  School  of  Geosciences
  • Digital Curation centre, Data Library, Information Services Infrastructure, Research Computing, Library & Collections Concern is both for the shorter term – ensuring competitive advantage through secure and easy-to-use access, and for the longer term – ensuring enduring access and usability to the research community into the future and compliance with legislation. 2 working groups RDS working group RDM working group
  • Funded by JISC as part of its UK programme, Managing Research Data to develop online learning materials to assist researchers manage their digital assets. IAD – set up to deliver training and development for postgraduate students and staff – via online course, Virtual Learning Environments, transferable skills training
  • A set of Multi- or Cross-Disciplinary online learning resources FRUIT principles – Fun Relevant Useful Interesting Timely
  • Shareable Content Object Reference Model – XML-based
  • JorumOpen - national OER repository
  • What about preserving?
  • Observational – sensor data, survey or sample data, neuroimages – e.g. ocean temperature, voters attitudes before an election, photographs of a supernova Experimental – e.g. gene sequences, chromatograms, toroid magnetic field data, HPLC, gel electrophoresis, chemical reaction rates, Simulation – e.g. climate models, economic models, algorithms Derived – e.g. text and data mining, compiled database, 3D models, maps Reference - e.g. gene sequence databanks, chemical structures, spatial data portals
  • BioData Blog “ Documenting data may seem like a tedious, wasteful step, but each researcher must think of its long-term benefits ” - methodologies, workflows, procedures, recording conditions etc
  • Introduction to Research Data Management

    1. 1. Introduction to ResearchData ManagementStuart MacdonaldEDINA & Data Training, School of Geosciences, 7 November 2012
    2. 2. • Background• Data Library Services & Projects• Research Data MANTRA• What is RDM – Research Data Defined – Data Management Planning – Organising Data – File Formats & Transformations – Documentation & Metadata – Storage & Security – Data protection & Rights – Preservation & Sharing
    3. 3. BackgroundEDINA and University Data Library (EDL) togetherare a division within Information Services of theUniversity of Edinburgh.EDINA is a JISC-funded National Data Centreproviding national online resources for educationand research - url: Data Library assists Edinburgh Universityusers in the discovery, access, use andmanagement of research datasets - url:
    4. 4. Data Library Services and Projects• Data Library & Consultancy• Edinburgh DataShare• JISC-funded projects – DISC-UK DataShare (2007-2009) – Data Audit Framework Implementation (2008) – Research Data MANTRA (2010- 2011)
    5. 5. Data Library & Consultancy• finding…• accessing …• using …• teaching …• managingBuilding relationships with researchers viaPG teaching activities, research support projects,IS Skills workshops, Research Data Managementtraining and through traditional reference interviews.
    6. 6. Edinburgh DataShare:url: online institutional repository of multi-disciplinaryresearch datasets produced by University researchers,hosted by the Data Library.Researchers producing research data associated with apublication, or which has Re-use potential, can upload theirdataset for sharing and safekeeping. A persistent identifierand suggested citation will be provided.DataShare is a customised DSpace instance with a selectionof standards-compliant metadata fields to aid discoverythrough Google and other search engines via OAI-PMH.
    7. 7. Edinburgh Data Audit Framework(DAF) Implementation(May – Dec 2008) A JISC-funded pilot project produced 6 case studies from research units across the University in identifying research data assets and assessing their management, using DAF methodology developed by the Digital Curation Centre. 4 main outcomes: • Develop online RDM guidance • Develop university research data management policy • Develop services & support for RDM (in partnership IS) • Develop RDM training
    8. 8. Research DataManagement WebGuidanceOnline suite of web pages for ISwebsite developed in 2009 –recently rationalised andrevamped (Oct. 2012)url:
    9. 9. University Research DataManagement Policy In spring 2010, a review commenced at the University to address the issue of managing the rapidly expanding volume and complexity of data produced by Edinburgh researchers. The Review was overseen by the IT & Library Committee and had twin tracks to look at Data Storage, and Data Management, Curation and Preservation. The Review looked at current practice in the University, in peer universities & internationally. Championed by Vice-Principal & Chief Information Officer Prof. Jeff Haywood the policy for management of research data was approved by the University Court on 16 May, 2011. One of the first RDM policies in a UK tertiary education Institution.
    10. 10. IS RDM RoadmapDrivers: University research data management policyand EPSRC request that all institutions in receipt of theirfunding should develop a roadmap for research datamanagement (to be implemented by May 1st 2015).Information Services (IS) has committed to an RDMRoadmap over an 18 month period (July 2012-Jan. 2014)across four strategic areas.The Roadmap will help to engage academic units andPIs in research data management and provide servicesto implement the University’s RDM Policy.The Roadmap is a cross-divisional goal of IS supportedby: DCC, EDINA & Data Library, User Services, Library& Collections, IT Infrastructure.
    11. 11. Research Data MANTRA
    12. 12. Research Data MANTRAPartnership between:Edinburgh University Data LibraryInstitute for AcademicDevelopmentFunded by JISC Managing ResearchData Programme (Sept. 2010 – Aug.2011)
    13. 13. Why ManageResearch Data?Data Deluge – exponential growth in thevolume of digital research artifacts createdwithin academia.Data management is one of the essentialareas of responsible conduct of research.
    14. 14. Project OverviewGrounded in three disciplinary contexts: social science,clinical psychology and geoscience.Aim was to develop online interactive open learningresources for PhD students and early careerresearchers that will: • Raise awareness of the key issues related to research data management & contribute to culture change. • Provide guidelines for good practice. Selling RDM as a Transferrable Skill. (voluntary participation)
    15. 15. Online Learning ModuleEight units with activities, scenarios and videos:• Research data explained• Data management plans• Organising data• File formats and transformation• Documentation and metadata• Storage and security• Data protection, rights and access• Preservation, sharing and licensingFour data handling practicals: SPSS, NVivo, R, ArcGISVideo stories from researchers in variety of settingsXerte Online Toolkits – University of Nottingham
    16. 16. MANTRA & Research Data Lifecycleurl:
    17. 17. Online Learning Module• Delivered online – self-paced, available ‘anytime, anyplace’• Emphasis on practical experience and active engagement via online activities• One hour per unit• Read and work through scenarios & activities (incl. videos etc)• CC licence to allow manipulation of content for re-use with attribution• Portable content in open standard formats (e.g. SCORM)
    18. 18. MANTRA Dissemination• Learning materials deposited with an openlicence in JorumOpen & Xpert.• Learning materials to be embedded in threeparticipating postgraduate programmes andmade available through IAD programme for useby all postgraduate students and early careerresearchers.• Website:• Download/re-brand/re-purpose materialsfrom JorumOpen in standards compliantsformats.• Software modules – data handling practicals(MS Word)
    19. 19. End of Part One!Questions?
    20. 20. What is Research Data Management?• An umbrella terms to describe all aspects of planning, organising, documenting, storing and sharing research data.• It also takes into account issues such as documentation, data protection and confidentiality.• It provides a framework that supports researchers and their data throughout the course of their research and beyond.
    21. 21. * Research Information Network. “Stewardship of digital research data - principles and guidelines", 30 March 2007. Viewed 30 October 2012 Research Data Defined US Office of Management and Budget in its grants management circular A-110 defines research data as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” The KRDS2 study (Beagrie et al, 2009) define research data as ‘ collections of structured digital data from any disciplines or sources which can be used by academic researchers to undertake their research or provides an evidential record of their research.’ RIN Classification*: • Observational – real-time, unique, usually irreplaceable • Experimental – from lab equipment, expensive, often reproducible • Simulation – generated from models – model & metadata more important than output data • Derived or compiled – reproducible but expensive • Reference - a (static or organic) collection of smaller (peer- reviewed) datasets, most probably published and curated
    22. 22. Research Data Defined• Research data, unlike other information types, is collected, observed, or created, for purposes of analysis to produce original research results.• Research data can be generated for different purposes and through different processes in a multitude of digital formats.
    23. 23. Research data may include thefollowing:• Documents (text, MS Word), spreadsheets• Lab books, field notes, diaries• Questionnaires, transcripts, codebooks• Audiotapes, videotapes, photographs, images• Slides, artefacts, specimens, samples• Collection of digital objects acquired & generated during the research process• Database contents (video, audio, text, images)• Models, algorithms, scripts• Contents of an application (input, output, logfiles for analysis software, schemas)• Methodologies, workflows• SOPs, protocols
    24. 24. By managing your data you will:• ensure scientific integrity of research and aid replication• ensure research data and records are accurate, complete, authentic and reliable• increase your research efficiency• save time, effort and resources in the long run• enhance data security and minimise the risk of data loss• prevent duplication of effort by enabling others to use your data• meet funding council grant requirementsNote:It may also be important to manage research records (both digital & hardcopy) during and beyond the life of the project e.g. correspondence (emails); project files; grant applications; technical reports; research reports; consent forms; ethics applications.
    25. 25. Funders Policiesurl:
    26. 26. What Do Funders Want?• timely release of data - once patents are filed or on (acceptance for) publication.• open data sharing - minimal or no restrictions if possible.• preservation of data - typically 5-10+ years if of long-term value. See the RCUK Common Principles on data policy:
    27. 27. Data Management & Sharing PlansFive common questions asked by funders are: • What data will be created? (format, types, volumes etc) • What standards and methodologies will you use? • How will you manage ethics and Intellectual Property? • What are the plans for data sharing and access? • What is the strategy for long-term preservation? DCC’s DMP Online tool: How to write a DMP guide:
    28. 28. Data Management Plan. What is it?A DMP is a document which describes: What research data will be created. What policies (funding, institutional, legal) apply to the data. What data management practices (backups, storage, access control, archiving) will be used. What facilities and equipment will be required (hard-diskspace, backup server, repository). Who will own the copyright and have access to the data. Who will be responsible for each aspect of the plan. How its reuse will be enabled and long-term preservationensured after the original research is completed.The data management plan must be continuously maintainedand kept up-to-date throughout the course of research.
    29. 29. Why do we need one?It improves your research both now and later...•Data is often valuable for a long time!•Results of your research may outlast your degree.•Will you use your data throughout your career?•Loss of physical/digital data and records.•Loss of usefulness through records loss, media andsoftware obsolescence,•Forgetting stuff!Good practice → Better research
    30. 30. Why do we need one?•Ensure research integrity (and repeatability) through keepingbetter records.•People can trace your outcomes from data collection,through research methodology, through to results.•Maximises usefulness of data to fellow researchers.•Highlights how data was collected, quality controls, howpeople can and should use it (access and licensing), how youthen attribute people/projects.•Facilitates data use within collaboration.•Can help lead to subsequent research papers.
    31. 31. Getting started with a DMP Gain an understanding of terminology & issues. Gain understanding of your project/community – Supervisor and colleagues – People in your School, i.e. IT Officers, Graduate Research Coordinator... Talk to your supervisor about data authorship, IP, licensing,policies. Use a research data planning checklist. Keep it practical and simple, dont spend too much time. Whatyou dont know leave gaps, investigate, fill in later. Remember it is never finished! Review it regularly through thecourse of your research.
    32. 32. Organising your data•Research data files and folders need to be labelled andorganised in a systematic way so that they are bothidentifiable and accessible for current and future users.•Naming datasets according to agreed conventions shouldmake file naming easier for colleagues because they will nothave to ‘re-think’ the process each time.•One benefit of consistent research data file labelling is thatfiles are not accidentally overwritten or deleted.•It is important to consistently identify and distinguishversions of data files. This ensures that a clear audit trailexists for tracking the development of a data file andidentifying earlier versions when needed.
    33. 33. File Formats & Transformation• A file format encodes information in a computer file, enabling another program to access data within it• HTML and PDF are two examples of commonly used file format and may be identified by their suffixes .html and .pdf.• Files are based on either text or binary encoding. The former is both machine- and human-readable and the latter only readable by means of  appropriate software.• Thus text files are less likely to become obsolete. Examples of file name extensions for these files are .txt, .csv and .por. • If you convert or migrate your data files from one format to another, be aware of the potential risk of the loss or corruption of your data and take appropriate steps to avoid/minimise it.
    34. 34. File Formats & Transformation•When compressing  your data files for storage,transportation or transmission, you encode the informationusing fewer bits than the original representation. Commonlyused compression programs are  Zip and Tar.•You may use the process of data normalisation. This meansto convert data from one format (e.g. proprietary) into anotherfor use or preservation (e.g. ASCII).•You may also need to compute new  values from old in yourdata, a process which is called data transformation.•This may be necessary prior to analysing your data. Threetechniques for doing this are aggregation, anonymisation andperturbation.
    35. 35. Documenting DataThere are many reasons why you need to documentyour data:•To help you remember the details later•To help others understand your research•Verify your findings•Review your submitted publication•Replicate your results•Archive your data for access and re-useSome examples of data documentation are:•Laboratory notebooks•Field notes•Questionnaires
    36. 36. Documenting DataLaboratory or field notebooks, for example play animportant role in supporting claims relating tointellectual property developed by Universityresearchers, and even defending claims againstscientific fraud.Research data need to be documented at variouslevels:•Project level•File or database level•Variable or item levelThe term metadata (‘data about data’) is often used.The importance of metadata lies in the potential formachine-to-machine interoperability to assist locationand access to data through search interfaces.
    37. 37. Secure data storage:For the purposes of integrity, efficiency and ease of replication it isimportant that research data is stored securely & backed up regularly via:• Networked drives • Fileservers managed by department / school / IS. • Stored in single, secure, accessible place – regular back-ups.• Personal computers / laptops • Convenient, temporary storage - should not be used for storing master copies. • Local drives may fail & laptops may get lost/stolen.
    38. 38. • External storage devices • Hard drives, USB sticks, CDs, DVDs – low cost & portable BUT not recommended for long term storage. • Longevity not guaranteed – degradation over time. • Easily damaged or misplaced. • Not big enough for all research data – need for use of multiple discs/drives. • May pose a security threat. If USB sticks, DVDs, CDs are used for working data or extra back-up then: • Choose high quality products from reputable manufacturers. • Conduct regular checks to ensure media is not failing. • Periodically refresh data (i.e. copy to a new disc or drive). • Ensure confidential data is password protected / encrypted
    39. 39. • Remote or online back-up services - services that provides an online system for storing and backing-up computer files e.g. Dropbox, Mozy, Humyo, A-Drive • Allow users to store and sync data files online and between computers. • Employ cloud computing storage facilities (e.g. Amazon S3). • Business model – first few GBs free, pay for more space.
    40. 40. Backing-up Considerations for back-up policy: • Whether all data (full back-up), or only changed data will be backed-up (incremental back-up)? • How often full and incremental back-ups will be made? • How much hard-drive space or DVDs will be required to maintain this schedule? • If working with sensitive data, how will it be secured (and destroyed)? • What back-up services are available that meet your these needs? • Who will be responsible for ensuring back-ups are available? Recommendation: Keep at least 3 copies of your data (e.g. original, external/local, andexternal/remote) and put in place regular back-up procedure
    41. 41. Data Security The means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. It is important to consider data security to prevent:• Accidental or malicious damage / modification to data.• Theft of valuable or irreplaceable data.• Breach of confidentiality agreements and privacy laws.• Release of data before it has been checked for accuracy and authenticity.
    42. 42. Data Protection• The 1998 Data Protection Act regulates how personal data may be held and processed, and is aimed at organisations but also applies to individuals.• The Act recognises that personal data on its own or linked with other data, can reveal the identity of an actual living person.• You must comply with the Act from the moment you obtain personal data until the time when the data have been returned, destroyed, or perhaps transformed into a public use dataset for purposes of sharing.• Research exemption exists if you are able to process anonymised data instead of personal data for your research by destroying the “key” between the identifiers and the personally identifying information.• The Records Management Office has full guidance on its website.
    43. 43. Rights and access• Intellectual property rights (IPR) can be defined as rights acquired over any work created or invented with the intellectual effort of an individual.• Facts are not copyrightable but the structure of a database could be.• As a researcher, you should clarify ownership of and rights relating to research data before a project starts. This includes the right of access and the right to make copies.• Data licences determine the terms and conditions of use by another, and may accompany a purchase or subscription.• Open data licences attempt to “set data free” by minimising and standardising the terms and conditions of re-use. Conditions may include attribution, non-commercial use, no derivative works, or ‘share alike’.
    44. 44. Benefits of Sharing Data• Scientific integrity – publishing & citing data in published research papers can allow others to replicate, validate, or correct results, thus improving the scientific record.• Publicly funded research - there is a growing movement for making publicly funded research available to the public.• Funding mandates - UK research councils are increasingly mandating data sharing so as to avoid duplication of effort and save costs.• University of Edinburgh’s mission - "the creation, dissemination and curation of knowledge" implies transparency about the research that is conducted in its name.• Preserve research data for researchers’ own future use.
    45. 45. THANK YOU!Data Library services: data management guidance pages: University data policy: Data Audit Framework (DAF) Implementation: data MANTRA course:
    46. 46. Scenarios for DiscussionAt completion of a research project the data andrecords are boxed and stored in a departmentalstoreroom. A participant in a research project lodges aclaim for compensation, alleging that he was notadequately informed about the effects of the study anddoes not recall giving consent. He finds that thestoreroom has since been converted into a coffee shop.Where are the records?
    47. 47. Scenarios for DiscussionSometime after completion of a research project theresearcher wishes to revisit her findings, applying a newstatistical approach. She manages to read the floppy discsthat the data were stored on, eventually gets the oldsoftware format imported into her current statisticalpackage, only to find she cannot remember what many of thevariable labels –each 8 digits in length - actually mean. Hasshe documented her data?You publish a paper based on your thesis and are surprisedto find it has become a hot topic in your field. Suddenlypeople are writing to you asking for the underlying data. Howmuch effort is required to give them a well-cleaned datasetand adequate documentation for re-use?