• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Managing and Sharing Research Data: Good practices for an ideal world...in the real world.
 

Managing and Sharing Research Data: Good practices for an ideal world...in the real world.

on

  • 1,140 views

Slides from a talk given at the University of Sheffield on 19 January 2012.

Slides from a talk given at the University of Sheffield on 19 January 2012.

Statistics

Views

Total Views
1,140
Views on SlideShare
1,095
Embed Views
45

Actions

Likes
2
Downloads
0
Comments
0

3 Embeds 45

http://www.scoop.it 38
http://a0.twimg.com 5
http://www.twylah.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • I’ll return to these in more detail shortly….

Managing and Sharing Research Data: Good practices for an ideal world...in the real world. Managing and Sharing Research Data: Good practices for an ideal world...in the real world. Presentation Transcript

  • Managing and Sharing Research Data:Good Practices for an Ideal World… in the Real World Martin Donnelly Digital Curation Centre University of Edinburgh University of Sheffield 19 January 2012
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • Digital Curation Centre- Founded in 2004 to support research in UK higher and further education in the preservation, curation and management of digital resources- Major funder is JISC- Original focus on publications / biblio; now more emphasis on research data management- Support to JISC projects, especially the two Managing Research Data programmes... http://www.jisc.ac.uk/whatwedo/programmes/di_researchman agement/managingresearchdata.aspx- Tools, training, guidance, consultancy, other resources/studies…- Three partner sites: Edinburgh (lead), Bath and Glasgow
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • What is meant by managing research data?Lots of strands…- Ensuring physical integrity of files and helping to preserve them- Ensuring safety of content (data protection, ethics, etc)- Describing the data (via metadata) and recording its history- Providing or enabling appropriate access at the right time, or restricting access, as appropriate- Transferring custody at some point, and possibly destroying In short, RDM means meeting funder, institutional, disciplinary and other requirements/norms across various areas and at different times, in sympathy with the nature of the data itself, for the benefit of yourself, your institution, and the wider community, as appropriate.
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • RDM and research ethics/integrity- RDM is increasingly seen as a core research competency, along with things like writing and referencing (see RCUK Common Principles >>)
  • Policy StreamliningRCUK Common Principles on Data PolicyKey messages: 1. Data are a public good 2. Adherence to community standards and best practice 3. Metadata for discoverability and access 4. Recognise constraints on what data to release 5. Permit embargo periods delaying data release 6. Acknowledgement of / compliance with T&Cs 7. Data management and sharing activities should be explicitly fundedhttp://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
  • RDM and research ethics/integrity- RDM is increasingly seen as a core research competency, along with things like writing and referencing (see RCUK principles >>)- Research outputs (which constitute the scientific record) are often based on the collection, analysis and processing of data / sources / information- Reproducibility and verifiability are fundamental principles in many disciplines. In other disciplines, including those where research cannot be replicated such as social and environmental sciences, the longevity of the data from which the findings are derived is equally crucial- Some data is unique and cannot be replaced if destroyed or lost, yet only by referring to trustworthy data can research be judged as sound- Therefore data must be accessible and comprehensible in order to back up claims, and enable third parties to reproduce (or validate) results- Additionally, there is increasing demand for public (or Open) access to publicly-funded research outputs, including data, but more on that later…
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • Institutional and funder perspectives- Research today is technology enabled and data intensive- Data as long-term asset; identify and preserve- The fragility and cost of digital data; curate to reuse and preserve- Data sharing: research pooling, cross-disciplinary and global partnering, new research from old, the wealth of knowledge- The cost of technology and human infrastructures- Pressure to show return on public investment of £3.5bn- Compliance with legislation and funder policies- The data deluge: volume and complexity, not just in HEIs- Financial and human consequences from lost data- The cost of administering unmanaged datasets
  • Context “For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open”Surfing the TsunamiScience, 11 February 2011
  • Policy
  • PolicyRCUK Policy and Code of Conduct on the GovernanceofEPSRCResearchall those institutions it October 2011) Good expects Conduct, 2008 (updated fundsUNACCEPTABLEroadmap that aligns theirmismanagement orto develop a RESEARCH CONDUCT includes policies andinadequate preservation of data and/or primary materials,st May 2012;processes with EPSRC’s expectations by 1 including failureto:to be fully compliantrecords these expectations by 1st May keep clear and accurate with of the research procedures followed and the2015. obtained, including interim results; resultsCompliance securely inmonitored andform; hold records will be paper or electronic non-complianceinvestigated. primary data and research evidence accessible to others for make relevantFailure to share research data could result datathe normally reasonable periods after the completion of the research: in should be preserved and accessible for 10 yrs (in some cases 20 yrs or longer);imposition of sanctions. research funder‟s data policy and all relevant manage data according to the legislation; wherever possible, deposit data permanently within a national collection.Responsibility for proper management and preservation of data and primarymaterials is shared between the researcher and the research organisation.
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • The Why (pt. 1)It’s A Good Thing – Data as a public good (see RCUK Shared Principles) – Others can build upon your work (the Shoulders of Giants, Newton) and it may be useful in ways you did not foresee, beyond your discipline (‘fresh eyes and new techniques or approaches’) – Passing custody enables you to leave the preservation legwork to the specialists – You won’t be around forever, but your work might be
  • The Why (pt. 2)Incentives, or “Why Should I Spend Time On ThisWhen I Have Other Things To Worry About?”- Impact. Linking papers to data increases citation rates, see for example Henneken & Accomazzi, Smithsonian Astrophysical Observatory: http://arxiv.org/PS_cache/arxiv/pdf/1111/1111.3618v 1.pdf (pre-print)- Warning! Some numbers follow…
  • Institutional cost saving Researcher career benefitsGrowing popularity of re-use Sharing as a catalyst for discovery http://www.dcc.ac.uk/resources/briefing-papers
  • Early results: public data archivingincreases scientific contribution byone third
  • Impact- Making data accessible increases citation rates- Better for authors; better for publishers- Piwowar, Day & Fridsma (2007): - 45% of studies make data accessible - They receive 85% of citations- N.B correlation is not causation… doi:10.1371/journal.pone.0000308 4th DCC Roadshow - Oxford. Kevin Ashley,2011-09-14 21 DCC, CC-BY-SA
  • Key findings - 2.98 more publications per dataset if archived 3 - 2.77 more if „informally shared‟ 2.5 “TheOr correct forof social science research: The use and reuse of primary - enduring value some 2 research data” Archived confounding factors… Amy M. Pienta, George Alter, Jared Lyle 1.5 Shared http://hdl.handle.net/2027.42/78307 - 2.42 more if archived 1 Not shared1 Presented in Torino, April 2010: “Organisation, Economics and Policy of Scientific Research”more if informally - 2.31 0.5 shared 0 Raw Corrected2011-09-14 4th DCC Roadshow - Oxford. Kevin Ashley, DCC, CC-BY-SA 22
  • The Why (pt. 2)More incentives…- Increased citations help with the Research Excellence Framework- Research councils are increasingly rejecting submissions on the basis of poor data management plans- So you get more funding if you do this right…
  • The Why (pt. 3)Sticks…- Some funders require you to make your data available for many years after project funding has ceased. So laying adequate data preservation foundations should be near the top of your list when planning any new research project.- Funder rejections on basis of poor data management.- EPSRC roadmap requirement (N.B. It is likely that DMPs will form part of many institutional infrastructures) - the institution has overall responsibility for this, but everyone will need to play a part, and EPSRC is an important funder at Sheffield. Others may follow suit…
  • The Why (pt. 3)Government pressure on RCs…6.9 The Research Councils expect the researchers they fund to deposit publishedarticles or conference proceedings in an open access repository at or around thetime of publication. But this practice is unevenly enforced. Therefore, as animmediate step, we have asked the Research Councils to ensure the researchersthey fund fulfil the current requirements. Additionally, the Research Councilshave now agreed to invest £2 million in the development, by 2013, of a UK‘Gateway to Research’. In the first instance this will allow ready access toResearch Council funded research information and related data but it will bedesigned so that it can also include research funded by others in due course. TheResearch Councils will work with their partners and users to ensure information ispresented in a readily reusable form, using common formats and open standards.http://www.bis.gov.uk/assets/biscore/innovation/docs/i/11-1387-innovation-and-research-strategy-for-growth.pdf
  • The Why (pt. 3)- In addition to funders and institutions, prestige journals like Science and Nature already have data policies in place, and the tendency is towards increasing requirements and scrutiny here as well as with the funders…Nature and Science data policiesNatureSuch material must be hosted on an accredited independent site (URL and accession numbers to be provided by the author), or sent to the Nature journalat submission, either uploaded via the journals online submission service, or if the files are too large or in an unsuitable format for this purpose, onCD/DVD (five copies). Such material cannot solely be hosted on an authors personal or institutional web site.[4]Nature requires the reviewer to determine if all of the supplementary data and methods have been archived. The policy advises reviewers to considerseveral questions, including: "Should the authors be asked to provide supplementary methods or data to accompany the paper online? (Such data mightinclude source code for modelling studies, detailed experimental protocols or mathematical derivations.)"[5]Science‘’’Database deposition policy’’’ – Science supports the efforts of databases that aggregate published data for the use of the scientific community.Therefore, before publication, large data sets (including microarray data, protein or DNA sequences, and atomic coordinates or electron microscopymaps for macromolecular structures) must be deposited in an approved database and an accession number provided for inclusion in the publishedpaper.[6]‘’’Materials and methods’’’ – Science now requests that, in general, authors place the bulk of their description of materials and methods online assupporting material, providing only as much methods description in the print manuscript as is necessary to follow the logic of the text. (Obviously, thisrestriction will not apply if the paper is fundamentally a study of a new method or technique.)[7]REFERENCES^"Availability of Data and Materials: The Policy of Nature Magazine[4]^ "Guide to Publication Policies of the Nature Journals," published March 14, 2007.[5]^ "General Policies of Science Magazine" [6]^ ”Preparing Your Supporting Online Material” [7]- Finally, a data management plan requirement is very likely to feature in EC FP8 (“Horizon
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • Practicalities …or, Areas Where The DCC Can Help- Assessing Need- Delivering Support- Developing Strategic Institutional Research Data Management Support - Policy - Advocacy - Planning - Tools - Training www.dcc.ac.uk
  • Three areas for thought1. Documentation and metadata2. Backup3. Depositing data for the long term
  • Documentation and Metadata- Could you, or someone else, make sense of your data five years from now? What about five minutes from now?- Metadata is ‘data about data’- Simple documentation (study level) – Use consistent file names and informative labels – Version control – E.g. ABC_Study4_output_2012-01-19_v1.xls
  • Documentation and Metadata- You may wish to maintain a separate log of high level metadata about each dataset (text file, spreadsheet or database) - Research context (when, where, who) - Data history (preparation, processing) - Where and how to access the data - Access rights and permissions - Link to supplementary materials, related data, documents, publications- Wherever possible, use standardised vocabularies and metadata formats
  • Backup- What would happen to your data if there was a fire in your office tonight?- Automatic backup - Find out if this is available in your Department or School - Best practice is at least one automatic off-site backup- Manual backup - Set repeat reminders, e.g. via online calendar- N.B. Backup and archiving are not same thing!
  • Depositing Data for the Long Term- Check copyright, consent and Data Protection status- Identify the appropriate archive / data centre- Submit form/sample data/supporting documentation for review- If accepted, sign Licence Agreement- Deposit data- Dissemination?
  • That’s a lot to remember…It is, but the DCC’s Checklistfor a Data Management Planprovides a comprehensive listof issues you might need toconsider…Not all of it will be relevant toyour work. Start with thesection headings, and useDMP Online to make your lifeeasier…
  • www.dcc.ac.uk/dmponline
  • Moving Forward
  • Moving ForwardThere are lots of guidance resourcesavailable already, e.g.www.lib.cam.ac.uk/preservation/incremental/and www.glasgow.ac.uk/datamanagement andResearch Data MANTRAhttp://datalib.edina.ac.uk/mantra/… and Sheffield-focused resources are on theway.
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • 39 Save our Soils (Prof Steve Banwart, Department of Civil and Structural Engineering)20/01/2012 © The University of Sheffield
  • 40 SoilTrEC (Banwart & Menon, Department of Civil and Structural Engineering)20/01/2012 © The University of Sheffield
  • 41 SASI (Dr Bethan Thomas, Department of Geography)20/01/2012 © The University of Sheffield
  • 42 HRI Digital (Humanities Research Institute)20/01/2012 © The University of Sheffield
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • Last Words- You may be in a small group with not much capacity for huge changes, but no one expects miracles- Starting with incremental changes now is better than burying your head in the sand and hitting a brick wall later- You’re not alone! There are lots of resources available, both institutionally and at a national level
  • Running Order1. Introduction2. What is meant by managing research data?3. Research data management and research ethics/integrity4. Context and policy5. The Why  Pt. 1 – It’s A Good Thing  Pt. 2 – Carrots  Pt. 3 – Sticks6. Practicalities and Moving Forward7. Sheffield Stories8. Last Words9. Q+A
  • Q+AFAQ’s pt. 1 Q. I don’t have time for all of this. A. You should have: the RCUK councils explicitly state that data management activities should be included as part of funding applications, and institutions are bound to meet their obligations. It’s not necessary for every researcher to become an expert in all aspects of RDM, just to know what their role is in the bigger picture. Q. How are data management plans actually assessed? A. It varies from funder to funder. The AHRC has a technical review college, and ADS has internal guidance on what to look for when marking. All funders provide markers guidelines which probably say something about DMPs, but these tend not to be public documents. A notable exception is ESRC, where markers’ guidance is produced by the UK Data Archive. We’re hearing more and more stories of bids rejected on the basis of poor DMPs, so the review processes may soon become more transparent. Interestingly, the AHRC crops up in this context more often than the others.
  • Q+AFAQ’s pt. 2 Q. Won’t sharing my data mean people can steal my work? A. No. Others might find things you didn’t (or weren’t looking for), but you should receive proper attribution. Additionally, most funders permit embargo periods to enable the original data collectors/creators to benefit from their work. The risk of plagiarism is the same as publishing a paper. Q. How could I possibly share confidential data? A. If it’s confidential, you probably shouldn’t! Techniques such as anonymisation and aggregation can be applied in order to safeguard personal information, and data with commercial significance may also be protected. It depends on policies and consortium agreements etc, which should be clearly communicated. ESRC/UKDA, for example, provide advice on ‘What to tell participants’ re. confidentiality / anonymisationhttp://www.data-archive.ac.uk/create-manage/consent- ethics/consent?index=7
  • Thank you Martin Donnelly Digital Curation Centre University of Edinburgh www.dcc.ac.uk/dmponline martin.donnelly@ed.ac.uk Twitter: @mkdDCC This work is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. Image credits: To view a copy of this license, (a) visit slide 12 -http://www.psdgraphics.com/3d/gold-pound-symbol/ http://creativecommons.org/licenses/by-nc- sa/2.5/scotland/; or (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Slide credits: Francisco, California, 94105, USA. Kevin Ashley and Graham Pryor, DCC Edinburgh; Andrew McHugh, DCC Glasgow