• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Research Data Management Storage Requirements: University of Leeds
 

Research Data Management Storage Requirements: University of Leeds

on

  • 380 views

Research Data Management Storage Requirements Workshop, Mon 25 February, organised by Jisc, Janet and DCC. Presentation covers a research data survey, the RoaDMaP project, research data ...

Research Data Management Storage Requirements Workshop, Mon 25 February, organised by Jisc, Janet and DCC. Presentation covers a research data survey, the RoaDMaP project, research data characteristics and potential storage requirements at the University of Leeds.

Statistics

Views

Total Views
380
Views on SlideShare
380
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Conducted a survey of research active staff at Leeds, asked them a range of questions. How much research data do you typically generate in a year. Most respondents generate a relatively small amount of data, although some of the science disciplines generate significant volumes. Faculty of Education, social sciences and law seem most uncertain about what they generate.
  • Using this to understand better volumes. Still analysing and trying to estimate how much data is generated overall in Leeds. Also asked questions about what researchers do with the data, where held while being used and where stored. Much of the data in the sciences, especially that generated through projects funded by research councils is deposited back in council run repositories. Therefore trying to understand what level of the total data produced not secured and needs to be handled by the Institution.
  • This suggests there would not be a huge increase in data volumes for most researchers.
  • Researchers were asked to state what they saw as the challenges of research data management in three words. Suggest storage and space are probably the same issue. Although confidentiality and security also strong. And no one has enough time for everything.
  • Negative behaviours.If charge researchers directly to pay to store by volume may result in them choosing not to store all relevant data.If free then may give everything without thought as to long-term value.Leeds also wants to capture data for projects which do not receive external funding so there may be no cash avaulableMixed model?
  • EnvironmentEngineering– common with experimentalists some hpcExperimentalists and HPCModellers and programmersNetwork storage – access controlStagingLocal copy
  • Each project can have data in several categories
  • 4 types of data – 4 types of storage – local – distributed – institutional - cloudArchive v RepositoryArchive will start as live data – metadata at time of creation
  • And shades of grey – NAS, raid, mirrored, snapshot
  • At the top are the strategicactivitiesthat the institution needs to undertake. First the development of an RDM policy and the articulation of a roadmap; second the creation of a business plan whichwillensuresustainability of the support service.At the bottomis the necessary support activitywithoutwhich infrastructure developmentwill not be effective: thisincludes the creation of guidance referencematerial, the provision of training and front office, user focussed, functions for the RDM service.At the centre of the diagram are thosefunctions – bothhuman and technical – thatmap onto the research data lifecycle. More granularityis possible here, but these a the keyfunctionswhichshouldbeavailable.

Research Data Management Storage Requirements: University of Leeds Research Data Management Storage Requirements: University of Leeds Presentation Transcript

  • RDM Data Storage WorkshopFebruary 25th 2013Brian CliffordUniversity of Leeds
  • The University of Leeds: Institutional Context• 1,500 researchers (plus postgrads)• £130m research income• 80% RCUK Funded• 9 Faculties – Devolved budgets – Faculty based support for researchers• Development of a Central RDM including The Library, Research and Innovation Office, IT Service, Staff Development supporting staff based in Faculties• Investigations being undertaken by the JISC funded RoaDMaP Project
  • How much research data do you typically generate in a year?
  • What % research data would you need to keep for others to validate yourresearch findings?
  • RoaDMaP considering aspects of Long term storage• Tested use of F5 systems for virtual storage• Archiving as a service – e.g. Arkivum – Currently working on proof of concept depositing / retrieving large files• Plan to investigate feasibility of integration with ePrints for retrieval of archived datasets.• Pros and cons of outsourcing vs consortial options vs institutional options• Does outsourcing help direct cost recovery from grants?• Consortial options: – White Rose (DCC Institutional Engagement Project) – N8 (parallels with HPC model)?
  • Funding options• Considering three different models for the funding of the institutional research data management service – Top slice through RAM from Faculty income to pay for central service – Strategy Development Funding (one off!) – Recharge model• Investigating all three to ensure that the model chosen does not lead to negative behaviours• What can we afford, what do we need to store?
  • RDM Storage RequirementsGraham BlythJISC RoaDMaP ProjectEngineering IT
  • Current estimate of required storage volume?• MAPS 1 PByte• Environment 1 PByte• M+H 0.3 PByte• FBS 0.25 PByte• Engineering 0.1 PByte*
  • Research Scenarios • Large volume – expensive - changing • Large volume – expensive – static • Large volume – cheap – static or changing • Small volume – expensive • Shared access • Rate of creation • Performance in use
  • Research Scenarios – Flame frontsRaw data - High speed camera – large data, expensive experimentProcessed camera data – large data, moderately expensive processParticle detection – moderate data, moderately expensive computationSoftware development – small data, very expensive
  • Research Scenarios Characteristic Implication for StorageRaw Camera data Cost to reproduce very high Permanent long term storage Shared access Access control Very large volume of data Dedicated network storage High speed access needed Local copy may be required
  • Types of Data Static Changing Live/Archive Cheap Published/Repository Expensive
  • Storage – focus on value axisScratch –– cheap static or changing dataBacked-up –– traditional fully managed storageRepository –– discipline repositories and growing institutional or regionalrepositoriesArchive –– ?
  • With an Archive for this ScenarioStore raw camera data in archiveMay keep local copy on scratch disk for performanceSimplified backupCapture metadata at time of data creationCommon scenario – estimate 80% of expensive Engineering data
  • Components of research data management support services Business Plan andRDM Policy and Roadmap Sustainability Data Management Planning Data Managing Active Data Repositories/Catalogues Processes for selection Deposit and Handover and retention Guidance, Training and Support