Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Simon Hodson


Published on

I shall provide a summary of JISC work in the area of ‘Big Data’. My primary focus will be on how to manage the huge amount of research data produced in UK Universities. I shall cover the history of JISC interventions to improve research data management and look at next steps. I shall touch on some other areas of work like ‘Digging into Data’ and web archiving which also deal with ‘big data’.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Simon Hodson

  1. 1. Thursday 10 May 2012 Eduserv Symposium: Big DataJISC and the Big (Research) Data ChallengeSimon HodsonJISC Programme Manager, Managing Research Data
  2. 2. Why is managing research data important?JISC considers it a priority to support universities in improving the way research data is managed and, where appropriate, made available for reuse.Research funder policies, legislative frameworks, good practice, open dataagenda – The outputs of publicly funded research should be publicly available. – The evidence underpinning research findings should be available for validationGood data management is good for research – More efficient research process, avoidance of data loss, benefits of data reuseAlignment with university missions. – Universities want to provide excellent research infrastructure. – Universities want to have better oversight of research outputs.
  3. 3. Estimated Research Data RequirementsTwo Russell Group Universities Estimated current data holdings of c.2PB (managed and unmanaged) Currently provide 800TB/300TB in a central storage facility, not all of which is used (but will be full in 12-18 months)… Significant amount of data in temporary storage, external drives etc… ‘the more groups we go to talk to, the more were hearing of significant data holdings on external hard drives and small RAID systems’1994 Group University No central research data provision. Faculties (medicine, business, humanities) have 20-30TB each. Engineering currently has 170TB faculty system, urgent need to expand. But… one group, recently interviewed, currently has 250TB, only half in ‘managed storage’; will reach PB levels in the next few years.
  4. 4. DUDs The data centreunder the desk (or in a back pack) is not adequate.
  5. 5. Why manage research data?Not just about storage or avoiding data loss…!It’s about knowing what to keep and what to throw away…Important to extract maximum return on investment from publiclyfunded research.Access to underlying data is essential for verification and thereforeresearch integrity.Opportunities to extract more knowledge from existing data, newanalysis.It’s about making the most out of data created!
  6. 6. Making Data Meaningful and Reusable
  7. 7. JISC and Research Data1. Understanding the problem (pre-2007-2009)2. Prototyping solutions (2009-11)3. Hardening solutions and building institutional capacity (2011-13)4. Developing elements of national infrastructure (2013+)
  8. 8. 1: Understanding the ProblemKey JISC reports: Dealing with Data: e.j.lyon/reports/dealing_with_data_ report-final.pdf Keeping Research Data Safe: ents/publications/keepingresearch datasafe0408.pdf Skills, Role, Career Structure of Data Scientists and Curators: ents/programmes/digitalrepositorie s/dataskillscareersfinalreport.pdfOther: UKRDS Scoping Study:
  9. 9. Prototyping Solutions: First MRD Programme, 2009-11RDM Infrastructure (guidance/support, systems)RDM Planning (DMPs, best practice, disciplinary challenges) RDM Training (targeted at disciplinary needs) Challenges of data citation and publicationFirst JISC MRD Programme, 2009-11: MRD Outputs Page:
  10. 10. Building Institutional Capacity: First MRD Programme, 2009-11RDM Infrastructure (policy, guidance/support, systems)17 large projectsRDM Planning (DMPs, best practice, disciplinary challenges) RDM Training (disciplines and libraries/research support) Innovative data publicationSecond JISC MRD Programme, 2009-11: shortly to be announced for research data publication and developing RDMtraining materials:
  11. 11. A holistic approach… Leadership and Policy DevelopmentPublication, Citation Guidance and and Discovery Training Mechanisms Support for Data RDM Systems and Management Infrastructure Planning
  12. 12. How to develop RDM services Why develop services? Roles and responsibilities In development! Process of service development The components / building blocks • Policy • Data Management Planning • Storage • Data registry..... Examples and case studies to Getting started develop into toolkitSlide Credit: Sarah Jones and Martin Donnelly, DCC
  13. 13. Next steps? Elements of a national infrastructureJournals are increasingly implementing policies requiring availabilityof underlying data. Registry of Journal Data Policies to help researchers and research administrators understand the implications and changing landscape.Universities are developing catalogues of research data holdings. National registry of research data to facilitate discovery, reuse; better understanding of impact and research landscape.
  14. 14. Thank You!First JISC MRD Programme, 2009-11: MRD Outputs Page: JISC MRD Programme, 2011-13: Blog: Project Blogs: #jiscmrdE-mail: for slides, content: Carol Goble, Liz Lyon, Peter Murray-Rust, David Shotton, Martin Donnelly, Sarah Jones.
  15. 15. From prototype to platform… DataFlow Project: Programme SaaS for RDM Projects:
  16. 16. The JISC UMF DataFlow Project Researchers DataStage is a file management system A DataStage data package consists of selected data files accompanied by an RDF metadata manifest, with a SWORD v2 wrapper DataStage file system Researchers, other users SWORD deposit DataBank is a generic repository, and can be used to store things other that research datasets, for example data management plans (DMPs) DataBank repository