Research Data Management from 
a disciplinary perspective 
Sarah Jones 
Digital Curation Centre 
sarah.jones@glasgow.ac.uk 
Twitter: @sjDCC 
Stéphane Goldstein 
Research Information Network 
stephane.goldstein@researchinfonet.org 
Twitter: @stephgold7
Disclaimer 
Practice varies greatly by 
discipline and sub-discipline 
so it’s hard to generalise 
Apologies for any sweeping 
statements and groupings 
that don’t fit your model 
Image credit: Sweep by Judy Van der Velden CC-BY-NC-ND 
www.flickr.com/photos/judy-van-der-velden/6757403261
Case studies on disciplinary practice 
RIN Information Seeking and Sharing Behaviour 
www.rin.ac.uk/our-work/using-and-accessing-information-resources 
– Life sciences 
– Humanities 
– Physical sciences 
RIN Open Science Case studies 
www.rin.ac.uk/our-work/data-management-and-curation/open-science-case-studies 
SCARP case studies www.dcc.ac.uk/resources/case-studies/scarp 
Knowledge Exchange Incentives and motivations for sharing research 
data (forthcoming) 
RLUK research data typology (more from Stephane)
Groups and disciplines 
Arts & Humanities 
– Creative arts, languages, philosophy, archaeology… 
Social Science 
– Economics, history, politics, business, psychology... 
Sciences & Engineering 
– Physics, astronomy, earth sciences, computing… 
Life Sciences 
– Biology, ecology, medical and veterinary science…
Arts & Humanities 
Outputs may not be termed 
‘data’ e.g. sketches, writing, 
performance, artefacts, ‘work’ 
Focus on literary outputs & 
manuscripts in some disciplines 
More use of standard tools e.g. 
Word, Excel – less likely to 
adapt technologies to fit 
Arguably lower awareness and 
uptake of RDM overall
Creative Arts 
Several RDM projects in the creative arts e.g. Kultivate, 
KAPTUR, VADS4R, CAiRO training... 
Resistance to term ‘data’ – too scientific 
Importance of personal websites for profile as work is 
also conducted outside of academia 
Visual Arts Data Service - www.vads.ac.uk 
Institutional repositories at arts schools accept a broader 
range of outputs and display content more visually to fill 
the void e.g. http://research.gold.ac.uk
Sonic Arts Research Unit 
Collaboration with IR as a 
result of losing data 
Tension between providing 
access in a visual / usable 
way and preserving data 
Still use soundcloud and 
personal websites for 
access, but these link to 
‘master’ copy of data held 
in IR for preservation 
www.dcc.ac.uk/resources/developing-rdm-services/repository-radar
Digital Humanities 
Intentional creation of resources rather than just data as 
by-product of research process 
More use of standards e.g. XML & TEI in language 
resources, image standards and capture quality for 
digitisation, Dublin Core metadata… 
Often include technical experts in project team 
Links with cultural heritage collections 
Negotiating copyright often a major issue 
Sustainability a big challenge
Mapping Edinburgh’s Social History 
Historical maps overlaid these with all kinds of 
open data to chart how the town has changed 
through time 
Uses open source tools 
Allows you to overlay maps 
Picks up on common themes 
www.mesh.ed.ac.uk
Social Sciences 
 Greater awareness and acceptance 
of RDM by community 
 Methodology is as much a factor in 
determining difference as discipline 
 Nature of data often poses 
challenges for sharing 
 Lots of reuse of large survey data 
 Established metadata standards e.g. 
Data Documentation Initiative (DDI) 
 Strong international data centre 
infrastructure
Public health 
Ethics predominant concern 
– How to negotiate consent 
– How to store, transfer & handle data securely 
– How to anonymise and share data 
Data integration / linking and curation of longitudinal studies is 
major concern as data added to over decades 
Need for data havens to help control access to data – role for 
unis e.g. Grampian Data Safe Haven 
UK Data Service - http://ukdataservice.ac.uk
Twenty-07: Public health study 
Longitudinal study following 4510 people from West of Scotland 
over 20 years to investigate the reasons for differences in health 
Undertook interviews, questionnaires, physical measurements, 
blood samples etc 
Strict access controls and guidelines for data collection 
Data managed within the MRC Social and Public Health Sciences 
Unit and accessible under a data sharing agreement - 
http://2007study.sphsu.mrc.ac.uk/Revised-Data-Sharing-Policy-has-been-
Life Sciences 
 Funders arguably more demanding 
in terms of data sharing policy 
 Sharing can be problematic / resisted 
given the nature of the data, fear of 
misuse or loss of control over IPR 
 Data sharing agreements and access 
committees more common 
 Data integration & mining key 
drivers 
 Research is well-resourced so greater 
capacity to fund local solutions and 
tools for RDM during projects
Genetics 
Vast quantities of data and rapid growth 
– DNA sequence data is doubling every 6-8 months 
Well established public databases for gene sequences e.g. 
GenBank www.ncbi.nlm.nih.gov/genbank 
– However even this is on short-term project funding! 
Need accession number to publish so driver for sharing and 
established workflow 
European Data Infrastructure projects too e.g. ELIXIR
Neuroscience 
 Large data volumes due to use of medical imaging 
 Moving towards larger cohort studies integrating wider range of data types, 
which strains the balance with ethical requirements around personal data 
 Costs of data gathering and advances in analysis technology are making field 
more data intensive - computational methods 
 Small interdisciplinary teams provide the human infrastructure for RDM, but 
historically low funder investment in data management at lab level 
 Disciplinary archives are immature, and has encouraged tendency for labs to 
treat longitudinal datasets as intellectual capital
OMERO – Open Microscopy 
Environment 
Monash e-Research Centre 
helps groups to adopt (and if 
needed adapt) existing 
technological solutions 
Partnered a research group to 
implement OMERO, a secure 
central repository to help 
researchers organise, analyze 
and share images 
Resulting tool more 
sustainable as tailored to 
specific community need 
www.dcc.ac.uk/resources/developing-rdm-services/improving-rdm-monash
Science & Engineering 
 Large scale can mean RDM is built in 
as standard and sharing part of 
workflow e.g. facilities science 
 Often early adopters and advocates 
of new technologies e.g. the Grid, 
wikis & Arxiv in particle physics 
 Archiving established in some cases 
as data can’t be recreated e.g. NERC 
data centres for Earth Sciences 
 Commercial sensitivities can place 
restrictions on sharing in some fields 
Industry 
partners
Mechanical Engineering 
Several RDM projects at Bath e.g. ERIM, REDm-MED 
Concept of repository well established in industrial engineering 
– Product Lifecycle Management (PLM) systems 
Preservation issues as data is challenging e.g. CAD files 
Less information sharing than other disciplines 
– Commercial sensitivities preclude sharing 
– Consultancy-style research can lead to internal-only results 
– Data generated from private systems, so less applicable to others
Crystallography 
X-ray examinations, images and videos of crystal structures, 
chemical crystallography diffraction images 
Established metadata standards e.g. Crystallographic 
Information Framework (CIF) 
Advocates of open science and use of related tools 
 UsefulChem - http://usefulchem.wikispaces.com 
 LabTrove - www.labtrove.org 
eCrystals Archive and Crystallography Open Database (COD) 
National Crystallography Service - www.ncs.ac.uk
Astronomy 
Established data standards (e.g. FITS and NOA) maintained by 
community 
Access to facilities requires the deposit of raw data, although 
this can be embargoed 
International data centres e.g. Sloan Digital Sky Survey - 
www.sdss.org 
Large volumes of data so transfer can be difficult 
Few IPR issues compared to other disciplines 
Data products are not always shared
Galaxy Zoo 
Citizen Science project started to 
classify a million galaxies imaged by 
the Sloan Digital Sky Survey 
Over 50 million classifications in the 
first year, contributed by more than 
150,000 people 
Classifications were as good as those 
from professional astronomers 
Further projects in astronomy, 
climatology, biology, humanities… www.galaxyzoo.org
Research data typology 
Commissioned by RLUK 
Aim: to help librarians improve their ability to 
engage with researchers on RDM matters; and 
to enable them to acquire a better 
understanding of the needs of researchers 
A resource structured around a suggested 
typology of research data, looking at different 
ways in which data might be categorised
Broad data types 
1. How do researchers generate and process data, and 
for what purpose? 
1.1 Method of creation and collection of research data: 
where the data comes from 
1.2 Readiness of research data: extent to which data 
has been processed 
1.3 Use of research data: researchers' main purpose for 
accessing and using data 
2. In what file formats, media and volumes do researchers 
generate data? 
2.1 Medium and format for research data: objects in which 
data is captured and recorded, electronic storage and file 
types 
2.2 Electronic data volumes: size of files (this is subjective, 
and based largely on the perception of researchers 
3. How do researchers manage and store their data? 3.1 Storage of research data: where and how data is kept 
3.2 Types of metadata: not an exhaustive list, but these are 
widely-recognised metadata standards 
3.3 Metadata standards 
3.4 Degree of openness: founded on Royal Society's 
categorisation of 'intelligent openness' 
3.5 Licensing of research data: legal rights appertaining the 
use of the data
An expandable resource 
A scaffold onto which disciplinary examples can be 
hung 
Dynamic resource: community input (from librarians, 
but maybe others too?), crowdsourcing 
Turning it into an online interactive tool 
Refreshing, curating, adapting the resource 
Basic introduction at 
http://www.powtoon.com/show/fZDm1s0W6TI/research-data-typology-for-rluk- 
draft/
Conclusions 
Lots of work still to do! 
Domains different in all respects: data, methods, key 
RDM concerns, level of infrastructure and support… 
Differences exist at sub-discipline level 
Need to understand the area 
 Developing and using RLUK’s typology
How to plug the gaps? 
Dozens of different repositories or databases 
specialising in sub-domains or data types, but still major 
gaps 
– Shared services? 
– Institutional services – specialising rather than generic? 
– Role of publishers and learned societies? 
– Funder calls for domain specific infrastructure? 
– Unis to support ground-up development of tools / services? 
• How can the sector help domain-specific solutions to 
mature and thrive?

Disciplinary RDM

  • 1.
    Research Data Managementfrom a disciplinary perspective Sarah Jones Digital Curation Centre sarah.jones@glasgow.ac.uk Twitter: @sjDCC Stéphane Goldstein Research Information Network stephane.goldstein@researchinfonet.org Twitter: @stephgold7
  • 2.
    Disclaimer Practice variesgreatly by discipline and sub-discipline so it’s hard to generalise Apologies for any sweeping statements and groupings that don’t fit your model Image credit: Sweep by Judy Van der Velden CC-BY-NC-ND www.flickr.com/photos/judy-van-der-velden/6757403261
  • 3.
    Case studies ondisciplinary practice RIN Information Seeking and Sharing Behaviour www.rin.ac.uk/our-work/using-and-accessing-information-resources – Life sciences – Humanities – Physical sciences RIN Open Science Case studies www.rin.ac.uk/our-work/data-management-and-curation/open-science-case-studies SCARP case studies www.dcc.ac.uk/resources/case-studies/scarp Knowledge Exchange Incentives and motivations for sharing research data (forthcoming) RLUK research data typology (more from Stephane)
  • 4.
    Groups and disciplines Arts & Humanities – Creative arts, languages, philosophy, archaeology… Social Science – Economics, history, politics, business, psychology... Sciences & Engineering – Physics, astronomy, earth sciences, computing… Life Sciences – Biology, ecology, medical and veterinary science…
  • 5.
    Arts & Humanities Outputs may not be termed ‘data’ e.g. sketches, writing, performance, artefacts, ‘work’ Focus on literary outputs & manuscripts in some disciplines More use of standard tools e.g. Word, Excel – less likely to adapt technologies to fit Arguably lower awareness and uptake of RDM overall
  • 6.
    Creative Arts SeveralRDM projects in the creative arts e.g. Kultivate, KAPTUR, VADS4R, CAiRO training... Resistance to term ‘data’ – too scientific Importance of personal websites for profile as work is also conducted outside of academia Visual Arts Data Service - www.vads.ac.uk Institutional repositories at arts schools accept a broader range of outputs and display content more visually to fill the void e.g. http://research.gold.ac.uk
  • 7.
    Sonic Arts ResearchUnit Collaboration with IR as a result of losing data Tension between providing access in a visual / usable way and preserving data Still use soundcloud and personal websites for access, but these link to ‘master’ copy of data held in IR for preservation www.dcc.ac.uk/resources/developing-rdm-services/repository-radar
  • 8.
    Digital Humanities Intentionalcreation of resources rather than just data as by-product of research process More use of standards e.g. XML & TEI in language resources, image standards and capture quality for digitisation, Dublin Core metadata… Often include technical experts in project team Links with cultural heritage collections Negotiating copyright often a major issue Sustainability a big challenge
  • 9.
    Mapping Edinburgh’s SocialHistory Historical maps overlaid these with all kinds of open data to chart how the town has changed through time Uses open source tools Allows you to overlay maps Picks up on common themes www.mesh.ed.ac.uk
  • 10.
    Social Sciences Greater awareness and acceptance of RDM by community  Methodology is as much a factor in determining difference as discipline  Nature of data often poses challenges for sharing  Lots of reuse of large survey data  Established metadata standards e.g. Data Documentation Initiative (DDI)  Strong international data centre infrastructure
  • 11.
    Public health Ethicspredominant concern – How to negotiate consent – How to store, transfer & handle data securely – How to anonymise and share data Data integration / linking and curation of longitudinal studies is major concern as data added to over decades Need for data havens to help control access to data – role for unis e.g. Grampian Data Safe Haven UK Data Service - http://ukdataservice.ac.uk
  • 12.
    Twenty-07: Public healthstudy Longitudinal study following 4510 people from West of Scotland over 20 years to investigate the reasons for differences in health Undertook interviews, questionnaires, physical measurements, blood samples etc Strict access controls and guidelines for data collection Data managed within the MRC Social and Public Health Sciences Unit and accessible under a data sharing agreement - http://2007study.sphsu.mrc.ac.uk/Revised-Data-Sharing-Policy-has-been-
  • 13.
    Life Sciences Funders arguably more demanding in terms of data sharing policy  Sharing can be problematic / resisted given the nature of the data, fear of misuse or loss of control over IPR  Data sharing agreements and access committees more common  Data integration & mining key drivers  Research is well-resourced so greater capacity to fund local solutions and tools for RDM during projects
  • 14.
    Genetics Vast quantitiesof data and rapid growth – DNA sequence data is doubling every 6-8 months Well established public databases for gene sequences e.g. GenBank www.ncbi.nlm.nih.gov/genbank – However even this is on short-term project funding! Need accession number to publish so driver for sharing and established workflow European Data Infrastructure projects too e.g. ELIXIR
  • 15.
    Neuroscience  Largedata volumes due to use of medical imaging  Moving towards larger cohort studies integrating wider range of data types, which strains the balance with ethical requirements around personal data  Costs of data gathering and advances in analysis technology are making field more data intensive - computational methods  Small interdisciplinary teams provide the human infrastructure for RDM, but historically low funder investment in data management at lab level  Disciplinary archives are immature, and has encouraged tendency for labs to treat longitudinal datasets as intellectual capital
  • 16.
    OMERO – OpenMicroscopy Environment Monash e-Research Centre helps groups to adopt (and if needed adapt) existing technological solutions Partnered a research group to implement OMERO, a secure central repository to help researchers organise, analyze and share images Resulting tool more sustainable as tailored to specific community need www.dcc.ac.uk/resources/developing-rdm-services/improving-rdm-monash
  • 17.
    Science & Engineering  Large scale can mean RDM is built in as standard and sharing part of workflow e.g. facilities science  Often early adopters and advocates of new technologies e.g. the Grid, wikis & Arxiv in particle physics  Archiving established in some cases as data can’t be recreated e.g. NERC data centres for Earth Sciences  Commercial sensitivities can place restrictions on sharing in some fields Industry partners
  • 18.
    Mechanical Engineering SeveralRDM projects at Bath e.g. ERIM, REDm-MED Concept of repository well established in industrial engineering – Product Lifecycle Management (PLM) systems Preservation issues as data is challenging e.g. CAD files Less information sharing than other disciplines – Commercial sensitivities preclude sharing – Consultancy-style research can lead to internal-only results – Data generated from private systems, so less applicable to others
  • 19.
    Crystallography X-ray examinations,images and videos of crystal structures, chemical crystallography diffraction images Established metadata standards e.g. Crystallographic Information Framework (CIF) Advocates of open science and use of related tools  UsefulChem - http://usefulchem.wikispaces.com  LabTrove - www.labtrove.org eCrystals Archive and Crystallography Open Database (COD) National Crystallography Service - www.ncs.ac.uk
  • 20.
    Astronomy Established datastandards (e.g. FITS and NOA) maintained by community Access to facilities requires the deposit of raw data, although this can be embargoed International data centres e.g. Sloan Digital Sky Survey - www.sdss.org Large volumes of data so transfer can be difficult Few IPR issues compared to other disciplines Data products are not always shared
  • 21.
    Galaxy Zoo CitizenScience project started to classify a million galaxies imaged by the Sloan Digital Sky Survey Over 50 million classifications in the first year, contributed by more than 150,000 people Classifications were as good as those from professional astronomers Further projects in astronomy, climatology, biology, humanities… www.galaxyzoo.org
  • 22.
    Research data typology Commissioned by RLUK Aim: to help librarians improve their ability to engage with researchers on RDM matters; and to enable them to acquire a better understanding of the needs of researchers A resource structured around a suggested typology of research data, looking at different ways in which data might be categorised
  • 23.
    Broad data types 1. How do researchers generate and process data, and for what purpose? 1.1 Method of creation and collection of research data: where the data comes from 1.2 Readiness of research data: extent to which data has been processed 1.3 Use of research data: researchers' main purpose for accessing and using data 2. In what file formats, media and volumes do researchers generate data? 2.1 Medium and format for research data: objects in which data is captured and recorded, electronic storage and file types 2.2 Electronic data volumes: size of files (this is subjective, and based largely on the perception of researchers 3. How do researchers manage and store their data? 3.1 Storage of research data: where and how data is kept 3.2 Types of metadata: not an exhaustive list, but these are widely-recognised metadata standards 3.3 Metadata standards 3.4 Degree of openness: founded on Royal Society's categorisation of 'intelligent openness' 3.5 Licensing of research data: legal rights appertaining the use of the data
  • 24.
    An expandable resource A scaffold onto which disciplinary examples can be hung Dynamic resource: community input (from librarians, but maybe others too?), crowdsourcing Turning it into an online interactive tool Refreshing, curating, adapting the resource Basic introduction at http://www.powtoon.com/show/fZDm1s0W6TI/research-data-typology-for-rluk- draft/
  • 25.
    Conclusions Lots ofwork still to do! Domains different in all respects: data, methods, key RDM concerns, level of infrastructure and support… Differences exist at sub-discipline level Need to understand the area  Developing and using RLUK’s typology
  • 26.
    How to plugthe gaps? Dozens of different repositories or databases specialising in sub-domains or data types, but still major gaps – Shared services? – Institutional services – specialising rather than generic? – Role of publishers and learned societies? – Funder calls for domain specific infrastructure? – Unis to support ground-up development of tools / services? • How can the sector help domain-specific solutions to mature and thrive?

Editor's Notes

  • #21 Flexible Image Transport System (FITS)