Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gobinda Chowdhury


Published on

El puzle de la gestión de datos de investigación. Maredata. Seminario final Valencia

  • Be the first to comment

  • Be the first to like this

Gobinda Chowdhury

  1. 1. Managing Research Data for Open Science: the UK experience Professor Gobinda Chowdhury Chair, iSchool@Northumbria Northumbria University, Newcastle, UK Chair elect, iSchools (
  2. 2. Open Science  In 2015 European Commissioner Moedas identified three strategic priorities, described in Open innovation, Open science, Open to the world (the 3Os strategy)  Open Science aims at transforming science through ICT tools, networks and media, to make research more open, global, collaborative, creative and closer to society  Open science is about the way research is carried out, disseminated, deployed and transformed by digital tools, networks and media. It relies on the combined effects of technological development and cultural change towards collaboration and openness in research. single-market/en/news/open-innovation- open-science-open-world-vision-europe
  3. 3. Open Science and Data Sharing: why?  Open science makes scientific processes more efficient, transparent and effective by offering new tools for scientific collaboration, experiments and analysis and by making scientific knowledge more easily accessible (  Societal benefits from making research data open are potentially very significant; including economic growth, increased resource efficiency, securing public support for research funding and increasing public trust in research ( )  Estimated that the $13 billion in government spending on the Human Genome project and its successors has yielded a total economic benefit of about $1 trillion  A British study of its public economic and social research database found that for every £1 invested by the government, an economic return of £5.40 (The Data Harvest, 2014… An RDA Europe Report. https://rd-
  4. 4. Open Research Data : Mandates  Stipulated under Article 29.3 of the Horizon 2020 Model Grant Agreement (including the creation of a Data Management Plan)  EPSRC, UK:  Research organisations will ensure that appropriately structured metadata describing the research data they hold is published (normally within 12 months of the data being generated) and made freely accessible on the internet  in each case the metadata must be sufficient to allow others to understand what research data exists, why, when and how it was generated, and how to access  Where the research data referred to in the metadata is a digital object it is expected that the metadata will include use of a robust digital object identifier (For example as available through the DataCite organisation ‐
  5. 5. Open Research Data Management: EPSRC, UK Mandate for Universities  Research organisations will ensure that EPSRC‐funded research data is securely preserved for a minimum of 10‐years from the date that any researcher ‘privileged access’ period expires or,  If others have accessed the data, from last date on which access to the data was requested by a third party;  All reasonable steps will be taken to ensure that publicly‐funded data is not held in any jurisdiction where the available legal safeguards provide lower levels of protection than are available in the UK  Research organisations will ensure that effective data curation is provided throughout the full data lifecycle, with ‘data curation’ and ‘data lifecycle’ being as defined by the Digital Curation Centre. searchdatamanagement/
  6. 6. What is Research Data  Data is “glue of a collaboration” and the “lifeblood of research”  Data includes:  text, sound, still images, moving images, models, games, simulations ….  statistics, collections of digital images, sound recordings, transcripts of interviews, survey data and fieldwork observations with appropriate annotations, an interpretation, an artwork, archives, found objects, published texts or a manuscript (Concordat on Open Research Data,  various types of laboratory data including spectrographic, genomic sequencing, and electron microscopy data; observational data, such as remote sensing, geospatial, and socioeconomic data, numerical data and other forms of data either generated or compiled by humans or machines (Borgman, C.L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059–1078. Borgman, C.L., Wallis, J.C., & Mayernik, M.S. (2012). Who’s got the data? Interdependencies in science and technology collaborations. Computer Supported Cooperative Work, 21(6), 485-523.)
  7. 7. Research Data Management  Good data management is fundamental to all stages of the research process and should be established at the outset  “The careful management of data throughout the research process is crucial if the data arising from research projects is to be rendered openly discoverable, accessible, intelligible, assessable and usable.” (  FAIR (Findable, Accessible, Interoperable and Reusable) guidelines ( mgt_en.pdf)  A DMP should include a description of all types of data, a description of all types of metadata and policies used, plans for archiving and preservation, and a description of resources required for data management (Strasser, C. (2015). Research data management: a primer publication of the National Information Standards organization. Baltimore, MD: NISO)
  8. 8. RDM Challenges and Stakeholders  Good data management is fundamental to all stages of the research process and should be established at the outset (Researchers + Data Librarian + Inst.)  Data management for Open Sc. (Data Librarian + Researchers + Institutions)  Data curation (Data Librarian/Curator + Institution + Govt./Funding Bodies)  Data Sharing Policies (Govt., Funding bodies, Institutions, Prof. Bodies)
  9. 9. RDM Technologies and Systems  National e.g. ANDS (  In-house/Institutional, e.g. Research data Oxford (; RDS Edinburgh University ( support/research-data-service) Not-for profit e.g. DataCite ( )  Subject/Discipline, e.g. UK Data Archive (; Github ( ………..  Commercial e.g. Figshare (  Aggregator portal: Jisc research Data Discovery Service ( Whichever option is chosen RDM is resource-intensive and hence requires a sustainable business model and supporting policies
  10. 10. A big question: Do researchers want to share data?  Does every researcher want to share data?  Do the researchers have the necessary awareness and data management skills?  Are there specific sharing practices and culture in specific disciplines?  Do the researchers have any concerns around data sharing?  What are the incentives of data sharing?  ....... And many more related questions
  11. 11. RDM Training Policies  Support for the development of appropriate data skills is recognised as a responsibility for all stakeholders (Principle 9 of the Concordat on Open Research Data, 2016 (  Researchers:  For research institutions this should include the provision of researcher training opportunities provided in an organised and professional manner.  It is imperative also that funding organisations, alongside research institutions, support the provision of such training through appropriate funding routes.  Individual researchers must also ensure their own data skills are at a level sufficient to meet their own obligations whilst understanding the benefits to themselves of a higher level of understanding.  Data Scientists:  “The specialised skills of data scientists are crucial in supporting the data management needs of researchers and institutions  Research institutions and funders should work together to help build underpinning capacity and capability in this area, and to attract and retain such specialists by developing well designed and sustainable career paths for them”
  12. 12. Key RDM Challenges  Technology  ICT infrastructure for storage, management, curation  Software, metadata, interoperability  Access and reuse  People  Researchers: culture, data literacy, training requirements  Data Scientists: data management, data curation, training  Users: researchers, businesses, governments, policy-makers, general public ….  Policy  Governments, Funding agencies, Institutions, Professional bodies ….  Resources  Financial, human, legal
  13. 13. RDM: Technology Issues  Volume, variety & growth of data  Software dependence of data  Multiple file formats  Data curation  Retrieval issues
  14. 14. Is Data Retrieval = Information Retrieval?  Most data retrieval services are based on the text retrieval paradigm  The key difference between IR and DR arises from the data elements  Using datasets often requires a no. of associated files  Search output in DR is often very large  Search output in DR requires downloading before access  Very little research has been undertaken on data seeking behaviour  No reliable data seeking and retrieval model exists
  15. 15. Discipline Keywords Data Retrieval Average File Size Information Retrieval Average File Size Arts & Humanities art museums 5.708 MB 0.820 MB nineteenth century 2.537 MB 1.042 MB “world war” 5.766 MB 0.508 MB medieval 5.053 MB 1.091 MB popular music 8.353 MB 1.000 MB Social Sciences unemployment 3.059 MB 0.455 MB cognition 11.681 MB 1.612 MB imprisonment 1.837 MB 0.503 MB “labour law” 1.667 MB 0.410 MB “trade union” 2.073 MB 0.748 MB Natural Sciences marine life 15.707 MB 1.491 MB “climate change” 1.655 MB 2.497 MB “renewable energy” 758.000 MB 3.606 MB “ultraviolet light” 495.900 MB 1.991 MB “oxidative phosphorlyation” 40.242 MB 1.895 MB Computer & Information Science search behaviour 656.000 MB 0.731 MB face recognition 1.391 GB 1.535 MB computer vision 1.330 GB 2.782 MB research data sharing 1.014 MB 0.521 MB social media data 16.329 MB 1.078 MB
  16. 16. Metadata for RDM  Tools:  DCC Metadata for Research disciplines (  RDA ( group.html)  Key questions:  How much metadata is required?  Who will do the tagging?  Who will check for consistency and standards?  How will it be used?
  17. 17. Data sharing: Researchers’ culture, awareness, concerns…  Findings from a study on researchers from three countries:  nearly 80% of researchers do not want to share data with anyone  Less than 25% researchers agree that their university encourages OA data sharing  Only 31% researchers are familiar with the OA requirements of the funding bodies  Nearly 95% of researchers are either uncertain or do not know whether their university has a prescribed metadata set  the key concerns for OA and data sharing include: legal and ethical issues, misuse and misinterpretation of data, and fear of losing the scientific edge  only a third of the researchers have a unique researcher ID  Over 70% of researchers did not have any formal training in DMP, metadata, consistent file naming and version control or data citation
  18. 18. TULIP: Information Management Research to address RDM Challenges  Technology  Research data repository/services: Local vs. National repository services  Research data management: standards & practices -- ORCID, DOI, Metadata, Citation, Quality, Version Control…  Research data discovery & access -- from IR paradigm to DR paradigm: user-centred & discipline-specific design  Research data sharing/reuse: data quality metrics  Users: research culture, training  Data Literacy and RDM training and advocacy across all disciplines  Librarians  Education and training programmes for data librarians  Industries  New research data service industries; Public-private partnership; Sustainability  Policies  OA mandates; Incentives for researchers; Data quality; Ethics, Curation…
  19. 19. Resources  Bugaje, M. and Chowdhury, G. (2018). Data Retrieval = Text Retrieval? iConference2018. In Chowdhury, G., McLeod, J., Gillet, V. and Willett, P. (eds). Transforming digital worlds: proceedings of the iConference2018. March 25-28, Sheffield, LNCS 10766, Springer, pp. 253-262.  Chowdhury, G. Boustany, J., Kurbanoglu, S., Unal, Y. and Walton, G. (2017). Preparedness for Research Data Sharing: A Study of University Researchers in Three European Countries, ICADL2017, Bangkok, 13-15 November, 2017, LNCS10647, pp. 104-116  DCC Checklist for DMP: ecklist_2013.pdf  DCC Curation Lifecycle model ( lifecycle-model)
  20. 20. … and now  Thanks for listening, and …..