Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona


Published on

The Open Data report is a result of a year-long, co-conducted study between Elsevier and the Centre for Science and Technology Studies (CWTS), part of Leiden University, the Netherlands. The study is based on a complementary methods approach consisting of a quantitative analysis of bibliometric and publication data, a global survey of 1,200 researchers and three case studies including in-depth interviews with key individuals involved in data collection, analysis and deposition in the fields of soil science, human genetics and digital humanities.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Elsevier CWTS Open Data Report Presentation at RDA meeting in Barcelona

  1. 1. RDA 9th Plenary Meeting, Barcelona, Spain Friday 7 April, 14.00 – 16.00 OPEN RESEARCH DATA: A GAP BETWEEN PRACTICE AND POLICY? LAUNCH EVENT Agenda 14.00 – Welcome by Wouter Haak, Elsevier 14.10 – Presentation of the report Stephane Berghmans, Elsevier Andrew Plume, Elsevier Clifford Tatum, CWTS 14.40 – Panel discussion moderated by Jean-Claude Burgelman, European Commission Panel members Paolo Budroni, University of Vienna Helena Cousijn, Elsevier Mark Hahnel, Figshare Ignasi Labastida, University of Barcelona Ingeborg Meijer, CWTS 15.40 – Summary & conclusions by Jean Claude-Burgelman 16.00 – Drinks
  2. 2. Open Data Collaboration Reproducibility Data Analysis Transparency
  3. 3. 1. How are researchers actually sharing data? 2. Do researchers themselves actually want to share data and/or reuse shared data? 3. Why might researchers be reticent to share their own data openly? 4. What are the effects of new data-sharing practices and infrastructures on knowledge production processes and outcomes? Research Questions – the researcher’s perspective?
  4. 4. Case StudiesGlobal Survey Quantitative (Bibliometrics) Complementary methods approach
  5. 5. Insights from bibliometric data Articles and their citations in data journals
  6. 6. Insights from bibliometric data Citations to data journals in different fields of science
  7. 7. Insights from bibliometric data Analysis of acknowledgment sections 1.51 million research articles & review articles in 2014 0.93 million with funding info data AND provide OR share 29,737 articles (3.2%)
  8. 8. Insights from bibliometric data Key Findings 1. The introduction of data journals is a recent development. Data journals are still a small-scale phenomenon, but their popularity is growing quite rapidly and it is detectable in strong growth of citations over time. 2. Open data is largely driven by disciplinary culture given the significant differences between scientific fields in the adoption of data journals. 3. The lack of consistency in reporting data sharing in the acknowledgment section of scientific articles highlights a lack of reporting standards.
  9. 9. Insights from large-scale global survey • How and why are researchers sharing data? • Why are researchers reticent to share their own data openly? • What is the role of research data management in data sharing? • How do researchers perceive reusability?
  10. 10. A third of respondents do not publish research data Q: Have you published the research data that you used or created as part of your last research project in any of the following ways?
  11. 11. The benefits of sharing research data are clear… Q: To better understand your attitudes towards research data access, please think about the research data that typically is not published (e.g. not summary charts, tables or images), and indicate how much you agree or disagree with the following statements. Strongly agree/Agree Neither agree nor disagree/Don’t know Strongly disagree/Disagree research
  12. 12. …but obstacles remain Q: To better understand your attitudes towards research data access, please think about the research data that typically is not published (e.g. not summary charts, tables or images), and indicate how much you agree or disagree with the following statements. Strongly agree/Agree Neither agree nor disagree/Don’t know Strongly disagree/Disagree
  13. 13. Whose data is it anyway? Q: Who do you believe ‘owns’ the research data that you have made or will make available to others as part of your last research project?
  14. 14. Who is responsible for acting on data management plans? Q: [Respondents indicated they are mandated to archive your research data and are provided with a research data management plan to follow.] Who is responsible for the execution this research data management plan? Who is responsible for monitoring compliance this research data management plan?
  15. 15. Insights from large-scale global survey Key finding 1 Dissemination of data is primarily contained within the current publishing system, even though one third of the researchers do not publish their data at all. Key finding 2 Data management requires significant effort, and training and resources are required. Open data mandates from funders or publishers are not perceived as a driving force to improving data management training or planning. Key finding 3 Research data is perceived as personally owned and decisions on sharing are driven by researchers, not by institutes or funders. It is important to be aware that the concept of open data speaks directly to basic questions of ownership, responsibility, and control. Key finding 4 Researchers have little awareness of reuse licenses and proper attribution, thereby making it less rewarding to make data reusable.
  16. 16. Insights from case studies • Open Data generally operationalized as the sharing and reuse of data • Open Data is not yet very common among scholars (Borgman, 2012) • recent study (Costas, et al. 2013) – data repositories as the basis for analyzing data sharing – scarcity of data available in repositories – wide variety of policies and associated infrastructures • often overlooked: data practices in fields with a tradition of data sharing – would not count as open data in the political sense – But provides a close look on data practices in research at the grass root level – involves reconceptualizing the ‘open’ in open data to include sharing and reuse that occurs in closed contexts Borgman, Christine L. 2012. “The Conundrum of Sharing Research Data.” JASIST 63 (6): 1059–78. doi:10.1002/asi.22634. Costas, Rodrigo, Ingeborg Meijer, Zahedi Zohreh, and Paul Wouters. 2013. “THE VALUE OF RESEARCH DATA: Metrics for Datasets from a Cultural and Technical Point of View. A Knowledge Exchange Report”
  17. 17. Case studies – analytical dimensions Six dimensions adapted from Leonelli’s (2013): 1. data situated 2. pragmatics of sharing/reuse 3. incentives for sharing/reuse 4. governance /accountability 5. commodification 6. globalization Leonelli, Sabina. 2013. “Why the Current Insistence on Open Access to Scientific Data? Big Data, Knowledge Production, and the Political Economy of Contemporary Biology.” Bulletin of Science, Technology & Society 33 (1-2): 6–11.
  18. 18. Case selection • Soil Mapping • Human Genetics • Digital Humanities ➡ 12 interviews (4 per case) ➡ Atlast.ti coding for dimensions Actors of interest • Data producer • Repository manager • Data users • Journal publisher • Research funder • Article author • Metrics researcher • Software developer • Database developer
  19. 19. Case selection Soil Mapping • international center dedicated to gathering information on world soil • for decades, outside scientists’ willingness to share their data with the center has meant they have accumulated a variety of data pertaining to soil properties of particular regions Human Genetics • research center organized into several co-located biomedical genetics labs • centralized bioinformatics group provides data processing and analysis expertise to multiple labs in the research center, coordinating their activities with several projects Digital Humanities • many digital humanities research projects in the Netherlands are linked through a national level network. • focused on researchers whose work straddles the traditional humanities and computational sciences
  20. 20. Data situated • Data is quite often described as digital, structured, and in relation to databases. Observations or source materials become data upon deposit in a database, which renders data as accessible for sharing and further processing  So: sharing/reuse embedded in the concept of data  clear database orientation for both data analysis and sharing • Soil mapping: – I would define it as systematized observations … and what I mean by that is enough to know how that data came to be. Otherwise I don’t think you can really use it, you might say, “We have observations,” you don’t really have data… That’s why we speak of a database. It’s got structure. You know what every field is and what it stands for. That’s what I would call data, yes.
  21. 21. Pragmatics of data sharing • In most cases the data undergoes sequential analyses in a semi- automated bundle of routines referred to as the ‘pipeline’. The database is thus integral to data analysis routines and to sharing among collaborators who participate in different stages of analysis  Layers of metadata; pipeline  Local reuse; bounded sharing • Human genetics – Just, yes, lots and lots of very small, sequential steps to come to an end product […] a list of variants, that is, annotated variants, that’s what this pipeline does. – we do is that we store all of the variants in a big database […] it will only answer in frequencies, […] you cannot do any queries on the individual level, because asking, […] I could identify a person; but by just asking frequency information, I still don’t know anything except whether or not a variant is rare or frequent in a population.
  22. 22. Incentives for sharing/reuse • The common themes are resisting openness and bounded sharing, characterized by asymmetrical incentives, collaborative modes of sharing, and evolving practices associated with new forms of collaboration.  Tensions in the distribution of labor and publications.  While sharing data is valued, the career benefits in doing so are uncertain. • Human genetics – Everyone always thinks it’s a good idea, but when you say “Okay, now, come send your data, we’ll put it in this database.” Then people always have concerns. always end up with long, long discussions why they can’t share it. • Digital humanities – There is a natural selection to the kind of students we get in literary studies. Occasionally, there are students, I’ve got one of them now who says, “I want to do maths. I want to go to mathematical studies as well and learn statistics, so that I can do this kind of research.” That’s great.
  23. 23. Governance and accountability • Publisher mandates matter, funder mandates don't – Human genetics: “funding agencies, they now start to impose this, but they do not control whether you’ve really... they do not check whether you’ve done it, right? So, there is still not a penalty for this.” – Soil mapping: “From the perspective of their own accountability as a center, a lack of consistent data citing practices means that accrediting committees are unable to evaluate the number of times the center’s data has been reused in publications”. • Security of data & privacy is leading – Human genetics: sharing of genetic data must comply with strict privacy measures. • Cross-disciplinary practices – Digital humanities: .. the transfer of practices between disciplines and the utilization of resources common with collaborators rather than following typical repository-oriented resources associated with the broader open data movement. • Training related to open data was generally understood as beneficial and/or desired, but largely missing.
  24. 24. Globalization • Negotiating terms of exchange • Privacy and security: – Soil mapping: strict privacy laws that prevent inclusion of geographical coordinate points (France) and restrictions on the scale of data that can be shown (China) • Financial – Soil mapping: diverging expectations over whether monetary exchange should occur (Netherlands and United Kingdom) • Bureaucracy – Soil mapping: bureaucratic practices that prolong and may prevent access to data (India) • Cultural objects – Digital humanities: I was just in Japan which has a completely different idea about for instance museums as treasure holds. They protect the treasures of culture and they would never consider opening that up just freely for the public... There's one university library in Nagasaki that is digitizing their own photo albums, but that was mainly it.
  25. 25. Commodification • licensing and commercialization  commercial funding  commercialization of tools  societal relevance, though often commercialization is still a ‘bad word’ • Digital humanities – “A small company, a consultancy company who works on projects for publishers wanted to know if they could use our corpus, because they were trying to predict a best-seller. So, now we’re working on a new project in which we try to develop a scouting tool for publishers.”
  26. 26. Key findings, Case studies (1) 1. Consider open data as a situated activity o All three cases reveal ways in which the pragmatics of data sharing and reuse are embedded both in conceptions of data and in normal data processing work. o Observations or source materials become data upon deposit in a database, which renders data accessible for sharing and further processing.  Reflection on survey: Note that ‘data’ in the survey is primarily defined as observations/results/source materials, rather than in relation to databases 2. Freeing-up data for reuse and sharing is hindered by national and regional differences with respect to data privacy and licensing. o The case study material illustrates potential globalization challenges regarding ‘late stage’ data sharing and reuse practices. o Friction from national differences was evident, including, cultural, bureaucratic and financial assumptions.  Reflection on survey: privacy issues, proprietary aspects, and ethics seem a common barrier
  27. 27. Key findings, Case studies (2) 3. Data is only integrally configured for sharing and reuse in collaborative research projects, where incentives for sharing are embedded in the research design itself.  Reflection on survey: Collaborative research can be used as a driver for data sharing also in non-data intensive research fields 4: Training related to open data was generally understood as beneficial and/or desired, but largely missing.  Reflection on survey: Training on open data handling is a big issue as well, as well as question who should be responsible for it. The researcher? Implication: The key findings raise questions about the efficacy of policy that prescribes open data practices as an activity apart from situated contexts.
  28. 28. Intensive data-sharing Restricted data-sharing Open Data Scenarios
  29. 29. Challenges Opportunities
  30. 30. Suggested questions for the panel  When and why should a researcher choose to publish data in data journals? Is it for example dependent or independent from other publications?  How would you address the tension of researchers wanting to share but afraid of losing control over their data?  How can you make researchers see the benefits of Open Data before they see the problems?  How would you (re)formulate open data policy to enable bottom-up implementation?  What will be the tipping point(s) for Open Data?  What are concrete implementation steps of Open Data for the researchers, for institutions and for funders?
  31. 31. Project Team Stephane Berghmans Helena Cousijn Gemma Deakin Ingeborg Meijer Adrian Mulligan Andrew Plume Alex Rushforth Sarah de Rijcke Clifford Tatum Stacey Tobin Thed van Leeuwen Ludo Waltman Thank You