Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Challenges of Making Data Travel
Sabina Leonelli
Exeter Centre for the Study of Life Sciences (Egenis)
& Department of...
Outline
• The Potential of Open Data
• Data Journeys:
– Challenges of collection
– Challenges of re-use
– Challenges of op...
Openness in Science
Long history of openness as a key norm for science: public scrutiny,
transparency and reproducibility ...
What makes Open Data valuable now?
• Potential to improve
– pathways to and quality of discoveries
– uptake of new technol...
Researching Data Journeys
Investigating the conceptual/material/institutional labor involved in
making data travel from si...
Research Data Management Across Disciplines
Scientific realms under investigation:
• model organism research: data on diff...
A simple case
[CyVerse]
Other DBs
Challenges of Collection
Data sharing needs to be extensive, comprehensive, global
and long-term. This requires:
• Habitua...
Challenges of Re-Use
• Qualitative results: very limited re-use*. Why?
• Misalignment between IT solutions and research
qu...
Challenges of Openness
• Semantic ambiguity: Openness means different things to different
people, even in same discipline ...
The Open Data Divide
High-resource bias: richer labs struggle to comply, poorer labs are left
behind and/or choose not to ...
Conclusions
1. OD is Not Quick Nor Cheap
1. Open to What and When?
2. Link between OD and Access to Software
3. Estimating...
Steps Forward: Researchers, Institutions,
Funders and Learned Societies
• Current data collections are very limited in sco...
With thanks to the Exeter Data Studies Group:
Brian Rappert
Louise Bezuidenhout
Ann Kelly
Niccolo Tempini
Gregor Halfmann
...
Upcoming SlideShare
Loading in …5
×

The Challenges of Making Data Travel, by Sabina Leonelli

440 views

Published on

1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Sabina Leonelli, Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology, University of Exeter

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The Challenges of Making Data Travel, by Sabina Leonelli

  1. 1. The Challenges of Making Data Travel Sabina Leonelli Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology University of Exeter @sabinaleonelli www.datastudies.eu
  2. 2. Outline • The Potential of Open Data • Data Journeys: – Challenges of collection – Challenges of re-use – Challenges of openness – The Open Data divide • Conclusions
  3. 3. Openness in Science Long history of openness as a key norm for science: public scrutiny, transparency and reproducibility of results define what science is, how it works, what counts as a research output Equally long history of reasons why it does not work in practice: • Trust system where scrutiny is delegated to specialists • Long paths from data generation to discovery • Strong incentives provided by commercialisation and competition, with associated intellectual property regimes around research results (and conflicting interests of research sponsors and institutions) • Practical difficulties in disseminating and reproducing data, software, techniques and materials, vis-à-vis research articles • Publication regime itself increasingly commercialised
  4. 4. What makes Open Data valuable now? • Potential to improve – pathways to and quality of discoveries – uptake of new technologies – collaborative efforts across disciplines, nations and expertises – research evaluation, debate and transparency – appropriate valuation of research components beyond papers and patents – fight against fraud, low quality and duplication of efforts – legitimacy of science and public trust – public understanding and participation • Open Data as a platform to debate what counts as science, scientific infrastructures and scientific governance, and how results should be credited and disseminated • Making data open means making data mobile and useful across sites, contexts, uses: major challenges to realising that potential • My concern: examining conditions under which the potential of data as evidence for scientific claims can be realised sustainably in the long term
  5. 5. Researching Data Journeys Investigating the conceptual/material/institutional labor involved in making data travel from sites of production to sites of (re-)use • Digital data infrastructures as sites for data movements and integration across a wide variety of sources and perspectives • Situations of data uptake and re-use in developed and developing world (ongoing studies in UK, USA, Kenya, South Africa) • Methods: history, philosophy and social studies of science – Archival research – Ethnographies and interviews on attitudes to openness, curation practices and re-use – Collaboration with researchers • Policy involvement: – Lead for Open Science working group of the Global Young Academy (e.g. Access to Open Software Survey – Nigeria, Ghana, Bangladesh) – Chair of ongoing Open Data consultation across European YAs
  6. 6. Research Data Management Across Disciplines Scientific realms under investigation: • model organism research: data on different aspects of same organism • plant science: environmental, phenotypic and omics data • biomedicine: clinical, crowdsourced, biological data • oceanography: geological, geographical, metereological, biological data • archaeology, particle physics, climate science, economics Parameters of comparison: • Subject matter (complex objects versus simplified models) • Data source (one or multiple disciplines) • Data production mode (centralised vs dispersed; highly automated vs system-specific) • Data types (ease of dissemination and analysis, size, relation to software) • Publication cultures and collaborative ethos • Geographical locations, types and sources of funding involved • Availability of relevant data (and other) infrastructures • Ethical concerns and regulation
  7. 7. A simple case
  8. 8. [CyVerse] Other DBs
  9. 9. Challenges of Collection Data sharing needs to be extensive, comprehensive, global and long-term. This requires: • Habitual data donation: challenge to current credit systems and research practices, given considerable labor involved (NB: when adopted as community ethos, huge boost to research) • Adequate standards & guidelines for data formatting: problematic given large diversity of methods & terminologies • Well-organised databases: intelligent and labor-intensive curation to avoid ‘data dumps’ • Sharing of related materials: reliable stock centres and collections, rarely available & well-coordinated with databases • Diversity of data types: now emphasis on cheap and easy quantitative measurements • Sustainability in time: – commitment to data infrastructures beyond short term – continuous updates of data standards and classification to keep up with shifts in technology and knowledge
  10. 10. Challenges of Re-Use • Qualitative results: very limited re-use*. Why? • Misalignment between IT solutions and research questions/needs/situations; problems with access to related software • Substantive disagreement over data management: – methods, terminologies, standards involved in data production and interpretation – what counts as data in the first place (data as a relational category) • Re-use often linked to participation in developing data infrastructures  rarely the case for busy practitioners, also gap in skills • Conflation of epistemic and economic value of data  wish to capitalise on past investments risks encouraging conservatism (building on old data instead of pursuing new
  11. 11. Challenges of Openness • Semantic ambiguity: Openness means different things to different people, even in same discipline (e.g. free of license, free of ownership, under CC-BY license, common good, good enough to share, unrestricted access and/or use, accessible without payment, unclear/open to interpretation..) – explicit debate is key • Problematic implementation: research ethos, career structures & incentives lag behind; strong disincentives in competitive fields; publication pressure leads to information control • IP: confusion around which modes of intellectual property apply, and to whom (individual researchers, labs, projects, networks, universities, funders) • Social & ethical concerns: data as tokens of personal identity • Universities and the state: confusion around Open Data policies perceived and perceived tensions with metrics of excellence and impact (e.g. UK)
  12. 12. The Open Data Divide High-resource bias: richer labs struggle to comply, poorer labs are left behind and/or choose not to participate • databases mostly display outputs of top English-speaking labs, which have funds to curate contents, visibility to determine dissemination formats/procedures, resources and confidence to build on data donated by others • involvement of poor/unfashionable labs, scientists in middle-low- income countries, non-scientists remains low & at ‘receiving’ end • few provisions for situations of systematic disadvantage (e.g. lack of infrastructures and online access, funding, governmental support, expertise, materials; teaching demands; power cuts and transport delays) and vulnerability (e.g. where access to a resource/location is what gives competitive edge, as in archaeology, botany) • low-resourced researchers are reluctant to contribute, fear it will undermine rather than increase international credibility
  13. 13. Conclusions 1. OD is Not Quick Nor Cheap 1. Open to What and When? 2. Link between OD and Access to Software 3. Estimating Prospective Value vs Preserving Open-Endedness Meanings of openness in Oxford English Dictionary: 1. ‘free’ (of..) 2. ‘accessible, exposed, unrestricted’ 3. ‘available, reusable’ 4. ‘flexible, unpredictable, uncertain, unsettled’ Policy and scientific discourse centers around 1-3, and yet 4 is crucial to science
  14. 14. Steps Forward: Researchers, Institutions, Funders and Learned Societies • Current data collections are very limited in scope and difficult to re-use by outsiders • Careful consideration needs to be given to what is disseminated, why, how and with which priority and time-line • Need to promote – data curation as integral part of research, since being involved in developing databases is key to effective data re-use – critical discussions about what counts as data and openness in each research community / centre / project, taking account of specific ethical, legal and political concerns • Crucial role of learned societies and funders in informing researchers as well as policy-makers of shifting needs, resources and constrains for each field • Beware of the term “sharing”: it suggests, but does not entail, reciprocity and common ground
  15. 15. With thanks to the Exeter Data Studies Group: Brian Rappert Louise Bezuidenhout Ann Kelly Niccolo Tempini Gregor Halfmann Rachel Ankeny Main reference: Leonelli, Sabina (2016, in press) Data-Centric Biology: A Philosophical Study. Chicago, Il: The University of Chicago Press. For other relevant publications, see www.datastudies.eu, @DataScienceFeed This research was funded by the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement n° 335925; the UK Economic and Social Research Council (ESRC), grant number ES/F028180/1; and the Leverhulme Trust, grant award RPG-2013-153. 15www.datastudies.eu

×