“Bioscience has emerged as a data-rich discipline, in a transformation that is spreading as widely now as molecular biology in the twentieth century. We look forward to supporting new research careers, where data are valued and shared widely, where new software is a natural part of Biology, and where re-analysis and modelling are as creative as experimentation in understanding the rules of life and their applications.” Prof Andrew Millar FRS, chair Expert Group UKRI-BBSRC Review of data-intensive bioscience 2020.
Indeed - biomedical science is knowledge work and knowledge turning - the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR and Open flow and availability of data and methods for automated processing and reproducible results, and on a society of scientists coordinating and collaborating.
For the past 25 years I have worked on the social and technical challenges in digital infrastructure to support scientific collaboration, data and method sharing, and automate scientific processing. Big ideas I have been instrumental in – sharing and publishing high quality computational workflows, semantic web technologies in bioscience, ecosystems of Research Objects as the currency of scholarly knowledge, FAIR data principles - preached revolution to inspire but need nudges* to get traction.
I’ll talk about making good on Andrew’s quote: what I’m doing to nudge and where we need to do more. I’ll also talk about my experiences as a woman in a digital infrastructure and computer science over the past 40 years – and some nudging is needed there too.
*Thaler RH, Sunstein CR (2008) Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press. ISBN 978-0-14-311526-7. OCLC 791403664.
https://www.bsc.es/research-and-development/research-seminars/hybrid-bsc-rslife-sessionbioinfo4women-seminar-love-money-fame-nudge-enabling-data-intensive
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through Digital Infrastructure Nudging
1. Love, Money, Fame, Nudge:
Enabling Data-intensive BioScience through
Digital Infrastructure Nudging
Professor Carole Goble CBE FREng FBCS CITP
eScience Lab, Dept of Computer Science, The University of Manchester, UK
Joint Head of Node ELIXIR-UK, Digital lead EU-IBISBA
Software Sustainability Institute UK
carole.goble@manchester.ac.uk
https://esciencelab.org.u
k/
BSC Research Seminar/BSC Life Session/Bioinfo4Women seminar, 21st July 2022
2. “Bioscience has emerged as a data-rich
discipline, in a transformation that is
spreading as widely now as molecular
biology in the twentieth century.
We look forward to supporting new
research careers, where data are valued
and shared widely, where new software is
a natural part of Biology, and where re-
analysis and modelling are as creative as
experimentation in understanding the
rules of life and their applications.”
Prof Andrew Millar FRS FRSE
chair Expert Group UKRI-BBSRC Review of data-intensive
bioscience 2020.
3. Data Intensive - Knowledge Turning
• Increase Flow of Information
• Scattered resources
• Diverse platforms
• MultipleTeams
• Data sovereignties
• Coordination
• Collaboration [original figure: Josh Sommer]
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8
4. Open Research, Sharing (intra and inter Teams)
Nature 602, 558-559 (2022)
doi: https://doi.org/10.1038/d41586-022-00402-1
Publications open access
Datasets ‘open as possible, closed
as necessary’
Beyond Data management
• Digital: Software, algorithms, protocols,
workflows, models
• Physical: reagents, antibodies, hardware
Software is a first-class research
output.
5. Open not always feasible…
Wilkinson, et al.The FAIR Guiding Principles for scientific data management and
stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Source: https://www.technologynetworks.com/informatics/articles/repeatability-
vs-reproducibility-317157
Open is not always enough…
6. Personally
Productive
Team
Science
The Fundamental Characteristics of aTranslational Scientist
C.TaylorGilliland, JuliaWhite, BarryGee, Rosan Kreeftmeijer-Vegter, Florence Bietrix,Anton E. Ussi, Marian Hajduch, Petr Kocis, Nobuyoshi Chiba, Ryutaro Hirasawa, Makoto
Suematsu, Justin Bryans, Stuart Newman, Matthew D. Hall, and Christopher P. Austin ACS Pharmacology &Translational Science 2019 2 (3), 213-216 DOI: 10.1021/acsptsci.9b00022
10. Standards
Standards
Standards
Two+ decades of preaching “revolution”
for FAIR and Open Research …
• Releasing ResearchObjects
• FAIR Methods Commons
• Better Software Better Research
Nudging.
11. What if we changed science scholarship?
Or as a PDF +
supplementary
materials
Research
outcomes
more than
just
publications
and data
Publish each object in its own metadata
& repository
FAIR Digital Objects
Connected knowledge
Reproducible results
12. What if we released research objects?
All the related research objects needed to reuse & reproduce
results as a object that is a new currency of scholarship?
Living objects?
released research rather
than “published” it?
A new way of exchanging, archiving, reporting, citing research outcomes
A new way of actionable knowledge units
Metadata objects
Self-described context,
dependencies and
relationships between the
objects?
Virtual objects referencing
scattered resources
Scaled up and working
across all platforms?
Moving knowledge
between different teams.
Objects are
Actionable knowledge units
Digital twins??
13. Do Research
with objects
Plan and Assemble
Methods, Materials
Objects
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Publish Research
objects
Share
Results
Manage
Results
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
Experiment
Observe
Simulate
Describe and release the data, software, workflows, research
as its being created, updated and used
Treat ALL Products and ALL Research Like we treat Open Source Software
Mesirov,J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
14. Concept in different forms
FAIR DigitalObjects for Science: From Data Pieces to Actionable
Knowledge Units: https://doi.org/10.3390/publications8020021
European Open Science Cloud EOSC Interoperability Framework
https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-
01aa75ed71a1/language-en/format-PDF/source-190308283
“Turning the Internet into
a meaningful data space”
15. Concept in different forms
https://mobilizecbk.med.umich.edu/
https://nanopub.org
https://jupyter.org/
https://elifesciences.org/collections/d7281
9a9/executable-research-articles
Learning Health Systems Executable Notebooks
Nanopublications
Executable Papers
16. Standards
Standards
Standards
Knowledge Graphs & Linked Data
Object-oriented systems at web scale
Privacy preserving
Cloud, Fog and Edge Computing
Standardised Web
BIG DATA ANALYTICS
Federated and Privacy
Preserving Analytics
Distributed
Computing
Translational
Computer Science
Legacy processes
Embed, Scale, Sustain
Stakeholders?
Adoption
Migration
Opportunity Costs and Blockers
Reward system of science!
Infrastructure
andTools
Text mining
data & software credit
AI/Machine learning/MLOps
pipelines, credit, auto-magic
documentation, ontology mapping,
provenance tracking, object trajectories,
object maintenance
blockchain
17. the Cameron Neylon
incentive equation
Incentive= Interest
Friction
x
Number of
people
benefitting
Cameron Neylon, BOSC 2013, http://cameronneylon.net/
18. Incentives to change behaviour
incite interest and reduce friction
Side effects
Stealth
Tweaks
Ramps
Socio-technical Manipulations
Choice Architecture
Libertarian Paternalism
Richard H.Thaler,Cass R. Sunstein, 2008
19. Nudge examples
Make your data objects findable?
Sneak in schema.org and use a search engine!
Get data collection well annotated
with metadata ?
Smuggle ontologies into Excel!
Want to get folks to open up their
datasets? Give them a (sort of) choice…
20. Research Object Nudge? Packaging
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Platform independent standards-based metadata framework for
bundling resources with context into citable reproducible packages
capsules
Archive
Platform specific
solutions
Linked data –
machine
processable
and at web
scale
21. Specific Nudge? Packaging over Distributed & Diverse
content, Honouring Legacy systems
Integrated view spanning over
fragmented resources using PIDs and
metadata
Linked data – machine
processable and at web scale
unbounded, self-describing, extensible
reference real and digital entities of any kind
22. Knowledge Graph in and out the box…
In a Research Object
Across Research Objects
23. Who do we have to
nudge? Developers!
For it is they who will adapt
repositories, fiddle with
journal infra and build the user
applications!
Killer Nudge? Developer Friendly Packaging
http://www.researchobject.org/ro-crate/
Structured archive to aggregate files and any
URI-addressable content, with contextual
information to aid decisions about re-use.
Soiland-Reyes, S et al. ‘Packaging Research Artefacts with RO-Crate’. 1 Jan. 2022 : 1 – 42.
24. Developer Friendly
just enough just in time underware
Practical, lightweight robust
Web native, off-the-shelf, Lo-Tek
• Machine and human readable
• Search engine and developer friendly
• JSON-LD and Schema.org
Limited flexibility, Fewer features,
documentation, examples, libraries,
tools, community
Domain diversity and legacy
• Duck typing self-describing profiles,
extensible with additional metadata and
pre-existing ontologies
Peter Sefton
Semantic Web world vs Real World
Open Repositories & Dig Lib Community
• Simpler opinionated guide to best practices
25. An underware nudge goes a long way…
Describe boxes of entities…
• Exchange between repositories
• Exchange between services
• Transfer collections of secure
distributed datasets
• Describe and archive datasets
• Citation aggregation and tracking
• Reproducibility
• Provenance collection
• FAIR Digital Objects …. NIH Commons.
Soiland-Reyes, Sefton, et al (2022): Creating lightweight FAIR Digital Objects with RO-
Crate. Research Ideas and Outcomes, 1st Intl Conf on FAIR Digital Objects (submitted)
EOSC4Cancer
EuroScienceGateway
FAIR-IMPACT
26. Digital Twinning …
https://biodt.eu/
predict biodiversity
dynamics
… coupled with Methods
Hardisty et al The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital
Mobilisation of Natural History Collections. Data Intelligence 2022; 4 (2): 320–341. doi: https://doi.org/10.1162/dint_a_00134
30. Nudge needed? Reuse, Repurpose, Recycle
Computational know-how Compositional
Collaborative efforts
Variant development
Method research objects
Skilled work
Validation and verification
Optimisation
Maintenance
Community
Design for reuse
FAIR units
Multi-platform portability
Multiplier effects
Developers: Users Ratio
31. Sharing Workflow building blocks on WorkflowHub
BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows.
Pau Andrio,Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Stian Soiland-Reyes, Carole Goble, Daniele
Lezzi, Rosa M. Badia, ModestoOrozco & Josep Ll. Gelpi Nature Scientific Data, 09/2019,Volume 6, Issue 1, p.169, (2019)
https://workflowhub.eu/programmes/2
Registration?
Couple with Github
repos
Offer access control
Metadata?
Mine workflows
OnboardWfMS
Sharing?
DOIs, credit tracking
Recognition? Policy
Promotion committees
Funder impact reviews
Journal mandates
32. Software-Intensive Bioscience
Better Software Better Research https://www.software.ac.uk/
56% UK researchers develop their own
research software or scripts
73% UK researchers have had no formal
software engineering training
**Survey 15 UK universities, 2014, 406 respondents
https://www.slideshare.net/carolegoble/better-software-better-research
92% UK researchers use software
33. RSECon2019
Nudge? A Name and a Mobilisation
2013
https://society-rse.org/
Professionalisation of Research Software
34. Nudge? Professionalisation
30 UK RSE groups
Research Software Engineers
SocRSE 610+ members
University of Manchester Research IT, 25 RSEs
6th RSE Conference (2022) 350 RSEs +
remote, from 13 different countries
https://society-rse.org/
35. Goldacre, B & Morley, J. (2022). Better, Broader, Safer: Using
health data for research and analysis. A review commissioned
by the Secretary of State for Health and Social Care.
Department of Health and Social Care.
Recommendation 9.
Recognise software development as a central
feature of all good work with data.
UKRI/ NIHR should provide open,
competitive, high status, standalone funding
for software projects and developers working
on health data.
Universities should embrace Research
Software Engineering (RSE) as an
intellectually and academically creative
collaborative discipline, especially in health,
with realistic salaries and recognition.
36. Data Intensive Bioscience -> RSE intensive Teams
Knowledge Turning through People
Crusoe MR et al, Methods included: standardizing computational reuse and
portability with the Common Workflow Language CACM 65(6): 54–63 (2022)
women 7
men 18
(39%)
37. Women in Informatics / Engineering
eScience Lab 7/18 39%
Computer Science faculty 18/79 23%
RSE fellowship awards 3/19 15%
Elected Fellows 122/1607 7.5%
Head of Node 4/23 17%
Deputy Head of Node 5/23 30%
38. Then ? … 1979
• Programmed ICL 2900 Series Mainframes at an all
girls school.
• 1st intake software based computer science degree
in Manchester.
• Appointed to faculty as teaching staff @24yrs; full
professor @39yrs
• UKWomen in Computing & schoolgirl workshops
late 1980s-early 1990s
• Spoke at 1st Grace Hopper Conference, 1994
Professors 0/4
Other Faculty 3/28
Tech support staff 0/31
Computer Operators 3/3
Operating Clerks 8/8
https://ghc.anitab.org/
39. And Now ? … 2017
European Open Science Cloud Symposium 2017
Man-panel of Science Drivers
41. EDI policies, practice, organisations ….
• EDI is not a women’s issue
• ISWC 2018 case study
Nudges?
• Get a mentor
• Join a support network
• Ask for help
• Gather people who are smart
• When asked to recommend/serve –
think/check. Institutional Community
Peers
Figure concept: Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
42. Most research done by small groups using bash
scripts, files stores and spreadsheets in modest,
stressful conditions in institutions that haven’t
caught up with team-based multi-platform research.
Institutional Community
Researchers
So, Data-intensive bioscience …
…. software intensive science
…. enabled by RSEs and
…. research objects
43. Universal panacea tech infrastructure?
Standards
Standards
Standards
Translational
Computer Science
Socio-Technical
Design
Often LowTech
maybe not technical
44. Acknowledgements
WorkflowHubClub
ELIXIR-UK ELIXIR HDR-UK
RO-Crate
BioExcel
SSI
SocRSE
eScience Lab
Shoaib Sufi, Stian Soiland-Reyes, StuartOwen, Nick Juty,AlanWilliams, Aleks
Nenadic, Anita Banerji, Chris Child, Finn Bacall, MunazahAndrabi, Paul Brack,
Rachael Ainsworth, Doug Lowe, Gerard Capes, Oliver Woolland, Aitor Apaolza,
EbtisamAlharbi, MeznahAloqalaa,YoYehudi,Andrew Stewart
ELIXIR-Converge has received funding from the European Union’s
Horizon 2020 Research and Innovation programme under grant
agreement No 871075.
EOSC-Life has received funding from the European Union’s Horizon
2020 Research and Innovation programme under grant agreement
No 824087.
Barcelona
Pau Andrio, Adam Hospital, Javier
Conejero, Luis Jordá, Marc Del Pino, Laia
Codo, Daniele Lezzi, Rosa M. Badia,
Modesto Orozco, Josep Ll. Gelpi, José Mª
Fernández, Laura Rodriguez-Navas,
MiguelVazquez, Mercè Crosas, , Salvador
Capella-Gutierrez, Laura Portell Silva
Madrid
Daniel Garijo, Oscar Corcho, Mark
Wilkinson