SlideShare a Scribd company logo
1 of 44
Love, Money, Fame, Nudge:
Enabling Data-intensive BioScience through
Digital Infrastructure Nudging
Professor Carole Goble CBE FREng FBCS CITP
eScience Lab, Dept of Computer Science, The University of Manchester, UK
Joint Head of Node ELIXIR-UK, Digital lead EU-IBISBA
Software Sustainability Institute UK
carole.goble@manchester.ac.uk
https://esciencelab.org.u
k/
BSC Research Seminar/BSC Life Session/Bioinfo4Women seminar, 21st July 2022
“Bioscience has emerged as a data-rich
discipline, in a transformation that is
spreading as widely now as molecular
biology in the twentieth century.
We look forward to supporting new
research careers, where data are valued
and shared widely, where new software is
a natural part of Biology, and where re-
analysis and modelling are as creative as
experimentation in understanding the
rules of life and their applications.”
Prof Andrew Millar FRS FRSE
chair Expert Group UKRI-BBSRC Review of data-intensive
bioscience 2020.
Data Intensive - Knowledge Turning
• Increase Flow of Information
• Scattered resources
• Diverse platforms
• MultipleTeams
• Data sovereignties
• Coordination
• Collaboration [original figure: Josh Sommer]
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8
Open Research, Sharing (intra and inter Teams)
Nature 602, 558-559 (2022)
doi: https://doi.org/10.1038/d41586-022-00402-1
Publications open access
Datasets ‘open as possible, closed
as necessary’
Beyond Data management
• Digital: Software, algorithms, protocols,
workflows, models
• Physical: reagents, antibodies, hardware
Software is a first-class research
output.
Open not always feasible…
Wilkinson, et al.The FAIR Guiding Principles for scientific data management and
stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Source: https://www.technologynetworks.com/informatics/articles/repeatability-
vs-reproducibility-317157
Open is not always enough…
Personally
Productive
Team
Science
The Fundamental Characteristics of aTranslational Scientist
C.TaylorGilliland, JuliaWhite, BarryGee, Rosan Kreeftmeijer-Vegter, Florence Bietrix,Anton E. Ussi, Marian Hajduch, Petr Kocis, Nobuyoshi Chiba, Ryutaro Hirasawa, Makoto
Suematsu, Justin Bryans, Stuart Newman, Matthew D. Hall, and Christopher P. Austin ACS Pharmacology &Translational Science 2019 2 (3), 213-216 DOI: 10.1021/acsptsci.9b00022
Computer
Science
Software
Engineering
Life Science
informatics
++
Digital Infra,
Scholarly
Comms
FAIR & Open
Science
People
Wrangling
Team Science
Translational
Computer
Science
Uber collaborator
Resource DevOps
D. Abramson and M. Parashar, "Translational Research in Computer Science" in Computer, vol. 52, no. 09, pp. 16-23, 2019. doi: 10.1109/MC.2019.2925650
Computer
Science
Software
Engineering
Life Science
informatics
++
Digital Infra,
Scholarly
Comms
FAIR & Open
Science
People
Wrangling
Team Science
Translational
Computer
Science
Computer
Science
Software
Engineering
Life Science
informatics
++
Digital Infra,
Scholarly
Comms
FAIR & Open
Science
People
Wrangling
Team Science
Translational
Computer
Science
Standards
Standards
Standards
Two+ decades of preaching “revolution”
for FAIR and Open Research …
• Releasing ResearchObjects
• FAIR Methods Commons
• Better Software Better Research
Nudging.
What if we changed science scholarship?
Or as a PDF +
supplementary
materials
Research
outcomes
more than
just
publications
and data
Publish each object in its own metadata
& repository
FAIR Digital Objects
Connected knowledge
Reproducible results
What if we released research objects?
All the related research objects needed to reuse & reproduce
results as a object that is a new currency of scholarship?
Living objects?
released research rather
than “published” it?
A new way of exchanging, archiving, reporting, citing research outcomes
A new way of actionable knowledge units
Metadata objects
Self-described context,
dependencies and
relationships between the
objects?
Virtual objects referencing
scattered resources
Scaled up and working
across all platforms?
Moving knowledge
between different teams.
Objects are
Actionable knowledge units
Digital twins??
Do Research
with objects
Plan and Assemble
Methods, Materials
Objects
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Publish Research
objects
Share
Results
Manage
Results
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
Experiment
Observe
Simulate
Describe and release the data, software, workflows, research
as its being created, updated and used
Treat ALL Products and ALL Research Like we treat Open Source Software
Mesirov,J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
Concept in different forms
FAIR DigitalObjects for Science: From Data Pieces to Actionable
Knowledge Units: https://doi.org/10.3390/publications8020021
European Open Science Cloud EOSC Interoperability Framework
https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-
01aa75ed71a1/language-en/format-PDF/source-190308283
“Turning the Internet into
a meaningful data space”
Concept in different forms
https://mobilizecbk.med.umich.edu/
https://nanopub.org
https://jupyter.org/
https://elifesciences.org/collections/d7281
9a9/executable-research-articles
Learning Health Systems Executable Notebooks
Nanopublications
Executable Papers
Standards
Standards
Standards
Knowledge Graphs & Linked Data
Object-oriented systems at web scale
Privacy preserving
Cloud, Fog and Edge Computing
Standardised Web
BIG DATA ANALYTICS
Federated and Privacy
Preserving Analytics
Distributed
Computing
Translational
Computer Science
Legacy processes
Embed, Scale, Sustain
Stakeholders?
Adoption
Migration
Opportunity Costs and Blockers
Reward system of science!
Infrastructure
andTools
Text mining
data & software credit
AI/Machine learning/MLOps
pipelines, credit, auto-magic
documentation, ontology mapping,
provenance tracking, object trajectories,
object maintenance
blockchain
the Cameron Neylon
incentive equation
Incentive= Interest
Friction
x
Number of
people
benefitting
Cameron Neylon, BOSC 2013, http://cameronneylon.net/
Incentives to change behaviour
incite interest and reduce friction
Side effects
Stealth
Tweaks
Ramps
Socio-technical Manipulations
Choice Architecture
Libertarian Paternalism
Richard H.Thaler,Cass R. Sunstein, 2008
Nudge examples
Make your data objects findable?
Sneak in schema.org and use a search engine!
Get data collection well annotated
with metadata ?
Smuggle ontologies into Excel!
Want to get folks to open up their
datasets? Give them a (sort of) choice…
Research Object Nudge? Packaging
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Platform independent standards-based metadata framework for
bundling resources with context into citable reproducible packages
capsules
Archive
Platform specific
solutions
Linked data –
machine
processable
and at web
scale
Specific Nudge? Packaging over Distributed & Diverse
content, Honouring Legacy systems
Integrated view spanning over
fragmented resources using PIDs and
metadata
Linked data – machine
processable and at web scale
unbounded, self-describing, extensible
reference real and digital entities of any kind
Knowledge Graph in and out the box…
In a Research Object
Across Research Objects
Who do we have to
nudge? Developers!
For it is they who will adapt
repositories, fiddle with
journal infra and build the user
applications!
Killer Nudge? Developer Friendly Packaging
http://www.researchobject.org/ro-crate/
Structured archive to aggregate files and any
URI-addressable content, with contextual
information to aid decisions about re-use.
Soiland-Reyes, S et al. ‘Packaging Research Artefacts with RO-Crate’. 1 Jan. 2022 : 1 – 42.
Developer Friendly
just enough just in time underware
Practical, lightweight robust
Web native, off-the-shelf, Lo-Tek
• Machine and human readable
• Search engine and developer friendly
• JSON-LD and Schema.org
Limited flexibility, Fewer features,
documentation, examples, libraries,
tools, community
Domain diversity and legacy
• Duck typing self-describing profiles,
extensible with additional metadata and
pre-existing ontologies
Peter Sefton
Semantic Web world vs Real World
Open Repositories & Dig Lib Community
• Simpler opinionated guide to best practices
An underware nudge goes a long way…
Describe boxes of entities…
• Exchange between repositories
• Exchange between services
• Transfer collections of secure
distributed datasets
• Describe and archive datasets
• Citation aggregation and tracking
• Reproducibility
• Provenance collection
• FAIR Digital Objects …. NIH Commons.
Soiland-Reyes, Sefton, et al (2022): Creating lightweight FAIR Digital Objects with RO-
Crate. Research Ideas and Outcomes, 1st Intl Conf on FAIR Digital Objects (submitted)
EOSC4Cancer
EuroScienceGateway
FAIR-IMPACT
Digital Twinning …
https://biodt.eu/
predict biodiversity
dynamics
… coupled with Methods
Hardisty et al The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital
Mobilisation of Natural History Collections. Data Intelligence 2022; 4 (2): 320–341. doi: https://doi.org/10.1162/dint_a_00134
Data Intensive bioscience =
Software intensive bioscience
Computational R*
CryoEM Image Analysis
Metagenomic Pipelines
[Rob Finn]
[Carlos Oscar Sorzano Sanchez] [Romain Dallet]
HighThroughput Sequencing
[Fabrice Allain]
Computational Workflows
Multi-step reproducible processing for
(federated) data analytics, data
processing pipelines, simulation
sweeps
Nudge – 300+ workflow systems,
executable notebook systems,
scripting platforms
Data Intensive bioscience =
Workflow research
Computational R*
CryoEM Image Analysis
Metagenomic Pipelines
[Rob Finn]
[Carlos Oscar Sorzano Sanchez] [Romain Dallet]
HighThroughput Sequencing
[Fabrice Allain]
Data Intensive bioscience =
Software intensive ResearchObjects
Registration
Execution
Benchmarking, Monitoring
Workflow
registry
Nudge needed? Reuse, Repurpose, Recycle
Computational know-how Compositional
Collaborative efforts
Variant development
Method research objects
Skilled work
Validation and verification
Optimisation
Maintenance
Community
Design for reuse
FAIR units
Multi-platform portability
Multiplier effects
Developers: Users Ratio
Sharing Workflow building blocks on WorkflowHub
BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows.
Pau Andrio,Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Stian Soiland-Reyes, Carole Goble, Daniele
Lezzi, Rosa M. Badia, ModestoOrozco & Josep Ll. Gelpi Nature Scientific Data, 09/2019,Volume 6, Issue 1, p.169, (2019)
https://workflowhub.eu/programmes/2
Registration?
Couple with Github
repos
Offer access control
Metadata?
Mine workflows
OnboardWfMS
Sharing?
DOIs, credit tracking
Recognition? Policy
Promotion committees
Funder impact reviews
Journal mandates
Software-Intensive Bioscience
Better Software Better Research https://www.software.ac.uk/
56% UK researchers develop their own
research software or scripts
73% UK researchers have had no formal
software engineering training
**Survey 15 UK universities, 2014, 406 respondents
https://www.slideshare.net/carolegoble/better-software-better-research
92% UK researchers use software
RSECon2019
Nudge? A Name and a Mobilisation
2013
https://society-rse.org/
Professionalisation of Research Software
Nudge? Professionalisation
30 UK RSE groups
Research Software Engineers
SocRSE 610+ members
University of Manchester Research IT, 25 RSEs
6th RSE Conference (2022) 350 RSEs +
remote, from 13 different countries
https://society-rse.org/
Goldacre, B & Morley, J. (2022). Better, Broader, Safer: Using
health data for research and analysis. A review commissioned
by the Secretary of State for Health and Social Care.
Department of Health and Social Care.
Recommendation 9.
Recognise software development as a central
feature of all good work with data.
UKRI/ NIHR should provide open,
competitive, high status, standalone funding
for software projects and developers working
on health data.
Universities should embrace Research
Software Engineering (RSE) as an
intellectually and academically creative
collaborative discipline, especially in health,
with realistic salaries and recognition.
Data Intensive Bioscience -> RSE intensive Teams
Knowledge Turning through People
Crusoe MR et al, Methods included: standardizing computational reuse and
portability with the Common Workflow Language CACM 65(6): 54–63 (2022)
women 7
men 18
(39%)
Women in Informatics / Engineering
eScience Lab 7/18 39%
Computer Science faculty 18/79 23%
RSE fellowship awards 3/19 15%
Elected Fellows 122/1607 7.5%
Head of Node 4/23 17%
Deputy Head of Node 5/23 30%
Then ? … 1979
• Programmed ICL 2900 Series Mainframes at an all
girls school.
• 1st intake software based computer science degree
in Manchester.
• Appointed to faculty as teaching staff @24yrs; full
professor @39yrs
• UKWomen in Computing & schoolgirl workshops
late 1980s-early 1990s
• Spoke at 1st Grace Hopper Conference, 1994
Professors 0/4
Other Faculty 3/28
Tech support staff 0/31
Computer Operators 3/3
Operating Clerks 8/8
https://ghc.anitab.org/
And Now ? … 2017
European Open Science Cloud Symposium 2017
Man-panel of Science Drivers
July 2022
Nice Girl
Syndrome
EDI policies, practice, organisations ….
• EDI is not a women’s issue
• ISWC 2018 case study
Nudges?
• Get a mentor
• Join a support network
• Ask for help
• Gather people who are smart
• When asked to recommend/serve –
think/check. Institutional Community
Peers
Figure concept: Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
Most research done by small groups using bash
scripts, files stores and spreadsheets in modest,
stressful conditions in institutions that haven’t
caught up with team-based multi-platform research.
Institutional Community
Researchers
So, Data-intensive bioscience …
…. software intensive science
…. enabled by RSEs and
…. research objects
Universal panacea tech infrastructure?
Standards
Standards
Standards
Translational
Computer Science
Socio-Technical
Design
Often LowTech
maybe not technical
Acknowledgements
WorkflowHubClub
ELIXIR-UK ELIXIR HDR-UK
RO-Crate
BioExcel
SSI
SocRSE
eScience Lab
Shoaib Sufi, Stian Soiland-Reyes, StuartOwen, Nick Juty,AlanWilliams, Aleks
Nenadic, Anita Banerji, Chris Child, Finn Bacall, MunazahAndrabi, Paul Brack,
Rachael Ainsworth, Doug Lowe, Gerard Capes, Oliver Woolland, Aitor Apaolza,
EbtisamAlharbi, MeznahAloqalaa,YoYehudi,Andrew Stewart
ELIXIR-Converge has received funding from the European Union’s
Horizon 2020 Research and Innovation programme under grant
agreement No 871075.
EOSC-Life has received funding from the European Union’s Horizon
2020 Research and Innovation programme under grant agreement
No 824087.
Barcelona
Pau Andrio, Adam Hospital, Javier
Conejero, Luis Jordá, Marc Del Pino, Laia
Codo, Daniele Lezzi, Rosa M. Badia,
Modesto Orozco, Josep Ll. Gelpi, José Mª
Fernández, Laura Rodriguez-Navas,
MiguelVazquez, Mercè Crosas, , Salvador
Capella-Gutierrez, Laura Portell Silva
Madrid
Daniel Garijo, Oscar Corcho, Mark
Wilkinson

More Related Content

More from Carole Goble

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerCarole Goble
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 

More from Carole Goble (20)

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 

Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through Digital Infrastructure Nudging

  • 1. Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through Digital Infrastructure Nudging Professor Carole Goble CBE FREng FBCS CITP eScience Lab, Dept of Computer Science, The University of Manchester, UK Joint Head of Node ELIXIR-UK, Digital lead EU-IBISBA Software Sustainability Institute UK carole.goble@manchester.ac.uk https://esciencelab.org.u k/ BSC Research Seminar/BSC Life Session/Bioinfo4Women seminar, 21st July 2022
  • 2. “Bioscience has emerged as a data-rich discipline, in a transformation that is spreading as widely now as molecular biology in the twentieth century. We look forward to supporting new research careers, where data are valued and shared widely, where new software is a natural part of Biology, and where re- analysis and modelling are as creative as experimentation in understanding the rules of life and their applications.” Prof Andrew Millar FRS FRSE chair Expert Group UKRI-BBSRC Review of data-intensive bioscience 2020.
  • 3. Data Intensive - Knowledge Turning • Increase Flow of Information • Scattered resources • Diverse platforms • MultipleTeams • Data sovereignties • Coordination • Collaboration [original figure: Josh Sommer] Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8
  • 4. Open Research, Sharing (intra and inter Teams) Nature 602, 558-559 (2022) doi: https://doi.org/10.1038/d41586-022-00402-1 Publications open access Datasets ‘open as possible, closed as necessary’ Beyond Data management • Digital: Software, algorithms, protocols, workflows, models • Physical: reagents, antibodies, hardware Software is a first-class research output.
  • 5. Open not always feasible… Wilkinson, et al.The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 Source: https://www.technologynetworks.com/informatics/articles/repeatability- vs-reproducibility-317157 Open is not always enough…
  • 6. Personally Productive Team Science The Fundamental Characteristics of aTranslational Scientist C.TaylorGilliland, JuliaWhite, BarryGee, Rosan Kreeftmeijer-Vegter, Florence Bietrix,Anton E. Ussi, Marian Hajduch, Petr Kocis, Nobuyoshi Chiba, Ryutaro Hirasawa, Makoto Suematsu, Justin Bryans, Stuart Newman, Matthew D. Hall, and Christopher P. Austin ACS Pharmacology &Translational Science 2019 2 (3), 213-216 DOI: 10.1021/acsptsci.9b00022
  • 7. Computer Science Software Engineering Life Science informatics ++ Digital Infra, Scholarly Comms FAIR & Open Science People Wrangling Team Science Translational Computer Science Uber collaborator Resource DevOps D. Abramson and M. Parashar, "Translational Research in Computer Science" in Computer, vol. 52, no. 09, pp. 16-23, 2019. doi: 10.1109/MC.2019.2925650
  • 8. Computer Science Software Engineering Life Science informatics ++ Digital Infra, Scholarly Comms FAIR & Open Science People Wrangling Team Science Translational Computer Science
  • 9. Computer Science Software Engineering Life Science informatics ++ Digital Infra, Scholarly Comms FAIR & Open Science People Wrangling Team Science Translational Computer Science
  • 10. Standards Standards Standards Two+ decades of preaching “revolution” for FAIR and Open Research … • Releasing ResearchObjects • FAIR Methods Commons • Better Software Better Research Nudging.
  • 11. What if we changed science scholarship? Or as a PDF + supplementary materials Research outcomes more than just publications and data Publish each object in its own metadata & repository FAIR Digital Objects Connected knowledge Reproducible results
  • 12. What if we released research objects? All the related research objects needed to reuse & reproduce results as a object that is a new currency of scholarship? Living objects? released research rather than “published” it? A new way of exchanging, archiving, reporting, citing research outcomes A new way of actionable knowledge units Metadata objects Self-described context, dependencies and relationships between the objects? Virtual objects referencing scattered resources Scaled up and working across all platforms? Moving knowledge between different teams. Objects are Actionable knowledge units Digital twins??
  • 13. Do Research with objects Plan and Assemble Methods, Materials Objects Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Publish Research objects Share Results Manage Results Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015 Experiment Observe Simulate Describe and release the data, software, workflows, research as its being created, updated and used Treat ALL Products and ALL Research Like we treat Open Source Software Mesirov,J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
  • 14. Concept in different forms FAIR DigitalObjects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021 European Open Science Cloud EOSC Interoperability Framework https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5- 01aa75ed71a1/language-en/format-PDF/source-190308283 “Turning the Internet into a meaningful data space”
  • 15. Concept in different forms https://mobilizecbk.med.umich.edu/ https://nanopub.org https://jupyter.org/ https://elifesciences.org/collections/d7281 9a9/executable-research-articles Learning Health Systems Executable Notebooks Nanopublications Executable Papers
  • 16. Standards Standards Standards Knowledge Graphs & Linked Data Object-oriented systems at web scale Privacy preserving Cloud, Fog and Edge Computing Standardised Web BIG DATA ANALYTICS Federated and Privacy Preserving Analytics Distributed Computing Translational Computer Science Legacy processes Embed, Scale, Sustain Stakeholders? Adoption Migration Opportunity Costs and Blockers Reward system of science! Infrastructure andTools Text mining data & software credit AI/Machine learning/MLOps pipelines, credit, auto-magic documentation, ontology mapping, provenance tracking, object trajectories, object maintenance blockchain
  • 17. the Cameron Neylon incentive equation Incentive= Interest Friction x Number of people benefitting Cameron Neylon, BOSC 2013, http://cameronneylon.net/
  • 18. Incentives to change behaviour incite interest and reduce friction Side effects Stealth Tweaks Ramps Socio-technical Manipulations Choice Architecture Libertarian Paternalism Richard H.Thaler,Cass R. Sunstein, 2008
  • 19. Nudge examples Make your data objects findable? Sneak in schema.org and use a search engine! Get data collection well annotated with metadata ? Smuggle ontologies into Excel! Want to get folks to open up their datasets? Give them a (sort of) choice…
  • 20. Research Object Nudge? Packaging Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ Platform independent standards-based metadata framework for bundling resources with context into citable reproducible packages capsules Archive Platform specific solutions Linked data – machine processable and at web scale
  • 21. Specific Nudge? Packaging over Distributed & Diverse content, Honouring Legacy systems Integrated view spanning over fragmented resources using PIDs and metadata Linked data – machine processable and at web scale unbounded, self-describing, extensible reference real and digital entities of any kind
  • 22. Knowledge Graph in and out the box… In a Research Object Across Research Objects
  • 23. Who do we have to nudge? Developers! For it is they who will adapt repositories, fiddle with journal infra and build the user applications! Killer Nudge? Developer Friendly Packaging http://www.researchobject.org/ro-crate/ Structured archive to aggregate files and any URI-addressable content, with contextual information to aid decisions about re-use. Soiland-Reyes, S et al. ‘Packaging Research Artefacts with RO-Crate’. 1 Jan. 2022 : 1 – 42.
  • 24. Developer Friendly just enough just in time underware Practical, lightweight robust Web native, off-the-shelf, Lo-Tek • Machine and human readable • Search engine and developer friendly • JSON-LD and Schema.org Limited flexibility, Fewer features, documentation, examples, libraries, tools, community Domain diversity and legacy • Duck typing self-describing profiles, extensible with additional metadata and pre-existing ontologies Peter Sefton Semantic Web world vs Real World Open Repositories & Dig Lib Community • Simpler opinionated guide to best practices
  • 25. An underware nudge goes a long way… Describe boxes of entities… • Exchange between repositories • Exchange between services • Transfer collections of secure distributed datasets • Describe and archive datasets • Citation aggregation and tracking • Reproducibility • Provenance collection • FAIR Digital Objects …. NIH Commons. Soiland-Reyes, Sefton, et al (2022): Creating lightweight FAIR Digital Objects with RO- Crate. Research Ideas and Outcomes, 1st Intl Conf on FAIR Digital Objects (submitted) EOSC4Cancer EuroScienceGateway FAIR-IMPACT
  • 26. Digital Twinning … https://biodt.eu/ predict biodiversity dynamics … coupled with Methods Hardisty et al The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections. Data Intelligence 2022; 4 (2): 320–341. doi: https://doi.org/10.1162/dint_a_00134
  • 27. Data Intensive bioscience = Software intensive bioscience Computational R* CryoEM Image Analysis Metagenomic Pipelines [Rob Finn] [Carlos Oscar Sorzano Sanchez] [Romain Dallet] HighThroughput Sequencing [Fabrice Allain] Computational Workflows Multi-step reproducible processing for (federated) data analytics, data processing pipelines, simulation sweeps Nudge – 300+ workflow systems, executable notebook systems, scripting platforms
  • 28. Data Intensive bioscience = Workflow research Computational R* CryoEM Image Analysis Metagenomic Pipelines [Rob Finn] [Carlos Oscar Sorzano Sanchez] [Romain Dallet] HighThroughput Sequencing [Fabrice Allain]
  • 29. Data Intensive bioscience = Software intensive ResearchObjects Registration Execution Benchmarking, Monitoring Workflow registry
  • 30. Nudge needed? Reuse, Repurpose, Recycle Computational know-how Compositional Collaborative efforts Variant development Method research objects Skilled work Validation and verification Optimisation Maintenance Community Design for reuse FAIR units Multi-platform portability Multiplier effects Developers: Users Ratio
  • 31. Sharing Workflow building blocks on WorkflowHub BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows. Pau Andrio,Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Stian Soiland-Reyes, Carole Goble, Daniele Lezzi, Rosa M. Badia, ModestoOrozco & Josep Ll. Gelpi Nature Scientific Data, 09/2019,Volume 6, Issue 1, p.169, (2019) https://workflowhub.eu/programmes/2 Registration? Couple with Github repos Offer access control Metadata? Mine workflows OnboardWfMS Sharing? DOIs, credit tracking Recognition? Policy Promotion committees Funder impact reviews Journal mandates
  • 32. Software-Intensive Bioscience Better Software Better Research https://www.software.ac.uk/ 56% UK researchers develop their own research software or scripts 73% UK researchers have had no formal software engineering training **Survey 15 UK universities, 2014, 406 respondents https://www.slideshare.net/carolegoble/better-software-better-research 92% UK researchers use software
  • 33. RSECon2019 Nudge? A Name and a Mobilisation 2013 https://society-rse.org/ Professionalisation of Research Software
  • 34. Nudge? Professionalisation 30 UK RSE groups Research Software Engineers SocRSE 610+ members University of Manchester Research IT, 25 RSEs 6th RSE Conference (2022) 350 RSEs + remote, from 13 different countries https://society-rse.org/
  • 35. Goldacre, B & Morley, J. (2022). Better, Broader, Safer: Using health data for research and analysis. A review commissioned by the Secretary of State for Health and Social Care. Department of Health and Social Care. Recommendation 9. Recognise software development as a central feature of all good work with data. UKRI/ NIHR should provide open, competitive, high status, standalone funding for software projects and developers working on health data. Universities should embrace Research Software Engineering (RSE) as an intellectually and academically creative collaborative discipline, especially in health, with realistic salaries and recognition.
  • 36. Data Intensive Bioscience -> RSE intensive Teams Knowledge Turning through People Crusoe MR et al, Methods included: standardizing computational reuse and portability with the Common Workflow Language CACM 65(6): 54–63 (2022) women 7 men 18 (39%)
  • 37. Women in Informatics / Engineering eScience Lab 7/18 39% Computer Science faculty 18/79 23% RSE fellowship awards 3/19 15% Elected Fellows 122/1607 7.5% Head of Node 4/23 17% Deputy Head of Node 5/23 30%
  • 38. Then ? … 1979 • Programmed ICL 2900 Series Mainframes at an all girls school. • 1st intake software based computer science degree in Manchester. • Appointed to faculty as teaching staff @24yrs; full professor @39yrs • UKWomen in Computing & schoolgirl workshops late 1980s-early 1990s • Spoke at 1st Grace Hopper Conference, 1994 Professors 0/4 Other Faculty 3/28 Tech support staff 0/31 Computer Operators 3/3 Operating Clerks 8/8 https://ghc.anitab.org/
  • 39. And Now ? … 2017 European Open Science Cloud Symposium 2017 Man-panel of Science Drivers
  • 41. EDI policies, practice, organisations …. • EDI is not a women’s issue • ISWC 2018 case study Nudges? • Get a mentor • Join a support network • Ask for help • Gather people who are smart • When asked to recommend/serve – think/check. Institutional Community Peers Figure concept: Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
  • 42. Most research done by small groups using bash scripts, files stores and spreadsheets in modest, stressful conditions in institutions that haven’t caught up with team-based multi-platform research. Institutional Community Researchers So, Data-intensive bioscience … …. software intensive science …. enabled by RSEs and …. research objects
  • 43. Universal panacea tech infrastructure? Standards Standards Standards Translational Computer Science Socio-Technical Design Often LowTech maybe not technical
  • 44. Acknowledgements WorkflowHubClub ELIXIR-UK ELIXIR HDR-UK RO-Crate BioExcel SSI SocRSE eScience Lab Shoaib Sufi, Stian Soiland-Reyes, StuartOwen, Nick Juty,AlanWilliams, Aleks Nenadic, Anita Banerji, Chris Child, Finn Bacall, MunazahAndrabi, Paul Brack, Rachael Ainsworth, Doug Lowe, Gerard Capes, Oliver Woolland, Aitor Apaolza, EbtisamAlharbi, MeznahAloqalaa,YoYehudi,Andrew Stewart ELIXIR-Converge has received funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 871075. EOSC-Life has received funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 824087. Barcelona Pau Andrio, Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Daniele Lezzi, Rosa M. Badia, Modesto Orozco, Josep Ll. Gelpi, José Mª Fernández, Laura Rodriguez-Navas, MiguelVazquez, Mercè Crosas, , Salvador Capella-Gutierrez, Laura Portell Silva Madrid Daniel Garijo, Oscar Corcho, Mark Wilkinson