Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building the FAIR Research Commons: A Data Driven Society of Scientists

374 views

Published on

Science is knowledge work. The scientific method and scholarly communication are about facilitating “knowledge turns” – that is, the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR flow and availability of data, methods for automated processing, reproducible results and on a society of scientists coordinating and collaborating. We need to build a new form of Research Commons and I will present my steps towards this.
Presented at Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018 that accompanied the 42nd Dies Natalis where I was awarded an honorary doctorate
Personal video:
https://www.youtube.com/watch?v=k5WN6KDDatU&index=4&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Video of the symposium:
https://www.youtube.com/watch?v=JN9eMMtCHf8&t=19s&index=6&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD

Published in: Science
  • Be the first to comment

Building the FAIR Research Commons: A Data Driven Society of Scientists

  1. 1. Building the FAIR Research Commons: A Data Driven Society of Scientists Professor Carole Goble CBE FREng FBCS The University of Manchester, UK carole.goble@manchester.ac.uk FAIR Research Commons Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018
  2. 2. Data-Driven Science Simulations, data exploration, data processing, analytics, text mining, visual analytics, automated inference…. e-Science: enabling Data Driven Science e-Infrastructure: enabling e-Science Distributed computing Data management, Catalogues Virtual Research Environments Metadata & Semantic Web technologies Software Engineering Products and Services Collaboration, Sharing & Publishing Platforms
  3. 3. Open Science Open Data Reproducible Science Personally Productive Science
  4. 4. “The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 Principles Metadata Identifiers Access policies Technical: Political Social Economic: A Flag, A Meme
  5. 5. The Future of a Data-Driven Society A Society of Scientists Do Data Driven Science Data Driven Scholarship Data contributors, curators, consumers Biodiversity Scientists + Research InfrastructureTechies ProjectTeams……. Of Individuals Collaborating and Competing Simultaneously
  6. 6. KnowledgeTurning Increase Flow of Information • Across scattered resources, platform, people • Coordination, collaboration • Cumulative, Dynamic [original figure: Josh Sommer] Cumulative Commons Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8
  7. 7. • Distributed, Fragmented, Siloed • No single entry point • Living software, models, data, catalogues, tools … What’s the Commons? Resources • collectively created • owned or shared • between or among a community Governance https://scholarlycommons.org/
  8. 8. Macro, Micro*, pooled • public resources • data centres • journals • dedicated projects • governance • majority of researchers • labs & universities • generators • my resources *Meso too – but to complicated for 20 minutes! See http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
  9. 9. Some Data-driven Predictive Science in Ecological Niche Modelling predatory fish the grazer endemic alga [Obst, Leidenberger]
  10. 10. BioSTIF
  11. 11. Do Research Research Infrastructure Services Assemble Methods, Materials Experiment ObserveSimulate Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Marketplace Services Publish Share Results Any research product Selected products Manage Results The Data-Driven Open Science Public + Personal Commons Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
  12. 12. “The questions don’t change but the answers do” Dan Reed, Microsoft Salami Slicing, Scattering
  13. 13. 101 Innovations in Scholarly Communication - the Changing Research Workflow, Boseman and Kramer, 2015, http://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826
  14. 14. Research Infrastructure Services Assemble Methods, Materials Experiment ObserveSimulate Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Marketplace Services Share Results Manage Results Building a FAIR Research Commons Portable Automated Reproducible Methods Supporting Collaborations Science 2.0 Repositories:Time for a Change in Scholarly Communication Assante, Candela,Castelli, Manghi, Pagano DOI: 10.1045/january2015-assante Mesirov,J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
  15. 15. Clear steps Transparent Comprehensible Replicable Logged Accessible Provenance Standardised Harmonised Combined Method Materials Variations X N Repeat. Compare. Log & Track Provenance Scale Data-driven Science, Predictive Science is Software-driven, Method-Driven x
  16. 16. Data ScienceAnalytics Machine learning Discovery, New algorithms Data stewardship Standardisation, Harmonisation, Annotation and enrichment, Maintaining access, preserving Software stewardship Updates, versions, porting Prep & Processing Data wrangling & curation Instrument pipelines Simulation sweeps
  17. 17. Method Commodities Workflows ASAP Automate, Scale, Abstract, Provenance Taverna 14th Anniversary
  18. 18. Methods techniques, algorithms, spec. of the steps, models, versions, robustness, statistical power … Materials datasets, parameters, thresholds, versions, algorithm seeds, reference datasets… Instruments tools, codes, services, scripts, underlying libraries, versions, workflows… Laboratory computational environment, High performance access, Operating system… Data Instruments -> Data Scopes Method Objects, fragile, updating …. Maintain for Running Document for Reading
  19. 19. Software is a first class member of Data-driven Science 56% Of UK researchers develop their own research software or scripts 73% Of UK researchers have had no formal software engineering training Survey of researchers from 15 RussellGroup universities conducted by SSI between August - October 2014. 406 respondents covering representative range of funders, discipline and seniority. Goble, Better Software, Better Research IEEE Internet Computing doi: 10.1109/MIC.2014.88 De Roure, Goble,Software Design for Empowering Scientists IEEE Software doi: 10.1109/MS.2009.22 Research Software Engineers National Capability
  20. 20. 10th Anniversary Workflow Commons Groups Social collaboration, credit and citation around Research Objects Replicate- Reproduce - Remix -Repurpose Reuse – Repurpose – avoid Reinvent
  21. 21. FAIR Workflow Research Object Reproducibility, Portability, Repurpose Repair. Preservation, Executable Publishing Metadata Object metadata, ontologies, identifiers Manifest Provenance Dependencies Versions Checklists Annotations Container System researchobject.org Unbounded Objects
  22. 22. FAIR Methods, Different wflow systems Living Products
  23. 23. Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL 2012 Don’t Publish, Release Analogous to software products and practices rather than data or articles Agile Data-driven Science Treat ALL Products and ALL Research Like Software “evolving manuscript” Sir Mark WalportTime Higher Education Supplement, 14 May 2015
  24. 24. Context Relationships Credit Research Goods FAIR Exchange Governance Stewardship Credit Tracking Lifecycles Fixivity… Arxiv, my Lab myExperiment GitHub, Web Service myWebSite bioModels.org, openModeller PubMed Spreadsheet in figshare ArrayExpress, BioSamples, PRIDE, GBIF, my Lab, institutional repository Overlaying the Research Commons ecosystem Unbounded Composite Living Rots
  25. 25. Tracking, credit mining, comparison, auto- metadata, blockchain, boundary objects…. 1 3 2 A FAIR KnowledgeWeb of Research Objects Map across metadata Threaded publications Navigate, Pivot-Focus, Cite Self-describing
  26. 26. Unit for Reproducibility / Productivity, Portability, Preservation, Executable Publishing researchobject.org Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ Linked Data / Semantic Web FAIR machine processable metadata Standards-based generic metadata framework Provenance Dependencies Versions Checklists Annotations
  27. 27. The time is right … Reproducible Document Stack project Social Technology Process Purpose Publishers, Research Infrastructures, Communities, Library services, Agencies …. Not Jo Public….
  28. 28. Research Infrastructure Services Assemble Methods, Materials Experiment ObserveSimulate Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Marketplace Services Share Results Manage Results Building a FAIR Research Commons Portable Automated Reproducible Methods Supporting collaborations to make & exchange FAIR content
  29. 29. Systems Biology Projects • SME multi-disciplinary groups • Multi-site collaborations • Competing • Experimentalists, dry modellers • Self-deposit, no stewardship skills • Funder driven sharing modellers experimentalists Build a Project Commons!! • Foster stewardship • Stimulate sharing • Ensure retention • Respect global community, local project resources http://fair-dom.org Wolstencroft et al , Nucleic Acids Research, 2016, 10.1093/nar/gkw1032.
  30. 30. 3 Studies Model analysis, construction, validation 24 Assays/Analysis Simulations, characterisations 16 19 13 2 1 Structured organisation Retain context in one place, Release FAIR products Use and deposit in the fragmented resources [Penkler, Snoep]
  31. 31. FAIRDOMHub Systems Biology Commons http://fairdomhub.org Distributed Commons, Integrated View “During and within” publishing Simulate Compare Validate 10th Anniversary
  32. 32. What methods are been used to determine enzyme activity? What SOP was used for this sample? Where is the validation data for this model? Is there any group generating kinetic data? Is this data available? Track versions of my model Whats the relationship between the data and model? Which data belong to which publications? Self-controlled spaces • enclaves -> public Discover own assets One entry point • over external systems
  33. 33. Project Pals Post-docs, Postgrads, Data stewards Building the Commons so they Come The Programme Funders Stewardship Support
  34. 34. TheTragedy of the Commons? FAIR Play? Values of assets of reproducibility of metadata economics of infras. priorities Behaviours enclave sharing hoarding, flirting, voyerism consumer-producer asymmetry playground rules Sweatshop collaborating but competing burden - time, skills short term, shortcuts principle investigators tools & templates seamless join-up automation, stewards reprod. debt is hard The last mile
  35. 35. Self Retention, Access Productivity Quick, Lightweight Simple ShortTerm Credit Trusted & Free Just Enough Skills? Service Sharing Reproducibility Accurate, Reusable Rich LongTerm Credit Sustained Just in Case Stewards Pushing FAIR upstream
  36. 36. “Sloppy ScienceWins” John Ioannidis, Stanford School of Medicine Open Science Fair, Athens 2017
  37. 37. Social Technology Process Professional Stewardship Ramps Defeating Cultural Inertia Overcoming TheTragedy of the Commons Paying for it
  38. 38. By Side Effect
  39. 39. By side effect – metadata for FAIR Universal tagging of Life Science datasets, tools, protocols, training materials Web scale knowledge graph Embedded ontologies and metadata templates Metadata harvesting by stealth https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
  40. 40. Ask what can you and Data Science do for the FAIR Commons?
  41. 41. Building the FAIR Research Commons: A Data Driven Society of Scientists Release FAIR Research Objects Manage Datascopes FAIR play incentives FAIR Research Commons
  42. 42. All the members of the Wf4Ever team Colleagues in Manchester’s Information Management Group, ELIXIR-UK, Bioschemas http://www.researchobject.org http://www.myexperiment.org http://wf4ever.org http://www.fair-dom.org http://www.fairdomhub.org http://seek4science.org http://rightfield.org.uk http://www.bioschemas.org http://www.commonwl.org http://www.bioexcel.eu http://www.openphacts.org https://www.force11.org/ Mark Robinson AlanWilliams Jo McEntyre Norman Morrison Stian Soiland-Reyes Paul Groth Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza Daniel Garijo Catarina Martins Alasdair Gray Rafael Jimenez Iain Buchan Caroline Jay Michael Crusoe Katy Wolstencroft Barend Mons Sean Bechhofer Matthew Gamble Raul Palma Jun Zhao Josh Sommer Matthias Obst Jacky Snoep David Gavaghan Stuart Owen Finn Bacall Paolo Missier Phil Crouch Oscar Corcho Dan Katz Arfon Smith David De Roure Marco Roos Massimilano Assante Paolo Manghi
  43. 43. EXTRAS HIDDEN SLIDES

×