Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scott Edmunds: Using FAIR principles for more Open & Democratic Science


Published on

Scott Edmunds at the Dobzhansky Center, 24th August 2017: Using FAIR principles for more Open & Democratic Science

Published in: Science
  • Be the first to comment

  • Be the first to like this

Scott Edmunds: Using FAIR principles for more Open & Democratic Science

  1. 1. Using FAIR principles for more Open & Democratic Science 'if I have seen further it is by standing on the shoulders of giants'. Scott Edmunds Dobzhansky Center, 24th August 2017
  2. 2. Scientists: need to convince public + politicians
  3. 3. Buckheit & Donoho: Scholarly articles are merely advertisement of scholarship. The actual scholarly artifacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible. Scientists: what we are doing instead
  4. 4. Not curing diseases Scientists: what we are doing instead
  5. 5. Scientists: what we are doing instead Focusing on unscientific unreproducibile metrics Incentivising short term-citations
  6. 6. JIFBAIT Network more GWAS GWAS JIFBAIT NEWS Arsenic Life forms, will they take over the planet? By Melba Ketchum, PhD Which Overhyped, Unreproducible Experiment Are You? Want rapid citations for 2 years only? Carry out this quiz. You got: STAP Cells Of course dipping cells in coffee will make them pluripotent. Even if the research gets discredited, it’ll still get 100’s of citations in two years.
  7. 7. Publish or impoverish: An investigation of the monetary reward system of science in China (1999-2016) Scientists: what we are doing instead
  8. 8. Attempts to “game the peer-review system on an industrial scale” 1. 2. Companies offering authorship of papers made to order by “paper mills”1. Common ghostwriting medical papers by pharma2 Guaranteed publication in JIF journal, often using fake referees, ID theft, etc. Scientists: what we are doing instead
  9. 9.
  10. 10. A more FAIR approach: Open Science?
  11. 11. What is open science? 5 flavours: Benedikt Fecher and Sascha Friesike:
  12. 12. Democratic:
  13. 13. Biggest Challenge: Closed Access WWW.RIGHTTORESEARCH.ORG
  14. 14. The Solution: Open Access “By “open access” to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” Budapest Open Access Initiative: • Maximizes reuse and access • Gives authors control over the integrity of their work and the right to be properly acknowledged and cited. • “Real” OA asks for no restrictions/limitations = CC-BY
  15. 15. The Solution: Open Data
  16. 16. • Review • Data • Software • Models • Pipelines • Re-use… = Credit } Credit where credit is overdue: “One option would be to provide researchers who release data to public repositories with a means of accreditation.” “An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “ Nature Biotechnology 27, 579 (2009) New incentives/credit
  17. 17. Pragmatic: Infrastructure:
  18. 18.
  19. 19. Lots of potential workflows…
  20. 20. Rewarding the
  21. 21. Rewarding open data Launched July 2012. Publishes “Data Notes” for CC0 data. Uses ISA-Tab.
  22. 22. Rewarding open data APC covers storage in GigaDB
  23. 23. Rewarding open data Building tools (inc Jbrowse) on top of datasets…
  24. 24. 1. 2. Pre-prints becoming the norm Don’t be 2nd: claim priority
  25. 25. / Now with bioRxiv integration GigaScience embraces
  26. 26. Publons + AcademicKarma = credit for reviewers efforts Credit transparency/open peer review
  27. 27. Research Objects: a concept & model • Supporting publication of more than just PDFs, making data, code, & other resources first class citizens of scholarship. • Recognizing that there is often a need to publish collections of these resources together as one shareable, cite-able resource. • Enriching these resources and collections with any & all additional information required to make research reusable, & reproducible!
  28. 28. Workflows Reward Sharing of Workflows
  29. 29. Virtual Machines/containers • Downloadable as virtual harddisk/available as Amazon Machine Image • Now publishing container (docker) submissions
  30. 30. First journal with deep integration with Launched 2nd June 2016 Reward better handling of “wet” protocols… • Create, share, modify forkeable protocols in repo. • Download & run on smartphone app. • Get discoverability, credit, DOIs for sharing methods. • Create your own, or let us set up & you claim.
  31. 31. New Integration: Code Ocean Cloud-based executable research platform Browse, share & run code on AWS Creates compute capsule: encapsulation of the data, code, and computation environment Integration into the paper, share via DOIs First examples just published in GigaScience Integrated plugin into GigaDB Share your code this way!
  32. 32. After open: FAIR
  33. 33. A mnemonic to remember: FAIR Lots of models/standards/guidelines Where does that leave us?
  34. 34. A mnemonic to remember: FAIR
  35. 35. Beyond a mnemonic: FAIR ecosystems FAIRifier tool
  36. 36. DTL/ELIXIR-NL “Bring Your Own Data Party” GigaScience/BGI HK Metabolomics ISA-TAB athon v More FAIR mnemonics: “BYODs”
  37. 37. FAIR Data in the wild Taking a microscope to the publication process
  38. 38.
  39. 39. How FAIR can we get? Data sets Analyses Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 >50,000 accesses & 885 citations Open-Code 7 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Code in sourceforge under GPLv3: >40,000 downloads Enabled code to being picked apart by bloggers in wiki
  40. 40. Can we reproduce results? SOAPdenovo2 S. aureus pipeline
  41. 41. The SOAPdenovo2 Case study Subject to and test with 3 models: Data Method/Experi mental protocol Findings Types of resources in an RO ISA-TAB/ISA2OWL Nanopublication Wfdesc/ISA- TAB/ISA2OWL Models to describe each resource type
  42. 42. 1. While there are huge improvements to the quality of the resulting assemblies, other than the tables it was not stressed in the text that the speed of SOAPdenovo2 can be slightly slower than SOAPdenovo v1. 2. In the testing an assessment section (page 3), based on the correct results in table 2, where we say the scaffold N50 metric is an order of magnitude longer from SOAPdenovo2 versus SOAPdenovo1, this was actually 45 times longer 3. Also in the testing an assessment section, based on the correct results in table 2, where we say SOAPdenovo2 produced a contig N50 1.53 times longer than ALL-PATHS, this should be 2.18 times longer. 4. Finally in this section, where we say the correct assembly length produced by SOAPdenovo2 was 3-80 fold longer than SOAPdenovo1, this should be 3-64 fold longer.
  43. 43. Lessons Learned • Most published research findings are false. Or at least have errors • With enough effort is possible to push button(s) & recreate a result from a paper with current tools • Being FAIR can be COSTLY. How much are you willing to spend? Who will build FAIR infrastructure? • Much easier to make things FAIR before rather than after publication. BYODs useful intermediate here
  44. 44. Public:
  45. 45. Public: Citizen Science Galaxy Zoo: Zoonoverse: 1M> “Zooites” and counting
  46. 46. Public: Games with a Purpose
  47. 47. How do we do we get citizens involved in Genomics? How do we do this in Hong Kong?
  48. 48. Community Genomics: The Inspiration
  49. 49. HK Botanical & Afforestation Dept. "The mysterious origin of the tree & its magnificent flowers at once arrest the interest. The Bauhinia Mystery? 1903 So far, all efforts to identify them with any foreign species have failed"
  50. 50. Courtesy of: Archives des Missions Etrangère de Paris
  51. 51.
  52. 52.
  53. 53. As seen on…
  54. 54. Education: (OER) teaching materials
  55. 55. Education: reproducible research
  56. 56. Education: sharing FAIR data
  57. 57. Education: best practices for metadata
  58. 58. Education: teaching people with the data
  59. 59. Student power (MSc @ CUHK) Education: teaching people with the data
  60. 60. Student power (MSc @ CUHK) Education: teaching people with the data Transcriptomes assembled & annotated by students Looked at GO/KEGG & TCM compounds Looked at parental links (diversity, maternal/paternal)
  61. 61. TEDxCeption: Inspiration2
  62. 62.
  63. 63. Give us your data, papers & pipelines Help GigaPanda make it happen! Contact us:
  64. 64. Thanks to: Laurie Goodman, Editor in Chief Nicole Nogoy, Editor Hans Zauner, Assistant Editor Peter Li, Lead Data Manager Chris Hunter, Lead BioCurator Xiao (Jesse) Si Zhe, Database Developer Chen Qi, Shenzhen Office. All of BGI @GigaScience Follow us: + Weibo & WeChat