Successfully reported this slideshow.
Your SlideShare is downloading. ×

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Science Data, Responsibly
Science Data, Responsibly
Loading in …3
×

Check these out next

1 of 41 Ad

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment

Download to read offline

Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing

Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment (20)

Advertisement

More from GigaScience, BGI Hong Kong (20)

Recently uploaded (20)

Advertisement

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment

  1. 1. Scott Edmunds, GigaScience/HKU Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment
  2. 2. The Hong Kong experience. Asia’s Academic City? 8 Universities, many ranked top 50 worldwide 100K students (UG/PG/FT/PT) 1 major research funder (UGC/RGC) UGC Policy: “Realization of making Hong Kong Asia's world city is only possible if it is based upon the platform of a very strong education and higher education sector. “ http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
  3. 3. Research Data policies growing globally http://ec.europa.eu/research/openscience/index.cfm?section=monitor&pg=researchdata#1
  4. 4. http://dx.doi.org/10.17477/jcea.2018.17.2.200 …meanwhile in Hong Kong “This ambivalence was reflected by the chairman of the Research Grants Council, who stated in an interview that ‘there is no relationship between world-class research and release of data’, questioning whether anyone might be interested in the completeness of data. The chairman also saw a conflict between competitiveness and openness, arguing that the reputation of a researcher is built on publications, not on the underlying data. “
  5. 5. No policies, Mo’ problems
  6. 6. If Government doesn’t act, Universities need to lead way http://www.rss.hku.hk/integrity/research-data-records-management
  7. 7. First CRIS in HK, built upon Scholars Hub http://hub.hku.hk/advanced-search?location=crisdataset (CRIS = current research information system)
  8. 8. First CRIS in HK, built upon ScholarsHub http://lib.hku.hk/researchdata/rpg.htm “Beginning with the September 2017 intake, all HKU research postgraduate (rpg) students have responsibility for 1) using a data management plan (DMP), where applicable, to describe the use of data in preparation for, or in the generation of their theses, and 2) depositing, where applicable, a dataset in the HKU Scholars Hub.”
  9. 9. Growing # of OA journals addressing this http://dx.doi.org/10.1371/journal.pmed.1001607
  10. 10. CAN WE QUANTIFY IF THIS IS WORKING?
  11. 11. http://reproducibility.cs.arizona.edu/ Arizona Repeatability in Computer Science Experiment • 2015 study examining extent Computer Systems researchers share their research artifacts (code) • NSF policies on sharing code since 2005 • Examined 613 papers from ACM conferences & journals • • Attempted to locate source code that backed up results • If found, tried to build the code.
  12. 12. http://reproducibility.cs.arizona.edu/ Arizona Repeatability in Computer Science Experiment • Manual curation/look for code that backed up results • If missing, emailed authors • Chased if no reply • If found, tried to build the code • Resolve issues • Survey results
  13. 13. http://reproducibility.cs.arizona.edu/ 613 papers tested 123 successful Reproductions (20%) Arizona Repeatability in Computer Science Experiment
  14. 14. Can we do something similar in HK? Teaching HKU MLIM students module on data curation and management.
  15. 15. HKU Repeatability in HK Research Experiment • HKU policy on data sharing from 2015 • PLOS policy mandating sharing of supporting March 1, 2014 • HKU has published ≈400 PLOS ONE papers 2014-date • Can we quantify reproducibility in a sample of these? • Compare with other less stringent journals (e.g. Springer Nature data policy ranked journals1) • Can we follow Arizona and harness crowdsourced (student) power? 1. https://www.springernature.com/gp/authors/research-data-policy/data-policy-types/12327096
  16. 16. HKU Repeatability in HK Research Experiment • Easy exercise in literature curation for HKU MLIM students • Set as a project for 59 students, 2017-2019 http://hub.hku.hk/simple- search?query=&location=publication&sort_by=score&order=desc&rpp=25&filter_field_1=journal&filter_type_1=equals &filter_value_1=plos+one&etal=0&filtername=dateIssued&filterquery=[2014+TO+2019]&filtertype=equals
  17. 17. https://scholarlykitchen.sspnet.org/2018/01/10/future-oa-megajournal/ NPG (Scientific Reports) copies the PLOS One model… Another question: Rise (and fall) of megajournals
  18. 18. HKU Repeatability in HK Research Experiment https://scholarlykitchen.sspnet.org/2016/01/06/plos-one-shrinks-by-11-percent/ Rise (and fall) of megajournals Driven by impact factor or “easier” data policies? “ Because data requirements are not uniform across all journals, PLOS has put itself at a disadvantage as far as attracting authors because other journals offer an easier path. If strictly enforced, this new policy is likely to result in a drop in submissions to PLOS journals. While no other mega-journal has been able to shake PLOS ONE’s hold on the market, this policy may provide an opening for competitors to gain on PLOS ONE and even overtake it.” Can we quantify this?
  19. 19. HKU Repeatability in HK Research Experiment • Students assigned 2 PLOS + 2 SciRep papers (268 total) • Quickly scan paper looking for supporting data • If no data, go to the next paper • If uses data, is it all associated with the paper? • If external data, is it available from URL or accession? • If “data available on request”, are they contactable? • Spend about up to 10mins per article • Add data into googledoc, and teacher double checks & marks students on accuracy Homework/Case study: literature curation exercise
  20. 20. HKU Repeatability in HK Research Experiment Alternative: webscraping option (code in GitHub)… https://github.com/jessesiu/hku_scholars_hub
  21. 21. HKU Repeatability in HK Research Experiment See protocols in protocols.io: http://dx.doi.org/10.17504/protocols.io.6x7hfrn Teachers protocol: http://dx.doi.org/10.17504/protocols.io.6x8hfrw Students protocol: http://dx.doi.org/10.17504/protocols.io.6yahfse
  22. 22. HKU Repeatability in HK Research Experiment Example http://hub.hku.hk/handle/10722/223364
  23. 23. HKU Repeatability in HK Research Experiment Is there data presented in the paper? – Yes Is there external data, and if so what is the link/accession? – No Is all the data in the paper available? – No Comments - Has questionnaire, but not data as says "minimal anonymized dataset will be made available upon request” Example
  24. 24. HKU Repeatability in HK Research Experiment If data “available on request”, do the authors respond if contacted? Example
  25. 25. Interesting examples http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0165978 Several examples of missing Infectious Disease data
  26. 26. Interesting examples Several examples of missing Infectious Disease data http://www.vox.com/2015/6/17/8796225/mers-virus-data-sharing http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966
  27. 27. Results
  28. 28. 148 Papers 114 with data 121 Respond 7 Missing 7 27 data on request Bounce 5 No response 17 121 accessible data (82%) data accessibility
  29. 29. 120 Papers 79 with data 87 Respond 8 Missing 25 16 data on request No response 8 57 accessible data (72.5%) data accessibility
  30. 30. External Data Sources • Growing number of papers hosted data via general-purpose open-access repositories: – figshare (12), Dryad (5), OSF (4), Zenodo (2), Dataverse (2), PANGAEA (2), DANS (1) – Since 2016 figshare use has been dropping & OSF/Zenodo increasing – Large numbers of government, IR & institutional websites – Other than one broken Dryad link, OA data repositories much more stable than other URLs (many broken) https://figshare.com/projects/HKU_Repeatability_in_HK_Research_Experiment/64118
  31. 31. Lessons Learned
  32. 32. Do not rely on handles Instability of older HKU Scholars Hub Identifiers & data • Going back to older (papers collected in early 2017) 3/49 (6%) handles have changed • Checking back over time, the number of 2016/2017/2018 PLOS/SR papers listed keeps increasing (have had to update our results)
  33. 33. Do not rely on “data available from our website” http://bioinformatics.oxfordjournals.org/content/24/11/1381.long
  34. 34. Do not rely on “data available on request” https://doi.org/10.1101/633255
  35. 35. Do not rely on “data available from the government” HK Hospital Authority only shares data with researchers at UGC-funded universities in Hong Kong, with data access charges on average 35,700 HKD per request1 1. https://www.accessinfo.hk/en/request/request_for_statistics_on_data_c 2. https://www.nature.com/articles/s41598-017-15579-z “Thanks for your interest. I'm afraid we can't as the data came from our hospital authority which is highly strict in using of their data and would not allow us to use the data other the purposed we stated before.” So why say it was available upon request? Emailing the authors for the data:
  36. 36. Do not rely on GitHub (or google) https://dev.to/mjraadi/if-you-don-t-know-now-you-know-github-is-restricting-access-for-users-from-iran-and-a- few-other-embargoed-countries-5ga9
  37. 37. Lessons Learned: never trust “data on request” • “Data Available on Request” does not work (65% requests failed after 2 attempts). • Hong Kong Government (esp. Hospital Authority) data access policies incompatible with international journal policies • Email addresses not checked by journals : 5 bounced (one wasn’t even in correct format). 1 example gave a postal address only. • Data Access Committee system not working. None of the DACs of the listed Consortia/Cohort projects responded to emails (Children of 1997, Guangzhou Biobank Cohort Study, JAGES, and China Research Center on Aging DACs). • Even if authors respond there are often problems • t&c’s. e.g.: MTAs or co-authorship, can share a sample of the processed data not the raw data as they were still writing publications. • Data missing, e.g. they deleted the raw sequencing data. https://figshare.com/projects/HKU_Repeatability_in_HK_Research_Experiment/64118
  38. 38. Lessons Learned: problems with Scholars Hub • Unstable identifiers – 6% (3/49) examples changed in 2 years • Unstable indexing – numbers of historic publications keep increasing (self-reporting by authors?) • Unstable source of datasets: one example of data in a thesis that was blocked for a period • Inconsistent indexing/metadata – one example lacked a link/DOI to the paper, inconsistent keywords & tagging • Inconsistent authorship – multiple, unused ORCID IDs registered by HKU https://figshare.com/projects/HKU_Repeatability_in_HK_Research_Experiment/64118
  39. 39. Importance of FAIR snapshots Why GigaScience set up http://gigadb.org/
  40. 40. Importance of FAIR snapshots Why GigaScience set up https://doi.org/10.1093/database/baz016 Foundational Principles • Can’t trust “data available on request” – need independent, trusted broker • Follow FAIR principles (Findability, Accessibility, Interoperability, and Reusability) for data stewardship & offer unlimited data hosting • Use globally unique and persistent (stable) identifiers, e.g. DataCite DOIs • Need to take unlimited sized snapshots of ”version of record” (data, code…) • Increase Reusability with Interoperable CC licensing (we use CC0) • Increase Findability & Reusability with rich open metadata (field specific, DataCite, schema.org) and wide indexing (DataCite, NIH datamed, DCI, etc.)
  41. 41. Thanks to: Laurie Goodman, Editor in Chief Nicole Nogoy, Editor Hans Zauner, Assistant Editor Hongling Zhao, Assistant Editor Peter Li, Lead Data Manager Chris Hunter, Lead BioCurator Chris Armit, Data Scientist Mary Ann Tulli, Data Ediitor Xiao (Jesse) Si Zhe, Database Developer Chen Qi, Shenzhen Office. @GigaScience facebook.com/GigaScience http://gigasciencejournal.com/blog/ Follow us: www.gigasciencejournal.com www.gigadb.org + Weibo & WeChat + HKU MLIM students

×