Scott Edmunds: Open Science Data = Open Data (a rant in e-minor), talk at on Open Science, Hardware and Environment. 9th March 2015

Scott Edmunds talk at Open Science Data = Open Data (a rant in e-minor)

  1. 1. 0000-0001-6444-1436 @SCEdmunds ODHK Open Science Working Group Open Science Data = Open Data (a rant in e-minor)
  2. 2. Open Data is about better decision making Open Science Data = Open Data As is Open Access, Open Hardware, Open Environmental Data, Open Scholarship… • To make decisions • You need good ideas • Which are based on relevant information • Supported by valuable data • Captured by accurate measures
  3. 3. What is Open (Science) Data? • Something very very very geeky • Free & open access to data about the world around us o Searchable, findable o Machine-readable, app-makeable, Excel-usable o Without restrictions/limitations • This (examples)
  4. 4. Open Science Data = Data Journalism
  5. 5. Open Science Data = Transparency • Evidence based policy making needed on drugs, environment, GMOs, etc. • Casualties of Politics v Science (advisors): UK Gov v David Nutt, EU (Greenpeace) v Anne Glover • Sensitivity over air/water pollution data in China • Sensitivity over radiation data in Japan
  6. 6. Why Open Science Data is the most important open data* *(I may be biased though) Climate change, global hunger, pollution, radioactivity, cancer, disease outbreaks…
  7. 7. To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 In contrast to our example: To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  8. 8. 1.3 The power of intelligently open data The benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro- intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin– producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.
  9. 9. OKFn Open Science Working Group
  10. 10. Open Science Survey (Index?)
  11. 11. Growing # Government OA/OD mandates
  12. 12. Hong Kong: still some work to go with OA …China, Singapore, India beats us
  13. 13. How much does closed data cost us? More profitable than a gold mine See:
  14. 14. Hong Kong: still some work to go with OA Dear Mr. Edmunds, Thank you for your email dated 27 April 2014. Please be informed that the requested information is not maintained in our database system. In addition, as the bulk of the University Grants Committee's recurrent grants are disbursed to institutions in the form of a block grant to provide institutions with flexibility in internal deployment, we do not possess the information on funding / spending for journal subscription. Since the requested information does not exist in our Department, you may wish to approach institutions directly on your request. Regards, University Grants Committee Secretariat Hong Kong Code on Access to Information request on Elsevier spending: Q. How much are we spending on closed access?
  15. 15. Hong Kong: still some work to go with OA Q. How much are we spending? What we do know: • Hong Kong University Grants Committee (UCG) yearly budget for grants = 17.5 Billion HKD (4% of Government spending). • HKU library budget = ~$200M HKD, 78.4% of acquisition budget spent on electronic journals. • In 2011-2012 8 funded institutions published 16,594 papers (inc conference papers and non refereed work). • HKU and Poly U have OA self archiving policies, but no enforcement or open data policies (yet…)
  16. 16. Hong Kong: still some work to go with OA Q. How much are we wasting? What we do know: OA boosts impact 50% and open data leads to a 9% citation boost. Estimates from Dryad that spending $400,000 to archive 2,500 datasets per year contributes to more than 1,000 papers within 4 years. Reproducibility crisis for published research: >50% of the time due to lack of open data. Ioannidis estimate that 85% of research funding wasted because of this. = ~15 Billion HKD wasted
  17. 17. “Faked research is endemic in China” If not Open Data, what are we focussing on instead? 475, 267 (2011) New Scientist, 17th Nov 2012: Nature, 29th September 2010: Science, 29th November 2013: Nature 20th July 2011: “Wide distribution of information is key to scientific progress, yet traditionally, Chinese scientists have not systematically released data or research findings, even after publication.“ “There have been widespread complaints from scientists inside and outside China about this lack of transparency. ” “Usually incomplete and unsystematic, [what little supporting data released] are of little value to researchers and there is evidence that this drives down a paper's citation numbers.”
  18. 18. Chinese Paper Mills: Attempts to “game the peer-review system on an industrial scale” 1. 2. Companies offering authorship of papers made to order by “paper mills”1 . Common ghostwriting medical papers by pharma2 Guaranteed publication in JIF journal, often using fake referees, ID theft, etc.
  19. 19. Chinese Paper Mills: Attempts to “game the peer-review system on an industrial scale” 1. 2.
  20. 20. What could we be doing with open science data? Mojave Solar Farms v Desert Tortoise
  21. 21. What could we be doing with open science data?
  22. 22. What could we be doing with open science data? Hong Kong-Zhuhai-Macau Bridge v Pink Dolphins
  23. 23. ODHK Open Science Working Group