Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Class 7…giant balancing
'if I have seen further it is by
standing on the shoulders of
giants'.
Scott Edmunds, HKU Data Cur...
Communicating in-class
• Chat channel:
• http://backchannelchat.com/chat/dw131
• Feel free to ask questions, requests to s...
About me:
• Scott Edmunds
• Molecular biology, sci editing & comms
• Scientific journal & (big) data publishing
• Reproduc...
About me:
• Formerly Beijing Genomics Institute
• Founded in 1999 (1% of HGP)
• China’s 1st citizen managed not-for-profit research
...
Open Data Hong Kong
ExCom member
for Open Science
Open Science
Working Group
WHY CURATE DATA?
WHY SHARE DATA?
WHY SHARE DATA?
https://okfn.org/
WHAT EXACTLY IS “OPEN DATA"?
What is open data (公开数据)?
http://opendefinition.org/od/2.0/en/
OKFN: 8 types of open data
http://science.okfn.org/
Research Data ≈ Government Data
Canada's Action Plan on Open Government 2014-16
http://open.canada.ca/en/content/canadas-a...
Research Data policies growing globally
http://ec.europa.eu/research/openscience/index.cfm?section=monitor&pg=researchdata...
https://data.gov.hk
HK has “Public Sector Information"
Why Licensing is Important for:
http://dx.doi.org/10.1186/1756-0500-5-494
Placing restrictions on the reuse of scientific ...
Panton Principles
http://pantonprinciples.org/
=
CC0 better than CC-BY for datasets to prevent “attribution stacking”
Levels of openness: 5★’s of open data
http://5stardata.info
Levels of openness: 5★’s of open data
http://5stardata.info
★ - make your stuff available on the Web (whatever format)
und...
Levels of openness: 5★’s of open data
Exercise: What star rating is this data?
Example: Hong Kong: Dengue Mosquito Breedin...
Levels of openness: 5★’s of open data
http://5stardata.info
Exercise: What star rating is this data?
1. HK FEHD: Distribut...
Why closed data sucks?
https://commons.wikimedia.org/wiki/File:Inner_door_in_forbidden_city.jpg
Hong Kong Edition
https://data.gov.hk
Gov't spend on open data platform =
$1.2M
Gov't spend on 20 rubbish apps =
$20M
http...
What the Gov't builds for $20M What open data can build for free
http://gazetteer.hk/
Hong Kong Edition
Why closed data su...
Open Data as a revenue stream...
Hong Kong Edition
Why closed data sucks?
Open Data as a revenue stream means can't share conservation data...
Why closed data kills spoonbills?
Climate change, global hunger, pollution, cancer,
disease outbreaks…
http://www.nature.com/news/data-sharing-make-outbreak...
Open Data as a revenue stream means can't share cancer data...
https://www.change.org/p/mark-c-capone-ceo-of-myriad-geneti...
Open Data as a revenue (publishing) stream means nobody is sharing ethnic Chinese
control data to enable pharmacogenomics ...
THE REPRODUCIBILITY CRISIS
How research is disseminated
18121665 1869
Consequences of 351 year old incentive systems…
Buckheit & Donoho: Scholarly articles are
merely advertisement of scholars...
The consequences: growing replication gap
1. Ioannidis et al., (2009). Repeatability of published microarray gene expressi...
1. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.1001747
The challenge: reproducibility
Replication rates as low as 11%
http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
https://osf.io/e81xl/wik...
Growing Issue: increasing number of retractions
>15X increase in last decade
Strong correlation of “retraction index” with...
Growing Issue: increasing number of retractions
>15X increase in last decade
Strong correlation of “retraction index” with...
Problem: growing replication gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses...
The Cost of Scientific Retractions?
A: $400,000 per paper
https://elifesciences.org/content/3/e02956
Only policy that counts…IMPACT FACTOR
What is the journal Impact Factor (jIF)?
• Citation Index concept first developed
by Eugene Garfield in 1955 (Science)
• F...
How do you calculate the jIF?
1. Count the total number of citations from the two
years before the IF release year.
2. Cou...
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers publ...
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers publ...
2015 20132014
Two PROBLEMS
1. Rewards/incentivizes short term citations only
Impact factor driven science =
JIFBAIT Network
more
GWAS
GWAS
JIFBAIT NEWS
Arsenic Life forms, will
they take over the planet?
By Melba Ketchum, PhD
Whic...
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers publ...
https://quantixed.wordpress.com/2016/01/05/the-great-curve-ii-citation-distributions-and-
reverse-engineering-the-jif/
http://bjoern.brembs.net/2016/01/even-without-retractions-
top-journals-publish-the-least-reliable-science/
http://iai.asm.org/content/79/10/3855.full
http://iai.asm.org/content/79/10/3855.full
Growing # of journals addressing this
http://dx.doi.org/10.1371/journal.pmed.1001607
QUANTIFYING REPRODUCIBILITY
Data
Same Different
Code
Same
Reproducible Replicable
Different
Robust Generalisabl
https://figshare.com/articles/Publishi...
http://reproducibility.cs.arizona.edu/
Arizona Repeatability in
Computer Science Experiment
• 2015 study examining extent ...
http://reproducibility.cs.arizona.edu/
Arizona Repeatability in
Computer Science Experiment
• Manual curation/look for
cod...
http://reproducibility.cs.arizona.edu/
613 papers
tested
123 successful
Reproductions (20%)
Arizona Repeatability in
Compu...
Questions? | 15 minute break
The Hong Kong context
http://web.archive.org/web/20131127073400/http://openaccess.hk/about.html
Asia’s Academic City?
8 Universities, many ranked top 50 worldwide
100K students (UG/PG/FT/PT)
1 major research funder (UG...
Asia’s Academic City?
8 Universities, many ranked top 50 worldwide
100K students (UG/PG/FT/PT)
1 major research funder (UG...
Data: WorldBank
R&D spending in HK amongst lowest in
Developed World
Hong Kong’s focus…
“The plot earmarked for expansion of Hong Kong Science Park might now be used to
build apartment blocks...
“The plot earmarked for expansion of Hong Kong Science Park might now be used to
build apartment blocks instead. Is the go...
https://osf.io/cgpzb/
Open Science (Open Access & Open
Data) survey of Hong Kong
Any comments?
Science & Technology players in HK
Political forum Legislative Council (LegCo)
Policy
makers
Government Advisory Committee...
HK: good with some parts of open…
http://hub.hku.hk/
http://index.okfn.org/
HK: bad with the rest…
https://data.gov.hk
HK: bad with the rest…
Signatories to Berlin OA Declaration
OA Policies in Hong Kong
Hidden at the back of RGC guidelines
http://www.ugc.edu.hk/eng/doc/rgc/form/srfdp_sr2.pdf
IR: infrastructure is (mostly) there
http://www.julac.org/?page_id=79
IR: infrastructure is (mostly) there
http://repositories.webometrics.info/en/Asia/Hong%20Kong
IR: infrastructure is (mostly) there
No policies, Mo’ problems
Q: How much is spent on Open/Closed Access in HK?
A: Nobody has any idea!
https://lists.okfn.org/pipermail/open-access/201...
In China publication + JIF = money = fraud
Attempts to “game the peer-review system on an industrial
scale”
1. http://www....
1. http://dx.doi.org/10.1087/20110203
2. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed
3. http://ww...
1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-
incentives-curb-resear...
1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-
incentives-curb-resear...
How to fight back: Sign DORA.
http://www.ascb.org/dora/
Political forum Legislative Council (LegCo)
Policy
makers
Government Advisory Committee on Innovation and Technology
Innov...
Who needs to provide leadership?
RGC/UGC & new ITB
What new infrastructure do we need?
New “HK Data Service”, stewardship ...
If Government doesn’t act,
Universities need to lead way
http://hub.hku.hk/advanced-search?location=crisdataset
If Government doesn’t act,
Universities need to lead way
http://www.rss.hku.hk/integrity/research-data-records-management
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
First CRIS in HK, built upon ScholarsHub
http://lib.hku.hk/researchdata/rpg.htm
“Beginning with the September 2017 intake,...
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
CC-BY NC by default
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
Licensing T&Cs
HK CRIS: Further reading/resources
https://youtu.be/focv1z3lpPI
RPg Students -- Instructions for Data:
http://lib.hku.hk/r...
The cost to Hong Kong of not doing this?
• Estimates lack of citation impact not being OA = 50% ($8.75B?)2
• How much is t...
https://osf.io/cgpzb/
Open Science (Open Access & Open
Data) survey of Hong Kong
Reading/Reflection for
next class
Thought...
QUANTIFYING REPRODUCIBILITY IN HK
HKU Repeatability in HK
Research Experiment
• HKU policy on data sharing from 2015
• PLOS policy mandating sharing of supp...
HKU Repeatability in HK
Research Experiment
• Everyone assigned 5 2016 HKU PLOS papers
• Quickly scan paper looking for su...
HKU Repeatability in HK
Research Experiment
Example 1.
https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153n...
HKU Repeatability in HK
Research Experiment
Example 1.
Is there data presented in the paper? – Yes
Is there external data,...
HKU Repeatability in HK
Research Experiment
Example 1.
OPTIONAL: Optional: If data missing, do the authors respond if cont...
Final Project
• For the final project for this course, you can
choose from 3 assignment options.
• The assignment is due o...
Final Project: Option 1
Write an Annotated Bibliography about data curation practices in an
academic discipline of your ch...
Final Project: Option 2
Using a relevant dataset (this can either be from the literature
curation exercise, a BYO dataset,...
Final Project: Option 3
Prepare a 30 minute data curation workshop that you could teach to
researchers that would provide ...
Looking ahead…
• Next class on Monday 27th March we’ll go
from open to FAIR data
• We’ll also go through the reflection & ...
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
Upcoming SlideShare
Loading in …5
×

HKU Data Curation MLIM7350 Class 7

106 views

Published on

Scott Edmunds slides from class 7 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering open data policy and practice, and the Hong Kong context.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

HKU Data Curation MLIM7350 Class 7

  1. 1. Class 7…giant balancing 'if I have seen further it is by standing on the shoulders of giants'. Scott Edmunds, HKU Data Curation MLIM7350
  2. 2. Communicating in-class • Chat channel: • http://backchannelchat.com/chat/dw131 • Feel free to ask questions, requests to speed up/slow down Also feel free to email: scott@gigasciencejournal.com
  3. 3. About me: • Scott Edmunds • Molecular biology, sci editing & comms • Scientific journal & (big) data publishing • Reproducibility & open science • Open Data Hong Kong & Citizen Science Journal, data-platform and database for large-scale biological data www.gigasciencejournal.com
  4. 4. About me:
  5. 5. • Formerly Beijing Genomics Institute • Founded in 1999 (1% of HGP) • China’s 1st citizen managed not-for-profit research institute funded by commercial sequencing-as-a-service (BGI Tech) • Now largest genomic organization in the world • HQ in Shenzhen, international data production in BGI HK (Tai Po) About my employer:
  6. 6. Open Data Hong Kong ExCom member for Open Science Open Science Working Group
  7. 7. WHY CURATE DATA?
  8. 8. WHY SHARE DATA?
  9. 9. WHY SHARE DATA? https://okfn.org/
  10. 10. WHAT EXACTLY IS “OPEN DATA"?
  11. 11. What is open data (公开数据)? http://opendefinition.org/od/2.0/en/
  12. 12. OKFN: 8 types of open data http://science.okfn.org/
  13. 13. Research Data ≈ Government Data Canada's Action Plan on Open Government 2014-16 http://open.canada.ca/en/content/canadas-action-plan-open-government-2014-16
  14. 14. Research Data policies growing globally http://ec.europa.eu/research/openscience/index.cfm?section=monitor&pg=researchdata#1
  15. 15. https://data.gov.hk HK has “Public Sector Information"
  16. 16. Why Licensing is Important for: http://dx.doi.org/10.1186/1756-0500-5-494 Placing restrictions on the reuse of scientific information, particularly data, slows down the pace of research. Furthermore, legal requirements for attribution ingrained in licenses such as CC-BY can prohibit future research across large collections of content – as commonly happens in data mining. Therefore, to eliminate legal impediments to integration and re-use of data, such as this stacking of attribution requirements in large collections of data, and to help enable long-term interoperability an appropriate license or waiver specific to data should be applied.
  17. 17. Panton Principles http://pantonprinciples.org/ = CC0 better than CC-BY for datasets to prevent “attribution stacking”
  18. 18. Levels of openness: 5★’s of open data http://5stardata.info
  19. 19. Levels of openness: 5★’s of open data http://5stardata.info ★ - make your stuff available on the Web (whatever format) under an open license ★★ - make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ - make it available in a non-proprietary open format (e.g., CSV as well as of Excel) ★★★★ - use URIs to denote things, so that people can point at your stuff ★★★★★ - link your data to other data to provide context
  20. 20. Levels of openness: 5★’s of open data Exercise: What star rating is this data? Example: Hong Kong: Dengue Mosquito Breeding Habitatshttp://www.fehd.gov.hk/english/safefood/dengue_fever/images/montlyO vitrap_2003-2016.pdf http://www.fehd.gov.hk/english/safefood/dengue_fever/ Static PDFs, images, not on data.gov.hk, no licensing information = ?
  21. 21. Levels of openness: 5★’s of open data http://5stardata.info Exercise: What star rating is this data? 1. HK FEHD: Distribution of the number of live pigs sold at different auction prices on the day https://data.gov.hk/en-data/dataset/hk- fehd-fehdsh-daily-auction 2. Singapore: Dengue Mosquito Breeding Habitats https://data.gov.sg/dataset/dengue-mosquito-breeding-habitats 3. Linked Drug-Drug Interactions (LIDDI) https://datahub.io/dataset/linked-drug-drug-interactions-liddi
  22. 22. Why closed data sucks? https://commons.wikimedia.org/wiki/File:Inner_door_in_forbidden_city.jpg
  23. 23. Hong Kong Edition https://data.gov.hk Gov't spend on open data platform = $1.2M Gov't spend on 20 rubbish apps = $20M https://www.hongkongfp.com/2015/09/14/public-finance-concern- group-raps-10-rubbish-govt-apps-one-has-only-10-downloads/ Why closed data sucks?
  24. 24. What the Gov't builds for $20M What open data can build for free http://gazetteer.hk/ Hong Kong Edition Why closed data sucks?
  25. 25. Open Data as a revenue stream... Hong Kong Edition Why closed data sucks?
  26. 26. Open Data as a revenue stream means can't share conservation data... Why closed data kills spoonbills?
  27. 27. Climate change, global hunger, pollution, cancer, disease outbreaks… http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966 Why closed data kills people?
  28. 28. Open Data as a revenue stream means can't share cancer data... https://www.change.org/p/mark-c-capone-ceo-of-myriad-genetics-myriad-genetics-give-us-our-damn-brca-data Why closed data kills women?
  29. 29. Open Data as a revenue (publishing) stream means nobody is sharing ethnic Chinese control data to enable pharmacogenomics to work on Chinese populations... Why closed data kills Chinese populations?
  30. 30. THE REPRODUCIBILITY CRISIS
  31. 31. How research is disseminated 18121665 1869
  32. 32. Consequences of 351 year old incentive systems… Buckheit & Donoho: Scholarly articles are merely advertisement of scholarship. The actual scholarly artifacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible.
  33. 33. The consequences: growing replication gap 1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8) Out of 18 microarray papers, results from 10 could not be reproduced
  34. 34. 1. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.1001747 The challenge: reproducibility
  35. 35. Replication rates as low as 11% http://www.nature.com/nature/journal/v483/n7391/full/483531a.html https://osf.io/e81xl/wiki/home/
  36. 36. Growing Issue: increasing number of retractions >15X increase in last decade Strong correlation of “retraction index” with higher impact factor 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
  37. 37. Growing Issue: increasing number of retractions >15X increase in last decade Strong correlation of “retraction index” with higher impact factor At current % increase by 2045 as many papers published as retracted! 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
  38. 38. Problem: growing replication gap 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950 More retractions: >15X increase in last decade At current % > by 2045 as many papers published as retracted Insufficient methods
  39. 39. The Cost of Scientific Retractions? A: $400,000 per paper https://elifesciences.org/content/3/e02956
  40. 40. Only policy that counts…IMPACT FACTOR
  41. 41. What is the journal Impact Factor (jIF)? • Citation Index concept first developed by Eugene Garfield in 1955 (Science) • Formed Institute of Scientific Information (ISI) in 1960 • Science Citation Index (SCI) launched in 1963. • Web version (Web of Science) launched in 1997. • ISI purchased by Thomson-Reuters in 1992. • Sold as part of their Intellectual Property & Science portfolio in July 2016 for $3.55B USD to private equity funds. https://commons.wikimedia.org/wiki/File:Eugene_Garfield_HD2007_Ric hard_J._Bolte_Sr._Award.TIF
  42. 42. How do you calculate the jIF? 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014
  43. 43. 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014 TWO PROBLEMS
  44. 44. 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014 TWO PROBLEMS 1. Rewards/incentivizes short term citations only
  45. 45. 2015 20132014 Two PROBLEMS 1. Rewards/incentivizes short term citations only Impact factor driven science =
  46. 46. JIFBAIT Network more GWAS GWAS JIFBAIT NEWS Arsenic Life forms, will they take over the planet? By Melba Ketchum, PhD Which Overhyped, Unreproducible Experiment Are You? Want rapid citations for 2 years only? Carry out this quiz. You got: STAP Cells Of course dipping cells in coffee will make them pluripotent. Even if the research gets discredited, it’ll still get 100’s of citations in two years.
  47. 47. 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014 TWO PROBLEMS 2. How do you count denominator? Negotiated.
  48. 48. https://quantixed.wordpress.com/2016/01/05/the-great-curve-ii-citation-distributions-and- reverse-engineering-the-jif/
  49. 49. http://bjoern.brembs.net/2016/01/even-without-retractions- top-journals-publish-the-least-reliable-science/
  50. 50. http://iai.asm.org/content/79/10/3855.full http://iai.asm.org/content/79/10/3855.full
  51. 51. Growing # of journals addressing this http://dx.doi.org/10.1371/journal.pmed.1001607
  52. 52. QUANTIFYING REPRODUCIBILITY
  53. 53. Data Same Different Code Same Reproducible Replicable Different Robust Generalisabl https://figshare.com/articles/Publishing_a_reproducible_paper/4720996
  54. 54. http://reproducibility.cs.arizona.edu/ Arizona Repeatability in Computer Science Experiment • 2015 study examining extent Computer Systems researchers share their research artifacts (code) • NSF policies on sharing code since 2005 • Examined 613 papers from ACM conferences & journals • • Attempted to locate source code that backed up results • If found, tried to build the code.
  55. 55. http://reproducibility.cs.arizona.edu/ Arizona Repeatability in Computer Science Experiment • Manual curation/look for code that backed up results • If missing, emailed authors • Chased if no reply • If found, tried to build the code • Resolve issues • Survey results
  56. 56. http://reproducibility.cs.arizona.edu/ 613 papers tested 123 successful Reproductions (20%) Arizona Repeatability in Computer Science Experiment
  57. 57. Questions? | 15 minute break
  58. 58. The Hong Kong context http://web.archive.org/web/20131127073400/http://openaccess.hk/about.html
  59. 59. Asia’s Academic City? 8 Universities, many ranked top 50 worldwide 100K students (UG/PG/FT/PT) 1 major research funder (UGC/RGC) Grant budget = $17.5 BN HKD/yr ($2.3BN USD) UGC Policy: “Realization of making Hong Kong Asia's world city is only possible if it is based upon the platform of a very strong education and higher education sector. “ http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
  60. 60. Asia’s Academic City? 8 Universities, many ranked top 50 worldwide 100K students (UG/PG/FT/PT) 1 major research funder (UGC/RGC) Grant budget = $17.5 BN HKD/yr ($2.3BN USD) UGC Policy: “Realization of making Hong Kong Asia's world city is only possible if it is based upon the platform of a very strong education and higher education sector. “ http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
  61. 61. Data: WorldBank R&D spending in HK amongst lowest in Developed World
  62. 62. Hong Kong’s focus… “The plot earmarked for expansion of Hong Kong Science Park might now be used to build apartment blocks instead. Is the government backing down on its commitment to project Hong Kong as a major technology hub?” http://bit.ly/1TxCRj3
  63. 63. “The plot earmarked for expansion of Hong Kong Science Park might now be used to build apartment blocks instead. Is the government backing down on its commitment to project Hong Kong as a major technology hub?” http://bit.ly/1TxCRj3 Hong Kong’s focus…
  64. 64. https://osf.io/cgpzb/ Open Science (Open Access & Open Data) survey of Hong Kong Any comments?
  65. 65. Science & Technology players in HK Political forum Legislative Council (LegCo) Policy makers Government Advisory Committee on Innovation and Technology Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC) Financing Government EB Private Sector ITC -> ITF Innov. & Tech. Venture Fund RGC UGC Operators Universities Public Technology Support Organizations Private Sector R&D Centres ASTRI Facilitators HKPC HKTDC HKSTPC Cyberport HKIB Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations Researched policy, collected case studies, FOI, interviewed many key players (funders, libraries, administrators…)
  66. 66. HK: good with some parts of open… http://hub.hku.hk/
  67. 67. http://index.okfn.org/ HK: bad with the rest…
  68. 68. https://data.gov.hk HK: bad with the rest…
  69. 69. Signatories to Berlin OA Declaration
  70. 70. OA Policies in Hong Kong
  71. 71. Hidden at the back of RGC guidelines http://www.ugc.edu.hk/eng/doc/rgc/form/srfdp_sr2.pdf
  72. 72. IR: infrastructure is (mostly) there http://www.julac.org/?page_id=79
  73. 73. IR: infrastructure is (mostly) there http://repositories.webometrics.info/en/Asia/Hong%20Kong
  74. 74. IR: infrastructure is (mostly) there
  75. 75. No policies, Mo’ problems
  76. 76. Q: How much is spent on Open/Closed Access in HK? A: Nobody has any idea! https://lists.okfn.org/pipermail/open-access/2014-May/001888.html
  77. 77. In China publication + JIF = money = fraud Attempts to “game the peer-review system on an industrial scale” 1. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/ 2. http://www.grassley.senate.gov/sites/default/files/about/upload/Senator-Grassley-Report.pdf Companies offering authorship of papers made to order by “paper mills”1. Common ghostwriting medical papers by pharma2 Guaranteed publication in JIF journal, often using fake referees, ID theft, etc.
  78. 78. 1. http://dx.doi.org/10.1087/20110203 2. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed 3. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/ What is the cost of the jIF? JIF 2 = $10,000 USD JIF 5 = $20,000 USD Buy Sell C/N/S = $30,000 USD JIF 10 = $1,500 USD
  79. 79. 1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic- incentives-curb-research Created by skewed incentive systems in China… “While we are rightly proud of Hong Kong’s highly regarded and ranked universities system, we are not immune to the same pressures. While funders in Europe have moved away from using citation based metrics such as JIF in their research assessments, the Hong Kong University Grants Committee states in their Research Assessment Exercise guidelines that they may informally use it.”
  80. 80. 1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic- incentives-curb-research And this is now happening in Hong Kong too! JIF 2 = $8,000 USD JIF 5 = $15,000 USD Buy
  81. 81. How to fight back: Sign DORA. http://www.ascb.org/dora/
  82. 82. Political forum Legislative Council (LegCo) Policy makers Government Advisory Committee on Innovation and Technology Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC) Financing Government EB Private Sector ITC -> ITF Innov. & Tech. Venture Fund RGC UGC Operators Universities Public Technology Support Organizations Private Sector R&D Centres ASTRI Facilitators HKPC HKTDC HKSTPC Cyberport HKIB Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations Who needs to provide leadership? What new infrastructure do we need? Science & Technology players in HK
  83. 83. Who needs to provide leadership? RGC/UGC & new ITB What new infrastructure do we need? New “HK Data Service”, stewardship & platforms Science & Technology players in HK Political forum Legislative Council (LegCo) Policy makers Government Advisory Committee on Innovation and Technology Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC) Financing Government EB Private Sector ITC -> ITF Innov. & Tech. Venture Fund RGC UGC Operators Universities Public Technology Support Organizations Private Sector R&D Centres ASTRI Data Curators & Stewards (Libraries, OGCIO, Data Studio@SP) Facilitators HKPC HKTDC HKSTPC Cyberport HKIB Data Disseminators (HARNET, data.gov.hk, "HK Data Service") Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations Downstream Users (Researchers, Innovators, Citizens) Academic/com mercial cloud
  84. 84. If Government doesn’t act, Universities need to lead way http://hub.hku.hk/advanced-search?location=crisdataset
  85. 85. If Government doesn’t act, Universities need to lead way http://www.rss.hku.hk/integrity/research-data-records-management
  86. 86. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset
  87. 87. First CRIS in HK, built upon ScholarsHub http://lib.hku.hk/researchdata/rpg.htm “Beginning with the September 2017 intake, all HKU research postgraduate (rpg) students have responsibility for 1) using a data management plan (DMP), where applicable, to describe the use of data in preparation for, or in the generation of their theses, and 2) depositing, where applicable, a dataset in the HKU Scholars Hub.”
  88. 88. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset
  89. 89. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset
  90. 90. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset CC-BY NC by default
  91. 91. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset Licensing T&Cs
  92. 92. HK CRIS: Further reading/resources https://youtu.be/focv1z3lpPI RPg Students -- Instructions for Data: http://lib.hku.hk/researchdata/rpg.htm Depositor's User Guide: http://lib.hku.hk/researchdata/deposit_page.htm Seminar slides from HKU Library http://www.rss.hku.hk/integrity/rcr/rcr-info/seminars See also ReShare video guide:
  93. 93. The cost to Hong Kong of not doing this? • Estimates lack of citation impact not being OA = 50% ($8.75B?)2 • How much is the HK taxpayer losing through missing out on potential collaborations, wider engagement & unrepeatable work? HK UCG grant budget = $17.5 Billion HKD/yr (4% of Gov spending) Taking lowest reported reproducibility rates (11%) = >$15 billion wasted1 $$ $ 1. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html 2. http://www.ecs.soton.ac.uk/~harnad/Temp/research-australia.doc
  94. 94. https://osf.io/cgpzb/ Open Science (Open Access & Open Data) survey of Hong Kong Reading/Reflection for next class Thoughts and ideas why Hong Kong is lagging behind US/EU? Any ideas what we need to do to move forward? Any feedback on the survey?
  95. 95. QUANTIFYING REPRODUCIBILITY IN HK
  96. 96. HKU Repeatability in HK Research Experiment • HKU policy on data sharing from 2015 • PLOS policy mandating sharing of supporting March 1, 2014 • HKU has published 267 PLOS ONE papers 2014-date • Can we quantify reproducibility in a sample of these? • Easy exercise in literature curation • 2016 HKU PLOS publications = 49 papers http://hub.hku.hk/simple- search?query=&location=publication&sort_by=bi_sort_2_sort&order=asc&rpp=25&filter_field_1=journal&filter_type_ 1=equals&filter_value_1=plos+one&filter_field_2=dateIssued&filter_type_2=equals&filter_value_2=[2014+TO+2017]& filter_field_3=dctype&filter_type_3=equals&filter_value_3=article&etal=0&filtername=dateIssued&filterquery=2016&f iltertype=equals
  97. 97. HKU Repeatability in HK Research Experiment • Everyone assigned 5 2016 HKU PLOS papers • Quickly scan paper looking for supporting data • If no data, ignore • If uses data, is it all associated with the paper? • If external data, is it available from URL or accession? • If “data available on request”, are they contactable? • Don’t spend more than 5mins per article • Add data into googledoc, and we’ll go through results & feedback next class Homework/Case study: literature curation exercise
  98. 98. HKU Repeatability in HK Research Experiment Example 1. https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nyeY mB3Uh4U23HX-o/edit?usp=sharing
  99. 99. HKU Repeatability in HK Research Experiment Example 1. Is there data presented in the paper? – Yes Is there external data, and if so what is the link/accession? – No Is all the data in the paper available? – No Comments - Has questionnaire, but not data as says "minimal anonymized dataset will be made available upon request” Enter data here: https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nye YmB3Uh4U23HX-o/edit?usp=sharing
  100. 100. HKU Repeatability in HK Research Experiment Example 1. OPTIONAL: Optional: If data missing, do the authors respond if contacted? Enter data here: https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nye YmB3Uh4U23HX-o/edit?usp=sharing
  101. 101. Final Project • For the final project for this course, you can choose from 3 assignment options. • The assignment is due on the 15th May and it is worth 40% of your grade. • Time will be set aside for presenting a provisional draft of this during the final class on the 24th April.
  102. 102. Final Project: Option 1 Write an Annotated Bibliography about data curation practices in an academic discipline of your choosing. • Choose a discipline (sciences, social sciences, & humanities) OR choose the topic of “open data.” • Summarize data practices in your chosen discipline or topic. (5-7 sentences) • Find 7-10 sources that relate that discipline or topic to data creation, management, and/or curation. • Provide a citation for the source in APA style. • Write a short annotation that summarizes the content of the source. You may include quotes from the source sparingly, but the annotations should be mostly, if not entirely, in your own words. (3-5 sentences) • Explain the relevance of the source with relation to the data practices of your chosen discipline or topic. (1-2 sentences) • Find a few example public datasets to demonstrate the above points. Cite the data in the relevant places in the Bibliography according to the Data Citation Principles. • Refer to this guide for more information about annotated bibliographies: http://sites.umuc.edu/library/libhow/bibliography_tutorial.cfm. Your annotation should be in the “Descriptive” style.
  103. 103. Final Project: Option 2 Using a relevant dataset (this can either be from the literature curation exercise, a BYO dataset, or one given to you), write a report that includes a description of the dataset, a Data Management Plan, and a guidelines document for the researcher(s). • Describe the dataset that explains the form of the data and the academic discipline in which it was created. This paragraph should provide context for the (3-5 sentences) 1-2 page Data Management Plan following the guidelines from HKU or a granting body such as NSF. • 1 page guidelines document that could be presented to the researcher(s) that provides guidelines for their data (extant and forthcoming): – Preservation – Appraisal – Documentation • For the DMP and the guidelines document, you can extrapolate from the your dataset to imagine additional details about the research practices that created the dataset and will create more data in the future. • Look for suitable data repositories that can host this data (institutional, general purpose, or subject specific), and if there is one relevant then publish the data if you have permission, and correctly cite the data in the relevant places in your report.
  104. 104. Final Project: Option 3 Prepare a 30 minute data curation workshop that you could teach to researchers that would provide them the necessary details to understand why data curation is relevant to them and best practices they should follow. • Slide deck that introduces data curation for a researcher audience. (No more than 40 slides.) • Presenter outline that describes the important points for each slide. • Topics that might be addressed in your workshop: the value of data management, writing a data management plan, data repository options. You can assume your audience is researchers are at HKU. • Make sure all of the content is copyright free, and share the final material openly (e.g. figshare, scholarhub, OER commons, etc.), and with sufficient metadata to make it discoverable.
  105. 105. Looking ahead… • Next class on Monday 27th March we’ll go from open to FAIR data • We’ll also go through the reflection & curation case studies – Bring ideas & feedback, and we’ll look at the data • Final project due 10th May – Need to present preliminary version on 26th April to get feedback before completion

×