Your SlideShare is downloading. ×
0
Science 2.0: discussing the best
available evidence
David Osimo
Open Evidence
based on a study for DG RTD
24th January 201...
Three stories
• Galaxyzoo: Galaxyzoo let users classify galaxies – 150K
volunteers had already classified more than 10 mil...
Science
outside
academia

Open data,
Article
mining

Mass
collaboration
Crowdsourcing
Sensors

3
Science 2.0: much more than Open Access

Open
Open
access
access

4
DataDataintensive
intensive

Citizens
Citizens
science
science

Open
Open
code
code

Open
Open
labbooks
labbooks
//wflows
...
DataDataFigshare.com
intensive
intensive

Sci-starter.com
Citizens
Citizens
science
science

An emerging
ecosystem of serv...
Growing at different speed
Trend

Status

Data

Pre-print

Mature

694.000 articles in arXiv

Open access

Fast growing

E...
Where The Data Goes Now:
> 50 My Papers
> 50 My Papers
2 M scientists
2 M scientists
2 M papers/year
2 M papers/year

A sm...
Deep implications
• New scientific outputs and players: nanopublications,
data and code; vertical disintegration of the va...
Europe can lead
• European scientific publishers are leading on
experimentation with new kind of open and data-intensive
s...
BUT the institutional framework is
a bottleneck
• Researchers are reluctant to
share data and code [1], and
to provide ope...
Institutional failure and the case for public intervention
• Contradictions emerge between individuals’ and societal benef...
How to grasp this opportunity?
• It’s not about adding a science 2.0 top-down
roadmap-based initiative in existing
program...
Towards research policy 2.0
Recommendation

Inspiring example

Adopt more flexible reputation
mechanisms for scientists

F...
Thanks
• Continue the discussion at
science20study.wordpress.com
• Collect evidence and cases at
groups.diigo.com/group/sc...
Backup

16
Emerging impact:
a) more productive science
– using the same data sets for multiple research. 50% of Hubble
papers came fr...
b) Better science
• Greater falsifiability (Popper): move towards reproducible
science thanks to publishing data + code in...
c) Greater role of inductive
methods
• “The end of theory”: “Here’s the evidence, now
what is the hypothesis?”
• All scien...
d) Scaling serendipity
• From penicillin to theory of relativity, serendipity has
always been a core component of science
...
e) New outputs and players
#beyondthepdf
• Nanopublications, datasets, code
• Integration of data and code with articles
•...
Emerging policies
• Funders and publishers have high leverage on researchers
• Increasing push towards Open Access from fu...
Towards research policy 2.0
Features
• Simplified proposals
• Rewarding solutions, not proposal
• Multi-stage
• Open prior...
• Last year researchers at one biotech firm, Amgen,
found they could reproduce just six of 53
“landmark” studies in cancer...
• Conversely, failures to prove a hypothesis are
rarely even offered for publication, let alone
accepted. “Negative result...
• When a prominent medical journal ran
research past other experts in the field, it
found that most of the reviewers faile...
Upcoming SlideShare
Loading in...5
×

Presentation of science 2.0 at European Astronomical Society

513

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
513
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Presentation of science 2.0 at European Astronomical Society"

  1. 1. Science 2.0: discussing the best available evidence David Osimo Open Evidence based on a study for DG RTD 24th January 2013 1
  2. 2. Three stories • Galaxyzoo: Galaxyzoo let users classify galaxies – 150K volunteers had already classified more than 10 million images of galaxies. “as accurate as that done by astronomers“. 25+ scientific articles by Galaxy Zoo project (from 2009) • Synaptic Leap: to find an alternative drug treatment for schistosomiasis with fewer side effect. All data and experiments published on Electronic Lab Notebook; social network activated. About 30 people, half from industry, participated. Identified new process and resolving agent. • Excel-gate: Reinhart & Rogoff, 2010: “as countries see debt/GDP going above 90%, growth slows dramatically”. Paper was used as main theoretical justification for austerity. 2013: after getting the original excel file, Herndon et al. discover coding error + data gaps + unconventional weighting. 2
  3. 3. Science outside academia Open data, Article mining Mass collaboration Crowdsourcing Sensors 3
  4. 4. Science 2.0: much more than Open Access Open Open access access 4
  5. 5. DataDataintensive intensive Citizens Citizens science science Open Open code code Open Open labbooks labbooks //wflows wflows Pre-print Pre-print Open Open access access Open Open data data Alternative Alternative Reputation Reputation systems systems Open Open annotati annotati on on Scientific Scientific blogs blogs Collaborative Collaborative bibliographies bibliographies 5
  6. 6. DataDataFigshare.com intensive intensive Sci-starter.com Citizens Citizens science science An emerging ecosystem of services and standards Open Open Runmycode.org code code ArXiv Open Open Myexperiment.org labbooks labbooks //wflows wflows Pre-print Pre-print Open Open Roar.eprints.org access access Datadryad.or Open Open g data data Alternative Alternative Altmetric.com Reputation Reputation systems systems Open Open annotati Openannotation.org annotati on on Scientific Scientific Researchgate.com blogs blogs Collaborative Collaborative Mendeley.com bibliographies bibliographies 6
  7. 7. Growing at different speed Trend Status Data Pre-print Mature 694.000 articles in arXiv Open access Fast growing Exponential growth of OA journals. 8/10% of scientific output is OA Data intensive Fast growing 52% of science authors deals with datasets larger than 1Gb Citizen scientist Medium growth 650K Zoouniverse users 500 similar projects on SciStarter Open data Medium growth 20% scientists share data 15% journals require data sharing Reference sharing Medium growth 2 Million users of Mendeley referencesharing tools Open code Sketchy growth 21% of JASA articles make code available 7% journals require code Open Notebook Sketchy growth Isolated projects Natural sciences outrank social science across all trends 7
  8. 8. Where The Data Goes Now: > 50 My Papers > 50 My Papers 2 M scientists 2 M scientists 2 M papers/year 2 M papers/year A small portion of data (1-2%?) stored in small, topic-focused data repositories Majority of data (90%?) is stored on local hard drives PDB: PDB: 88,3 kk 88,3 PetDB: PetDB: 1,5 kk 1,5 MiRB: MiRB: 25k 25k Some data (8%?) stored in large, generic data repositories Dryad: Dryad: 7,631 files 7,631 files SedDB: SedDB: 0.6 kk 0.6 TAIR: TAIR: 72,1 kk 72,1 Dataverse: Dataverse: 0.6 M 0.6 M Datacite: Datacite: 1.5 M 1.5 M Source: Anita De Waard 2013
  9. 9. Deep implications • New scientific outputs and players: nanopublications, data and code; vertical disintegration of the value chain • Greater role for inductive methods: everything becomes a Genome Project • Scaling serendipity: Big linked data, collaborative annotation, social networking and knowledge mining detect unexpected correlations on a massive scale • Better science: reproducible and truly falsifiable research findings; earlier uncovering of mistakes • More productive science: reusing data and products, crowdsourcing work, reduce time-to-publication 9
  10. 10. Europe can lead • European scientific publishers are leading on experimentation with new kind of open and data-intensive services E.g. “Article of the Future project, AppsForScience competition (Elsevier) Thieme ( a small German publisher) data integration • Home to world class science 2.0 startups: Mendeley and ResearchGate are global players in social networking for scientists, Digital Science that recently acquired FigShare Mendeley used by about 2 million researchers, covering 65 million documents vs 49 by commercial databases by Thomson Reuters. Elsevier just bought Mendeley for 50 M Euros. • Home to top citizen science initiatives (GalaxyZoo was launched in Oxford, ExCiteS group and Citizen Cyberscience Centre) • Funding agencies are active in new mandates on openness (e.g. Wellcome Trust, FP7) – open access, open data 10
  11. 11. BUT the institutional framework is a bottleneck • Researchers are reluctant to share data and code [1], and to provide open peer review • Current career mechanisms are “publish or perish”. No reward for sharing. • Publishing data and code requires additional work • Publishing intermediate products can actually hinder publication/patenting: sharing is difficult in patentintensive domains • Funding mechanisms are too rigid, roadmap-based and evaluated on articles and patents [1] Wicherts et al., 2011 ; Research Information Network, 2008 ; Campbell , 2002 11
  12. 12. Institutional failure and the case for public intervention • Contradictions emerge between individuals’ and societal benefits • Research funders (and publishers) have high leverage on scientific institutions BENEFITS Individual Researchers Institutions Business Publishers Societal benefits Open access ++ + + -- ++ Open data -- -- -- + ++ Open code -- -- -- = ++ Citizen science + = + = + Alternative reputation systems + - + - + Dataintensive + + + + ++ Social media + = = = + 12
  13. 13. How to grasp this opportunity? • It’s not about adding a science 2.0 top-down roadmap-based initiative in existing programmes • It’s not about simply letting a thousand flowers flourish bottom-up • It’s about nudging the right institutional rearrangement (Perez) and right system of incentives for the scientific value chain 13
  14. 14. Towards research policy 2.0 Recommendation Inspiring example Adopt more flexible reputation mechanisms for scientists From 2013, NSF requires PI to list research “products” rather than “publications” Encourage sharing by regulation Wellcome Trust mandatory data plan Cover the costs of sharing intermediate output such as data Gold access publication costs to be covered in Horizon2020 Develop Innovative infrastructure, Alternative reputation system, tools , methods and standards Openannotation, Datadryad Make IPR more flexible Innocentive.com, Peertopatent.com Increase open-ended funding system FET open, UK Arts council, Inducement prizes Collect better evidence Dedicated data-gathering exercise (a’ la PEW) 14
  15. 15. Thanks • Continue the discussion at science20study.wordpress.com • Collect evidence and cases at groups.diigo.com/group/science-20 • Contact david.osimo@tech4i2.com ; katarzyna.szkuta@tech4i2.com ; @osimod 15
  16. 16. Backup 16
  17. 17. Emerging impact: a) more productive science – using the same data sets for multiple research. 50% of Hubble papers came from data re-users [1]. – Crowdsourcing work: “thousands recruited in months versus years and billions of data points per person, potential novel discovery in the patterns of large data sets, and the possibility of near real-time testing and application of new medical findings.” [2]. – “cut down the time it takes to go from lab to medicine by 10-15 years with Open Notebook Science”. “because of poor literature analysis tools 20-25% of the work done in his synthetic chemistry lab is unnecessary duplication or could be predicted to fail” [3] – Faster circulaton of high-quality ideas: 70% of publications discussed in blogs are from high-impact journals – Open research solved one-third of a sample of problems that large and well-known R & D-intensive firms had been unsuccessful in solving internally [4] [1] http://archive.stsci.edu/hst/bibliography/pubstat.html [2] http://www.jmir.org/2012/2/e46/ [3] http://science.okfn.org/category/pubs/ [4] Lakhani et al., 2007) 17
  18. 18. b) Better science • Greater falsifiability (Popper): move towards reproducible science thanks to publishing data + code in addition to article, • Rapidly uncover mistaken findings (Climategate 2009 or microarray-based clinical trials underway at Duke University) • Data sharing is associated with greater robustness of findings [1]. Sharing data and notes applies to failures, as well as successes • Especially important for computational science “Computational science cannot be elevated to a third branch of the scientific method until it generates routinely verifiable knowledge” [2] [1] Wicherts et al., 2011 [2] Donoho, Stodden, et al. 2009 18
  19. 19. c) Greater role of inductive methods • “The end of theory”: “Here’s the evidence, now what is the hypothesis?” • All science becomes computational. 38% of scientists spend more than 1/5 of their time developing software (Merali, 2010). • Greater availability of data collection and datasets increases the utility of inductive methods. Genome project as new paradigm 19
  20. 20. d) Scaling serendipity • From penicillin to theory of relativity, serendipity has always been a core component of science • Big linked data, collaborative annotation and knowledge mining of OA articles allow to detect unexpected correlation on a massive scale. Mendeley manages the bibliographies of 2 Million scientists and uses them for suggest further reading. • Emerging evidence that for scholars recommendation is more important than search for references. Social networking and recommendation systems allow scientists to “stumble upon” new evidence • Open research successful solvers solved problems at the boundary or outside of their fields of expertise [1] [1] Lakhani et al., 2007 20
  21. 21. e) New outputs and players #beyondthepdf • Nanopublications, datasets, code • Integration of data and code with articles • Reproducible papers and books 21
  22. 22. Emerging policies • Funders and publishers have high leverage on researchers • Increasing push towards Open Access from funders • Journals and funding agencies increasingly require data submission and data management plans • From 14 January 2013, NSF grants forms requires PI to list research “products” rather than “publications” • Alternative metrics emerge such as altmetrics and download statistics 22
  23. 23. Towards research policy 2.0 Features • Simplified proposals • Rewarding solutions, not proposal • Multi-stage • Open priorities • Flexible and open ended (allowing for • serendipity) • Peer-selection Reputation-based • (funding not to the proposal but to the person) • Multidisciplinarity by design • Flexible IPR • Short project time • Accepting failure • transparency (open monitoring) • Based on social network analysis Examples • • • • • • • • • • • • • • • • • Inducement prizes e.g. http://www.heritagehealthprize.com Seed Capital http://www.ibbt.be/en/istart/our-istart toolbox/iventure) ERC http://erc.europa.eu SBIR http://www.sbir.gov FET OPEN http://cordis.europa.eu/fp7/ict/fetopen/h ome_en.html SME htt://cordis.europa.eu/fetch? CALLER=PROGLINK_PARTNERS&AC TION=D&DOC=1&CAT=PROG&QUERY=012e7c32 4da6:39b1:49a0 957c&RCN=862 IBBT www.ibbt.be Arts council http://www.artscouncil.org.uk/funding/gr ants arts Banca dell’innovazione / Innovation Bank http://italianvalley.wired.it/news/altri/per che-ci 23 serve-una-banca-nazionale-dell-
  24. 24. • Last year researchers at one biotech firm, Amgen, found they could reproduce just six of 53 “landmark” studies in cancer research. • Earlier, a group at Bayer, a drug company, managed to repeat just a quarter of 67 similarly important papers. • A leading computer scientist frets that threequarters of papers in his subfield are bunk. • In 2000-10 roughly 80,000 patients took part in clinical trials based on research that was later retracted because of mistakes or improprieties. 24
  25. 25. • Conversely, failures to prove a hypothesis are rarely even offered for publication, let alone accepted. “Negative results” now account for only 14% of published papers, down from 30% in 1990. Yet knowing what is false is as important to science as knowing what is true. The failure to report failures means that researchers waste money and effort exploring blind alleys already investigated by other scientists. 25
  26. 26. • When a prominent medical journal ran research past other experts in the field, it found that most of the reviewers failed to spot mistakes it had deliberately inserted into papers, even after being told they were being tested. 26
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×