The Open Access (OA) model for scientific publications has been examined for years by academics who have argued that it presents advantages in increasing accessibility and, consequently, in increasing the impact of papers.
This presentation examines the results of recent studies assessing the free availability of scholarly publications during different time periods and the proportion of Open Access Papers published in peer-reviewed journals at different levels. Different types of growth in freely available papers have been
identified and analyzed. It also addresses best practices for institutional repository management and examines opportunities and challenges faced by the OA model.
1. Measuring Open Access - Current State of the Art
by
Éric Archambault, D.Phil.
President and CEO, Science-Metrix and 1science
ESSS 2015 - Leuven
2. 2
The OA revolution is firmly in motion
Librarians can play a key role:
Traditional role – percolation
New role – diffusion
Researchers too – be fruitful and multiply
OA in academic publications: complex beast
Understanding the OA universe is key to
useful measurement
BACKGROUND
4. 4
Budapest Open Access Initiative (2002)
“The literature that should be freely accessible online is that which
scholars give to the world without expectation of payment. Primarily,
this category encompasses their peer-reviewed journal articles, but it
also includes any unreviewed preprints that they might wish to put
online for comment or to alert colleagues to important research
findings. There are many degrees and kinds of wider and easier access to
this literature. By "open access" to this literature, we mean its free
availability on the public internet, permitting any users to read,
download, copy, distribute, print, search, or link to the full texts of these
articles, crawl them for indexing, pass them as data to software, or use
them for any other lawful purpose, without financial, legal, or technical
barriers other than those inseparable from gaining access to the internet
itself. The only constraint on reproduction and distribution, and the only
role for copyright in this domain, should be to give authors control over
the integrity of their work and the right to be properly acknowledged
and cited.”
DEFINITIONS
5. 5
Green OA
The main idea behind Green is self-archiving
Archiving can be done in institutional and
thematic repositories
Gold OA
The main idea behind Gold is that journal
publishers make papers available
There are Gold journals (cover-to-cover) but
also Gold papers published in subscription-
based journals (a.k.a. “hybrid journals”)
DEFINITIONS
6. 6
Complexity of OA definition and
measurement notably due to
Embargoes
Transiency
Rights of all kind (to self-archive, to crawl, to
recombine, to use commercially, etc)
Discoverability
DEFINITIONS
9. 9
Free or open source repository software
DSpace
EPrints
Archimede, DAITSS, Dienst, Enterprise-Wide
Digital Repository and Archive, ETD-db,
eXtensible Text Framework, Fedora,
Greenstone, Invenio, IRPlus, Keystone Digital
Library Suite, MOAI, Omeka, OPUS, PubMan,
WEKO, PeerLibrary
Source: http://oad.simmons.edu/oadwiki/Free_and_open-source_repository_software
VANTAGE POINTS
10. 10
Key repositories
arXiv.org – the mothership
PubMed Central / Europe PubMed Central
Aggregators
OpenAire
BASE
CORE
A typical repository
hosted by the Umeå universitet Library
VANTAGE POINTS
11. 11
Despite, or perhaps because of, all the
sources of OA available, it is very difficult to
measure the availability of OA
Here, we are concerned about the
availability of peer-reviewed articles
published in scholarly journals
Why – this is what policies and mandates
are preoccupied with
BIBLIOMETRICS – PROPORTION OF OA
12. 12
Bottom-up measurement
One would have to harvest all the sources
available and de-duplicate results
The main problem is how to determine reliably
that items
(1) were published (as opposed to an un-
submitted manuscript)
(2) are peer-reviewed
(3) answer the “so what” question (you found
there were 14,325,678 papers, so what?)
BIBLIOMETRICS – PROPORTION OF OA
13. 13
Top-down measurement
One would have to find an exhaustive
bibliographic database of peer-reviewed articles
and verify the availability of all papers
Main problems:
(1) there is no such database
(2) extremely tedious to check all of them
(3) how do you actually do that
BIBLIOMETRICS – PROPORTION OF OA
14. 14
Top-down measurement - Sampling
Considering the enormous task at hand, most
authors have resorted to using sampling and
search engines
Harnad and team sampled articles from the
Web of Science
Björk and team sampled articles from Scopus
Archambault and team sampled articles from
Scopus and used multiple techniques as well as
search engines
BIBLIOMETRICS – PROPORTION OF OA
15. 15
Dealing with search engines
Use user-friendly meta-search engines such as
DuckDuckGo or DogPile
Try to stay below the radar using mainstream
search engines
Neither solution feels remotely confortable
Other solution is to build a dedicated
infrastructure to facilitate OA discovery (this
is the solution used by 1science)
BIBLIOMETRICS – PROPORTION OF OA
16. 16
Divergence from the real measure is due to
Capacity to design instrument that provides true
value (function of recall and retrieval precision)
Capacity to increase statistical significance
through large samples
SAMPLING AND METROLOGY
17. 17
A true positive (tp) in the present case is a paper known to be available in OA which
is found by the harvesting instrument developed for the current project. A true
negative (tn) is an article which is not available for free and is not found by the
instrument. False positives and negatives (fp and fn) are the converse of the later.
Retrieval precision, also called positive predictive value, provides an estimation of
how frequently the instrument finds correct positive results and is calculated as
follows:
Retrieval Precision =
𝑡𝑡
𝑡𝑡+𝑓𝑓
Recall, also called true positive rate or sensitivity, is the capacity to correctly
identify a large proportion of the positive records:
Recall =
𝑡𝑡
𝑡𝑡+𝑓𝑓
Knowing the precise characteristics in terms of true and false positives and
negatives allows for the computation of an adjustment score, which can then be
applied to recalibrate the results to obtain a truer measure, one that corrects the
limits of the instrument. The adjustment made in the previous study is based on
the following formula:
Adjustment =
𝑡𝑡+𝑓𝑓
𝑡𝑡+𝑓𝑓
SAMPLING AND METROLOGY
18. 18
Statistical precision can be assessed with the margin of
error (ME). For a proportion (p) where the population is
finite and known (which is the case here as the
population from which we are sampling is the Scopus
database), (N) is not systematically much larger than
the sample size (n), and in which the values are discrete
(for example, papers are discrete as one does not
publish one third of a paper), given a critical score Z
(which will be set at 0.95 in the study), ME is calculated
as follows:
𝑀𝑀 = 𝑍
𝑝 1−𝑝 𝑁−𝑛
𝑛 𝑁−1
+
0.5
𝑛
SAMPLING AND METROLOGY
19. 19
The harvesting engine developed by Science-Metrix
searches specific sites, including Scielo, PubMed
Central, Research Gate and CiteSeerX
It also uses a locally hosted version of the metadata of
large-scale specialised repositories such as arXiv
It systematically harvests metadata from institutional
repositories listed in ROAR and OpenDOAR
Finally, and in addition, a portion of the harvesting
engine works in the cloud and searches for freely
available papers
MEASURING THE % OF OA PAPERS
20. 20
For Gold Journal OA articles, an estimate of the
proportion of papers was made from the random
sample by matching the journals that were known to be
Gold to the year a paper was published
Journals were obtained from the Directory of Open
Access Journals (DOAJ) and the list of OA journals in
PubMed Central
This was done by matching journals’ ISSN, E-ISSN and
names from Scopus to the relevant records in the
sample
MEASURING THE % OF OA PAPERS
21. 21
Evolution of the proportion of OA scientific papers as
measured in April 2013 and April 2014, 1996–2013
RESULTS
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
%ofpapersavailableinOA
Adjusted OA April 2014
Adjusted OA April 2013
Measured OA April 2014
Measured OA April 2013
22. 22
Translation of OA availability between April 2013 and
April 2014
RESULTS
y = 2E-21e0.0234x
R² = 0.976
y = 3E-17e0.0186x
R² = 0.9473
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
%ofpapersavailableinOA
Adjusted OA April 2014
Adjusted OA April 2013
23. 23
OA backfilling between April 2013 and April 2014 of
papers published in 1996–2011
RESULTS
y = 2E-112e0.1335x
R² = 0.9976
0
20,000
40,000
60,000
80,000
100,000
120,000
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
NumberofOApapersbackfilled
betweenApril2013andApril2014
24. 24
Growth of the number of papers available in OA as
measured in April 2014, 1996–2013
RESULTS
y = 2E-73e0.09x
R² = 0.9971
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
NumberofpapersinOA
Adjusted OA
Measured OA
25. 25
Scientific impact of OA and non-OA papers published in
1996–2011
RESULTS
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Averageofrelativecitations
(ARC,1=worldavergae)
OA
All Papers
Not OA
26. 26
Impact contest by OA type by field, 2009–2011
N.B. Here, Gold refers to full-Gold journals, not to Gold papers in hybrid journals
RESULTS
1st place 2nd place 3rd place Least impact
Type ARC Type ARC Type ARC Type ARC
Agriculture, Fisheries & Forestry Green OA 1.57 Other OA 1.32 Not OA 0.88 Gold OA 0.51
Biology Other OA 1.37 Green OA 1.30 Not OA 0.69 Gold OA 0.47
Biomedical Research Other OA 1.23 Green OA 1.10 Gold OA 0.91 Not OA 0.65
Built Environment & Design Green OA 1.56 Other OA 1.28 Not OA 0.86 Gold OA 0.19
Chemistry Other OA 1.34 Green OA 1.28 Not OA 0.95 Gold OA 0.34
Clinical Medicine Other OA 1.56 Green OA 1.08 Gold OA 0.64 Not OA 0.63
Communication & Textual Studies Other OA 1.82 Green OA 1.51 Not OA 0.66 Gold OA 0.73
Earth & Environmental Sciences Green OA 1.46 Other OA 1.26 Gold OA 0.98 Not OA 0.72
Economics & Business Green OA 1.46 Other OA 1.30 Not OA 0.71 Gold OA 0.22
Enabling & Strategic Technologies Green OA 1.68 Other OA 1.53 Not OA 0.83 Gold OA 0.52
Engineering Green OA 1.84 Other OA 1.38 Not OA 0.83 Gold OA 0.55
General Arts, Humanities & Social Sciences Green OA 1.74 Other OA 1.49 Not OA 0.73 Gold OA 0.13
General Science & Technology Green OA 2.56 Other OA 2.24 Gold OA 0.69 Not OA 0.11
Historical Studies Green OA 2.37 Other OA 1.61 Not OA 0.76 Gold OA 0.37
Information & Communication Technology Green OA 1.62 Other OA 1.36 Gold OA 0.76 Not OA 0.69
Mathematics & Statistics Green OA 1.35 Other OA 1.11 Not OA 0.75 Gold OA 0.67
Philosophy & Theology Green OA 1.72 Other OA 1.63 Gold OA 0.86 Not OA 0.72
Physics & Astronomy Green OA 1.43 Gold OA 1.18 Other OA 1.04 Not OA 0.73
Psychology & Cognitive Sciences Other OA 1.35 Green OA 1.31 Not OA 0.66 Gold OA 0.59
Public Health & Health Services Other OA 1.38 Green OA 1.30 Not OA 0.76 Gold OA 0.71
Social Sciences Green OA 1.54 Other OA 1.44 Not OA 0.76 Gold OA 0.52
Visual & Performing Arts Green OA 2.16 Other OA 1.86 Not OA 0.77 Gold OA 0.29
Total Green OA 1.53 Other OA 1.36 Not OA 0.76 Gold OA 0.61
Field
27. 27
OA is a fast-moving phenomenon
It is also quite complex to understand and to
measure
Uptake of OA limited by heterogeneity and
challenges in discovery
CONCLUSION
28. 28
Growth of OA should be understood to
comprise two main aspects:
Organic growth as more publishers, researchers
and librarians increasingly make freshly
published papers freely available
“Backfilling” of already published papers by
researchers and librarians and dis-embargoing
of previously locked papers by publishers
contribute to a translation of the availability
curve
CONCLUSION
29. 29
On average, openly accessible papers have a decidedly
greater impact
In 7 fields, publishing in subscription-based journals and not
self-archiving is the worst possible strategy
In these fields, Gold journals surpass in impact publishing in
subscription-based journals with no self-archiving, even if
these Gold journals are much younger and less established
No longer adequate to publish and forget papers
One has to actively market papers and think of post-
publishing communication strategies
Considering the high value of the knowledge contained
in papers, and their high public cost, working to
maximise diffusion and uptake is the least one can do
CONCLUSION
30. 30
Visit 1science to learn about our solution to radically
facilitate the discovery and use of peer-reviewed open
access papers
Visit Science-Metrix to learn about our evaluation and
measurement activities
THANK YOU