How Much Does $1.7 Billion Buy?
A Comparison of Scientific Journal
Articles to Their Pre-print Versions
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
2
Research Team
Peter Broadwell @peterbroadwell
Sharon E. Farb @farbthink
Todd R. Grappone @liber8er
Martin Klein @mart1nkle1n
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
3
• Global STM publishing market $25.2 billion
• 55% of STM comes from USA
• 40% of STM is for journals ($10 billion)
• 68-75% is coming out of libraries’
budgets
Explosion of data
intensive research
Increase in Open Access
USA is the largest contributor in terms of global output of research papers: 23%
Global Trends in Scientific Output
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
4
UC Publication Impact
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
5
What is the $ of Knowledge?
Source: ARL Statistics, Association of Research Libraries, Washington, D.C.
Prices Set by
Profit Maximizing
Publishers are
Determined NOT
by costs, but by
what the market
will bear.
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
6
OA by Disciplines
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
7
Pre-print v. Final Published?
arXiv
• The annual
budget for
arXiv is circa
$826,000 for
2013 - 2017
Final Published
• English language
STM journals:
$10 billion in 2013
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
8
Role of Publisher*
• Entrepreneur
• Copyediting
• Tagging
• Marketer
• Distributor
• E-Host
*STM Report 2015: An Overview of Scientific and Scholarly Publishing
Michael Ware and Michael Mabe
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
9
Working Assumptions
• If the publishers’ argument is valid, the text of a
pre-print paper should vary significantly from its
corresponding post-print version
• By applying standard similarity measures, we
should be able to detect and quantify such
differences.
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
10
Methodology
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
11
Data Gathering
Assembling a pre-print corpus
• Source: arXiv.org
• 1.1 million publication records
• metadata (typical DC, including DOI) obtained
via OAI-PMH interface
• PDF versions of articles available via Amazon’s
S3 service (using “requester pays” option)
• Latest version used if multiple available
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
12
Data Gathering
Finding a matching post-print corpus
• Extract DOIs from arXiv metadata
• 44.5% or articles have DOI
• CrossRef’s Metadata Search API
• Match by DOI: download XML marked up or
PDF full-text version of article
• Metadata included in XML
• Access based on UCLA’s serial subscriptions
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
13
Data Processing
Both pre-print and post-print corpora
• Convert PDF to XML where needed
• Extract sections from XML
• Title, authors, abstract, body, references, publication
date
• Occasionally, sections are missing
• Also try extraction from arXiv’s OAI-PMH interface
• Data provided by authors: abstracts, titles
• Texts often have markup, complicating comparisons
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
14
Text Comparison Methods
• Length ratio
• Levenshtein ratio
• Cosine similarity
• Simhash
• Jaccard coefficient
• Sorensen similarity
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
15
Text Comparison Methods 1/3
Length ratio:
Ratio of the shorter text’s length to the longer text’s length
Example
“Four score and seven”  “Four score and seven years ago“
Four score and seven (20 characters)
Four score and seven years ago (30 characters)
Length ratio: 20/30 = .667
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
16
Text Comparison Methods 2/3
Levenshtein edit distance:
Number of operations (insert, delete, substitute) needed to
transform one string into the other
Example
“The rose is red”  “The roose is rot”
• rose -> roose (1 insertion)
• red -> rod (1 substitution, ‘o’ for ‘e’)
• rod -> rot (1 substitution, ‘t’ for ‘d’)
Levenshtein distance: 1 + 1 + 1 = 3
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
17
Text Comparison Methods 2/3
Levenshtein edit distance:
Number of operations (insert, delete, substitute) needed to
transform one string into the other
Example
“The rose is red”  “The roose is rot” (edit distance: 3)
Levenshtein ratio:
length of text 1 + length of text 2 – edit distance
length of text 1 + length of text 2
15 + 16 – 3 = .903
15 + 16
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
18
Text Comparison Methods 3/3
Cosine similarity:
Two texts have a high similarity (max value: 1) if they share
many of the same words in the same proportions.
In practice, common words (“and,” “the,” “is”) are ignored, and
words that are more characteristic of a given text have
greater importance.
Example
“QCD sum rule approach for scalar mesons as four-quark states”
“QCD sum rule approach for the light scalar mesons as four-quark states”
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
19
Text Comparison Methods 3/3
Cosine similarity:
Two texts have a high similarity (max value: 1) if they share
many of the same words in the same proportions.
In practice, common words (“and,” “the,” “is”) are ignored,
and words that are more characteristic of a given text have
greater importance.
Example
qcd sum rule approach for scalar mesons as four quark states
qcd sum rule approach for the light scalar mesons as four quark states
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
20
Text Comparison Methods 3/3
Cosine similarity:
Two texts have a high similarity (max value: 1) if they share
many of the same words in the same proportions.
In practice, common words (“and,” “the,” “is”) are ignored,
and words that are more characteristic of a given text have
greater importance.
Example
qcd sum rule approach for scalar mesons as four quark states
qcd sum rule approach for the light scalar mesons as four quark states
Cosine similarity: 0.8955
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
21
Comparison of Sections
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
22
Comparison of Sections
Title
Abstract
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
23
Comparison of Sections
Body
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
24
Comparison of Sections
(Authors) References
(future
work)
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
25
Preliminary Findings
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
26
Data Gathering Results
Both pre-print and post-print corpora
• 11,017 full text articles matched by DOI
• Most papers within date range 2003 – 2015
• 96% of post-prints published by Elsevier
• “Physics Letters B” top journal
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
27
Number of Papers by Category (arXiv)
01000200030004000
hep−ph
m
ath
hep−th
astro−ph
gr−qc
hep−excond−m
at
nucl−th
physics
nucl−ex
m
ath−ph
cs
hep−lat
q−bio
quant−ph
nlin
stat
q−fin
solv−int
funct−an
q−algcom
p−gas
Physics
Non−Physics
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
28
Title Comparison
Browse findings at http://sologlo.library.ucla.edu/prepost
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
1100020003000400050006000700080009000
0102030405060708090100
Length
Levenshtein
Cosine
Percentage
Papers
Similarity (1 = most similar)
%ofallpapers
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
29
Abstract Comparison
Browse findings at http://sologlo.library.ucla.edu/prepost
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
1100020003000400050006000700080009000
0102030405060708090100
Length
Levenshtein
Cosine
Percentage
Papers
Similarity (1 = most similar)
%ofallpapers
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
30
Body Comparison
Browse findings at http://sologlo.library.ucla.edu/prepost
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
110002000300040005000600070008000
0102030405060708090100
Length
Levenshtein
Cosine
Percentage
Papers
Similarity (1 = most similar)
%ofallpapers
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
31
10.1016/j.physletb.2006.10.068
Physics Letters B
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
32
Publication Dates0500100015002000
1−10
11−20
21−30
31−40
41−50
51−60
61−70
71−80
81−90
91−100
101−200
201−300
301−400
401−500
>500
Days
Pre−print first
Post−print first
Papers
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
33
Author Similarity (very preliminary)
• Not trivial to determine due to
• Different name formatting
• Varying metadata quality
“Save” statement: 7,731/9,935 articles (78%)
have identical author lists (number and ranks of authors)
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
34
Next Steps
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
35
More to Come!
• Refine extraction/comparison of authors and
references
• Overlay with ISI Impact factor and usage statistics
• Expand to other disciplines/publishers
• Social Sciences
• Humanities
• Economy
• Linguistics
• Operate at scale
• Seeking collaboration
• Enable other institutions to conduct similar experiments
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
36
Discussion
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
37
Questions
• Collecting imperative
• New forms of scholarly communication
• Developing data collections
• UC faculty is contributing to 1/12 of Elsevier
publications, yet we license back 100%
• Open Access
• Not even APCs added to the calculations
• Revenue: $182 million in 2012, growing
30% per year
• What about preservation?
How Much Does $1.7 Billion Buy?
@farbthink, @liber8er, @peterbroadwell, @mart1nkle1n
#CNI15F, Washington, DC, 12/15/2015
38
How Much Does $1.7 Billion Buy?
A Comparison of Scientific Journal
Articles to Their Pre-print Versions
Sharon E. Farb @farbthink
Peter Broadwell @peterbroadwell
Martin Klein @mart1nkle1n
Todd R. Grappone @liber8er

How much does $1.7 billion buy?

  • 1.
    How Much Does$1.7 Billion Buy? A Comparison of Scientific Journal Articles to Their Pre-print Versions
  • 2.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 2 Research Team Peter Broadwell @peterbroadwell Sharon E. Farb @farbthink Todd R. Grappone @liber8er Martin Klein @mart1nkle1n
  • 3.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 3 • Global STM publishing market $25.2 billion • 55% of STM comes from USA • 40% of STM is for journals ($10 billion) • 68-75% is coming out of libraries’ budgets Explosion of data intensive research Increase in Open Access USA is the largest contributor in terms of global output of research papers: 23% Global Trends in Scientific Output
  • 4.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 4 UC Publication Impact
  • 5.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 5 What is the $ of Knowledge? Source: ARL Statistics, Association of Research Libraries, Washington, D.C. Prices Set by Profit Maximizing Publishers are Determined NOT by costs, but by what the market will bear.
  • 6.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 6 OA by Disciplines
  • 7.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 7 Pre-print v. Final Published? arXiv • The annual budget for arXiv is circa $826,000 for 2013 - 2017 Final Published • English language STM journals: $10 billion in 2013
  • 8.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 8 Role of Publisher* • Entrepreneur • Copyediting • Tagging • Marketer • Distributor • E-Host *STM Report 2015: An Overview of Scientific and Scholarly Publishing Michael Ware and Michael Mabe
  • 9.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 9 Working Assumptions • If the publishers’ argument is valid, the text of a pre-print paper should vary significantly from its corresponding post-print version • By applying standard similarity measures, we should be able to detect and quantify such differences.
  • 10.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 10 Methodology
  • 11.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 11 Data Gathering Assembling a pre-print corpus • Source: arXiv.org • 1.1 million publication records • metadata (typical DC, including DOI) obtained via OAI-PMH interface • PDF versions of articles available via Amazon’s S3 service (using “requester pays” option) • Latest version used if multiple available
  • 12.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 12 Data Gathering Finding a matching post-print corpus • Extract DOIs from arXiv metadata • 44.5% or articles have DOI • CrossRef’s Metadata Search API • Match by DOI: download XML marked up or PDF full-text version of article • Metadata included in XML • Access based on UCLA’s serial subscriptions
  • 13.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 13 Data Processing Both pre-print and post-print corpora • Convert PDF to XML where needed • Extract sections from XML • Title, authors, abstract, body, references, publication date • Occasionally, sections are missing • Also try extraction from arXiv’s OAI-PMH interface • Data provided by authors: abstracts, titles • Texts often have markup, complicating comparisons
  • 14.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 14 Text Comparison Methods • Length ratio • Levenshtein ratio • Cosine similarity • Simhash • Jaccard coefficient • Sorensen similarity
  • 15.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 15 Text Comparison Methods 1/3 Length ratio: Ratio of the shorter text’s length to the longer text’s length Example “Four score and seven”  “Four score and seven years ago“ Four score and seven (20 characters) Four score and seven years ago (30 characters) Length ratio: 20/30 = .667
  • 16.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 16 Text Comparison Methods 2/3 Levenshtein edit distance: Number of operations (insert, delete, substitute) needed to transform one string into the other Example “The rose is red”  “The roose is rot” • rose -> roose (1 insertion) • red -> rod (1 substitution, ‘o’ for ‘e’) • rod -> rot (1 substitution, ‘t’ for ‘d’) Levenshtein distance: 1 + 1 + 1 = 3
  • 17.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 17 Text Comparison Methods 2/3 Levenshtein edit distance: Number of operations (insert, delete, substitute) needed to transform one string into the other Example “The rose is red”  “The roose is rot” (edit distance: 3) Levenshtein ratio: length of text 1 + length of text 2 – edit distance length of text 1 + length of text 2 15 + 16 – 3 = .903 15 + 16
  • 18.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 18 Text Comparison Methods 3/3 Cosine similarity: Two texts have a high similarity (max value: 1) if they share many of the same words in the same proportions. In practice, common words (“and,” “the,” “is”) are ignored, and words that are more characteristic of a given text have greater importance. Example “QCD sum rule approach for scalar mesons as four-quark states” “QCD sum rule approach for the light scalar mesons as four-quark states”
  • 19.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 19 Text Comparison Methods 3/3 Cosine similarity: Two texts have a high similarity (max value: 1) if they share many of the same words in the same proportions. In practice, common words (“and,” “the,” “is”) are ignored, and words that are more characteristic of a given text have greater importance. Example qcd sum rule approach for scalar mesons as four quark states qcd sum rule approach for the light scalar mesons as four quark states
  • 20.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 20 Text Comparison Methods 3/3 Cosine similarity: Two texts have a high similarity (max value: 1) if they share many of the same words in the same proportions. In practice, common words (“and,” “the,” “is”) are ignored, and words that are more characteristic of a given text have greater importance. Example qcd sum rule approach for scalar mesons as four quark states qcd sum rule approach for the light scalar mesons as four quark states Cosine similarity: 0.8955
  • 21.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 21 Comparison of Sections
  • 22.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 22 Comparison of Sections Title Abstract
  • 23.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 23 Comparison of Sections Body
  • 24.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 24 Comparison of Sections (Authors) References (future work)
  • 25.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 25 Preliminary Findings
  • 26.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 26 Data Gathering Results Both pre-print and post-print corpora • 11,017 full text articles matched by DOI • Most papers within date range 2003 – 2015 • 96% of post-prints published by Elsevier • “Physics Letters B” top journal
  • 27.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 27 Number of Papers by Category (arXiv) 01000200030004000 hep−ph m ath hep−th astro−ph gr−qc hep−excond−m at nucl−th physics nucl−ex m ath−ph cs hep−lat q−bio quant−ph nlin stat q−fin solv−int funct−an q−algcom p−gas Physics Non−Physics
  • 28.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 28 Title Comparison Browse findings at http://sologlo.library.ucla.edu/prepost 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 1100020003000400050006000700080009000 0102030405060708090100 Length Levenshtein Cosine Percentage Papers Similarity (1 = most similar) %ofallpapers
  • 29.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 29 Abstract Comparison Browse findings at http://sologlo.library.ucla.edu/prepost 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 1100020003000400050006000700080009000 0102030405060708090100 Length Levenshtein Cosine Percentage Papers Similarity (1 = most similar) %ofallpapers
  • 30.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 30 Body Comparison Browse findings at http://sologlo.library.ucla.edu/prepost 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 110002000300040005000600070008000 0102030405060708090100 Length Levenshtein Cosine Percentage Papers Similarity (1 = most similar) %ofallpapers
  • 31.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 31 10.1016/j.physletb.2006.10.068 Physics Letters B
  • 32.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 32 Publication Dates0500100015002000 1−10 11−20 21−30 31−40 41−50 51−60 61−70 71−80 81−90 91−100 101−200 201−300 301−400 401−500 >500 Days Pre−print first Post−print first Papers
  • 33.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 33 Author Similarity (very preliminary) • Not trivial to determine due to • Different name formatting • Varying metadata quality “Save” statement: 7,731/9,935 articles (78%) have identical author lists (number and ranks of authors)
  • 34.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 34 Next Steps
  • 35.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 35 More to Come! • Refine extraction/comparison of authors and references • Overlay with ISI Impact factor and usage statistics • Expand to other disciplines/publishers • Social Sciences • Humanities • Economy • Linguistics • Operate at scale • Seeking collaboration • Enable other institutions to conduct similar experiments
  • 36.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 36 Discussion
  • 37.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 37 Questions • Collecting imperative • New forms of scholarly communication • Developing data collections • UC faculty is contributing to 1/12 of Elsevier publications, yet we license back 100% • Open Access • Not even APCs added to the calculations • Revenue: $182 million in 2012, growing 30% per year • What about preservation?
  • 38.
    How Much Does$1.7 Billion Buy? @farbthink, @liber8er, @peterbroadwell, @mart1nkle1n #CNI15F, Washington, DC, 12/15/2015 38 How Much Does $1.7 Billion Buy? A Comparison of Scientific Journal Articles to Their Pre-print Versions Sharon E. Farb @farbthink Peter Broadwell @peterbroadwell Martin Klein @mart1nkle1n Todd R. Grappone @liber8er