1. Technology Blitz:
Making Sense of the
Technology Landscape
for Publishers
The State of Linking
and Searching
Chuck Koscher
Director Of Technology
SSP
Technology Blitz 1
2. In the beginning …
Tim Berners-Lee conceived the WWW to provide
universal access to documents on the network
• Uniquely identify everything
• A way to access them (Hyper Links)
To do this he invented 3 things …
1. A simple protocol to communicate over (HTTP)
2. A common document language (HTML)
3. An identification scheme (UniversalResource Identifier ––URI)
(Uniform ResourceLocator – URL)
Identifier URI)
SSP
Technology Blitz 2
3. URLs are Unreliable
Links to a location are fragile
Documents move around the Web and change over time (versions)
Available in multiple formats and multiple places (appropriate copy)
Spinellis, “The Decay and Failures of Web References”. Communications of
the ACM. 46(1), 71-77, 2003
Examined 4300 URL citations in articles from ACM and IEEE
After 4 years 40-50% of referenced URLs become inaccessible
Larwrence et al. “Persistence of Web References in Scientific Research”.
IEEE Computer, 34(2), 26-31, 2001
Examined 67,000 URL citations from 270,000 articles
23% from 1999 and 54% from 1994 were invalid
Brooks, “Broken Links: Just How Rapidly Do Science Education Hyperlinks
Go Extinct?”. University of Nebraska, 2003
Loss after 35 months:
‘.edu’: 40% ‘.com’ : 53% ‘.org’ : 25%
Technology link to an identifier not to a location
SSP Blitzto an identifier not to a location
link 3
4. Why link in the first place
‘Publish or perish’ corollary: ‘if it is not on-line it doesn’t exist
If it is on-line but not linked to no one will find it
If it is on-line but does not link out it is not as valuable
Linked references are expected
An actionable bibliography is no longer an enhancement
‘Added value’ features are becoming more sophisticated
(example. Cited-by links)
To positively influence academic standards
Many students are likely to use information found on search engines and
various Web sites as research material…and faculty often report concerns
about the number of URLs included in research paper bibliographies and the
decrease in citations from traditional scholarly sources. Pew Internet and
American Life Project College Students Survey. http://www.pewinternet.org/
SSP
Technology Blitz 4
5. Topics
What is a DOI
Why CrossRef
How CrossRef and DOIs work
New linking features of the DOI & at CrossRef
OpenURL & CrossRef
Local Link Servers & CrossRef
SSP
Technology Blitz 5
6. What is a DOI
A Digital Object Identifier (DOI), is a unique string created to identify a
piece of intellectual property in an online environment.
ANSI/NISO Z39.84 -2000
The prefix is assigned by an IDF registration agency (CrossRef)
The suffix is an opaque identifier assigned by the publisher
Suffixes must be unique within a prefix
Some publishers use a transparent scheme
A DOI is intended to identify a ‘work’ not a manifestation
We’ll need multiple resolution to fully realize this goal
SSP
Technology Blitz 6
7. What is CrossRef.org?
An independent, non-profit membership organization
A cross-publisher citation linking network based on the
DOI system
An official DOI registration agency
There are now five DOI registration agencies. CrossRef was the
first, and is the only one whose mission is to enable citation
linking and serve the scientific/scholarly community
SSP
Technology Blitz 7
8. CrossRef’s Mission
To serve as the complete citation linking backbone for all scholarly
literature in electronic form, focusing on services that are best
achieved through collective agreement by publishers
– Books
– Conference proceedings
– Dissertations
– Patents
– Gray literature
– Etc…
SSP
Technology Blitz 8
9. Key benefits of the CrossRef system
As a business infrastructure for linking
No bilateral linking agreements needed – an agreement with
CrossRef is a linking agreement with all CrossRef publishers
Publishers maintain their own business models while enhancing
value and functionality of their content
Collaborative foundation for future developments in information
access
As a technology service
A search engine to find DOIs based on an article’s metadata
(title / author / volume / issue / page / year)
Operates an OpenURL 1.0 resolver
As a technology provider
Standards and industry coordination
Linking expertise
SSP
Technology Blitz 9
10. The CrossRef / DOI Community
CERN
…and more !!
…and more !!
…and more !!
• Gateway to the DOI world
• Develops and maintains
the DOI standard
• Develops and maintains the
Handle system upon which
the DOI executes
SSP
Technology Blitz 10
11. How is CrossRef used
by Publishers …
1. Deposit the metadata for their articles in an XML file
(Journal / Authors / Article Title / Volume / Issue /Page / Year)
2. Parse their articles and extract the elements of each reference then query
CrossRef to find the DOIs
3. Make the reference link active by using http://dx.doi.org/<some-DOI>
May cache the DOIs for later use
Retrieve the DOIs as needed
by Libraries …
Directly query CrossRef
(CERN is placing doi.crossref.org/resolve?<an-OpenURL> in their links)
Through local link servers to ‘reverse lookup’ the metadata for a DOI
by the public …
Indirectly every time they follow a (CrossRef) DOI enabled link
Find DOIs using the free lookup form at www.crossref.org
SSP
Technology Blitz 11
12. What the DOI does
Provides truly persistent hyper links to the identified entities
Makes linking more reliable
Enables locating the entity through the CrossRef database
Makes linking more accessible
SSP
Technology Blitz 12
13. How the DOI appears in use
SSP
Technology Blitz 13
15. How DOIs & CrossRef Work
HTTP://dx.doi.org/10.1016/S0040-4039(01)80789-9
Constant address DOI used to ‘lookup’
of the Resolver the entity’s URL
Publisher of the 1
1. Deposit article meta-data to
target entity
CrossRef with the DOI & URL
2 2. Query CrossRef for the DOI using
Publisher of the meta-data
referring entity 3. Present the referring article to the
user with reference links active as
3 Referrer
4 5 DOIs
Source
User 4. The user clicks on a link
5. Their browser sends the link to the
DOI Resolver
Referent 6. The Resolver finds the URL and re-
Service directs the user to the target
Target 6 document
SSP
Technology Blitz 15
16. CrossRef Growth is Steady
Members Journals
300 10000
9000
250
8000
7000
200
6000
150 5000
4000
100
3000
2000
50
1000
0 0
ry h y r r ry h y r
c ay Ju
l be be c ay Ju
l
be
ua ar M ua ar M
n M te
m em n M te
m
Ja p v Ja p
Se No Se
SSP
Technology Blitz 17
17. Ja
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
0
nu
Fe ary
br
ua
ry
SSP
Ma
rch
Ap
r il
Ma
y
Ju
Technology Blitz
ne
Ju
Au ly
Se gus
p te t
mb
Oc er
t
No ober
ve
m
De be
ce r
mb
Ja er
nu
F e ar y
br
DOI Clicks
ua
ry
Ma
rch
Ap
ril
Ma
y
Ju
ne
DOI Clicks
Ju
Au ly
Se gus
p te t
mb
Oc er
to b
er
18
18. Queries and Matches
# of Queries Matches
16,000,000
15,000,000
14,000,000
13,000,000
12,000,000
11,000,000
10,000,000
9,000,000
8,000,000
7,000,000
6,000,000
5,000,000
4,000,000
3,000,000
2,000,000
1,000,000
0
02 2 02 02 2 2 02 2 2 2 02 2 03 3 03 03 3 3 03 3 3 3
n- eb-0 ar- pr- ay-0 un-0 Jul- ug-0 ep-0 ct-0 ov- ec-0 an- eb-0 ar- pr- ay-0 un-0 Jul- ug-0 ep-0 ct-0
Ja F M A M J A S O N D J F M A M J A S O
SSP
Technology Blitz 19
19. Deposits
Current Article Backfile Articles
480,000
450,000
420,000
390,000
360,000
330,000
300,000
270,000
240,000
210,000
180,000
150,000
120,000
90,000
60,000
30,000
0
ay
ay
ly
ly
e
e
r il
il
ch
ch
Fe r y
st
st
Fe r y
De b e r
er
r
Ja r
r
Oc r
ry
ry
r
n
n
be
Ju
be
Ju
be
be
Ap
Ap
gu
gu
M
M
ua
a
ar
ar
Ju
Ju
ob
ua
ua
nu
m
to
m
em
em
M
M
Au
Au
n
ct
br
br
ve
ce
Ja
O
pt
pt
No
Se
Se
SSP
Technology Blitz 20
20. Using & Getting DOIs is free
(there is a fee for creating DOIs)
Annual fee for libraries has been dropped immediately
Lookup fee will be eliminated as of Jan 04
SSP
Technology Blitz 21
21. Why DOIs are a good thing
Its more than just an identifier
A complete resolver system operated by CNRI and the RAs
A governing body (the International DOI Foundation) for
establishing policies and to guide the maintenance, promotion
and development of the entire system.
A tiered organization to provide support at multiple levels
IDF
RAWG CNRI
RAs RAs RAs
Prefix Prefix
Holders Holders
DOI Users DOI Users DOI Users
SSP
Technology Blitz 22
22. Parameter Passing(a DOI feature)
Parameter passing
Allows the link source to pass information through the DOI
resolver to the link target (very cool!)
Based on OpenURL 1.0
Source
HTTP://dx.doi.org/10.1038/suffix?from=EBSCO
HTTP://dx.doi.org/10.1038/suffix?rfr_dat=cr_pub=EBSCO
With Parameter Passing and HTTP://www.nature.com/articles?id=123
With Out Parameter Passing
With
a standard vocabulary Registered URL
Target
HTTP://www.nature.com/articles?id=123
HTTP://www.nature.com/articles?rft_dat=id=123&from=EBSCO
HTTP://www.nature.com/articles?rft_dat=id=123&rfr_dat=cr_pub=EBSCO
SSP
Technology Blitz 23
24. The appropriate copy problem &
Multiple Resolution
Truly a problem seeking a multiple-resolution solution
1. Publisher A produces and hosts a journal from 1999 through 2002
2. Publisher B acquires the journal in 2003 and hosts all back issues
Pre 2003 DOIs are transferred to publisher B and the URLs are reset
to publishers B’s
3. A customer subscribes to publisher A and wants to go there for pre-2003
issues.
Multiple resolution of DOIs will solve this problem in the future
Local link servers solve the problem today
SSP
Technology Blitz 25
25. Multiple Resolution (cont.)
Multiple Resolution
Infrastructure will be provide by the IDF
(browser script , server component, etc)
Target owner can control what is displayed at the link
Update of multiple target information done once in theDOI,
1000’s of links do not need to be touched
3rd parties will be able to supply services as one of the choices.
(e.g. CopyRight Clearance Center’s RightsLink)
Early forms of multiple resolution are available now
(Content Directions)
SSP
Technology Blitz 26
26. Forward Linking
Cited-By Service Coming to CrossRef
CrossRef forward linking will …
− Allow members to retrieve the DOI and extended
meta-data for the articles that cite a given article
− Utilize the same types of interfaces used with the
current service (deposit and query)
− Be built easily on top of existing member commitments
Couple meta-data deposits with DOI queries so we can
identify a reference’s source document.
CrossRef forward linking is still under development …
− Indications are very encouraging, final approval
expected in Q3
− Fees and policies are TBD, but…
… we expect forward linking to be part of the core
CrossRef service
SSP
Technology Blitz 27
28. OpenURL and DOIs/CrossRef
OpenURL and DOI are complementary technologies
CrossRef helps solve the appropriate copy problem by
providing a ‘reverse’ DOI lookup (DOI in / meta-data out)
http://doi.crossref.org/servlet/query?id=10.1006/jmbi.2000.4282&pid=<USR>:<PWD>
CrossRef offers an OpenURL 1.0 compliant resolver
http://doi.crossref.org/resolve?pid=<USR>:<PWD>&aulast=Maas LRM
&title= JOURNAL OF PHYSICAL OCEANOGRAPHY&volume=32
&issue=3 &spage=870&date=2002
(This resolver will redirect you to the target document)
SSP
Technology Blitz 29
29. What does an OpenURL look like
• Add the referent
referrer
fixed
Start with the
http://resolver.example.org?
elements
BaseURL
url_ver=z39.88-2003
&url_ctx_fmt=ori:fmt:kev:mtx:ctx Version
&rfr_id=ori:rfr:publisher.com Declare
&rft_id=ori:doi:10.1126/science.275.5304.1320 ContextObject referrer
Identifier for
format
Add the metadata
&rft_id=ori:pmid:9036860
elements
&rft_val_fmt=ori:fmt:kev:mtx:journal
&rft.genre=article Declare the
&rft.atitle=Isolation of a common receptor Include identifiers
OpenURL “Referrer” Domain of
metadata format
for … referrer
&rft.jtitle=Science (rft) format
Shows referent
(fmt) is by value (val) Indicates we are using key-
&rft.aulast=Bergelson
encoded-values (kev) as actual
Substitute
&rft.auinit=J defined in the “journal” for the item
… values matrix
(mtx) in the registry.
being referenced
SSP
Technology Blitz 30
30. Why Use an OpenURL?
OpenURL is a sophisticated (and complex) subject, but it …
Establishes a uniform structure for the transport of meta-data and
objects
Supports context sensitive linking (local link servers)
Supports algorithmic linking (links based on meta-data)
OpenURL is only a specification, it is not a system
http://resolver.example.org?
A service must provide the resolver
CrossRef operates an OpenURL 1.0 resolver:
http://doi.crossref.org/resolve?pid=<USR>:<PWD>&aulast=Maas …
Local link servers (SFX, EBSCO’s LinkSource) are resolvers
SSP
Technology Blitz 31
31. Local Link Servers
Solves the appropriate copy problem
Localizes control over resource linking
Integrate with local services
Centralizes administration
Optimizes use of licensed resources
Tracks resource utilization
Linking consistency for the end users
Often are based on OpenURL
Algorithmic linking driven by database lookups
Must subscribe to the vendor for database updates
SSP
Technology Blitz 32
32. A&I (Ovid) as link source
Link Source
SSP
Technology Blitz Link Menu Link Target 33
33. OpenURL Linking: SFX & CrossRef
DOI
link
DOI
Server
References
OpenURL
http://dx.doi.org/
doi=10.1034/j.1399-0039.2000.560502.x Aware
http://www.sfx.edu/? OpenURL
doi=10.1034/j.1399-0039.2000.560502.x
Metadata
DOI Server
SSP
Technology Blitz 34
34. CrossRef and Full Text Search
Part of CrossRef’s mission statement:
Facilitate collaborative action
Enable access for the scholar to the primary literature
Many CrossRef members are keenly interested in a
readily accessible broad full text search service
Two options are being considered, each have pros & cons
Initial focus is on an ‘open’ service (very low cost,
easier to implement, lower barriers)
Next focus is on a proprietary service that offers
more robust features specific to STM publications
Look for an announcement before year end.
SSP
Technology Blitz 35
35. www.CrossRef.org
the central source for reference linking
Linking Scholarly Communities Together
Chuck Koscher ckoscher@crossref.org
SSP
Technology Blitz 36
Editor's Notes
Publish or Perish What's the origin of this phrase? Our patrons want to know, but the origin has been lost to the ages! F. A. Hetzel has this to say about "publish or perish": Although no one seems to recall who coined the phrase Publish or Perish to describe the assertion that a university or even college teacher will not be promoted within the system of American higher education unless he conducts original research and proves his capabilities by publishing, the words have provided scholars and their publishers with an unparalleled opportunity to defend the faith. [pp. 101-102] Hetzel, F.A. (1973). Publish or perish, and the competent manuscript. Scholarly publishing , 4(2), 101-109.
JBC also took the references and submitted them to CrossRef. CrossRef checked the references against its metadata database of article information and returned DOIs that matched. JBC also has links to Medline, Infotrieve for document delivery and to other articles available on the HighWire platform. The CrossRef links go directly to the cited article. JBC didn’t need to have bilateral agreements, figure out what other publishers had available online, figure out publishers algorithms. Reference were sent to CrossRef and DOIs were returned.
Clicking the first reference links takes u off to the cited journal on ScienceDirect. Used to be IDEAL – DOI seamlessly goes to new location. Elsevier displays DOIs and have a help button to explain about using DOIs. User education is very important since this is so new.