1. Agenda
2:00 – 2:20 Status, Changes, Issues - Chuck Koscher
2:20 – 2:45 Metadata Quality - Chuck Koscher
2:45 – 3:00 System rewrite - Chuck Koscher
3:00 – 3:20 New Initiatives - Geoff Bilder
3:20 – 3:30 Coffee and Tea Break
3:30 – 5:00 Publisher system discussions
- PLOS - Richard Cave
- APA - Beverly Jamison
- J. Wiley & Sons - Matt Larson
CrossRef 2009 Annual Member Meeting - Boston
Page 1
1
2. System status
Query response time load
2002 2005 2009
10/13/2009 1.900 sec (heavy load) 1.2 (moderate load) 0.680 (light load)
2007 0.500 sec
2005 0.300 sec
2003 0.625 sec
CrossRef 2009 Annual Member Meeting - Boston
Page 2
2
3. System status
Deposit times (2009)
June July August Sept October
Less than 5 mn: 107888 (53 %) 141105 (83 %) 131661 (91 %) 83379 (57 %) 33546 (52 %)
Less than 1 hr: 35189 (17 %) 22389 (13 %) 10753 (7 %) 33829 (23 %) 18165 (28 %)
Less than 6 hr: 31666 (15 %) 3666 (2 %) 903 (0 %) 24201 (16 %) 8037 (12 %)
Less than 12 hr: 23482 (11 %) 181 (0 %) 0 (0 %) 2411 (1 %) 1855 (2 %)
Less than 18 hr: 4019 (1 %) 713 (0 %) 0 (0 %) 968 (0 %) 1950 (3 %)
Less than 24 hr 0 (0 %) 3 (0 %) 0 (0 %) 0 (0 %) 0 (0 %)
More than 24 hr: 0 (0 %) 1 (0 %) 1 (0 %) 1 (0 %) 0(0 %)
Total deposits: 203001 168058 143318 144790 63555
CrossRef 2009 Annual Member Meeting - Boston
Page 3
3
4. System status
Operations changes
Starting to use HAProxy for internal load balancing and redundancy
Using Alertra for external monitoring
VMWare virtual servers
Now migrating Oracle from 9 to 11g (allows active read-only standby)
Using Jira for all support@crossref.org activities
Berkeley DB based service for OpenURL DOI queries (metadata lookups)
Testing a process for <unstructured_citations>
Two technologies being used
refXpress from Inera which parses a reference and breaks it into parts
CitationQueryEngine, internally developed Lucene based search
Trial run
Number of unstructured citations : 1,158,889
Number of DOIs processed : 3,150,525
Number of refXPress DOIs found : 47,165
Number of CQE DOIs found (score>2.2) : 139,721
CrossRef 2009 Annual Member Meeting - Boston
Page 4
4
5. <citation key="10.1016/S0736-0266(02)00040-2-BIB21"> 1
<author>Valero-Cuevas</author>
<cYear>2000</cYear>
<unstructured_citation>
Applying principles of robotics to understand the biomechanics, neuromuscular control and clinical
rehabilitation of human digits. In: IEEE International Conference on Robotics and Automation, San
Francisco, CA, 2000.
</unstructured_citation>
</citation>
CQE: score 3.159, refXpress: unparsed, XMLquery: nomatch
CrossRef 2009 Annual Member Meeting - Boston
Page 5
5
9. <citation key="b64_1025"> 5
<unstructured_citation>
Xu C, Taoka S, Crofts AR, Govindjee (1991) Kinetic characteristics of
formate/formic acid binding at the plastoquinone reductase site in spinach
thylakoids. Biochim Biophys Acta 1098: 32-40
</unstructured_citation>
</citation>
CQE: score 2.39, refXpress: semi-parsed, XMLquery: nomatch
10.1016/0167-4838(91)90582-K 10.1016/0005-2728(91)90006-A
<journal_title> <journal_title>
Biochimica et Biophysica Acta (BBA) - Protein Structure and Biochimica et Biophysica Acta (BBA) - Bioenergetics
Molecular Enzymology </journal_title>
</journal_title> <contributors>
<contributors> <contributor sequence="first" contributor_role="author">
<contributor sequence="first" contributor_role="author"> <given_name>C</given_name>
<given_name>C</given_name> <surname>Xu
<surname>Xu </surname>
</surname> </contributor>
</contributor> </contributors>
</contributors> <volume>1098</volume>
<volume>1098</volume> <issue>1</issue>
<issue>1</issue> <first_page>32</first_page>
<first_page>32</first_page> <last_page>40</last_page>
<last_page>40</last_page> <year media_type="print">1991</year>
<year media_type="print">1991</year> <publication_type>full_text</publication_type>
<publication_type>full_text</publication_type> <article_title>
<article_title> Kinetic characteristics of formate/formic acid binding at the
Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids
plastoquinone reductase site in spinach thylakoids </article_title>
</article_title> CrossRef 2009 Annual Member Meeting - Boston
Page 9
9
10. <citation key="b53_366"> 6
<unstructured_citation>
53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T.
Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic acid-
and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 .
</unstructured_citation>
</citation>
CQE: score
3.41, refXpress: semi-parsed, XMLquery: nomatch
10.1034/j.1399-3011.1999.00077.x 10.1034/j.1399-3011.1999.00076.x
<doi type="journal_article"> <doi type="journal_article">
10.1034/j.1399-3011.1999.00077.x</doi> 10.1034/j.1399-3011.1999.00076.x</doi>
<issn type="print">1397-002X</issn> <issn type="print">1397-002X</issn>
<issn type="electronic">1399-3011</issn> <issn type="electronic">1399-3011</issn>
<journal_title>Journal of Peptide Research</journal_title> <journal_title>Journal of Peptide Research</journal_title>
<contributors> <contributors>
<contributor sequence="first" contributor_role="author"> <contributor sequence="first" contributor_role="author">
<given_name>O.S.</given_name> <given_name>O.S.</given_name>
<surname>Gudmundsson</surname> <surname>Gudmundsson</surname>
</contributor> </contributor>
</contributors> </contributors>
<volume>53</volume> <volume>53</volume>
<issue>4</issue> <issue>4</issue>
<first_page>403</first_page> <first_page>383</first_page>
<last_page>413</last_page> <last_page>392</last_page>
<year media_type="print">1999</year> <year media_type="print">1999</year>
<publication_type>full_text</publication_type> <publication_type>full_text</publication_type>
<article_title> <article_title>
The effect of conformation of the acyloxyalkoxy-based cyclic The effect of conformation on the membrane permeation of
prodrugs of opioid peptides on their membrane permeability coumarinic acid- and phenylpropionic acid-based cyclic
</article_title> CrossRef 2009 Annual Member Meeting - Boston
prodrugs of opioid peptides
Page 10
</article_title>
10
11. Changes (problems)
Notable software error this past 12 months
URLsin Handle rewritten with an older value (effected some publishers who
had deposited as-crawled URLs AND did URL mods via ownership transfer)
Medium-big changes
BookVolume-title/ author/ year rule: match on (only) Book title DOIs (sample2)
Added a false positive prevention rule
IF a (XML) query contains article title and that title is not an exact match with the
deposited title DO NOT MATCH except if author and first-page are an EXACT
match
Small-medium changes
Matching special characters in author names
Matching compound surnames
Removed ability to avoid conflicts
DOI character limits: "a-z", "A-Z", "0-9" and "-._;()/“
Title lock-down (ISSN check disallowing a deposit)
CrossRef 2009 Annual Member Meeting - Boston
Page 11
11
12. Issues
Ongoing …..
Too many alternative (publication) titles. Be CAREFULL!!! Can really mess up
title fuzzy matching (we do have a schematron monitor)
Deleting DOIs:
Change the publication title
New Change the DOI’s title (article title) to the DOI itself
Remove optional metadata
Set publication date to the deletion date
Conflicts
Conflicts reduce matching rates!
Timestamps
DOIs are deposited with a timestamp to ensure the latest metadata gets
inserted. Timestamps are essential when we have to re-process deposits.
Problems occur when DOI ownership occurs (e.g. what is the timestamp?)
Solution: Crossref will provide a means to retrieve current timestamp.
CrossRef 2009 Annual Member Meeting - Boston
Page 12
12
13. Conflicts
===========================================
Created: 2006-04-04 04:10:03.0
ConfID: 263262
CauseID: 246648646
OtherID: 64341060,
JT: Ophthalmic and Physiological Optics
MD: Brown, 15 ,3,163,1995,Differences in visual acuity between the eyes: determination of normal limits in a clinical
population
DOI: 10.1046/j.1475-1313.1995.9590568m.x(85579-R 263262-null )
DOI: 10.1016/0275-5408(95)90568-M
===========================================
enable-multiple-hits=“true"
<query key="MyKey1" enable-multiple-hits="false"> <query key="MyKey1" status="multiresolved" fl_count="2">
<journal_title>Ophthalmic and Physiological Optics</journal_title> <doi type="journal_article">10.1046/j.
<author>Brown</author> 1475-1313.1995.9590568m.x</doi>
<volume>15</volume> <issn type="print">02755408</issn>
<issn type="electronic">14751313</issn>
<first_page>163</first_page> <journal_title>Ophthalmic and Physiological Optics</
<year>1995</year> journal_title>
</query> <author>Brown</author>
<volume>15</volume>
<issue>3</issue>
<first_page>163</first_page>
Match Fails <year>1995</year>
<publication_type>full_text</publication_type>
</query>
<query key="MyKey1" status="multiresolved" fl_count="0">
<doi type="journal_article">10.1016/0275-5408(95)90568-M</
<query key="MyKey1" status="unresolved" fl_count="0">
doi>
<journal_title>Ophthalmic and Physiological Optics</journal_title> <issn type="print">02755408</issn>
<author>Brown</author> <issn type="electronic">14751313</issn>
<volume>15</volume> <journal_title>Ophthalmic and Physiological Optics</
<first_page>163</first_page> journal_title>
<author>Brown</author>
<year>1995</year>
<volume>15</volume>
</query> <issue>3</issue>
<first_page>163</first_page>
<year>1995</year>
CrossRef 2009 Annual Member Meeting - Boston
<publication_type>full_text</publication_type>
Page 13 </query> 13
14. Conflicts
CrossRef 2009 Annual Member Meeting - Boston
Page 14
14
15. Conflicts What to do?
Wiley/Blackwell owns this journal
Resolve_conflit.txt
H:email=ckoscher@crossref.org;op=PRIMARY
10.1046/j.1475-1313.1995.9590568m.x Process log (email)
<?xml version="1.0" encoding="UTF-8"?>
<doi_batch_diagnostic>
<submission_id>923604608</submission_id>
<record_diagnostic doi="10.1046/j.1475-1313.1995.9590568m.x">
<conflict status="Success" ids="85579,263262">
<msg>Marked as alias</msg>
<doi_list>
<doi>10.1016/0275-5408(95)90568-M</doi>
</doi_list>
</conflict>
</record_diagnostic>
</doi_batch_diagnostic>
CrossRef 2009 Annual Member Meeting - Boston
Page 15
15
17. Conflicts
What do YOU need to do
1. Go to http://www.crossref.org/06members/59conflict.html
2. Determine the nature of your conflicts
1. If they only involves your own DOIs
Construct the necessary conflict resolution files
and upload them using doi.crossref.org
Use the screens at doi.crossref.org Metadata
Admin tab to fix them
2. If they involve someone else’s DOIs
Construct the necessary conflict resolution files
Email them to support@crossref.org
Audits will be coming next year and un-resolved conflicts may co$t you
CrossRef 2009 Annual Member Meeting - Boston
Page 17
17
18. Metadata Quality
Metadataquality is good enough for linking (besides conflict problems) … but it is
not good enough for other purposes (display).
No Article
No Volume No Issue No Page No Author Title One Author No First Name Initial Only DOI Total
3,055,090 5,582,856 1,359,241 3,807,396 988,764 16,139,751 4,835,479 12,039,157 38,193,723
Schematron rules
Contributor checks
Alert if only single author is present Edition / Issue info
(not reported but recorded) Check for 'edition' in <edition>
Alert if only first initial is deposited Check for 'issue' in <issue>
Check for numbers in given name / surname Check for 'no' or 'number' in volume/issue/edition
Check for punctuation in given name / surname
currently checks for: _/*@()[] Citation Checks
Check for ndash in name All surname checks
Check for Jr or Sr in surname All page range check
Alert if all caps Year range check
Alert if more than 3 spaces are present
Alert if space in surname when no given name is present Article Title
Alert if surname ends with jr,JR Check for single word title
Alert if surname contains 'et. al. Alert if all caps
Alert if surname/given name contains & or &# Alert if title name contains & or &# (malformed entity)
(malformed entity)
Alert if multiple ??? are present Other
Alert for year beyond current year
Page Ranges Alert if neither first page or author are present
Alert for _ or - in first or last page Alert if more than 2 alternate titles
CrossRef 2009 Annual Member Meeting - Boston
Alert if first and last page are identical Alert if DOI contains character not in allowed
Page 18
18
19. Metadata Quality
What else is bad quality?
224,000 DOIs with bad page number (really effects matching)
<pages>
<first_page>305???306</first_page>
</pages>
DOI links that still work: 14,985 journals crawled in 2009
69.25% are confirmed good, 22.8% unconfirmed, 5% confirmed not good
sum(dois) sum(checked) sum(confirmed) sum(semiconfirmed) sum(nonconfirmed) sum(bad) sum(login)
25,977,348 361,514 206,204 44,168 82,565 1,140 16,950
Western Journal of Medicine 10.1136/ewjm.172.6.364 http://www.pubmedcentral.nih.gov/
Western Journal of Medicine 10.1136/ewjm.172.2.84 http://www.pubmedcentral.nih.gov/
Western Journal of Medicine 10.1136/ewjm.172.1.43 http://www.pubmedcentral.nih.gov/
Western Journal of Medicine 10.1136/ewjm.172.1.61-a http://www.pubmedcentral.nih.gov/
Western Journal of Medicine 10.1136/ewjm.174.2.103 http://www.pubmedcentral.nih.gov/
CrossRef 2009 Annual Member Meeting - Boston
Page 19
19
20. Metadata Quality Schematron reports are run once a week.
From: <support@crossref.org>
Date: October 3, 2009 12:42:27 PM EDT
To: <jstark@crossref.org>, <pfeeney@crossref.org>
Subject: Schematron Report for prefix(es) 10.1109
g.grenier@ieee.org The results of a weekly metadata quality check are listed below. The affected DOIs were
deposited successfully but the metadata attached to the DOI may need some attention.
http://www.crossref.org/schematron/data/st_20091003_5431.xml
http://www.crossref.org/schematron/data/st_20091003_5347.xml
http://www.crossref.org/schematron/data/st_20091003_5430.xml
http://www.crossref.org/schematron/data/st_20091003_5348.xml
http://www.crossref.org/schematron/data/st_20091003_5411.xml
http://ftp.crossref.org/schematron/data/st_20091010_2004.xml
http://ftp.crossref.org/schematron/data/st_20091010_3553.xml
<person_name sequence="first" contributor_role="author">
http://ftp.crossref.org/schematron/data/st_20091003_5411.xml
<given_name>AdriËnne M.</given_name>
CrossRef 2009 Annual Member Meeting - Boston
<surname>Mendrik $^*$</surname>
http://ftp.crossref.org/schematron/data/st_20091003_59687.xml
Page 20
</person_name>
http://ftp.crossref.org/schematron/data/st_20091003_5837.xml
20
21. System rewrite
May 2008: Board endorses plan to address a significant rewrite/upgrade
June2008-Feb 2009: TWG subgroup (rewrite2) meets to define requirements
and other project parameters
Oct:
Scenario options documented and cost comparisons profiled, started
negotiations with Atypon re: new contract.
Nov: Report presented to board and to rewrite2 group for direction and validation
Dec 08- May09: Negotiations with Atypon
Oct 12,09: New contract signed
Core Needs
• That CrossRef should ultimately own the intellectual property in the software at
the heart of its operations
• That CrossRef should not risk or jeopardize the reliability and throughput
offered by the existing system
• That CrossRef should remain free to develop further applications for other
purposes which need to interface to the reference-linking systems and/or its data
• Recognized that CrossRef is not likely to establish internal resources sufficient
to manage independently the development and maintenance of this magnitude a
system.
CrossRef 2009 Annual Member Meeting - Boston
Page 21
21
22. System rewrite
2009 2010 2011
System
Existing System (EDS)
EDS mods to use NQS
New Query System (NQS)
New Deposit System (NDS)
Both query and deposit transactions
Deposit transactions
Query transactions
NQS will make use of the existing Oracle database (minimal mods to the schema)
EDS will communicate with NQS via JMI (Java Message Interface)
May use Spring framework, if not initially more likely later on (NDS)
NDS will include significant data model and process changes
Title
management
Conflicts
Oracle schema cleanup
NQS/NDS combined will allow integration of currently stand-alone functions (OAI-PMH)
After NQS/NDS: possibly augment/replace back end database (satellite DBs)
CrossRef 2009 Annual Member Meeting - Boston
Page 22
22
23. Current organization
System rewrite
www.crossref.org /openurl
/iPage
/query/xref.cgi doi.crossref.org /*
/*
oai.crossref.org/OAIHandler
HAProxy
openurl, iPage openurl, iPage
www System System
SIGG,other SIGG,other
(Apache) (resin) (resin)
(Tomcat) (Tomcat)
Deposit
Processor
(Java app)
Oracle Oracle
(CMD) Stored Query
(prime)
Processor
BerkelyDB(2) Daily (Java app)
Oracle
Replication Constant (passive-stndby)
Lucene (2) Replication
CrossRef 2009 Annual Member Meeting - Boston
Page 23
23
24. System rewrite New organization
www.crossref.org/openurl doi.crossref.org/* oai.crossref.org/OAIHandler
/iPage
/query/xref.cgi
/*
Hisham?
HAProxy HAProxy
(standby)
NQS www EDS
(Spring)
(Tomcat)
(Apache) (resin)
Deposit
Processor
Metadata (Java app)
Query Dispatch
Access
Metadata Citation Lookup
Direct JDBC
Persistent Data Access
Lucene BerkelyDB Oracle Oracle Oracle
(CMD) (active-stndby) Constant (prime)
Replication
CrossRef 2009 Annual Member Meeting - Boston
Page 24
24
25. New initiatives, technical perspectives.
… Geoff
CrossRef 2009 Annual Member Meeting - Boston
Page 25
25