CrossRef Technical Working Group

2,391 views

Published on

Presentation from Chuck Koscher at the 2009 Technical Working Group meeting in Cambridge, MA

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,391
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CrossRef Technical Working Group

  1. 1. Agenda 2:00 – 2:20 Status, Changes, Issues - Chuck Koscher 2:20 – 2:45 Metadata Quality - Chuck Koscher 2:45 – 3:00 System rewrite - Chuck Koscher 3:00 – 3:20 New Initiatives - Geoff Bilder 3:20 – 3:30 Coffee and Tea Break 3:30 – 5:00 Publisher system discussions - PLOS - Richard Cave - APA - Beverly Jamison - J. Wiley & Sons - Matt Larson CrossRef 2009 Annual Member Meeting - Boston Page 1 1
  2. 2. System status   Query response time load 2002 2005 2009 10/13/2009 1.900 sec (heavy load) 1.2 (moderate load) 0.680 (light load) 2007 0.500 sec 2005 0.300 sec 2003 0.625 sec CrossRef 2009 Annual Member Meeting - Boston Page 2 2
  3. 3. System status   Deposit times (2009) June July August Sept October Less than 5 mn: 107888 (53 %) 141105 (83 %) 131661 (91 %) 83379 (57 %) 33546 (52 %) Less than 1 hr: 35189 (17 %) 22389 (13 %) 10753 (7 %) 33829 (23 %) 18165 (28 %) Less than 6 hr: 31666 (15 %) 3666 (2 %) 903 (0 %) 24201 (16 %) 8037 (12 %) Less than 12 hr: 23482 (11 %) 181 (0 %) 0 (0 %) 2411 (1 %) 1855 (2 %) Less than 18 hr: 4019 (1 %) 713 (0 %) 0 (0 %) 968 (0 %) 1950 (3 %) Less than 24 hr 0 (0 %) 3 (0 %) 0 (0 %) 0 (0 %) 0 (0 %) More than 24 hr: 0 (0 %) 1 (0 %) 1 (0 %) 1 (0 %) 0(0 %) Total deposits: 203001 168058 143318 144790 63555 CrossRef 2009 Annual Member Meeting - Boston Page 3 3
  4. 4. System status   Operations changes   Starting to use HAProxy for internal load balancing and redundancy   Using Alertra for external monitoring   VMWare virtual servers   Now migrating Oracle from 9 to 11g (allows active read-only standby)   Using Jira for all support@crossref.org activities   Berkeley DB based service for OpenURL DOI queries (metadata lookups)   Testing a process for <unstructured_citations>   Two technologies being used   refXpress from Inera which parses a reference and breaks it into parts   CitationQueryEngine, internally developed Lucene based search  Trial run Number of unstructured citations : 1,158,889 Number of DOIs processed : 3,150,525 Number of refXPress DOIs found : 47,165 Number of CQE DOIs found (score>2.2) : 139,721 CrossRef 2009 Annual Member Meeting - Boston Page 4 4
  5. 5. <citation key="10.1016/S0736-0266(02)00040-2-BIB21"> 1 <author>Valero-Cuevas</author> <cYear>2000</cYear> <unstructured_citation> Applying principles of robotics to understand the biomechanics, neuromuscular control and clinical rehabilitation of human digits. In: IEEE International Conference on Robotics and Automation, San Francisco, CA, 2000. </unstructured_citation> </citation>   CQE: score 3.159, refXpress: unparsed, XMLquery: nomatch CrossRef 2009 Annual Member Meeting - Boston Page 5 5
  6. 6. <citation key="BIB14"> 2 <volume_title>Macromolecules 1995</volume_title> <author>Butler</author> <unstructured_citation>; ; ; ; ; Macromolecules 1995, 28: 6383. </unstructured_citation> </citation>   CQE: score 1.89 , refXpress: parsed, XMLquery: nomatch 10.1021/ma00123a001 10.1021/ma00123a001 - CrossRef 2009 Annual Member Meeting - Boston Page 6 6
  7. 7. <citation key="BIB3"> <volume_title>Taurine 3: cellular and regulatory mechanisms</volume_title> 3 <author>Chen</author> <first_page>397</first_page> <cYear>1998a</cYear> <unstructured_citation> 1998a. Effect of taurine on human fetal neuron cells: proliferation and differentiation. In: editors. Taurine 3: cellular and regulatory mechanisms. New York: Kluwer Academic/Plenum Publishers. p 397-403. </unstructured_citation> </citation>   CQE: score 0.48 , refXpress: unparsed, XMLquery: nomatch 10.1007/s11626-009-9184-7 - - Springer has assigned DOIs to Taurine 4,6 and 7! CrossRef 2009 Annual Member Meeting - Boston Page 7 7
  8. 8. <citation key="10.1021/js950353+-BIB19"> 4 <volume_title>Pharm. Res.</volume_title> <author>Shukla</author> <volume>8</volume> <first_page>1396</first_page> <cYear>1991</cYear> <unstructured_citation>; Pharm. Res. 1991, 8, 1396-1400.</unstructured_citation> </citation>   CQE: score 0.99, refXpress: parsed, XMLquery: nomatch 10.1007/BF01067277 10.1023/A:1015801207091 <query key="MyKey4"> <query key="10.1021/js950353+-BIB19"> <issn>0724-8741</issn> <volume_title>Pharm. Res.</volume_title> <journal_title>Pharm. Res.</journal_title> <author>Shukla</author> <volume>8</volume> <volume>8</volume> <first_page>1396</first_page> <first_page>1396</first_page> <year>1991</year> <year>1991</year> </query> </query> CrossRef 2009 Annual Member Meeting - Boston Page 8 8
  9. 9. <citation key="b64_1025"> 5 <unstructured_citation> Xu C, Taoka S, Crofts AR, Govindjee (1991) Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids. Biochim Biophys Acta 1098: 32-40 </unstructured_citation> </citation>   CQE: score 2.39, refXpress: semi-parsed, XMLquery: nomatch 10.1016/0167-4838(91)90582-K 10.1016/0005-2728(91)90006-A <journal_title> <journal_title> Biochimica et Biophysica Acta (BBA) - Protein Structure and Biochimica et Biophysica Acta (BBA) - Bioenergetics Molecular Enzymology </journal_title> </journal_title> <contributors> <contributors> <contributor sequence="first" contributor_role="author"> <contributor sequence="first" contributor_role="author"> <given_name>C</given_name> <given_name>C</given_name> <surname>Xu <surname>Xu </surname> </surname> </contributor> </contributor> </contributors> </contributors> <volume>1098</volume> <volume>1098</volume> <issue>1</issue> <issue>1</issue> <first_page>32</first_page> <first_page>32</first_page> <last_page>40</last_page> <last_page>40</last_page> <year media_type="print">1991</year> <year media_type="print">1991</year> <publication_type>full_text</publication_type> <publication_type>full_text</publication_type> <article_title> <article_title> Kinetic characteristics of formate/formic acid binding at the Kinetic characteristics of formate/formic acid binding at the plastoquinone reductase site in spinach thylakoids plastoquinone reductase site in spinach thylakoids </article_title> </article_title> CrossRef 2009 Annual Member Meeting - Boston Page 9 9
  10. 10. <citation key="b53_366"> 6 <unstructured_citation> 53. O.S. Gudmundsson, S.D.S. Jois, D.G. Vander Velde, T.J. Siahaan, B. Wang, and R.T. Borchardt (1999 ) The effect of conformation on the membrane permeability of coumarinic acid- and phenylpropionic acid-based cyclic prodrugs of opioid peptides.J. Pept. Res.53 , 383 -392 . </unstructured_citation> </citation>   CQE: score 3.41, refXpress: semi-parsed, XMLquery: nomatch 10.1034/j.1399-3011.1999.00077.x 10.1034/j.1399-3011.1999.00076.x <doi type="journal_article"> <doi type="journal_article"> 10.1034/j.1399-3011.1999.00077.x</doi> 10.1034/j.1399-3011.1999.00076.x</doi> <issn type="print">1397-002X</issn> <issn type="print">1397-002X</issn> <issn type="electronic">1399-3011</issn> <issn type="electronic">1399-3011</issn> <journal_title>Journal of Peptide Research</journal_title> <journal_title>Journal of Peptide Research</journal_title> <contributors> <contributors> <contributor sequence="first" contributor_role="author"> <contributor sequence="first" contributor_role="author"> <given_name>O.S.</given_name> <given_name>O.S.</given_name> <surname>Gudmundsson</surname> <surname>Gudmundsson</surname> </contributor> </contributor> </contributors> </contributors> <volume>53</volume> <volume>53</volume> <issue>4</issue> <issue>4</issue> <first_page>403</first_page> <first_page>383</first_page> <last_page>413</last_page> <last_page>392</last_page> <year media_type="print">1999</year> <year media_type="print">1999</year> <publication_type>full_text</publication_type> <publication_type>full_text</publication_type> <article_title> <article_title> The effect of conformation of the acyloxyalkoxy-based cyclic The effect of conformation on the membrane permeation of prodrugs of opioid peptides on their membrane permeability coumarinic acid- and phenylpropionic acid-based cyclic </article_title> CrossRef 2009 Annual Member Meeting - Boston prodrugs of opioid peptides Page 10 </article_title> 10
  11. 11. Changes (problems)   Notable software error this past 12 months   URLsin Handle rewritten with an older value (effected some publishers who had deposited as-crawled URLs AND did URL mods via ownership transfer)   Medium-big changes   BookVolume-title/ author/ year rule: match on (only) Book title DOIs (sample2)   Added a false positive prevention rule IF a (XML) query contains article title and that title is not an exact match with the deposited title DO NOT MATCH except if author and first-page are an EXACT match   Small-medium changes   Matching special characters in author names   Matching compound surnames   Removed ability to avoid conflicts   DOI character limits: "a-z", "A-Z", "0-9" and "-._;()/“   Title lock-down (ISSN check disallowing a deposit) CrossRef 2009 Annual Member Meeting - Boston Page 11 11
  12. 12. Issues   Ongoing …..   Too many alternative (publication) titles. Be CAREFULL!!! Can really mess up title fuzzy matching (we do have a schematron monitor)   Deleting DOIs:   Change the publication title New   Change the DOI’s title (article title) to the DOI itself   Remove optional metadata   Set publication date to the deletion date   Conflicts   Conflicts reduce matching rates!   Timestamps DOIs are deposited with a timestamp to ensure the latest metadata gets inserted. Timestamps are essential when we have to re-process deposits. Problems occur when DOI ownership occurs (e.g. what is the timestamp?) Solution: Crossref will provide a means to retrieve current timestamp. CrossRef 2009 Annual Member Meeting - Boston Page 12 12
  13. 13. Conflicts =========================================== Created: 2006-04-04 04:10:03.0 ConfID: 263262 CauseID: 246648646 OtherID: 64341060, JT: Ophthalmic and Physiological Optics MD: Brown, 15 ,3,163,1995,Differences in visual acuity between the eyes: determination of normal limits in a clinical population DOI: 10.1046/j.1475-1313.1995.9590568m.x(85579-R 263262-null ) DOI: 10.1016/0275-5408(95)90568-M =========================================== enable-multiple-hits=“true" <query key="MyKey1" enable-multiple-hits="false"> <query key="MyKey1" status="multiresolved" fl_count="2"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <doi type="journal_article">10.1046/j. <author>Brown</author> 1475-1313.1995.9590568m.x</doi> <volume>15</volume> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <first_page>163</first_page> <journal_title>Ophthalmic and Physiological Optics</ <year>1995</year> journal_title> </query> <author>Brown</author> <volume>15</volume> <issue>3</issue> <first_page>163</first_page> Match Fails <year>1995</year> <publication_type>full_text</publication_type> </query> <query key="MyKey1" status="multiresolved" fl_count="0"> <doi type="journal_article">10.1016/0275-5408(95)90568-M</ <query key="MyKey1" status="unresolved" fl_count="0"> doi> <journal_title>Ophthalmic and Physiological Optics</journal_title> <issn type="print">02755408</issn> <author>Brown</author> <issn type="electronic">14751313</issn> <volume>15</volume> <journal_title>Ophthalmic and Physiological Optics</ <first_page>163</first_page> journal_title> <author>Brown</author> <year>1995</year> <volume>15</volume> </query> <issue>3</issue> <first_page>163</first_page> <year>1995</year> CrossRef 2009 Annual Member Meeting - Boston <publication_type>full_text</publication_type> Page 13 </query> 13
  14. 14. Conflicts CrossRef 2009 Annual Member Meeting - Boston Page 14 14
  15. 15. Conflicts   What to do? Wiley/Blackwell owns this journal Resolve_conflit.txt H:email=ckoscher@crossref.org;op=PRIMARY 10.1046/j.1475-1313.1995.9590568m.x Process log (email) <?xml version="1.0" encoding="UTF-8"?> <doi_batch_diagnostic> <submission_id>923604608</submission_id> <record_diagnostic doi="10.1046/j.1475-1313.1995.9590568m.x"> <conflict status="Success" ids="85579,263262"> <msg>Marked as alias</msg> <doi_list> <doi>10.1016/0275-5408(95)90568-M</doi> </doi_list> </conflict> </record_diagnostic> </doi_batch_diagnostic> CrossRef 2009 Annual Member Meeting - Boston Page 15 15
  16. 16. Conflicts <query key="MyKey1" enable-multiple-hits="false"> <journal_title>Ophthalmic and Physiological Optics</journal_title> <author>Brown</author> <volume>15</volume> <first_page>163</first_page> <year>1995</year> </query> Match Succeeds !!! <body> <query key="MyKey1" status="resolved" fl_count="2"> <doi type="journal_article">10.1046/j.1475-1313.1995.9590568m.x</doi> <issn type="print">02755408</issn> <issn type="electronic">14751313</issn> <journal_title match="exact">Ophthalmic and Physiological Optics</journal_title> <author match="exact">Brown</author> <volume match="exact">15</volume> <issue>3</issue> <first_page match="exact">163</first_page> <year match="exact">1995</year> <publication_type>full_text</publication_type> </query> </body> CrossRef 2009 Annual Member Meeting - Boston Page 16 16
  17. 17. Conflicts   What do YOU need to do 1.  Go to http://www.crossref.org/06members/59conflict.html 2.  Determine the nature of your conflicts 1.  If they only involves your own DOIs   Construct the necessary conflict resolution files and upload them using doi.crossref.org   Use the screens at doi.crossref.org Metadata Admin tab to fix them 2.  If they involve someone else’s DOIs   Construct the necessary conflict resolution files   Email them to support@crossref.org Audits will be coming next year and un-resolved conflicts may co$t you CrossRef 2009 Annual Member Meeting - Boston Page 17 17
  18. 18. Metadata Quality   Metadataquality is good enough for linking (besides conflict problems) … but it is not good enough for other purposes (display). No Article No Volume No Issue No Page No Author Title One Author No First Name Initial Only DOI Total 3,055,090 5,582,856 1,359,241 3,807,396 988,764 16,139,751 4,835,479 12,039,157 38,193,723   Schematron rules Contributor checks Alert if only single author is present Edition / Issue info (not reported but recorded) Check for 'edition' in <edition> Alert if only first initial is deposited Check for 'issue' in <issue> Check for numbers in given name / surname Check for 'no' or 'number' in volume/issue/edition Check for punctuation in given name / surname currently checks for: _/*@()[] Citation Checks Check for ndash in name All surname checks Check for Jr or Sr in surname All page range check Alert if all caps Year range check Alert if more than 3 spaces are present Alert if space in surname when no given name is present Article Title Alert if surname ends with jr,JR Check for single word title Alert if surname contains 'et. al. Alert if all caps Alert if surname/given name contains &amp; or &amp;# Alert if title name contains &amp; or &amp;# (malformed entity) (malformed entity) Alert if multiple ??? are present Other Alert for year beyond current year Page Ranges Alert if neither first page or author are present Alert for _ or - in first or last page Alert if more than 2 alternate titles CrossRef 2009 Annual Member Meeting - Boston Alert if first and last page are identical Alert if DOI contains character not in allowed Page 18 18
  19. 19. Metadata Quality   What else is bad quality?   224,000 DOIs with bad page number (really effects matching) <pages> <first_page>305???306</first_page> </pages>   DOI links that still work: 14,985 journals crawled in 2009 69.25% are confirmed good, 22.8% unconfirmed, 5% confirmed not good sum(dois) sum(checked) sum(confirmed) sum(semiconfirmed) sum(nonconfirmed) sum(bad) sum(login) 25,977,348 361,514 206,204 44,168 82,565 1,140 16,950 Western Journal of Medicine 10.1136/ewjm.172.6.364 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.2.84 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.43 http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.172.1.61-a http://www.pubmedcentral.nih.gov/ Western Journal of Medicine 10.1136/ewjm.174.2.103 http://www.pubmedcentral.nih.gov/ CrossRef 2009 Annual Member Meeting - Boston Page 19 19
  20. 20. Metadata Quality   Schematron reports are run once a week. From: <support@crossref.org> Date: October 3, 2009 12:42:27 PM EDT To: <jstark@crossref.org>, <pfeeney@crossref.org> Subject: Schematron Report for prefix(es) 10.1109 g.grenier@ieee.org The results of a weekly metadata quality check are listed below. The affected DOIs were deposited successfully but the metadata attached to the DOI may need some attention. http://www.crossref.org/schematron/data/st_20091003_5431.xml http://www.crossref.org/schematron/data/st_20091003_5347.xml http://www.crossref.org/schematron/data/st_20091003_5430.xml http://www.crossref.org/schematron/data/st_20091003_5348.xml http://www.crossref.org/schematron/data/st_20091003_5411.xml http://ftp.crossref.org/schematron/data/st_20091010_2004.xml http://ftp.crossref.org/schematron/data/st_20091010_3553.xml <person_name sequence="first" contributor_role="author"> http://ftp.crossref.org/schematron/data/st_20091003_5411.xml <given_name>AdriËnne M.</given_name> CrossRef 2009 Annual Member Meeting - Boston <surname>Mendrik $^*$</surname> http://ftp.crossref.org/schematron/data/st_20091003_59687.xml Page 20 </person_name> http://ftp.crossref.org/schematron/data/st_20091003_5837.xml 20
  21. 21. System rewrite   May 2008: Board endorses plan to address a significant rewrite/upgrade   June2008-Feb 2009: TWG subgroup (rewrite2) meets to define requirements and other project parameters   Oct: Scenario options documented and cost comparisons profiled, started negotiations with Atypon re: new contract.  Nov: Report presented to board and to rewrite2 group for direction and validation   Dec 08- May09: Negotiations with Atypon   Oct 12,09: New contract signed Core Needs •  That CrossRef should ultimately own the intellectual property in the software at the heart of its operations •  That CrossRef should not risk or jeopardize the reliability and throughput offered by the existing system •  That CrossRef should remain free to develop further applications for other purposes which need to interface to the reference-linking systems and/or its data •  Recognized that CrossRef is not likely to establish internal resources sufficient to manage independently the development and maintenance of this magnitude a system. CrossRef 2009 Annual Member Meeting - Boston Page 21 21
  22. 22. System rewrite 2009 2010 2011 System Existing System (EDS) EDS mods to use NQS New Query System (NQS) New Deposit System (NDS) Both query and deposit transactions Deposit transactions Query transactions   NQS will make use of the existing Oracle database (minimal mods to the schema)   EDS will communicate with NQS via JMI (Java Message Interface)   May use Spring framework, if not initially more likely later on (NDS)   NDS will include significant data model and process changes   Title management   Conflicts   Oracle schema cleanup   NQS/NDS combined will allow integration of currently stand-alone functions (OAI-PMH)   After NQS/NDS: possibly augment/replace back end database (satellite DBs) CrossRef 2009 Annual Member Meeting - Boston Page 22 22
  23. 23. Current organization System rewrite www.crossref.org /openurl /iPage /query/xref.cgi doi.crossref.org /* /* oai.crossref.org/OAIHandler HAProxy openurl, iPage openurl, iPage www System System SIGG,other SIGG,other (Apache) (resin) (resin) (Tomcat) (Tomcat) Deposit Processor (Java app) Oracle Oracle (CMD) Stored Query (prime) Processor BerkelyDB(2) Daily (Java app) Oracle Replication Constant (passive-stndby) Lucene (2) Replication CrossRef 2009 Annual Member Meeting - Boston Page 23 23
  24. 24. System rewrite New organization www.crossref.org/openurl doi.crossref.org/* oai.crossref.org/OAIHandler /iPage /query/xref.cgi /* Hisham? HAProxy HAProxy (standby) NQS www EDS (Spring) (Tomcat) (Apache) (resin) Deposit Processor Metadata (Java app) Query Dispatch Access Metadata Citation Lookup Direct JDBC Persistent Data Access Lucene BerkelyDB Oracle Oracle Oracle (CMD) (active-stndby) Constant (prime) Replication CrossRef 2009 Annual Member Meeting - Boston Page 24 24
  25. 25. New initiatives, technical perspectives. … Geoff CrossRef 2009 Annual Member Meeting - Boston Page 25 25
  26. 26. CrossRef 2009 Annual Member Meeting - Boston Page 26 26

×