System Update
AMM Workshop 2013

Chuck Koscher
Director of Technology
ckoscher@crossref.org
Overview
 2013 :

many growing pains

 Growth

in data
 Growth in usage
 Growth in complexity
 2013:

a lot of time spent fixing problems

 Query

performance, working hard to maintain acceptable levels

 Query

effectiveness, keeping matching rates up

 Deposit

throughput, dealing with spikes in re-deposits

 Deposit

processing with <citations>, can cause a lot of message traffic
Overview
 2013 :

improvements and some new services

 FundRef
 Schema



changes
 Allow MathML in article titles
 Allow JATS abstracts in deposits
 Support non-CrossRef DOIs as components
 Support text-data-mining (TDM)
Stand alone deposits (easier than sending in all metadata again)
 FundRef
 CrossMark

 Can

query on ORCIDs
Overview
 new

service: which RA tool

doi.crossref.org/ra/10.5284/1000389
[ { "DOI": "10.5284/1000389", "RA": "Data Cite" }]
Roll over from 2013 into 2014
 Tweak

the query logic to improve precision

•Return a DOI even if there are conflicts: Publishers often (mistakenly) deposit
a second DOI for something they’ve already assigned a DOI to (normally creates
a conflict). When a query finds two or more DOIs from the same publisher for a
given item, we could return one (the most recent).
 Reliability

and scaling
Deposit System

Deposit System

Query System

Query System
Data Management

Oracle

MySQL

Berkely

Lucene
Oracle

Data Management

Current

MySQL
Other

Goal

Berkely

Lucene
Some 2013 fun facts
 504 internal

tickets created, 154 are still open
 459 tickets closed so far in 2013 (some created in late 2012)
 ~ 400,000 lines of code
 768,115,361 metadata queries so far in 2013
 348,386,170 matched
 207,109,812 forward link queries
 3,578,469 new CY DOIs
 2,320,151 new BY DOIs
 17,735,351 updated DOIs
 1,084,529,650 RAW DOI ‘clicks’ (Dec12 thru Oct13)
FundRef
FundRef
doi.crossref.org/fundrefSearch
FundRef
2014
 Re-design

conflict processing.

 Current

process requires too much labor following up and fixing
 Conflicts should only be created inter-member
 A given publisher will be allowed to create multiple DOIs, the system will
clean up
 Title locks should prevent nearly all journal-to-journal conflicts
 Auto cleanup the existing backlog


Consider alternatives to OAI-PMH for bulk data distribution



Accept full JATS file and/or PDF for deposits.
Questions

2013 CrossRef Workshops System Update Chuck Koscher

  • 1.
    System Update AMM Workshop2013 Chuck Koscher Director of Technology ckoscher@crossref.org
  • 3.
    Overview  2013 : manygrowing pains  Growth in data  Growth in usage  Growth in complexity  2013: a lot of time spent fixing problems  Query performance, working hard to maintain acceptable levels  Query effectiveness, keeping matching rates up  Deposit throughput, dealing with spikes in re-deposits  Deposit processing with <citations>, can cause a lot of message traffic
  • 4.
    Overview  2013 : improvementsand some new services  FundRef  Schema  changes  Allow MathML in article titles  Allow JATS abstracts in deposits  Support non-CrossRef DOIs as components  Support text-data-mining (TDM) Stand alone deposits (easier than sending in all metadata again)  FundRef  CrossMark  Can query on ORCIDs
  • 5.
    Overview  new service: whichRA tool doi.crossref.org/ra/10.5284/1000389 [ { "DOI": "10.5284/1000389", "RA": "Data Cite" }]
  • 6.
    Roll over from2013 into 2014  Tweak the query logic to improve precision •Return a DOI even if there are conflicts: Publishers often (mistakenly) deposit a second DOI for something they’ve already assigned a DOI to (normally creates a conflict). When a query finds two or more DOIs from the same publisher for a given item, we could return one (the most recent).  Reliability and scaling Deposit System Deposit System Query System Query System Data Management Oracle MySQL Berkely Lucene Oracle Data Management Current MySQL Other Goal Berkely Lucene
  • 7.
    Some 2013 funfacts  504 internal tickets created, 154 are still open  459 tickets closed so far in 2013 (some created in late 2012)  ~ 400,000 lines of code  768,115,361 metadata queries so far in 2013  348,386,170 matched  207,109,812 forward link queries  3,578,469 new CY DOIs  2,320,151 new BY DOIs  17,735,351 updated DOIs  1,084,529,650 RAW DOI ‘clicks’ (Dec12 thru Oct13)
  • 8.
  • 9.
  • 10.
  • 11.
    2014  Re-design conflict processing. Current process requires too much labor following up and fixing  Conflicts should only be created inter-member  A given publisher will be allowed to create multiple DOIs, the system will clean up  Title locks should prevent nearly all journal-to-journal conflicts  Auto cleanup the existing backlog  Consider alternatives to OAI-PMH for bulk data distribution  Accept full JATS file and/or PDF for deposits.
  • 12.