15. CrossRef
Association of scholarly publishers
15 years old this year
70,416,598 DOIs
not only links
I CrossCheck plagiarism detection
I CrossMark retraction notices
I an API
I metadata
F titles
F tables of contents
F authors
F ISSN
F datasets
F funding information
F license information
F full-text links
Joe Wass (CrossRef) 14 / 30
16. What's this got to do with TDM?
It's all about the links (and metadata).
Work
ow for Text and Data Mining
1 Identify corpus
2 Somehow get hold of corpus
1 Figure out the license for each document
2 Figure out where to get the document
3 Download it
3 Clever algorithms
1 That's your problem
Repeat for very large numbers of documents.
Joe Wass (CrossRef) 15 / 30
17. CrossRef Metadata
DOIs + license information + full-text URLs = corpus
cross-publisher API
cross-publisher data schema
Joe Wass (CrossRef) 16 / 30
30. More metadata
> 1,100,000 articles and counting
11 million more coming soon
more publishers in the pipeline
I American Institute of Physics (AIP)
I American Physical Society (APS)
I Elsevier
I HighWire Press
I Institute of Physics (IoPP)
I Springer
I Taylor & Francis
I Walter de Gruyter
I Wiley
120,000 Creative Commons articles
Joe Wass (CrossRef) 29 / 30
31. Text and Data Mining with CrossRef
Joe Wass
www.crossref.org
jwass@crossref.org
@joewass
British Library, November 2014
http://www.crossref.org
http://tdmsupport.crossref.org
http://api.crossref.org
https://github.com/CrossRef/rest-api-doc/blob/master/rest_api_tour.md
Joe Wass (CrossRef) 30 / 30