Introducing
CrossRef
Prospect
Cambridge, MA
2013
Geoffrey Bilder
Director of Strategic Initiatives
Taking the tedium out of TDM….

Geoffrey Bilder
Director of Strategic Initiatives
Text & Data
Mining
Gold
Diamond

Text & Data

?
The Problem
Institute * Channel View Publications, Ltd * Chartered Institution Of
Building Service Engineers * Chattagram Maa-O-Shishu...
• All parties would benefit from support of standard APIs
and data representations in order to enable TDM across
both open...
Common API
DOI
Content
Negotiation
http://dx.doi.org/10.5555-12345678
(Accept: text/html)
http://dx.doi.org/10.5555-12345678
(Accept: application/bibjson+json)
New Metadata
Full Text Link
License Information
Rate Limiting
(Optional)
Prospect HTTP
Headers
CR-Prospect-Rate-Limit: 1500!

(the rate limit ceiling per window on Prospect
requests)
!

CR-Prospe...
Common API Summary
•

Content Negotiation (Required)

•

New Metadata (Required)
•
•

•

Full text URIs
License URIs

Rate...
Stop here if
•

You are an open access publisher

•

You include TDM as a part of your
subscription license/T&Cs.
Click-Through
License
Service
(Optional)
Research queries DOI using CN + API token
Publisher verifies API token with Prospect
(frequency at publisher discretion)

I...
Research queries DOI
using CN + API token
curl -H "Accept: text/turtle" "http://
dx.doi.org/10.5555/515151" -D - -L !
Link: <http://data.crossref.org/full-text/10.5555/515151>;
rel="http://id.crossref.org/schema/full-text";
anchor="http://a...
Publisher verifies API
token with Prospect
curl -H "CR-Prospect-Publisher-Token:
MdvA59fGn8ukykYlSxJL6g" "https://
prospect...
{	

"result": "ok",	

"message": "licenses",	

"orcid": "0000-0002-1825-0097",	

"given_names": "Josiah",	

"family_name":...
Sustainability
Model
•

New initiatives are always optional to our members. Members who do not
participate in our new initiatives will not be c...
Current State
Prospect Working Group
•

AAAS: Walter Jones, Stewart Wills, Deborah Rivera-Wienhold

•

American Institute of Physics: Ev...
CrossRef
• DOI Content Negotiation	

• CrossRef support for recording links to full text 	

• CrossRef metadata Search for...
We are using CrossRef's Prospect text mining API in the context
of the Hiberlink project, which investigates reference rot...
I think this is a big step in the right direction and makes
retrieving full text file a lot easier, I hope that publishers...
What do I
need to do?
Publishers (required)

•

Register full-text URLs with CrossRef

•

Register <lic_ref> well-known license URIs with
CrossR...
Publishers (optional)

•

Register click-through proprietary licenses with
Prospect click-through service

•

Adapt platfo...
Researchers
•

Register with Prospect and accept/decline licenses

•

Modify TDM tools to look for <lic_ref> elements

•

...
kmeddings@crossref.org

gbilder@crossref.org
Thank You
gbilder@crossref.org
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
Upcoming SlideShare
Loading in...5
×

2013 CrossRef Workshops Text Data Mining Geoffrey Bilder

473

Published on

2013 CrossRef Workshops presentation on Text and Data Mining by Geoffrey Bilder.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
473
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2013 CrossRef Workshops Text Data Mining Geoffrey Bilder

  1. 1. Introducing CrossRef Prospect Cambridge, MA 2013 Geoffrey Bilder Director of Strategic Initiatives
  2. 2. Taking the tedium out of TDM…. Geoffrey Bilder Director of Strategic Initiatives
  3. 3. Text & Data Mining
  4. 4. Gold Diamond Text & Data ?
  5. 5. The Problem
  6. 6. Institute * Channel View Publications, Ltd * Chartered Institution Of Building Service Engineers * Chattagram Maa-O-Shishu Hospital Medical College * Chelonian Conservation And Biology Journal * Chelonian Research Foundation * Chem-Bio Informatics Society * Chemical Engineering Diponegoro University * Chemical Science Transactions * Chemical Society Of Japan * Chiang Mai University * Children, Youth And Environments Center * Chimera Innova Group * China Agricultural University * China Communications Magazine, Co., Ltd. * China Journal Of Chinese Materia Medica * China Petroleum Industry Press * China Science Publishing & Media Ltd. * Chinese Astronomical Society * Chinese Birds * Chinese Birds (Press) * Chinese Civilisation Centre * Chinese Geoscience Union * Chinese Institute Of Automation Engineers (Ciae) * Chinese Journal Of Mechanical Engineering * Chinese Mathematical Society * Chinese Physical Society * Chinese Physiological Society * Chinese Society Of Theoretical And Applied Mechanics * Chonnam National University Medical School (Kamje) * Christ University Bangalore *
  7. 7. • All parties would benefit from support of standard APIs and data representations in order to enable TDM across both open access and subscription-based publishers. • Subscription-based publishers find it impractical to negotiate multiple bilateral agreements with thousands of researchers and institutions in order to authorize TDM of subscribed content. • Researchers find it impractical to negotiate multiple bilateral agreements with hundreds of subscriptionbased publishers in order to authorize TDM of subscribed content.
  8. 8. Common API
  9. 9. DOI Content Negotiation
  10. 10. http://dx.doi.org/10.5555-12345678 (Accept: text/html)
  11. 11. http://dx.doi.org/10.5555-12345678 (Accept: application/bibjson+json)
  12. 12. New Metadata
  13. 13. Full Text Link
  14. 14. License Information
  15. 15. Rate Limiting (Optional)
  16. 16. Prospect HTTP Headers CR-Prospect-Rate-Limit: 1500! (the rate limit ceiling per window on Prospect requests) ! CR-Prospect-Rate-Limit-Remaining: 1387! (number of requests left for the current window) ! CR-Prospect-Rate-Limit-Reset: 1378072800! (the remaining time in UTC epoch seconds before the rate limit resets and a new window is started) *this is a technique used by many APIs, including Twitter’s
  17. 17. Common API Summary • Content Negotiation (Required) • New Metadata (Required) • • • Full text URIs License URIs Rate Limiting Headers (optional)
  18. 18. Stop here if • You are an open access publisher • You include TDM as a part of your subscription license/T&Cs.
  19. 19. Click-Through License Service (Optional)
  20. 20. Research queries DOI using CN + API token Publisher verifies API token with Prospect (frequency at publisher discretion) If token verified AND access control allows, publisher returns full text
  21. 21. Research queries DOI using CN + API token curl -H "Accept: text/turtle" "http:// dx.doi.org/10.5555/515151" -D - -L !
  22. 22. Link: <http://data.crossref.org/full-text/10.5555/515151>; rel="http://id.crossref.org/schema/full-text"; anchor="http://annalsofpsychoceramics.labs.crossref.org/ fulltext/515151/515151.pdf"
  23. 23. Publisher verifies API token with Prospect curl -H "CR-Prospect-Publisher-Token: MdvA59fGn8ukykYlSxJL6g" "https:// prospect.crossref.org/licenses/ hZqJDbcbKSSRgRG_PJxSBA" -D - -L!
  24. 24. { "result": "ok", "message": "licenses", "orcid": "0000-0002-1825-0097", "given_names": "Josiah", "family_name": "Carberry", "licenses": [ { "uri": "http://www.crossref.org/tdm_license", "status": "rejected", "reviewed_at": "2013-05-28T17:09:36+00:00" }, { "uri": "http://www.oxygenxml.com/", "status": "read", "reviewed_at": "2013-05-29T12:08:59+00:00" } ] }
  25. 25. Sustainability Model
  26. 26. • New initiatives are always optional to our members. Members who do not participate in our new initiatives will not be charged for them. • We do not charge end-users (e.g. researchers, librarians) for access to metadata and APIs • We sometimes charge intermediaries for access to our services (to cover the cost of administration, maintaining SLAs, etc.) • We do not charge our members for depositing extra metadata into our services • We sometimes charge our members for the cost of administering our services, maintaining SLAs, development, etc. • We eschew charging mechanisms that involve complex administrative overhead. The cost of developing and running them generally negates the revenue raised by implementing them. • We try to tie any charges as directly as possible to where costs are incurred.
  27. 27. Current State
  28. 28. Prospect Working Group • AAAS: Walter Jones, Stewart Wills, Deborah Rivera-Wienhold • American Institute of Physics: Evan Owens, • American Physical Society: Mark Doyle • Elsevier: Chris Shillum, Ale de Vries • HighWire: John Sack, Craig Jurney • Institute of Physics Publishing: Graham McCann, James Walker • Springer: Chinchu Ann Belarmin, Michiel van der Heyden • Taylor & Francis: Gillian Howcroft • Walter de Gruyter: Bettina de Keijzer • Wiley: Edward Wates, Alan Bacon • CrossRef: Geoffrey Bilder, Chuck Koscher, Ed Pentz, Carol Meyer, Kirsty Meddings.
  29. 29. CrossRef • DOI Content Negotiation • CrossRef support for recording links to full text • CrossRef metadata Search for Discovery • CrossRef metadata support for license URIs • Click-through TDM license registry • Prospect publisher API for verifying, managing Exists Exists Exists Exists Exists Exists tokens •Sample publisher code •Sample researcher code Exists Exists ✻ being extended to support mime-types
  30. 30. We are using CrossRef's Prospect text mining API in the context of the Hiberlink project, which investigates reference rot in scholarly papers at a very large scale. The API is really straightforward and based on common technical approaches; it can easily be integrated in a broader workflow. In our case, we have a work bench that monitors newly published papers, obtains their XML version via the API, extracts all HTTP URIs, and then crawls and archives the referenced content. Currently, we can only access Elsevier papers via the API but as more publishers join Prospect, it will become a powerful, uniform onestop-shop for text mining scholarly literature. --Martin Klein and Herbert Van de Sompel, Los Alamos National Laboratory
  31. 31. I think this is a big step in the right direction and makes retrieving full text file a lot easier, I hope that publishers support it. --Maximilian Haeussler, UCSD
  32. 32. What do I need to do?
  33. 33. Publishers (required) • Register full-text URLs with CrossRef • Register <lic_ref> well-known license URIs with CrossRef
  34. 34. Publishers (optional) • Register click-through proprietary licenses with Prospect click-through service • Adapt platform APIs to handle Prospect API tokens
  35. 35. Researchers • Register with Prospect and accept/decline licenses • Modify TDM tools to look for <lic_ref> elements • Modify TDM tools to make use of Prospect API token
  36. 36. kmeddings@crossref.org gbilder@crossref.org
  37. 37. Thank You gbilder@crossref.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×