ORCID Query API Phase 1

  • 839 views
Uploaded on

This documents describes the application programmer interface (API) for querying and …

This documents describes the application programmer interface (API) for querying and
searching the ORCID system. This API will allow third parties to integrate ORCID profiles into
their submission and evaluation systems.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
839
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
12
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ORCID Phase 1 Query APIAuthors: Michael Taylor (mi.taylor@elsevier.com), Geoffrey Bilder (gbilder@crossref.org)V1: 4-JUL-2011V2: 4-AUG-2011 (corrected mime types)V3: 11-AUG-2011 (updated following API meeting August 9th)V4: 26-AUG-2011 (some further updates based on August 9th meeting and emails)V5: 20-SEP-2011 (correcting curl url strings to remove reference from content in header)V6: 28-SEP-2011 (made public)V7: 18-OCT-2011 (changed “Authorization: OAuth” to “Authorization: Bearer” as per spec.V8: 26-OCT-2011 (added JSON access token details)V9: 2-NOV-2011 (updated URLs, and changed accept header for search API)V10: 17-JAN-2012 (Removed option to authenticate using IP and API key)V11: 15-FEB-2012 (Removed x- from mime-types)V12: 19-APR-2012 (Added search API details)v13: 24-APR-2012 (Added email address as search field)v14: 21-AUG-2012 (Updated to match schema version 1.0.3)v15: 29-Aug-2012 (Updated to update terminology inconsistencies with other documents)IntroductionThis documents describes the application programmer interface (API) for querying andsearching the ORCID system. This API will allow third parties to integrate ORCID profiles intotheir submission and evaluation systems.Separate documents will describe ORCID APIs for enabling third parties to batch depositrecords into the system as well as the API which third parties will need to support in order toallow ORCID users to import profile data from their systems.Note: where this document refers to researchers, paper authors or contributors, it does sointerchangeably. Our intention is to refer to those individuals who have made a scholarlycontribution.Warnings, Caveats and Weasel WordsThis is preliminary documentation and has been produced as the first draft of a specification fora system that does not yet exist. In short, there is no ORCID API yet, so please don’t attempt toexperiment with the system using this documentation, you will just be disappointed.Having said that, our goal is to get this specification robust enough so that we can get adevelopment team to create a working sandbox version of the API (aka “Mock API”) as soon 1
  • 2. as possible. This will allow third parties to start their integration efforts in parallel with the rest ofORCID’s phase 1 development. The proposed sandbox functionality is explained in appendix A.TerminologyPrivate dataProfile data that the user has chosen to restrict access to. This data may be used in hashed,anonymized form for internal disambiguation purposes by the ORCID system, but it will not bemade available to third parties or the public.Protected dataProfile data that the user chooses to share with selected third parties (e.g. specific fundingagencies, publishers, etc.) but which is not made available publicly.Public dataProfile data that the user chooses to make available to the public. This data will be madeavailable under a CC-zero waiver via APIs and via periodic data dumps.Public APIThe API made available to the general public and which can be used without any sort ofauthentication. This API will only return data marked by users as “public” and will come with noservice level agreement (SLA). The API may be throttled at the IP / transaction level in order todiscourage inadvertent overloading and/or deliberate abuse of the system.Member APIThe API made available to third parties requiring production-level integration of the ORCIDservice. The API will come with defined service level agreements and will allow authenticatedthird parties to retrieve “protected” profile data from those users who have authorized them to doso.The Member API will be architecturally isolated from the Public API in order to allow separatescaling to meet SLA requirements.Bio RecordThe default result when querying the ORCID APIs will be to return “bio records”. Bio recordsonly include biographical information relating to the ORCIDs (subject to the user’s privacysettings) and do not include activities (publications, patents, grants). This default behavior isdesigned to minimize the size of returned records for default queries and will work the same inboth the Public and Member APIs.WorksThe list of “works” for this ORCID: published articles, books or other documents for which thisperson was recognized as a contributor. Privacy settings can apply to these relationships aswell, so the response list can depend on the entity making the query. 2
  • 3. Full RecordThis is provided for convenience so that a single request includes *both* the biographicalinformation and activities information (subject to the user’s privacy settings) related to theORCIDs queried.HTML ● Web pages will be semantically marked-up using schema.org microdata. ● They will be freely accessible webpages.OverviewThe ORCID query APIs will enable third parties to search the system and obtain data through asimple RESTFUL1 interface. The interface will support the following query types: Name Key Returned Description Bio ORCID Profile metadata Given a contributor, give me name and affiliation data. Works ORCID List of work metadata Given a contributor, tell me what works they have contributed to. Full ORCID Profile metadata, activities Given an contributor, tell me what metadata and ORCIDs activities they have contributed to, name and affiliation data. Work Work identifiers ORCIDs & associated Given a work, tell me what (e.g. DOIs) metadata contributors are responsible for it. Search ORCID, Work ORCIDs & associated Given whatever metadata I have, identifiers, or metadata give me a ranked list of potential profile metadata contributors identified by that metadata.The Public API will require no authentication on the part of those querying it, while the MemberAPI will require third parties to authenticate using the OAuth2 open standard for authenticationbetween computer systems.Using the Public ORCID Query APIThe ORCID APIs will support returning ORCID records in various representations, including, tostart with, HTML, XML and JSON. The preferred representation of the record can be specifiedusing content negotiation. Examples of the XML representation of both bio and full responses1 http://en.wikipedia.org/wiki/Representational_State_Transfer 3
  • 4. can be found at:http://orcid.github.com/ORCID-Parent/schemas/orcid-message/1.0.3/Users of the API will be able to specify the particular version of a representation that they areexpecting. This will enable ORCID to change the representations (e.g. by adding, moving orremoving elements, etc.) without breaking third party applications which are expecting particularrepresentations. Note carefully that the “version” refers to the version of the representation ofthe record returned - not the version of the record itself.The following examples illustrate querying the Public ORCID Query APIs using the commandline tool, curl2. Note that all of the queries below will only return data that users have marked asbeing “public.”Note inclusion of examples that specify type request in URL alone.Example 0: Request the HTML representation of an ORCID profilecurl -H "Accept: text/html" "http://api.orcid.org/{orcid}/orcid-bio" -D - -LExample 1: Request an XML representation of the “bio record” for anORCIDcurl -H "Accept: application/orcid+xml" "http://api.orcid.org/{orcid}/orcid-bio" -D - -LExample 2: Request the JSON representation of the “bio record” foran ORCIDcurl -H "Accept: application/orcid+json" "http://api.orcid.org/{orcid}/orcid-bio" -D - -LExample 3: Request XML representation version 1.1 of the “biorecord” for an ORCIDcurl -H "Accept: application/orcid+xml ; version=1.1" "http://api.orcid.org/{orcid}/orcid-bio" -D - -LExample 4: Asking XML representation 1.1 of the “full record” for anORCIDcurl -H "Accept: application/orcid+xml ; version 1.1" "http://api.orcid.org/{orcid}/orcid-profile" -D - -L2 http://curl.haxx.se/ 4
  • 5. Example 5: Request an XML representation of the “works” for anORCIDcurl -H "Accept: application/orcid+xml" "http://api.orcid.org/{orcid}/orcid-works" -D - -LSearching using the ORCID Public API.The ORCID API will support searching a subset of ORCID metadata using the popular SOLRquery syntax.Query syntaxThe ORCID search API will be based on SOLR, and will support all query syntaxes available inSOLR 3.6, including the following. ● Lucene with SOLR extensions ● DisMax ● Extended DisMaxThe default syntax will be Lucene with SOLR extensions.Search URLThe base URL for searching will be as follows. SOLR query parameters will be appended to thisURL.http://api.orcid.org/search/orcid-bio/Search results formatThe results of the search will be a list of “bio records” in the same format returned by the RESTAPI call for a single record, described above. Each bio in the list will have a relevancy score, asdetermined by SOLR.The search API will use content negotiation to determine whether to return XML or JSON, in thesame way as the REST API calls for a single record.Search fields SOLR field name XPath for corresponding Description profile data orcid //orcid-profile/orcid The ORCID identifier for the researcher or contributor. 5
  • 6. given-names //orcid-profile/orcid-bio/personal- The given names of the details/given-names researcher of contributorfamily-name //orcid-profile/orcid-bio/personal- The family name of the details/family-name researcher of contributorpast-institution- //orcid-profile/orcid-bio/ The name of any pastaffiliation-name affiliations/affiliation[affiliation- institution in the researcher or type="past-institution"]/affiliation- contributor’s profile namecurrent-primary- //orcid-profile/orcid-bio/ The name of the primaryinstitution-affiliation- affiliations/affiliation[affiliation- institution of the researcher orname type="current-primary-institution"] contributor /affiliation-namecurrent-institution- //orcid-profile/orcid-bio/ The name of non-primaryaffiliation-name affiliations/affiliation[affiliation- institutions of the researcher type="current-institution"]/ or contributor affiliation-namecredit-name //orcid-profile/orcid-bio/personal- The name that normally details/credit-name appears on publications by the researcher or contributorother-names //orcid-profile/orcid-bio/personal- Alternative names that details/other-names may have appeared on publications by the researcher or contributoremail //orcid-profile/orcid-bio/contact- The email address of the details/email researcher or contributordigital-object-ids //orcid-profile/orcid-activities/ The DOI of any work in the orcid-works/orcid-work/work- researcher or contributor’s external-identifiers/work-external- profile identifier[work-external-identifier- type="doi"]/work-external- identifier-idwork-titles //orcid-profile/orcid-activities/ The titles of any work in the orcid-works/orcid-work/work-title/ researcher or contributor’s (title|subtitle) profilegrant-numbers //orcid-profile/orcid-activities/ The grant number of any orcid-grants/orcid-grant/grant- grant associated with the 6
  • 7. number researcher or contributorpatent-numbers //orcid-profile/orcid-activities/ The patent numbers of any orcid-patents/orcid-patent/patent- patent associated with the number researcher or contributorkeywords //orcid-profile/orcid-bio/keywords/ Any keywords associated with keyword the researcher or contributortext All the above data are combined All the above fields. This into this field is also the default field for Lucene syntax queries.Example queriesName Example 1Description Search family names of all ORCID records for the name ‘Carberry’Syntax LucenePaging First 10 rows onlyURL http://api.orcid.org/search/orcid-bio/?q=family- name:Carberry&start=0&rows=10Name Example 2Description Search all searchable fields of all ORCID records for the word ‘Carberry’Syntax LucenePaging First 10 rows onlyURL http://api.orcid.org/search/orcid-bio/?q=text:Carberry&start=0&rows=10Name Example 3Description Search family names of all ORCID records for the name ‘Carberry’ and the keyword ‘Physics’. Only records containing both the family name and 7
  • 8. the keyword will be returned.Syntax LucenePaging First 10 rows onlyURL http://api.orcid.org/search/orcid-bio/?q=family- name:Carberry%20AND%20keyword:Physics&start=0&rows=10Name Example 4Description Search given names and family names of all ORCID records for ‘Raymond’ but boost the family name. Records with given names containing ‘Raymond’ and family name containing ‘Raymond’ will be returned, but those with family name will appear at the top of the list and will have a higher relevancy score.Syntax Extended DisMaxPaging First 10 rows onlyURL http://api.orcid.org/search/orcid-bio/? defType=edismax&q=Raymond&qf=given-names^1.0%20family- name^2.0&start=0&rows=10Name Example 5Description Search given names and family names of all ORCID records for ‘Raymond’ but boost the family name. Records with given names containing ‘Raymond’ and family name containing ‘Raymond’ will be returned, but those with family name will appear at the top of the list and will have a higher relevancy score. The two records with ORCID ID 281877-5816-0747-5659 and 6181-9093- 3346-6284 will be excluded from the results.Syntax Extended DisMaxPaging First 10 rows onlyURL http://api.orcid.org/search/orcid-bio/?defType=edismax&q=Raymond%20- orcid:%281877-5816-0747-5659%206181-9093-3346- 8
  • 9. 6284%29&qf=given-names^1.0%20family-name^2.0&start=0&rows=10The search API will support opensearch in version 1.1 of the API release.Using the Member ORCID Query APIThe Member ORCID Query API will allow authenticated third parties to retrieve “protected” datafrom the profiles of researchers who have explicitly agreed to share their data. In order to usethe API to query for protected data, the third party will first have authenticate using the OAuthprotocol.OAuthOAuth is an open standard for authorization between computer systems. Technology built usingOAuth allows users to share their private resources stored on one system with another onewithout having to use credentials, (username, password, etc). Once a relationship betweentwo systems has been approved, that relationship is remembered via a process of retainingexchanged secure tokens. Either side may revoke the relationship in the future.For example, when installing a Facebook or Twitter application on a smartphone, you’ll oftengo through a simple mechanism of using your username and password to establish thatrelationship - and after this, in your Facebook or Twitter application pages, you’ll see thatrelationship between phone and web service listed, detailed and revokable. The process feelsjust like a simple username / password login to the user, but the underlying technology protocolis far more complex and secure.ORCID will utilize OAuth 2. At the time of writing the current specification version is v2-22, andcan be found at: http://tools.ietf.org/html/draft-ietf-oauth-v2-22 9
  • 10. Twitter settings page. The user has granted access to three applications to share data withTwitter via OAuth. Permission may be revoked either at Twitter or on the applications.It is proposed that Orcid uses Oauth to establish relationships between itself and partners whohave the authority to access and use the Member API data interface. Once the relationshipis made, it is permanent until such time it is revoked (for example at the end of a subscriptionperiod, contract, terms and conditions infringements, etc).OAuth is used by Google, Microsoft, Facebook, Yahoo and Twitter to control to their APIs.An overview of the entire Oauth work flow can be seen in appendix B.Obtaining An Oauth Consumer (API) KeyIn order for a third party to query the Member API ORCID Query API, they will first needto obtain a Consumer Key from the ORCID service. The ORCID system will provide a webinterface which will allow authorized third parties to generate Consumer Keys for theirapplications.For Example, the Society of Psychoceramics, who wants to to be able to integrate ORCID intotheir manuscript submission process, would go to a form on the ORCID site and fill in relevantinformation. 10
  • 11. Upon submitting this information, the developers at the Society for Psychoceramics wouldbe returned to a page listing all the relevant keys and API end-points needed in order toauthenticate their users against the ORCID system. 11
  • 12. Once the developer has generated the needed application Consumer Keys, they can start tointegrate ORCID into their own systems.In summary, the flow for obtaining a Consumer Key looks like this: 12
  • 13. Retrieving Protected DataThe Member API allows third parties to query the ORCID API and retrieve protected data fromthe profiles of those ORCID users that have authorized them to do so. This means that thirdparties will need to support a process that allows ORCID users to explicitly authorize them toaccess protected profile data. Once an ORCID user has authorized a particular third party tohave access to their protected data, then the third party can query said data without having togo through the authorization process again. That is- unless the ORCID user (or ORCID itself)revokes the third parties authorization.The process of authorizing a third party to access protected data involves a simple workflow.For example. Josiah Carberry submits his first manuscript to the Journal of Psychoceramics.The manuscript submission system can offer to expedite the submission process by importinghis ORCID profile information.The manuscript tracking system will then redirect Josiah Carberry to the ORCID site where, if hehasn’t already, he is prompted to authenticate and log-in. 13
  • 14. Once Josiah has authenticated with ORCID, the ORCID system will ask him if he wants toshare his protected profile information with the Journal of Psychoceramics Manuscript Trackingsystem. At this point, the Journal of Psychoceramics Manuscript Tracking System will beauthorized to generate an “access token”. When requesting this, it will also return the user’sORCID, thus allowing the system to verify and, or omit the necessity for the user to supply theirORCID.The JSON response for the access token request will be similar to:{ "access_token": "5a7a4062-3d26-4b10-aa6d-3d48458535c5", "expires_in": 43199, "orcid": "4444-4444-4444-4", "refresh_token": "007e7701-769b-461d-bac4-ed8133003e49", "scope": "read", "token_type": "bearer"}The access token will allow access to the protected data within the user’s ORCID profile. 14
  • 15. A summary of the above workflow is: 15
  • 16. Once Josiah grants permission for the Journal of Psychoceramics manuscript tracking systemto query his protected profile, then the manuscipt tracking system will be able to query Josiah’sprofile as often as it likes, without re-authenticating, until either Josiah or ORCID revokes the 16
  • 17. Journal’s permissions. So, once Josiah has authorized the Journal- then the work-flow simplifiesto this:This latter transaction would be accomplished using curl as follows:curl curl -H "Accept: application/orcid+xml" -H "Authorization: Bearer{YOUR_ACCESS_TOKEN}" “https://api.orcid.org/{orcid}" -D - -LAll such queries will honor the researcher’s privacy settings. That is, profile data thatthe researcher marks as “private” will not be shared with anybody. Profile data markedas “protected” will only be shared with third parties to whom which the researcher has explicitlygranted access. Profile data marked as “public” will be shared with everybody.Error codesThe following error codes will be implemented in the ORCID API and in the sandbox. ● 400 - Bad request (invalid search syntax) ● 401 - Unauthorized (the application is not authorized to access the resource) ● 406 - Not acceptable (bad accept header, version, etc) ● 413 Request Entity Too Large (search was overly broad and needs to be restricted, partial results may be returned) ● 416 Requested Range Not Satisfiable (bad pagination parameters for search results, other large result list) ● 426 Upgrade required (version the client is requesting is too old)Appendix A: Sandbox 17
  • 18. NB- the “Sandbox API” is now know as the “Mock API” and is available on Github.A query sandbox will be built at the earliest opportunity to allow partners to develop applicationsthat interface with the Orcid system. The sandbox will: ● Parse query strings, as specified above and respond with appropriate error messages ● Decipher the content preference (web, XML, JSON) and display appropriately. ● Having parsed the query, the sandbox will reply with a static payload. ● Will dummy an OAuth transaction, success or fail.The sandbox will be maintained and versioned to always support the full set of query strings andresponse formats.Sandbox: Response formats ● Web pages will be semantically marked-up using schema.org microdata. ● XML and JSON format detail to be discussed. ● Consideration will be given to inclusion of human-readable messages in response payload files, possibly of url to full message. This would be to enable alerting API clients (particularly public API clients for whom we don’t necessarily have any contact details) to changes in format, functionality, throttling alerts, etc. ● Search responses will be in the form of Orcid IDs plus a match score in the first version. Additional fields will be added in subsequent versions of the API.Appendix B: Overview of ORCID/OAuth Flow 18
  • 19. 19
  • 20. 20