About the Webinar
In the world of authority control, it is a bit of an alphabet soup of acronyms. ORCID (Open Researcher and Contributor ID), which is a system to uniquely identify scientific and other academic authors; ISNI (International Standard Name Identifier), which identifies the public identities of contributors to media content such as books, television programs, and newspaper articles; and VIAF (Virtual International Authority File) a system that combines multiple name authority files into a single authority service, hosted by OCLC, all have their place when discussing identifiers for authority control.
Identity issues and disambiguating authors, researchers, other content creators, and their institutional affiliations are crucial as we move into a world of linked data. In this webinar, presenters will cover the implications and differences between ORCID, ISNI, and VIAF, what is the proper use of each, and some of the benefits that come with using authority files and making that information available on the Web.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
ORCID identifiers in research workflows
Simeon Warner, Director of Repository Development, Cornell University Library
ISNI: How It Works And What It Does
Laura Dawson, Product Manager, ProQuest
VIAF and its Relationships with Other Files
Thomas Hickey, Chief Scientist, OCLC
How to do quick user assign in kanban in Odoo 17 ERP
NISO Webinar Authority Control
1. NISO Webinar
Authority Control:
Are You Who We Say You Are?
Wednesday, February 11, 2015
Speakers:
Simeon Warner, Director of Repository Development, Cornell University Library
Laura Dawson, Product Manager, ProQuest
Thomas Hickey, Chief Scientist, OCLC
http://www.niso.org/news/events/2015/webinars/authority_control/
2. ORCID identifiers in research
workflows
Simeon Warner, Cornell University Library
with thanks to
Laure Haak, ORCID Executive Director and
Josh Brown, ORCID Regional Director, Europe
for slides and comments
NISO Webinar:
Authority Control: Are You Who We Say You Are?
February 11, 2015
3. “Use ORCID iDs in research
workflows to solve name
ambiguity and save everyone
a bunch of effort!”
4. ORCID background
• open - anyone can register, any organization with interest in
research and scholarly communications can join, iDs intended
for reuse, software open source
• non-profit - incorporated in USA, also ORCID EU
• community-driven - where community includes all sectors of
research process including publishers, funders, universities,
and the researchers themselves
two core functions:
1. a registry of unique identifiers and manage a record of
activities
2. APIs that support system-to-system communication and
authentication
see: http://orcid.org/content/initiative
5. ORCID status and adoption
A little over 2 years since launch, over 1.1M ids created,
over 190 members from all sectors and around the world.
-
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
June
July
Aug
Creator
Website
Trusted Party
2012 2013 2014
Publishing
25%
Universities
& Research
Orgs
45%
Funders
7%
Association
s
12%
Repositorie
s & Profile
Sys
11%
EMEA
35%
America
s
50%
AsiaPac
15%
6. National integrations and membership
http://openaccess.blogg.kb.se/2013/01/30/slutrapport-fran-projekt-forfattarindentifikatorer/
http://www.jisc.ac.uk/whatwedo/programmes/di_researchmanagement/researchinformation/orcid.aspx
http://orcid.org/blog/2014/09/03/denmark-adopts-orcid-consortium-approach-orcid-implementation
http://orcidpilot.jiscinvolve.org/wp/
7. ORCID Scope
ORCID = Open RESEARCHER AND CONTRIBUTOR Identifier
o Research activities
o Living people
o There are fewer researchers than the scope of people and
personas covered by ISNI or VIAF
CONTRIBUTOR -- ORCID intended to be used for the spectrum of
actors in the research process, not just authors, and records roles.
o Already supports roles like translator, principal investigator
o 2012 Harvard Workshop
http://projects.iq.harvard.edu/attribution_workshop/home
o 2014 Project CRediT Workshop
http://www.eventbrite.ca/e/project-credit-workshop-tickets-10314211083
8. Researcher driven
Creation methods:
• integrations dominate
• website second
• institutional creation
Researcher must be involved to create or activate the ORCID iD,
and can control the privacy settings and/or add information.
Recommend institutions use the trusted party creation method
rather then direct record creation. Need to connect with and
educate users anyway. Can pre-populate registration fields.
-
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
June
July
Aug
Creator
Website
Trusted Party
2012 2013 2014
9. Leveraging ISNI Organization IDs
ORCID uses Ringgold (an ISNI registrar) organization list to support
connection between individuals and education and employment
affiliations.
12. Publication round trip
ORCID iDs are intended to be integrated into research and
publication workflows, and become embedded in the
metadata. ORCID iDs will thus be associated with new
works at the time of publication.
ORCID
record
Manuscript
Submission
ORCID
record
ORCID
record
Review
Publication
w DOI &
ORCID(s)
CrossRef
DOI assignment
Verified ORCID, update permission
Readers
13. Round trip process and implications
Publisher captures ORCID iD during manuscript submission
o Authenticated process, no mistyping, accurate
o User may grant permission to add works later
Publisher includes ORCID iD in metadata when minting DOI
o Will be available to support discovery
o Available in CrossRef search
Publisher/CrossRef writes metadata back to ORCID record
o Holder notified, can control visibility
o Saves effort updating record
o Information flow to other systems such as local profile (e.g.
I've linked my ORCID record with my VIVO profile)
Similar process for datasets, mediated by DataCite
ref: http://orcid.org/blog/2014/11/21/new-functionality-friday-auto-update-your-orcid-record
14. Funder workflow
• Use for applicants and reviewers
• Profile data reduces applicant/grantee form filling burden
• Improve reporting accuracy
• Pull publications, datasets and other works based on ORCID iD
ref: http://support.orcid.org/knowledgebase/articles/426596-orcid-funder-workflow
15. An ounce of ambiguity avoidance is worth a
pound of disambiguation
-- with apologies to Benjamin Franklin
• Workflow integration avoids name ambiguity at source
• Resulting data good for disambiguation of older data
• Resulting data good for compilation of authority records
17. Minimal record
Registration is really quick and
easy, 30 seconds perhaps
1. name
2. email
3. password
4. agree to privacy policy and
conditions
A minimal ORCID record that is
enough to get an iD and use it in
research workflows
18.
19. Helpful ORCID record
Reasons to add a little more information:
1. Provide enough information so that someone who follows a
link to your record, or searches for you, can understand which
"John Smith" you are
o alternate names
o education and employment information
o a few works. Everyone likes to show off their best work …
o opens the door for disambiguation of existing data
1. Provide other identifiers so that ORCID can act as a
switchboard to connect your identities in different systems.
o local profile id (e.g. my VIVO id at Cornell)
o Scopus Author ID, Researcher ID, ISNI
o (Using the search and link wizards that connect to these
other systems is also the easiest way to add works.)
20.
21. Expansive ORCID record
There are many import wizards which not only allow
o connection of an ORCID record to other identifiers
o also import of works, grants, etc..
o source is recorded and provides way to assess trust
ORCID registry has facilities for users to enter works themselves,
specify their roles, etc..
ORCID UI groups information about the same work from multiple
sources
o user may select preferred one to display
You may make your ORCID record a complete picture research
contributions if you choose. But a complete record isn't necessary
for ORCID to work.
23. ORCID is a hub
Other
Identifiers
Funders
Higher
Education
and
Employers
Professional
Associations
Repositories
Publishers
The ORCID identifier
connects researchers
with their works
(papers, grants,
datasets, and more),
organizations, and
other identifiers.
ORCID APIs enable data
exchange between
research information
systems.
DOI
DOI
ISBN
Thesis ID
ISNI
Researcher ID
Scopus Author ID
Internal identifiers
Member ID
Abstract ID
Member ID
Abstract ID
FundRef
GrantID
32. What Is ISNI
• ISO Standard, published in 2012
• International Standard Name Identifier
• Numerical representation of a name
– 16 digits
– Assigned to public figures, contributors of content –
researchers, authors, musicians, actors, publishers,
research institutions – and subjects of that content (if
they are people or institutions).
– Example: 0000 0004 1029 5439
33. Who is ISNI
• Founding members
– IFRRO (International Federation of Reproduction
Rights Organizations)
– CISAC (International Confederation of Authors and
Composers Societies)
– SCAPR (Societies’ Council for the Collective
Management of Performers’ Rights)
– OCLC
– CENL (Conference of European National Librarians),
represented by the British Library and the National
Library of France
– ProQuest, represented by Bowker
34. Members
Quality Team
Board of Directors
ISNI Organizational Structure
Registration Agencies
Ongoing
assignments/
general public
35. How Does ISNI Registration Work
• Publisher submits names for assignment through a Registration
Agency
• RA works with the publisher to ensure the data feed is well-
formatted, and sends that feed to the Assignment Agency
• AA assigns as many ISNIs to the names in the feed as it can, using
complex algorithms and business rules that evolve with each feed
• AA returns a file of names with ISNIs attached to them
– This may not be the full file of names
– Ambiguous names are held for review by Quality Team
– QT assignments and other exceptions (assignments as a result
of improvements to the algorithm) are returned to RA quarterly
– Process is not instant. Assignment may be immediate if the
name and other information is unique, but frequently
assignments take a week or two.
36. Stage One
Customer
submits data to
Registration
Agency
Registration
Agency sends
file to
Assignment
Agency
Assignment
Agency assigns
as many ISNIs
to the names as
it can
39. Display
• Only minimal metadata is displayed
• Not meant as a comprehensive profile
• ISNI is a tool for linking data sets, collocation, and
disambiguation
• Enhancements to the record can be made but not
required
44. How many names in the ISNI database?
• Over 8,000,000 assigned
• 10,112,931 provisional (awaiting a match from another
data set for corroboration)
• Your author names may well already have ISNIs.
http://www.isni.org/search.
50. Data Quality
• Based on matching names to existing records in
database (over 17 million names)
• Strict criteria for assigning ISNIs to names
• Quality team oversight (manual edits)
– British Library
– National Library of France
– OCLC
50
51. Assignment Criteria
• If on the common surname list:
– Birth date
– Death date
– ISBN(s)
– Title(s)
– Co-authors or institutional affiliation
• If not on the common surname list
– Title(s)
– Birth date
– Death date
– Any other distinguishing factors (“is not”)
• If unique
– Immediate assignment
51
52. ISNI and ORCID
• ORCID numbers are a subset of the numbers in ISNI’s
database
• Working towards alignment, with ultimate goal of single
assignment
• There is ISNI representation on the ORCID Technical
Steering Group, and ORCID representation on the ISNI
Technical Committee
• A researcher may have both an ORCID and an ISNI
52
56. Virtual International Authority File
• Grew out of collaboration with national libraries
• Implemented and run by OCLC
• VIAF Council helps oversee it
• ~36 files, mainly from national authority files
• Everything libraries control other than topical
subject headings is in scope
– Personals, corporates, families
– Jurisdictionals, geographics
– Works, expressions
– Imaginary characters, etc.
56
62. Why multiple files?
• Different
– Information collected
• Private vs. public
• Identification vs. comprehensive
– Technologies and systems
• APIs
– Time scales
• Batch vs. interactive creation
• Historical vs. contemporary
– Business models
62
63. VIAF’s characteristics
• Origins
• What is being identified
• Who creates it
• Range of entities
• Priorities and control
• What can be shared
Library authorities
Entities libraries control
Library staff
Very broad
Libraries
Open
63
64. Relationship with ISNI
• Both systems run by OCLC
– VIAF helped get ISNI started
• Problems
– Each absorbs the other’s data
– Feedback loops!
• Who’s in charge?
– ISNI now indicates reviewed records
• Relationships treated as though from xA
• Can both merge and split VIAF clusters
70. Relationship with Wikipedia
• VIAF Harvests Wikipedia dumps monthly
• Pages about people that are in VIAF are added
• VIAFbot back loaded links into Wikipedia
– http://en.wikipedia.org/wiki/User:VIAFbot
71. Relationship with WorldCat
• One of the main uses of VIAF internally at
OCLC is controlling names
• Multilingual Bibliographic Structure project
• Generate ‘xR’ authority records
– Works
– Expressions
72. OCLC Production Services
External OCLC Research Systems
Internal OCLC Research
Resources
enhanced
WorldCat
Kindred Works
Classify
Identities
FictionFinder
Cookbook
Finder
LCSH
FAST
VIAF
GMGPC
Linked Data Entities
WORKS
GSAFD
GTT
DDC
LCTGM
MeSH
74. Unexpected interactions
• Drive towards comprehensiveness
– More information about entities
– More entities
• Importing other files
• Keeping up with updates
• Recognizing source of information
• What to trust
• How to leverage limited staff
76. NISO Webinar • February 11, 2015
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2015/webinars/authority_control/
NISO Webinar
Authority Control:
Are You Who We Say You Are?
77. Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU
Editor's Notes
I’m Simeon Warner and I work in the Cornell University Library. I also serve on the board of ORCID. I thank Laure Haak (ORCID executive director) and Josh Brown (ORCID regional directory for Europe) for slides and images that I have reused. I’m delighted to have the opportunity to speak about ORCID as part of this NiSO webinar.
Perhaps the subtitle for my talk should “Use ORCID iDs in research workflows to solve name ambiguity and save everyone a bunch of effort!”. This may be a little cheeky, but this is the key message of my talk and I will spend some time explaining it. There are many aspects of ORCID that I could talk about but this is an important one, and also a key differentiator between ORCID, and ISNI and VIAF which illustrates situations where these different identifiers are best used.
A little background: ORCID is open, non-profit and community driven. It is designed to meet the specific need of solving the ambiguity problem with controlled scope and with policies that are acceptable to the range of stakeholders, including the researchers themselves. ORCID provides two core functions: a registry and APIs to interact with it.
In a little over 2 years since launch, ORCID has issued over 1.1M ids, and has over 190 members from all sectors. ORCID is younger that VIAF and ISNI. Significantly, the key driver for ORCID identifier creation is INTEGRATION with other systems. Note the “Trusted Party” parts of the bars – these represent integrations
Out of the more than 190 memberships I’d like to highlight a few national level integrations. Back in 2011 Jisc in the UK recommended ORCID as the solution for researcher identifiers and there is an ongoing pilot Jisc-ARMA project for potential UK-wide adoption. Denmark has a national effort for integration with a goal of 80% registration and person-publication linkages for the national repository. In Sweden the use or ORCID ids will be integrated with the grants process and there are efforts to link ORCIDs in national and university repositories. In 2014 ORCID developed a consortial model for national memberships and there are ongoing discussions with other countries.
ORCID = Open RESEARCHER AND CONTRIBUTOR Identifier. Scope is all research activities over all disciplines. Deals only with living people at present because a small amount of researcher engagement is required. ISNI scope is broader as covers rights management contexts, including fictional characters or personas, and also identifiers for organizations. About 10% of ISNI assignments are to researchers (as off summer 2014)
ORCID is not just for authors, but covers CONTRIBUTORS more generally. Not time to discuss details here be see workshop write-ups from 2012 and 2014. Work is ongoing with the community to understand appropriate roles and their description but the ORCID registry already includes roles such as principal investigator and translator which are not always well represented by the notion of “authorship”
ORCID is researcher-driven per our privacy policy and only live people can register or claim and ORCID iD. VIAF is an aggregation of librarian-created authority files from several national libraries. ISNI is librarian/algorithm-driven and may assign ISNIs to living and dead authors based on publicly available works. This is a key distinction between ORCID and ISNI and VIAF.
ORCID uses Ringgold (an ISNI registrar) organization list to support connection between individuals and education and employment affiliations
Auto-complete of funding agency information in ORCID is based on FundRef data. If one types “nsf” that will auto-complete for example.
Integration of ORCID iDs in research workflows
There is already significant adoption of ORCID within the publisher community. At time of submission, one or more authors are asked to authenticate with ORCID to provide a verified ORCID identifier. At this time the author may grant the publisher permission to update their record with the publication information when accepted and published. Review and correction may take some time but when it comes time to publish then a CrossRef DOI is typically assigned and the publisher provides metadata, including ORCID identifiers, for the CrossRef database. As soon as the publication information is know, the round trip will be completed by the publisher (or CrossRef on behalf of the publisher) updating the author(s) record(s) with the new work. This last step will be implemented very soon and there are already many works in the CrossRef DB with ORCIDs associated with them,
To recap on this process and emphasize a few points: On submission ORCID authentication avoids mistyping or other errors, the identifier association will be accurate. ORCIDs are searchable in the CrossRef database and will be available to support discovery. Finally, if permission was granted the publisher or CrossRef can write metadata back to the ORCID record.
A similar process is being implemented for datasets by DataCite.
Another workflow to mention is integration with funders as part of submission and review processes. Perhaps the two most common key drivers are improved precision and transparency of assessment and tracking activities associated with funder researchers, and the desire to improve accuracy and save researcher time when “form filling” – information from their ORCID record can be used to pre-populate forms.
“An ounce of ambiguity avoidance is worth a pound of disambiguation”. ORCID changes the question from “how do you disambiguate authors?” to “how do we get everyone engaged at source to avoid name ambiguity in the first place?”.
How much information should my ORCID record have? This is a question I'm often asked by researchers when I discuss ORCID and the process of registering. … Do I have yet another profile to maintain? I want to talk about this not just because it is a frequently asked question, but also because it gets at some of the affordances and reasons for ORCID.
A minimal ORCID record is very quick and easy to create, 30 seconds. This is enough to get an iD and use it in research workflows
Minimal record will show nothing public except for name and identifier. This page clearly isn’t very useful because when we arrive here we are left with the question “Which John Smith?”. However, the owner of this id is still able to login as part of verified workflows and use his id. He may wish not to add works to his profile but should he choose to do so then it will be easy and automatic via the round-trip described earlier – a simply matter of approving
While a minimal ORCID record facilitates the use of an ORCID iD, there are reasons to add a little more information. Perhaps two key reasons are 1) Provide enough information so that someone who follows a link to your record, or searches for you, can understand which "John Smith" you are, and 2) Provide other identifiers so that ORCID can act as a switchboard to connect your identities in different systems.
With just a little more information added, we stand a very good chance of disambiguating this John Smith from another.
There are many import wizards which not only allow connection of an ORCID record to other identifies, and also import of works, grants, etc.. Users may also enter works themselves.
A key feature of the ORCID UI is grouping of duplicate work records based on work identifier (e.g. DOI). This allows multiple to work records from different sources to be stored, updated without overwriting each other, and yet the user may select the preferred one to view.
We all have many identifiers in many different systems and a key functionality provided by ORCID is that of a hub, a way to link these different identities together. It is also a hub for linking personal identity with identifiers for works, grants, patents, etc..
Here we se different segments of the ORCID community and some example identifier that are important. ORCID APIs enable data exchange between research information systems between all of these communities.
ORCID supports a number of wizards that connect to other systems such as Scopus, ResearcherID and ISNI. Cornell Library developed and integration with VIVO that is supported in a number of installations including our own. Links to systems not currently recognized a person or name identifiers can be added as website links too – I choose to link my github page for example.
Of course, the information shown on the ORCID page is also available in machine readable form too, with easy ways to access.
The Sloan Foundation funded ORCID integration with the VIVO open-source semantic-web research profiling system originally developed at Cornell and now under the DuraSpace umbrella. Here I show our VIVO instance at Cornell where I have linked my profile to my ORCID record. Such integration is now available “out-of-the box” as part of the core VIVO software.
One can link an ORCID with the Thomson Reuters ResearcherID system and import works. The ORCID is displayed also on the ResearcherID profile.
Similarly there is a Scopus import wizard which allows linking between an ORCID and a Scopus ID. The ORCID is displayed on the Scopus profile page.
With and ORCID on your mug it will always be traceable back to you. Thanks for listening!
A few pointers to key ORCID resources
16 digits – final one is a check digit so it is sometimes an X
In machine-to-machine communication, the ISNI is rendered without the spaces. We break it up into four sections just so it’s more human-readable on the web.
As you can see, these are representatives from a wide variety of domains. IFRRO is primarily text and image-based. CISAC is concerned with music. SCAPR’s domain is film and video. OCLC, which is also the assignment agency, as well as CENL, are concerned with library usage of ISNI. ProQuest’s domain is around book publishing, web usage, scholarly research, and inclusion in semantic web ontologies.
ISNI Board of Directors is made up of representatives from each founding organization, as well as a representative from the collection of registration agencies.
The Assignment Agency – the part of ISNI that does the actual assigning of numbers to names – is OCLC. The Assignment Agency runs out of OCLC’s Leiden office in the Netherlands.
The Quality Team – comprised of librarians from the British Library and the French National Library – works with the Assignment Agency to ensure that ISNI information is accurate and that disambiguations and collocations are correct. Basically, the Quality Team handles errors in the data, and continuously works on refining the assignment algorithms.
Members of ISNI include the founding members, of course, as well as Macmillan (ISNIs are in use at Digital Science), MusicBrainz, VIAF, and numerous other organizations. These members send data through a Registration Agency, to the Assignment Agency, and ISNIs are assigned to the name records in that data.
Currently, there are two Registration Agencies – Ringgold, for institutions, and Bowker, for everything else.
VIAF served as the initial data set for ISNI.
Just to give a run-through of the methodology behind ISNI assignment – ISNI is not a self-claiming system. Individuals can apply for ISNIs through a Registration Agency, but ISNIs are also assigned on behalf of authors and other contributors by their publisher or another organization that’s distributing content. ISNI profiles are not meant to be comprehensive. The ISNI website displays the minimal amount of information required to disambiguate one contributor name from another. Behind the scenes, in the ISNI database, there may be more information – which is used for disambiguation and collocation. But ISNI takes privacy very seriously and does not display more than is absolutely necessary, unless a person would like to make more information available on their ISNI page.
Run through slide.
To recap – a publisher submits a data file to a registration agency. The RA packages up that file, working with the publisher and the assignment agency to ensure the file is in an easily-process-able format. The assignment agency then assigns ISNIs to as many names as it can.
Once those assignments are made, a file is sent to the registration agency. The registration agency shares the file with the publisher, who QA’s it and then uses it as they wish. Dealing with a registration agent – as opposed to many individual publishers or other institutions – simplifies the process for the assignment agency.
Given that not all names in any given file will receive an ISNI, how do updates work? The AA sends updates quarterly. The RA parses through these updates, and disperses the appropriate files to the publishers, who then each ingest their update.
As previously noted, and unlike ORCID, ISNI does not display a comprehensive profile. The number is a tool – it determines whether the Bryan May who plays guitar for Queen is also Brian May the astrophysicist (he is), or Brian May the editor of an obscure photography book (he is). The number – the tool – determines that Fyodor Dostoevsky is the author of Crime and Punishment no matter how you spell his name or what character set you’re using.
And ISNI is a tool for linking data sets together. If two disparate databases – such as Books in Print and Musicbrainz – use ISNIs, then cross-domain linking is possible. This allows for a music professor, for example, to be unambiguously identified in his capacity as a session musician for Wynton Marsalis as well as in his capacity as the author of monographs on the evolution of jazz keyboard styles. An organization using both of these data sets would be able to link all the work produced by that professor regardless of whether it is audio or text.
If a contributor wishes to enhance his or her ISNI record, that is of course possible. If an individual wishes more information to be displayed, or to correct information in the database, he or she can work with a registration agency and the ISNI Quality Team to ensure this happens.
So here’s an example of an ISNI record. You’ll notice a couple of things – the paucity of information, and the yellow box. Clicking on the yellow box leads a user to a basic online form where they can submit additional information or corrections for the record. That information is evaluated by the Quality Team, and implemented if it’s determined to be accurate. Submissions by the actual contributor are enormously welcome – because, of course, the contributor is the best possible source!
This is just an example of the data sets that are already using ISNI. You’ll note that ORCID is using ISNIs – this is to identify research institutions, which are regarded as contributors to research being done there. Wikipedia is using ISNIs as well, as is Scholar Universe and other ProQuest products. ISNI allows any one of these organizations to transmit contributor data to any other one of these organizations. It also allows a third party to combine data sets and link them through the common ISNI.
Just a rundown of some organizations and products already using ISNIs. We are currently piloting with Booknet Canada and the Authors Guild.
And an example of ISNI use in Wikipedia. Clicking on the ISNI takes you to the VIAF entry for that contributor – that’s how Wikipedia decided to use ISNI.
We have over 8 million ISNIs assigned to names in the ISNI database, with an additional 10 million awaiting a corroboratory match. Because the primary application of ISNI is in Linked Data, large data sets have served as the basis for the ISNI database. Recruiting one contributor at a time – given the large number of domains that exist for contributors – is not feasible for ISNI implementation; it would take far too long. So assignment is fairly automated, as we’ve discussed, and geared towards large data sets.
These are two different theologians. But Random House is publishing both of them. They must make sure that they are getting the appropriate royalties to the correct author. The subject matter is not enough to distinguish the two, and middle names are not always consistently listed – how do we know for certain that these are two different individuals? The ISNI helps the publisher definitively disambiguate the two, and pay correctly.
This is a directory of researchers at Xerox PARC. How can they be sure that research published under “Y. Wang” is credited to the right person? Even “Yu Wang” might lead to some mistakes. Using ISNIs definitively separates the research of Yu Wang from Yunda Wang.
At Arizona State, there are two faculty members named Michael White who both work in the legal area. Again, area of research is not necessarily an effective disambiguator – but a numerical distinction provides clarity.
Here we have the ISNI record for Brian May. As you can see, he’s done a few things – as an author, a creator, and a performer.
As you can see, the ISNI links together disparate types of data. Brian May, noted guitarist for the band Queen, is also an astrophysicist who has published his dissertation “A survey of radial velocities in the zodiacal dust cloud”. He has ALSO annotated a seminal collection of stereoscopic photographs. A true Renaissance man – and the ISNI allows for these works to be linked together under his unique, persistent identifier. So that if someone sees his dissertation or his photograph collection, and questions whether or not this is the same Brian May that played “Bohemian Rhapsody”, they can confirm that yes, this is the same (rather unusual) person.
Data quality is paramount for ISNI. The quality team work through over 17 million records. And, as I mentioned, the data quality team also solicits individual feedback – crowdsourcing, if you will, but with editorial oversight. With a data set this large, accuracy is an enormous challenge, so the role of the data quality team is critical.
This is just a list of criteria for how the assignments get made. ISNI has compiled a list of common surnames which dictates how much other data is necessary for assignment.
ISNI designated a set of a million numbers for ORCID’s use. This is so there won’t be any data integrity issues – particularly among data sets using both identifiers.
We are, in fact, working extremely closely with ORCID. Our mechanisms for assignment are different, but we do have the ultimate goal of having a single number serve both the ORCID and ISNI communities – similar to how an ISBN is also an EAN. There’s a lot of work we have to do to get there (issues around deprecation, redirection, and critieria for assignment – as ORCID only requires an email address), but we are on one another’s technical committees, and are in talks at the board level to create an alignment plan. We’ll have more information after an ORCID-ISNI joint board meeting later this month. In the meantime, a researcher may have both an ORCID and an ISNI.
Because ISNIs are assigned through library organizations, large databases, and other sources, it’s quite possible you might already have one. You can go to http://www.isni.org/search to check and see.
Currently VIAF has about 44M input records and 46 million equivalence relationships
Business models might be subscription, charging for services, advertising, public good
Currently running with monthly updates, but we expect it to gradually become closer to real-time updating
ISNI started with VIAF data plus links we had been able to establish between it’s initial files and VIAF
xA is an authority file used to split/join clusters in VIAF.
By relying on hand-reviewed records in ISNI, VIAF benefits from the work done by ISNI’s review teams at national libraries.
Seems to be one of the most important files that VIAF links
We started by importing Wikipedia people that match VIAF personas into VIAF from Wikipedia data dumps.
Then we created VIAF-Bot (Wikipedian in residence Max Klein) to push VIAF IDs into Wikipedia
Still working from English Wikipedia dumps, but expect to start using WikiData this year.
There is a page where people can list problems. Difficult for us to use.
WorldCat is the main OCLC database of nearly 400 million library curated records
GMGPC, LCTGM: Thesarus for Graphic Materials
GSAFD: Genre Terms from Guidelines on Subject Access to Individual Works of Fiction, Drama, etc.
GTT: Gemeenschappelijke Trefwoorden Thesaurus, Dutch Joint Subject index
Everything has errors. Fixing them one-by-one with limited staff helps, but not enough.
Either need crowd-sourcing or find ways to leverage your limited staff