Who's the Author?
Identifier soup - ORCID, ISNI,
LC NACO and VIAF
Simeon Warner
http://orcid.org/0000-0002-7970-7855
CUL Career Development Days 2017, 2017-04-07
Promises and changes...
Identifiers, including ORCID, ISNI, LC NACO and
VIAF, are playing an increasing role in library
authority work.
We’ll describe changes to cataloging practices
to leverage identifiers.
We'll then tell a short story of the how and why
of ORCID identifiers for researchers, and
relationships with other person identifiers.
Finally, we'll discuss the use of identifiers as
part of moves toward linked data cataloging
being explored in Linked Data for Libraries
work (in the LD4L Labs and LD4P projects).
True
statemen
t
Sketchy
also LD4L
Labs part
This I
know well
Big win for users:
• All books by this Iain
Banks
• None by any other Iain
Banks (e.g. “Banks,
Iain, 1963-”)
A little additional
information
I do find 3 titles about this
Iain Banks
No links out
URIs for Persons (& other entities) in MARC
 Use $0 sub-field for URI of thing/RWO
o RWO = Rodents Wildly Oversize (larger than ROUS)?
o … or maybe Real World Object (cf. authority or description)
https://www.loc.gov/aba/pcc/documents/PoCo-2016/PCC_URI_TG_20161006_Report.pdf
https://www.loc.gov/aba/pcc/bibframe/TaskGroups/URI-TaskGroup.html
~5k VIVO URIs in catalog
for committee members &
advisors of dissertations,
part of ETD process
Link to profile not
provided via catalog
user interface yet …
but information is
present
NACO
NACO = Name Authority Cooperative Program of the PCC (Program
for Cooperative Cataloging)
“Through this program, participants contribute authority records
for personal, corporate, and jurisdictional names; uniform titles;
and series headings to the LC/NACO Authority File. Membership
in NACO is open to any institution willing to support their staff
through a process of training, review, and direct contributions of
records to the LC/NACO Name Authority File.”
(http://www.loc.gov/aba/pcc/naco/)
 Collaborative construction of big centralized
name authority database that we use
Entity Types of Name Authorities in LC/NACO
6310631, 73%
183495,
2%
1444944, 17%
130208, 2%
526623, 6%
Persons
Conferences
Corporate bodies
Geographic names
Titles
Data as of 2013-06-01, from https://viaf.org/viaf/partnerpages/LC.html
PCC report http://www.loc.gov/aba/pcc/stats/FY2013/PCC-Contributions.pdf
says 4,102,550 contributions to end FY2013 => 47% of LC authorities
Will focus on persons
Data from Wikipedia
but no URI 
Linked (Open) Data and Identifiers
• Identifiers provide a layer of indirection that
authorized names do not
o Generally considered good practice to avoid
semantics in identifiers
o No change from “Banks, Iain, 1953-” to “Banks,
Iain, 1953-2013”
• Identifiers expressed and URLs/URIs make the web
work well
o Provide means to give access to human and
machine representations
http://5stardata.info/en/
VIAF: The Virtual International Authority File
“The VIAF® (Virtual International Authority File) combines
multiple name authority files into a single OCLC-hosted name
authority service. The goal of the service is to lower the cost and
increase the utility of library authority files by matching and
linking widely-used authority files and making that information
available on the Web.” (https://viaf.org/)
 Automated matching/clustering of authority data
from many sources (mostly national libraries) with
(resolvable) identifier
 Access to greater internationalization than LC/NACO
 More entries that LC/NACO, wider lookup
 Changing data may merge / split / delete entries –
not yet policies and mechanisms to handle this
Death date note added
everywhere
VIAF URI (“Permalink”)
and link to ISNI URI
ISNI: International Standard Name Identifier
“The mission of the ISNI International Authority (ISNI-IA) is to
assign to the public name(s) of a researcher, inventor, writer,
artist, performer, publisher, etc. a persistent unique identifying
number in order to resolve the problem of name ambiguity in
search and discovery; and diffuse each assigned ISNI across all
repertoires in the global supply chain so that every published
work can be unambiguously attributed to its creator wherever
that work is described.” (http://www.isni.org/)
 The key driver was the need for accurate identities
for communication and payment associated with
rights
ISNI in libraries?
The mission of ISNI has expanded beyond rights, for example data
has been imported to create ISNI identifiers for PhD thesis authors
and from VIAF.
Restrictive terms of use: “All content provided on this Site and
any materials you download are provided for informational, non-
commercial purposes and you shall have no right to modify,
reproduce or redistribute any such content or materials”
(http://www.isni.org/terms-conditions)
Both VIAF and ISNI operated by OCLC so one can expect good
interoperability between them
ISNI URI stable and
managed
Includes Wikipedia &
Wikidata links
ORCID: Open Researcher and Contributor ID
“ORCID’s vision is a world where all who participate in research,
scholarship, and innovation are uniquely identified and connected
to their contributions across disciplines, borders, and time.”
“ORCID provides an identifier for individuals to use with their
name as they engage in research, scholarship, and innovation
activities. We provide open tools that enable transparent and
trustworthy connections between researchers, their
contributions, and affiliations. We provide this service to help
people find information and to simplify reporting and analysis.”
(https://orcid.org/)
 Research and scholarship focus
 Expect use by individuals identified in workflows
ORCID by way of arXiv – the need
2000’s: Long running discussions between arXiv, SPIRES (now
Inspire) and NASA ADS about how to share author identities, and
how to connect with other systems... some experiments:
o arXiv author ids
o Inspire login
o Facebook integration
o LinkedIn integration
ORCID by way of arXiv – creation
• 2009: No non-proprietary solutions for name authority or
identifiers for journal articles
• Alignment of academic, publisher, funder and aggregator
interests – including CUL and arXiv
• Meetings starting in 2009
• ORCID Incorporated 2010-08-05 – CUL a founding member
Other ids and
links useful, but
the author-articles
graph is key
Why must ORCID be different?
• How many people should have ORCID iDs?
o UNESCO 2013 estimate: 7.8 million researchers
o OECD 2014 estimate: 25.5 million researchers
o Far more than person records in LC NACO, and rapidly
changing
• Around 2 million journal articles published per year
(https://arxiv.org/abs/1402.4578)
 Only partial overlapping with other kinds of
publication
 “Sort it all out with manual effort” solution not
practical
 Solve with researcher engagement and use in
publication workflows
Journal article round trip
ORCID iDs are intended to be integrated into research and
publication workflows, and become embedded in the metadata.
ORCID iDs will thus be associated with new works at the time of
publication – ambiguity avoidance rather than disambiguation!
ORCID
record
Manuscript
Submission
ORCID
record
ORCID
record
Review
Publication
w DOI &
ORCID iDs
CrossRef
DOI assignment
Verified ORCID iDs, update permission
Readers
Authors
Hub linking to other
identities – leverage
overlap that exists
Short biography under
my control
How is ORCID doing? – Very well!
• 653 member organizations, approaching break-even
• Worldwide use, many integrations, some mandates
• National consortia including Australia, Denmark, Germany,
Finland, Italy, The Netherlands, New Zealand, Sweden, UK
• Over 3 million ORCID iDs, steady growth
https://orcid.org/blog/2017/04/04/measuring-progress-orcid-2016-annual-report
LD4L Labs and LD4P (2016-2018)
Follow-ons from LD4L 2014-2016
LD4L Labs –Linked Data for Libraries Labs - https://ld4l.org/ld4l-labs/
(Cornell, Harvard, Stanford, Iowa)
o advance the use and usefulness of linked data in libraries
o create and assemble tools, ontologies, services
o pilot tools and services with a view to solutions in 3-5 years
o develop tools and provide support for LD4P projects
LD4P – Linked Data for Production – https://ld4l.org/ld4p
(Columbia, Cornell, Harvard, LC, Princeton, Stanford)
o pilot transition of technical services workflows to a linked
data environment
o “production” means creation of catalog records, not
production-ready
Hydra lookup/autocomplete
• Building modules to support easy integration with popular
repository framework
• Leverage established Linked Data authorities instead of just
locally configured terms
• Example: Corporate Name from OCLC FAST
But hey! Didn’t you
say we should use
identifiers?
Lookup/autocomplete with identifier (1)
• Instead of storing just the authorized name, store identifier as
well
• Lookup unchanged – for example, an Agrovoc
(http://aims.fao.org/standards/agrovoc) term as used in
AgriKnowledge+
Lookup/autocomplete with identifier (2)
• Having selected “hive products”, both that string and the
corresponding identifier
http://aims.fao.org/aos/agrovoc/c_3655 are recorded
Better for maintenance and opportunity to link out
• Matching terms in other
vocabularies
• Related terms
• Labels in many languages
• Admin metadata
VitroLib – A experimental cataloging tool
• Using Vitro (the platform behind VIVO), customized for library
metadata as an editing environment – VitroLib
• Usability studies to understand how to build systems that will
work for cataloging
 Early days, long journey!
“Let’s imagine you are creating
an entirely new record”
“But I never create an entirely
new record!”
“Oh, and BTW, we shouldn’t say
record”
Why move from authorities to identifiers?
• Reduce maintenance and copy/paste work in library
environment
• Bridge between library catalog world and rest-of-world
o Journal literature
o Popular web
o Profiling services
• Better services to users
o More links to related sources
o Better localization options (maybe some users don’t want
the romanized names?)
o Better discovery (leverage structured data from other
sources)
That’s all folks…
• Thanks to Huda Khan and Nick Cappadona for VitroLib
prototype screenshot
• Thanks to Lynette Rayle for screenshots of Hydra “Questioning
Authority” work
• Thanks to Christina Harlow for finding examples in the catalog!
• Thanks to Cornell LD4L Labs and LD4P teams, including: Dean
Krafft, Lynette Rayle, Rebecca Younes, Jim Blake, Muhammad
Javed, Huda Khan, Nick Cappadona, Sandy Payette, Chew Chiat
Naun, Steven Folsom, Jason Kovari and Christina Harlow

Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF

  • 1.
    Who's the Author? Identifiersoup - ORCID, ISNI, LC NACO and VIAF Simeon Warner http://orcid.org/0000-0002-7970-7855 CUL Career Development Days 2017, 2017-04-07
  • 2.
    Promises and changes... Identifiers,including ORCID, ISNI, LC NACO and VIAF, are playing an increasing role in library authority work. We’ll describe changes to cataloging practices to leverage identifiers. We'll then tell a short story of the how and why of ORCID identifiers for researchers, and relationships with other person identifiers. Finally, we'll discuss the use of identifiers as part of moves toward linked data cataloging being explored in Linked Data for Libraries work (in the LD4L Labs and LD4P projects). True statemen t Sketchy also LD4L Labs part This I know well
  • 6.
    Big win forusers: • All books by this Iain Banks • None by any other Iain Banks (e.g. “Banks, Iain, 1963-”)
  • 7.
    A little additional information Ido find 3 titles about this Iain Banks No links out
  • 8.
    URIs for Persons(& other entities) in MARC  Use $0 sub-field for URI of thing/RWO o RWO = Rodents Wildly Oversize (larger than ROUS)? o … or maybe Real World Object (cf. authority or description) https://www.loc.gov/aba/pcc/documents/PoCo-2016/PCC_URI_TG_20161006_Report.pdf https://www.loc.gov/aba/pcc/bibframe/TaskGroups/URI-TaskGroup.html
  • 9.
    ~5k VIVO URIsin catalog for committee members & advisors of dissertations, part of ETD process
  • 11.
    Link to profilenot provided via catalog user interface yet … but information is present
  • 12.
    NACO NACO = NameAuthority Cooperative Program of the PCC (Program for Cooperative Cataloging) “Through this program, participants contribute authority records for personal, corporate, and jurisdictional names; uniform titles; and series headings to the LC/NACO Authority File. Membership in NACO is open to any institution willing to support their staff through a process of training, review, and direct contributions of records to the LC/NACO Name Authority File.” (http://www.loc.gov/aba/pcc/naco/)  Collaborative construction of big centralized name authority database that we use
  • 13.
    Entity Types ofName Authorities in LC/NACO 6310631, 73% 183495, 2% 1444944, 17% 130208, 2% 526623, 6% Persons Conferences Corporate bodies Geographic names Titles Data as of 2013-06-01, from https://viaf.org/viaf/partnerpages/LC.html PCC report http://www.loc.gov/aba/pcc/stats/FY2013/PCC-Contributions.pdf says 4,102,550 contributions to end FY2013 => 47% of LC authorities Will focus on persons
  • 17.
  • 18.
    Linked (Open) Dataand Identifiers • Identifiers provide a layer of indirection that authorized names do not o Generally considered good practice to avoid semantics in identifiers o No change from “Banks, Iain, 1953-” to “Banks, Iain, 1953-2013” • Identifiers expressed and URLs/URIs make the web work well o Provide means to give access to human and machine representations http://5stardata.info/en/
  • 19.
    VIAF: The VirtualInternational Authority File “The VIAF® (Virtual International Authority File) combines multiple name authority files into a single OCLC-hosted name authority service. The goal of the service is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.” (https://viaf.org/)  Automated matching/clustering of authority data from many sources (mostly national libraries) with (resolvable) identifier  Access to greater internationalization than LC/NACO  More entries that LC/NACO, wider lookup  Changing data may merge / split / delete entries – not yet policies and mechanisms to handle this
  • 21.
    Death date noteadded everywhere VIAF URI (“Permalink”) and link to ISNI URI
  • 22.
    ISNI: International StandardName Identifier “The mission of the ISNI International Authority (ISNI-IA) is to assign to the public name(s) of a researcher, inventor, writer, artist, performer, publisher, etc. a persistent unique identifying number in order to resolve the problem of name ambiguity in search and discovery; and diffuse each assigned ISNI across all repertoires in the global supply chain so that every published work can be unambiguously attributed to its creator wherever that work is described.” (http://www.isni.org/)  The key driver was the need for accurate identities for communication and payment associated with rights
  • 23.
    ISNI in libraries? Themission of ISNI has expanded beyond rights, for example data has been imported to create ISNI identifiers for PhD thesis authors and from VIAF. Restrictive terms of use: “All content provided on this Site and any materials you download are provided for informational, non- commercial purposes and you shall have no right to modify, reproduce or redistribute any such content or materials” (http://www.isni.org/terms-conditions) Both VIAF and ISNI operated by OCLC so one can expect good interoperability between them
  • 24.
    ISNI URI stableand managed Includes Wikipedia & Wikidata links
  • 25.
    ORCID: Open Researcherand Contributor ID “ORCID’s vision is a world where all who participate in research, scholarship, and innovation are uniquely identified and connected to their contributions across disciplines, borders, and time.” “ORCID provides an identifier for individuals to use with their name as they engage in research, scholarship, and innovation activities. We provide open tools that enable transparent and trustworthy connections between researchers, their contributions, and affiliations. We provide this service to help people find information and to simplify reporting and analysis.” (https://orcid.org/)  Research and scholarship focus  Expect use by individuals identified in workflows
  • 26.
    ORCID by wayof arXiv – the need 2000’s: Long running discussions between arXiv, SPIRES (now Inspire) and NASA ADS about how to share author identities, and how to connect with other systems... some experiments: o arXiv author ids o Inspire login o Facebook integration o LinkedIn integration
  • 27.
    ORCID by wayof arXiv – creation • 2009: No non-proprietary solutions for name authority or identifiers for journal articles • Alignment of academic, publisher, funder and aggregator interests – including CUL and arXiv • Meetings starting in 2009 • ORCID Incorporated 2010-08-05 – CUL a founding member Other ids and links useful, but the author-articles graph is key
  • 28.
    Why must ORCIDbe different? • How many people should have ORCID iDs? o UNESCO 2013 estimate: 7.8 million researchers o OECD 2014 estimate: 25.5 million researchers o Far more than person records in LC NACO, and rapidly changing • Around 2 million journal articles published per year (https://arxiv.org/abs/1402.4578)  Only partial overlapping with other kinds of publication  “Sort it all out with manual effort” solution not practical  Solve with researcher engagement and use in publication workflows
  • 29.
    Journal article roundtrip ORCID iDs are intended to be integrated into research and publication workflows, and become embedded in the metadata. ORCID iDs will thus be associated with new works at the time of publication – ambiguity avoidance rather than disambiguation! ORCID record Manuscript Submission ORCID record ORCID record Review Publication w DOI & ORCID iDs CrossRef DOI assignment Verified ORCID iDs, update permission Readers Authors
  • 30.
    Hub linking toother identities – leverage overlap that exists Short biography under my control
  • 31.
    How is ORCIDdoing? – Very well! • 653 member organizations, approaching break-even • Worldwide use, many integrations, some mandates • National consortia including Australia, Denmark, Germany, Finland, Italy, The Netherlands, New Zealand, Sweden, UK • Over 3 million ORCID iDs, steady growth https://orcid.org/blog/2017/04/04/measuring-progress-orcid-2016-annual-report
  • 32.
    LD4L Labs andLD4P (2016-2018) Follow-ons from LD4L 2014-2016 LD4L Labs –Linked Data for Libraries Labs - https://ld4l.org/ld4l-labs/ (Cornell, Harvard, Stanford, Iowa) o advance the use and usefulness of linked data in libraries o create and assemble tools, ontologies, services o pilot tools and services with a view to solutions in 3-5 years o develop tools and provide support for LD4P projects LD4P – Linked Data for Production – https://ld4l.org/ld4p (Columbia, Cornell, Harvard, LC, Princeton, Stanford) o pilot transition of technical services workflows to a linked data environment o “production” means creation of catalog records, not production-ready
  • 33.
    Hydra lookup/autocomplete • Buildingmodules to support easy integration with popular repository framework • Leverage established Linked Data authorities instead of just locally configured terms • Example: Corporate Name from OCLC FAST But hey! Didn’t you say we should use identifiers?
  • 34.
    Lookup/autocomplete with identifier(1) • Instead of storing just the authorized name, store identifier as well • Lookup unchanged – for example, an Agrovoc (http://aims.fao.org/standards/agrovoc) term as used in AgriKnowledge+
  • 35.
    Lookup/autocomplete with identifier(2) • Having selected “hive products”, both that string and the corresponding identifier http://aims.fao.org/aos/agrovoc/c_3655 are recorded Better for maintenance and opportunity to link out
  • 36.
    • Matching termsin other vocabularies • Related terms • Labels in many languages • Admin metadata
  • 37.
    VitroLib – Aexperimental cataloging tool • Using Vitro (the platform behind VIVO), customized for library metadata as an editing environment – VitroLib • Usability studies to understand how to build systems that will work for cataloging  Early days, long journey! “Let’s imagine you are creating an entirely new record” “But I never create an entirely new record!” “Oh, and BTW, we shouldn’t say record”
  • 38.
    Why move fromauthorities to identifiers? • Reduce maintenance and copy/paste work in library environment • Bridge between library catalog world and rest-of-world o Journal literature o Popular web o Profiling services • Better services to users o More links to related sources o Better localization options (maybe some users don’t want the romanized names?) o Better discovery (leverage structured data from other sources)
  • 39.
    That’s all folks… •Thanks to Huda Khan and Nick Cappadona for VitroLib prototype screenshot • Thanks to Lynette Rayle for screenshots of Hydra “Questioning Authority” work • Thanks to Christina Harlow for finding examples in the catalog! • Thanks to Cornell LD4L Labs and LD4P teams, including: Dean Krafft, Lynette Rayle, Rebecca Younes, Jim Blake, Muhammad Javed, Huda Khan, Nick Cappadona, Sandy Payette, Chew Chiat Naun, Steven Folsom, Jason Kovari and Christina Harlow