Virtual International Authority File (VIAF) and International Standard Name Identifier (ISNI)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Virtual International Authority File (VIAF) and International Standard Name Identifier (ISNI)

on

  • 1,451 views

Tutorial presented at the COAR 4th Annual Meeting, 8 May 2013, Istanbul, Turkey

Tutorial presented at the COAR 4th Annual Meeting, 8 May 2013, Istanbul, Turkey

Statistics

Views

Total Views
1,451
Views on SlideShare
1,449
Embed Views
2

Actions

Likes
2
Downloads
21
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The following slides give some background on two different initiatives relevant to researcher identifiers: VIAF and ISNI and how they relate to each other and to ORCID. The last few slides pay some attention to the role of name identifiers and URIs in discovering researchers on the Web.
  • To begin with, a slide to remind us of the basics of persistent identifiers. The characteristics of uniqueness and persistence attributed to Pids are not absolute values. They only make sense in the context in which the identifier is used. Uniqueness is valid in the context of the space in which the identifier is used: a specific domain (eg. ISBNs for books) or information system (eg. URNs for the Web). Persistence is valid in the context of the communities for whom the identifier is useful and as long as there is organisational and/or legal commitment to maintain the identifier. In addition, the best practice to separate the identifier (id of an object) and the address (the location of the object) is crucial for a well-behaved information system (eg. a physical library or the Web). These basics are essential for implementing identifiers in repositories.
  • In the library world authority control has been a best practice for years. As an example of the scale of this operation: names of over 22Million people and 5Million organisations are “controlled” by the OCLC library community. It has been a manual method for disambiguating names of specific entities.
  • Library cataloguers assign a unique header (=name) for each author that manifests itself in different publications, in different scripts, languages, etc. Libraries who share the same authority file refer to these headers which are unique across their systems. The use of authority files allows for example to search for an author and to list all the works of this author, that are held in different libraries.
  • This is an example of an authority record of an author. It contains all the name variants of the author, his identifiers (here you see the ISNI, ORCID and VIAF numbers for this author), some biographical data, etc.
  • Authority files for authors have been developed in the context of national programs, usually led by the national library of the country.
  • When bibliographic records using different national authority files are aggregated, like in the case of WorldCat, issues of duplication of authority control and ambiguity appear. As a result author names are no longer referring to one single header/identifier and the bibliographic records relating to this author are not linked together via a single header.
  • To address these issues, VIAF was developed. VIAF merges authority files and brings different authorities about the same identity together. VIAF started as a research project. Now that it has reached sufficient critical mass it is run as an operational service by OCLC. The cooperative VIAF programme is an ongoing effort with additional, national level authority files being added as we speak. Here are some recent VIAF statistics. It is fair to say VIAF is more about ‘authors’ than ‘researchers’, but most people that have written a thesis are in VIAF.
  • VIAF is released as linked data and any web service can re-use the data.
  • In the Web of linked data VIAF represents an extensive and trusted source of information about authors.
  • This is an example of a VIAF record. It contains the name variants of the author, as recorded in different national authority files. It shows the VIAD ID and the permalink which functions as a URI. It also contains many links to related information, namely the publications of the author – held by libraries across the world.
  • If you scroll down the page, the “About” tab gives you more personal information about the author and links to other identifiers: the ISNI and also to the Identity Page of the author.
  • Let us follow the link to the identity page of the author.
  • WorldCat Identities is an OCLC Research activity, which generates a human readable web page for every name.
  • The is the identity page for Orhan Pamuk. It contains all sorts of information and data about the author, his works and works about him: all data collected from the WorldCat records, via data mining techniques. If you scroll down the page...
  • You will see external links, as well. Like a link to the Wikipedia article about this author.
  • This is the English Wikipedia article about Orhan Pamuk.
  • And if you scroll down the page you will find the authority control link to VIAF. In another OCLC Research project we have developed a VIAF bot which automatically inserts the correct VIAF-link in Wikipedia article pages about authors. This example shows nicely the ongoing effort to populate the web of linked data and the role VIA plays in this. Recently, OCLC Research also added ISNIs to the English Wikipedia version – which brings me to the next identifier I will talk about: ISNI.
  • ISNI has a different background and a different status. It is an initiative of the many players in the distribution chain of creative works, with a strong representation of the rights holders associations. It is an ISO standard and part of a family of international standard identifiers that includes identifiers of works, recordings, products and right holders in all repertoires, e.g. DOI, ISAN, ISBN, ISRC, ISSN, ISTC, and ISWC. ISNI can be assigned to all individuals and organisations that create, perform, produce, manage, distribute or feature in creative content including natural, legal, or fictional identities.
  • ISNI is governed by the ISNI international agency and the ISNI database is operated by OCLC, which acts as the official assignment agency for ISNI. ISNI has followed a 2-tiered approach: Assigning ISNIs to names coming from controlled environments, like the library authority files; in this way the ISNI database is being populated with a large amount of existing and known data. This is giving ISNI its critical mass as a name identifier; Assigning ISNIs to new names via registration agencies.
  • The ISNI database is largely populated with VIAF data – that is the base file for ISNI. It is complemented with proprietary data from the rights holder databases, who are ISNI members.
  • The ISNIs are also assigned to identities from external research based sources, such as the British Library Theses, JISC Names (UK), Modern Languages Association, Proquest Theses, and Scholar Universe. More researchers’ rich data is being added from OCLC theses and ZETOC, among other sources.
  • This is the ISNI presentation page for Orhan Pamuk, with all the associated data. The yellow box on the left hand side shows how ISNI allows for crowd sourced additions/corrections.
  • And ISNI also allows individual registrations.
  • There is a close relationship between ORCID, VIAF and ISNI. They are different approaches and they are complementary systems. The scope of ISNI is much broader than ORCID, collecting information on all areas of creative activity. ORCID is an instance of ISNI: ISNI has allocated a range of identifiers for the exclusive use of ORCID and interoperation is in the course of development whereby the ISNI database will be consulted during ORCID registration. ISNI is focussed on ingesting and consolidating data from existing databases and establishing then diffusing reliable identifiers and links. ORCID is focussed on self registration so that researchers can submit their identifiers with their papers for publication. VIAF is a library utility and it is the base file for ISNI. There are plans to interlink ISNI and VIAF in such a way that both feed each other. So unlike many believe, there is no competition between the identifiers. For repositories it will depend on their workflows, whether they can better use ORCID, ISNI or VIAF. If their workflow is researcher-driven, they might include registration with ORCID or ISNI as a step in the workflow. If their workflow is library-driven, they might include registration in an authority file (and thereby in VIAF) as a step in their workflow.
  • What is the point of using researcher/author-ids in repositories? It is useful for the internal workflows, for the data flows between CRIS and repository systems, etc. But in how far are identifiers also useful for the discoverability of authors who deposit their publications in repositories?
  • We have been talking mostly about identifiers from the researcher perspective (CRIS/IR) and from the library perspective (authority control), but what about the end-user perspective? End-users on the Web usually make use of the internet search engines as a first entry point for their discovery. They will type in an author’s name or the name of a researcher. Will the search engine relate this name with the ORCID, ISNI or VIAF identifier?
  • In fact, the Internet search engines have published their own schema: schema.org. Schema.org is a joint effort by Google, Bing, Yandex and Yahoo! , to improve the web by creating a structured data markup schema supported by major search engines. On-page markup helps search engines understand the information on web pages and provide richer search results. So what does schema.org propose for person names? The entity Person has, as you can see many possible properties.
  • Scrolling down the page, one sees many more attributes including very personal ones and also very US-centric properties. The only identifiers listed are: the Duns: the Dun & Bradstreet DUNs number for identifying a business person the taxID, the fiscal id of the person and the vatID (value added tax). And the expected type for these elements is not even a URI, but Text.
  • Nevertheless, we can see how data encoded in schema.org is populating Google’s Knowledge Graph here: with typical schema.org elements like “siblings” and “awards”. OCLC Research is investing quite some effort in this area, trying to push library data upstream in the discovery layers of the Web. To that end, we are proposing extensions to Schema.org, via the W3C Schema Bib Extend Community Group .
  • In general, it will be increasingly important for repositories to be aware of the discovery layers where their end-users are searching for information. They will need to push their data to these layers. Identifiers and their corresponding URIs can play an important role in improving the quality of discovery.
  • I am grateful to Thom Hickey, Janifer Gatenby and Karen Smith- Yoshimura for their input and pointers to reports and previous presentations on this subject.

Virtual International Authority File (VIAF) and International Standard Name Identifier (ISNI) Presentation Transcript

  • 1. The world’s libraries. Connected. VIAF & ISNI Virtual International Authority File (VIAF) & International Standard Name Identifier (ISNI) by Titia van der Werf Tutorial Wednesday, 8 May, 2013 COAR 4th Annual Meeting, Istanbul
  • 2. The world’s libraries. Connected. Persistent identifiers: basics Identifier properties: • Unique => dependency = system/namespace • Persistent => dependency = organisational commitment Best practice: identifiers should be independent of location/addresses: • In the library: call number & shelf number • On the web: URN & URL
  • 3. The world’s libraries. Connected. Authority control And many more names… • Collective authors • Pseudonyms • Imaginary characters • Deities, saints, angels • Whales, horses, dinosaurs • Buildings • Ships, telescopes, space ships, missiles • Kings, Popes, Presidents • Cities, lakes, mountains
  • 4. The world’s libraries. Connected. Authority record = name identifier For each name, the cataloguer assigns a unique term (header) which is used consistently, uniquely, and unambiguously within the library catalogue to describe all references to that same name
  • 5. The world’s libraries. Connected. Example of an authority record in MARC
  • 6. The world’s libraries. Connected. National Authority File Programs Participating institutions contribute authority records according to a common set of standards and guidelines. • USA: NACO operated by LC • NL: NTA operated by OCLC • DE: Personennamendatei, PND (incl. in the Gemeinsame Normdatei) – operated by the DNB • FR: Authority Data for Persons – operated by BnF and idRef operated by ABES • etc.
  • 7. The world’s libraries. Connected. Issues when aggregating different authority files in one system In WorldCat: • duplication of authority control • differences in national/regional authority control practices => No unique and unambiguous referencing => Need to link up the different authority files
  • 8. The world’s libraries. Connected. VIAF What is VIAF? • Merge of 24+ national level authority files • Cooperative program run by OCLC with the VIAF Council • 29 million authority records • 112 million bibliographic records • Migrated from an OCLC Research project to an OCLC service in 2012
  • 9. The world’s libraries. Connected. VIAF Authorities in a connected world Beyond the catalogue Towards web-accessible identification of names
  • 10. The world’s libraries. Connected. From control in library systems to a trusted source in a world of linked data http://www.w3.org/DesignIssues/diagrams/lod/2010-color.png
  • 11. The world’s libraries. Connected.
  • 12. The world’s libraries. Connected.
  • 13. The world’s libraries. Connected.
  • 14. The world’s libraries. Connected. WorldCat Identities A web page for every name
  • 15. The world’s libraries. Connected.
  • 16. The world’s libraries. Connected.
  • 17. The world’s libraries. Connected.
  • 18. The world’s libraries. Connected.
  • 19. The world’s libraries. Connected. ISNI Founded in 2010 by: • International Confederation of Societies of Authors and Composers (CISAC), • The International Federation of Reproduction Rights Organisations (IFRRO), • International Performers Database Association (IPDA), • ProQuest, • OCLC, • The Conference of European National Librarians.
  • 20. The world’s libraries. Connected. ISNI • Administred and governed by: ISNI International Agency • Operated by: OCLC (official assignment agency) Approach: 1.Creating the initial ISNI database 2.Establishing a system of registration agencies
  • 21. The world’s libraries. Connected. Populating the ISNI database • Allocation of ISNIs to the vast legacy of identities already managed in separate data silos in different domains of activity throughout the information industry • Base file for ISNI = VIAF (public domain/library data) • Proprietary databases of ISNI-IA members
  • 22. The world’s libraries. Connected. Populating the ISNI database ISNI stats March 2013 • Total nr of records: 16.4M • Assigned records: 6.4M • Next file to be loaded: ZETOC with 97M bibl. Records
  • 23. The world’s libraries. Connected.
  • 24. The world’s libraries. Connected.
  • 25. The world’s libraries. Connected. ORCID – VIAF - ISNI • ORCID is an instance of ISNI: targeted to reseachers (ISNI also supports other types of creators) • The base source for ISNI is VIAF • VIAF is an effort on the background to clean-up and link-up all authority files of names maintained by libraries and to make this data available for re- use – eg. for researcher identification.
  • 26. The world’s libraries. Connected. Syndication / discovery
  • 27. The world’s libraries. Connected. Syndication/discovery • CRIS/IR and authority control perspectives are researcher/university and library perspectives - not end-user perspectives • How will users find researchers in the places where they search for persons: in Google and social media networks? • How can researcher-ids contribute to discovery on those environments?
  • 28. The world’s libraries. Connected.
  • 29. The world’s libraries. Connected.
  • 30. The world’s libraries. Connected.
  • 31. The world’s libraries. Connected. Syndication/discovery • Make sure upstream data sources are correct: VIAF feeds into FreeBase and Wikipedia. • Open up the data sources: data sources are mostly closed • Support semantic markup schemes used by Google, Facebook, etc. (schema.org; the open graph protocol...)
  • 32. The world’s libraries. Connected. Questions? Titia van der Werf titia.vanderwerf@oclc.org