Viaf and isni ifla 2013 08-16


Published on

VIAF and ISNI; interoperation; cluster level maintenance
Linked data

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Started with 1 million assigned – in November 2011 – mostly VIAF 3 + sources, now there are 2.7 million assigned ISNIs with non VIAF sources attached- More than 90% ISNIs with VIAF as a source. 39% of VIAF clusters have an ISNI (in 2011 was 10%)Explain this evolution (more ISNI sources and unique name assignment. And unique name assignment was possible because rich metadata…
  • This slide only shows assigned ISNIs for the different VIAF sources, giving the totals, assigned, percentage and the number of unique names.BnF example1067572 assigned (ca 62 % of “complete” authority records )Ingestion of ISNI in the BnF catalogue. Importance of persistent identifiers in the libraries metadata (ISNI is one of them).
  • This is a list of non VIAF ISNI sources and the numbers of links from those sources to VIAF clusters. The numbers of links from ISNI sources to VIAF sources is over 7.3 million.
  • ISNI’s scope overlaps but is not identical to VIAF’s scope. For persons, ISNI includes all VIAF (except sparse and undifferentiated records) plus includes many persons involved with music and research not present in VIAF.Also, unlike VIAF, ISNI includes private data that may be used for matching but not displayed or diffused publically. Such data includes dates of birth (actors in particular do not like their dates of birth publicized because it limits the parts that they are offered). Rights management associations are also not permitted to reveal the relationships between real persons and pseudonyms. Witness the recent case of JK Rowling publishing crime novels under a pseudonym and being irked that her cover was revealed by her Lawyers.
  • ISNI’s role is different from VIAF’s. ISNI creates a permanent ID and is required to keep the ID as stable as possible, and where it changes must diffuse corrections. ISNI diffuses cross domain – libraries, trade, rights management, professional socieities, edicuation.
  • ISNI includes and online request and maintenance capabilityImproved data quality and confidenceAnomaly reports – 7,000 date anomalies (>50% represent real errors)Merge, split and data error reports (c. 5,000)Matching improvementsDates, common surnames, longest name form, weightings, new elementsDetection of UNIMARC Conversion errorsparallel main names, name variant conversion, related names conversion, missed dataPseudonymsFeedback, record links (c. 70,000)More widely diffused linked dataProposal for inter-operation – joint notification, shared maintenance
  • Web interface for error reporting, enriching, detecting duplicates for data contributorsWeb interface for public Client for full maintenance including streamlined procedures* for Quality TeamNotifications to data contributorsData Sampling*Data Anomaly checks (dates, pseudonyms)*Fixes to incoming data (pre and post load)*Data enrichment to increase matching (Dewey)*
  • The image at the top of the screen is the ISNI record – where the 3 VIAF records have been merged into a record also including data from British Library Sound Archive and MusicBrainz.The image at the bottom shows 3 VIAF clusters for the same identity. As VIAF ingests ISNI as a VIAF source, ISNI merges will be adopted by VIAF.
  • VIAF’s policy is to prevent cluster merges where there would be 2 records from the same source in the cluster. ISNI’s policy is to assign an ISNI where there are 3 or more VIAF sources (except where there is a possible match indicated). ISNI has marked just under 500,000 VIAF clusters as possible duplicates.Thus ISNI makes duplicate assignment where there are 2 or more VIAF clusters for the same identity, each having more than 3 VIAF sources.ISNI also makes duplicate clusters where an ISNI source matches a VIAF cluster that is not assigned but another VIAF cluster is assigned.For a short period, where there were multiple clusters for the same identity, VIAF sources could move between clusters; this too brought along the risk of ISNI making duplicate assignments. This problem at VIAF has since been resolved.
  • Example of a VIAF cluster with multiple identities. The main identity is an English playwright. 4 sources have caused the incorrect result. Problem is that these incorrect assertions can multiply with libraries regarding each other as authoritative sources.It is important to make these corrections
  • A search of Amazon for Peter Nichols leads to a page where Peter Nicholls writing about Italy and the Vatican is clearly differentiated from Peter Richard Nichols, 1927, the playwright.
  • ISNI and VIAF differ in the treatment of pseudonyms. Some VIAF sources treat pseudonyms as related identities (as ISNI policy) and some treat them as name variants. ISNI, where it identifies a pseudonym as a name variant changes it to a related name.
  • A VIAF ISNI task force is proposed with, to begin with an agenda including a joint policy on pseudonyms, study of notification work flows; helping with cluster sampling and anomaly detection.
  • Acknowledgements to Pauline Chougnet, BnFThis is one proposed workflow as developed by Pauline Chougnet of the BnF. To be studied by the proposed joing VIAF / ISNi task force.
  • Why become an ISNI memberCross domain quality links – linking authors of theses, articles, books +++Importance in digitisation projectsFull coverage of ISNI for national identitiesBridge identifier – e.g. VIAF to ORCID via ISNIParticipation in QualityAccess to full databaseStatistics & quality reports
  • Viaf and isni ifla 2013 08-16

    1. 1. The world’s libraries. Connected. VIAF and ISNI Interoperability Janifer Gatenby EMEA Program Manager Metadata OCLC VIAF Council Meeting Singapore 2013-08-16
    2. 2. Libraries Text Rights Music Rights Trade Sources Encyclopaedias Researchers & Professional
    3. 3. The world’s libraries. Connected. Provisional: Unassigned 9,563,590 Provisional: Possible 580,738 Assigned 6.87 million Assigned ISNIs July 2013 2 + independent sources 2,730,631 3+ VIAF sources 656,976 Unique name 3,157,075 Single source (JISC names, BOEK, Ringgold) 296,417 Total 6,636,916 Assigned ISNIs to VIAF July 2013 2 + independent sources 2,496,141 3+ VIAF sources 656,976 Unique name 2,643,958 Single source 0 Total 5,797,075
    4. 4. The world’s libraries. Connected. Assigned bav bibsys bne bnf dbc dnb egaxa iccu jpg Assigned 181597 40634 221254 1067572 1065 1403508 14121 12736 47783 Total 263382 70347 415630 1715817 2263 3205940 34688 36554 181623 Percentage assigned 68,95 57,76 53,23 62,22 47,06 43,78 40,71 34,84 26,31 Unique 41653 15633 46685 213047 524 572443 107 7154 145 lac lc ndl nkc nla nli nszl nta nukat Assigned 245604 3530966 366043 358085 345220 293140 13075 1399918 828316 Total 509126 6973060 775719 520334 647341 440566 33673 2347967 1143046 Percentage assigned 48,24 50,64 47,19 68,82 53,33 66,54 38,83 59,62 72,47 Unique 64647 778749 226188 94632 27482 29882 25 314163 173921 ptbnp rero rsl selibr sudoc swnl vlacc wkp VIAF Assigned 116038 81155 293 96930 727610 22738 3647 261673 5797075 Total 286490 119523 586 157939 1002970 38228 5132 326347 14501337 Percentage assigned 40,50 67,90 50,00 61,37 72,55 59,48 71,06 80,18 39,97614 Unique 38100 4019 118 21861 182138 6553 49 16395 2643958
    5. 5. Links from Current Non- VIAF sources to VIAF clusters VIAF source links to ISNI = > 7.3 million
    6. 6. The world’s libraries. Connected. VIAF Scope • Persons • Organisations • Works / uniform titles • Expressions • Meetings • Geographic • All public data ISNI Scope • Persons • + musicians, researchers • Organisations • (excluding sparse) • (excluding undifferentiated) • Includes private data VIAF and ISNI are Complementary
    7. 7. The world’s libraries. Connected. VIAF Role • Ingest authority records from the world’s major national and research libraries • Make clusters • Expose and diffuse ISNI Role • Create permanent IDs • By batch • On demand • Diffuse those IDs • Libraries, trade, rights management, professiona l societies, education VIAF and ISNI are Complementary
    8. 8. The world’s libraries. Connected. VIAF System • Harvester • Clustering mechanism • Web site (5 interface languages) • Download in multiple formats • Linked data & SRU  1 million personal visitors p.a. ISNI System • Batch load • Online request API • Web site (English only) • Allows end user input • Member input and correction • 16+ indexes • SRU; soon linked data • Quality Team monitoring & correcting • Diffusion, including corrections VIAF and ISNI are Complementary
    9. 9. The world’s libraries. Connected. • Samples data regularly • c. 2% VIAF clusters have mixed identities • Duplicate clusters are higher, nearer 5% • Makes corrections at cluster level • Merges, splits, error notifications • Access to cataloguing client / macros • Makes system recommendations • Gives approval for single source assignment • Responds to End User input ISNI Quality Team
    10. 10. Example record fixed by QT • 3 VIAF records merged • ISNI sources British Library Sound Archive, MusicBrainz • notice instruments, performances
    11. 11. The world’s libraries. Connected. Another example of a merge in ISNI
    12. 12. The world’s libraries. Connected. • Cause duplicate ISNI assignment • Where both clusters have more than 3 VIAF sources • Where an ISNI source matches with a single or 2 source VIAF record • (Where VIAF sources move between the clusters) • ISNI as a VIAF source will help VIAF merge clusters where ISNI QT has manually merged them (2,444) • ISNI has flagged 481,766 VIAF records as possible duplicates Duplicate clusters
    13. 13. Titles of other identities Vocabulaire anglais-français, français- anglais, de terminologie économique et juridique Il Piemonte visto da un inglese Italia, Italia The politics of the Vatican The Pope's divisions : the Roman Catholic Church today La notte comincia ancora una volta
    14. 14. The world’s libraries. Connected. Amazon is differentiating
    15. 15. The world’s libraries. Connected. • Signalled by ISNI Quality Team • Most cases encountered now are due to source data • ISNI QT would like to notify VIAF sources directly • VIAF is currently notified by a field in the ISNI record; notifications indicate if a cluster error or a source error Undifferentiated Identities
    16. 16. The world’s libraries. Connected. • ISNI is assigned to public identities. Pseudonyms = different identities; but related • VIAF sources- some treat as name variants, some as related names • ISNI suite of programs • Converts pseudonym name variants to related names • Flags records with dissimilar main names • Links and protects Public Identities versus Persons
    17. 17. The world’s libraries. Connected. • Policy on pseudonyms • Study notification work flows • How to remove record protect flags • Participate in cluster sampling in VIAF and ISNI • Help define new anomaly detectors • ISNI has dissimilar main name / publishing before age 9, life span greater than 120 years VIAF ISNI Task Force
    18. 18. NUKAT 99036027 record for Thomas Meier (1953-) -2 erroneous titles-VIAF cluster 267789223 ISNI 0000 0003 9867 7425 Thomas Meier (1953-) ISNI 0000 0004 0034 1112 Thomas Meier 1966- • ISNI QT to delete the titles from the ISNI record • Notification to VIAF and directly to the contributor (977) NUKAT VIAF cluster 267789223 NUKAT deletes the 2 titles and creates a new authority record for Thomas Meier 1966 This new NUKAT record matches with the VIAF cluster for Thomas Meier 1966- on the 2 titles added by ISNI VIAF cluster 12431062 (with ISNI in the cluster) VIAF cluster 12431062 Actions by ISNI • ISNI QT adds the 2 titles to this ISNI record • Notification to VIAF (977) New NUCAT authority record for Thomas Meier 1966 Without the 2 titles for Thomas Meier Actions by ISNI
    19. 19. The world’s libraries. Connected. • Flag undifferentiated records • e.g. those generated by programs comparing authority and bibliographic name strings • Respond to ISNI notifications • correct “home” data VIAF sources As ISNI Members: • Control “own identities” in VIAF and ISNI * Check possible matches and suspect records on ISNI • Use ISNI for direct maintenance of clusters * Will generate notifications to VIAF and VIAF sources
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.