Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
cross-domain bridging-domains 
Libraries 
Text Rights 
Trade Sources Music Rights 
Encyclopaedias 
Researchers & Professio...
ISNI Behind the scenes 
• ISNI’s CBS software 
• Performance 
• Searching 
• SRU enquiry API 
• Indexes 
• Linked data 
• ...
CBS 
Centraal 
Bibliotheek 
Systeem 
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-1...
Tailoring and building 
• Loading 
• Export 
• Matching & merging 
• Web OPAC 
• Web cataloguing 
• Cataloguing client 
• ...
Enquiries Dutch Union Catalogue 
ISNI 14,000 per day cf GGC 50 per second 
ISNI ‘s CBS software 
Searching 
Update 
Utilit...
• 
ISNI Data Definitions 
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
http://www.isni.org/filedepot_download/140/3...
Searching indexes 
ISNI ‘s CBS software 
Examples 
your seedocument 
code and > update date December 2013 
Cn: ams & upd: ...
Browse ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11-18
Search by SRU API 
See Document: 
ISNI SRU search API guidelines.doc 
Example search by name keyword (pica.nw): 
ISNI ‘s C...
SRU API Enquiry Response ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11-18
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
SRU API Enquiry Response 
Harvard University Library 2014-11-18
Member View – see all data except private data ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Libr...
Member view – additional data displayed 
(if not private) 
• Nationality 
• Gender 
• Keyword or key phrase 
• Dewey class...
Private Data 
• Dates 
• Personal Affiliations 
• Titles of works 
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Rig...
Fully Public sources in Green 
GENERAL SOURCES 
Bowker Books in Print BOWKER 
ISNI (Generated, adopted, made by QT) ISNI 
...
ISNI as linked data ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11-18
Linked data ontologies 
ISNI ‘s CBS software 
Searching 
• @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
•...
Documents 
relating to 
enquiry 
ISNI search guidelines 
ISNI SRU search API guidelines 
ISNI SRU search API guidelines - ...
Scalable Quality Ecosystem 
ISNI Database 
Harvested, Batch loaded; Online contributions 
Algorithms 
Notifications 
Data ...
Assigned 
8.69 million 
Provisional: Possible 
700,815 
Provisional: Unassigned 
9,287, 278 
Assigned ISNIs November 2014 ...
Confidence 
The two main problems for maintaining persistence are 
• duplicates needing to be merged 
• undifferentiated i...
ISNI Assignment: Batch loading 
Independent matching 
sources 
ISNI ‘s CBS software 
3 VIAF sources 
Searching 
Update 
Ut...
ISNI Assignment: Batch loading 
Unique name 
Single source 
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard Un...
ISNI Matching 
Name 
Title 
Partial title 
Rare title word 
Date 
Publisher 
Personal affiliation 
Organisation affiliatio...
Building similarity vectors 
Trying alternatives 
to traditional rules 
based matching 
Working with Article 
First data 
...
VIAF import 
• VIAF issues a full file monthly 
• Compare with previous full file 
• Deletes / additions / Sources changed...
Documentation: Data Submission 
Documents relating to data 
submission 
ISNI tab delimited format 
ISNI tab delimited form...
Procedures for maximizing assignment 
• Refinement of matching algorithms 
• E.g. introduced rare title word; 
• Now ignor...
Online: Guarantee assignment – Personal Name 
ISNIs will be automatically assigned where there are no possible 
matches in...
Online: Guarantee assignment – Organisation Name 
ISNIs will be automatically assigned where there are no possible 
matche...
Maximizing assignment 
ISNI ‘s CBS software 
Searching 
 Enter a request record online (Web page or via API) 
 Batch loa...
Resolving Possible Matches ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11-18
Compare Screen ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11-18
Adding a new record – Michel Calame ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11...
Adding a new record ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2014-11-18
Adding your source to an existing record 
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard University Library 2...
Atom Pub API (Machine to machine) 
• Requests and replacements (you can replace your existing data citing local identifier...
WinIBW – QT Tool 
• Sees whole records 
• Can edit and delete all data 
• Can force merge 
• Macros with VB scripts 
• Dow...
Hunting anomalies 
ISNI ‘s CBS software 
• Strange source combinations 
• Lived > 120 years; published before 10 years 
• ...
Utilities 
• Re-import a set of records 
• Delete source from a set of records 
• Pseudonym fix 
• Move from name variant ...
End User Note 
ISNI ‘s CBS software 
Dear Sir / Madam, The ISNI 0000000117488848 refers to "Marco Antonio 
Casanova", Prof...
Reports and Notifications 
• Bulk reports 
• Basic 
• Enriched 
• Notifications 
• Ad hoc reports 
• Report generator 
• W...
Statistics 
Basic statistics 
Cross matches 
VIAF matches 
ISNI ‘s CBS software 
Searching 
Update 
Utilities 
Harvard Uni...
Upcoming SlideShare
Loading in …5
×

Isni behind the scenes gatenby nadav manes harvard 201411

674 views

Published on

ISNI behind the scenes. Presentation at Harvard University Library 2014-11-18. Covers International Standard Name Identifier, ISO 20779. CBS software features, searching the database, how the database is updated and database utilities

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Isni behind the scenes gatenby nadav manes harvard 201411

  1. 1. cross-domain bridging-domains Libraries Text Rights Trade Sources Music Rights Encyclopaedias Researchers & Professional Granting organisations Professional Societies Article databases Theses databases Archives and Museums Harvard University Library 2014-11-18
  2. 2. ISNI Behind the scenes • ISNI’s CBS software • Performance • Searching • SRU enquiry API • Indexes • Linked data • Updating • Batch load • Matching • VIAF Update • Web Cat • AtomPub • WinIBW • Utilities • Hunting anomalies • Reports and statistics • QT / End user interface ISNI at Harvard 18 November 2014 Janifer Gatenby Boaz Nadav Manes OCLC EMEA Harvard University Library 2014-11-18
  3. 3. CBS Centraal Bibliotheek Systeem ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  4. 4. Tailoring and building • Loading • Export • Matching & merging • Web OPAC • Web cataloguing • Cataloguing client • SRU • SRU update (Atom Pub) • Hosting infrastructure ISNI ‘s CBS software • Data definition (based on VIAF Searching MARC) • Input formats (tab and XML) • Indexes • Matching (based on VIAF) • Public / Private data mix • Statistics • QT / end user interoperability • Reports • Fix jobs (pseudonyms, reports ++) Update Utilities Harvard University Library 2014-11-18
  5. 5. Enquiries Dutch Union Catalogue ISNI 14,000 per day cf GGC 50 per second ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  6. 6. • ISNI Data Definitions ISNI ‘s CBS software Searching Update Utilities http://www.isni.org/filedepot_download/140/390 Harvard University Library 2014-11-18 Name Use Organisation legalName acronymn nickname assignedName transliteratedName disused name commonForm (default) Name Use Person Public public and private private fictional character Unknown
  7. 7. Searching indexes ISNI ‘s CBS software Examples your seedocument code and > update date December 2013 Cn: ams & upd: > 201312 Your code and another’s code Cn: jnam & cn: proq Name Keyword not your code Nw: trobe not cn: auvlu Almost anything can be indexed Also available by SRU API See document ISNI search guidelines.doc http://www.isni.org/content/documents-related-database-enquiry Searching Update Utilities Harvard University Library 2014-11-18
  8. 8. Browse ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  9. 9. Search by SRU API See Document: ISNI SRU search API guidelines.doc Example search by name keyword (pica.nw): ISNI ‘s CBS software http://isni.oclc.nl/sru/?query=pica.nw+%3D+%22maloy%2Brebecca%22 &operation=searchRetrieve&recordSchema=isni-b This search is for the any records containing both “Rebecca” and “Maloy” in the name Response in XML enquiry response schema. ISNI enquiry response v2.xsd Searching Update Utilities Harvard University Library 2014-11-18
  10. 10. SRU API Enquiry Response ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  11. 11. ISNI ‘s CBS software Searching Update Utilities SRU API Enquiry Response Harvard University Library 2014-11-18
  12. 12. Member View – see all data except private data ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  13. 13. Member view – additional data displayed (if not private) • Nationality • Gender • Keyword or key phrase • Dewey classification • Publisher • Dates active • Associated countries • Provisional records • Including links to possible matches, if applicable ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  14. 14. Private Data • Dates • Personal Affiliations • Titles of works ISNI ‘s CBS software Searching Update Utilities Rights management Societies may not reveal data. Their legal contracts do not allow it. Though they make the data available for matching. Trade organisations may choose not to reveal their titles supplied to retain commercial advantage. Private data hides behind publicly available data. 14 of 30 sources are fully public and more than 90% of records contain public sources. Harvard University Library 2014-11-18
  15. 15. Fully Public sources in Green GENERAL SOURCES Bowker Books in Print BOWKER ISNI (Generated, adopted, made by QT) ISNI The European Library (48 national libraries) TEL VIAF (33 libraries) VIAF RIGHTS MANAGEMENT Access Copyright, Canada ACCE Authors’ Guild AGLD Authors’ Licensing and Collecting Society, UK ALCS Centrum Dienstverlening Auteurs- en aanverwante Rechten, Netherlands CEDA Centro Español de Derechos Reprográficos CEDR Irish Copyright Licensing Agency ICLA Prolitteris, Switzerland PROL VG WORT, Germany VGWO MUSIC American Musicological Society AMS British Library Sound Archive BLSA International Performers’ Database Association IPDA MusicBrainz MUBZ RESEARCHERS AND PROFESSIONALS American Musicological Society AMS British Library Theses BRTH Digital Author identifier, Netherlands DAI Jisc Names Project, UK JNAM La Trobe University AU:VLU Modern Languages Association MLA OCLC Theses OCLCT ORCID and DataCite Interoperability Network ODIN AuthorClaim and RePec OPENL Proquest Theses PROQ Scholar Universe, Proquest SCHU Electronic tables of content ZETO ORGANISATIONS Boekenbank, Belgium BOEK Bowker Publishers BOWP Publishers Licensing Society, UK PLS Ringgold RING ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  16. 16. ISNI as linked data ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  17. 17. Linked data ontologies ISNI ‘s CBS software Searching • @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . • @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . • @prefix owl: <http://www.w3.org/2002/07/owl#> . • @prefix skos: <http://www.w3.org/2004/02/skos/core#> . • @prefix isni: <http://isni.org/ontology#> . • @prefix rdaGr2: <http://rdvocab.info/ElementsGr2/> . • @prefix dc: <http:/purl.org/dc/elements/1.1/> . • @prefix foaf: <http://xmlns.com/foaf/0.1/> . • @prefix dcterms: <http://purl.org/dc/terms/> Update Utilities Harvard University Library 2014-11-18
  18. 18. Documents relating to enquiry ISNI search guidelines ISNI SRU search API guidelines ISNI SRU search API guidelines - public.doc ISNI XML enquiry response schema ISNI Access Comparison Public and Member Getting started with PSI queries ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  19. 19. Scalable Quality Ecosystem ISNI Database Harvested, Batch loaded; Online contributions Algorithms Notifications Data fixing Sampling Data Policy Enrichment Correction Curation Crowd sourcing Data contributors ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  20. 20. Assigned 8.69 million Provisional: Possible 700,815 Provisional: Unassigned 9,287, 278 Assigned ISNIs November 2014 VIAF + non VIAF sources 4,870,099 3+ VIAF sources 428,988 2+ sources (not VIAF) 315,915 Unique name 2,735,449 Trusted single source (JISC, BOEK, RING) 342,231 Total 8,692,683 Authoritative, Unique, Trustful, Persistent 8.24 million persons 446,258 organisations + % confidence - % confidence ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  21. 21. Confidence The two main problems for maintaining persistence are • duplicates needing to be merged • undifferentiated identities needing to be split ISNI ‘s CBS software ISNI errs on the side of making duplicates rather than mixed identities Thus the batch load process (usually) makes a provisional record • where there is no match (for fear of making a duplicate assignment) • where there is a low confidence match (for fear of making a mixed identity or a duplicate assignment) • where a matching record already has another local ID for the same source, regardless of the strength of the match (for fear of making a mixed identity) Searching Update Utilities Harvard University Library 2014-11-18
  22. 22. ISNI Assignment: Batch loading Independent matching sources ISNI ‘s CBS software 3 VIAF sources Searching Update Utilities Harvard University Library 2014-11-18
  23. 23. ISNI Assignment: Batch loading Unique name Single source ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  24. 24. ISNI Matching Name Title Partial title Rare title word Date Publisher Personal affiliation Organisation affiliation ISBN, ISWC, ISAN, DOI + Other name identifier e.g. IPI, VIAF, IPD Instrument Linked entities Dewey classification ISNI ‘s CBS software Searching Update Utilities Scores are collected from each judge Overall score computed; lowered where • common surnames • common titles • if not much on which to match Score > .85 = match Score >.6 but <.85 = possible match Harvard University Library 2014-11-18
  25. 25. Building similarity vectors Trying alternatives to traditional rules based matching Working with Article First data Will load high confidence data to ISNI with traditional matching ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  26. 26. VIAF import • VIAF issues a full file monthly • Compare with previous full file • Deletes / additions / Sources changed / contents changed • Deletes • delete only if not assigned • Remove VIAF and mark for re-import • If VIAF only source, change source to ISNI • Cluster movement reports ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  27. 27. Documentation: Data Submission Documents relating to data submission ISNI tab delimited format ISNI tab delimited format organisations ISNI data element values ISNI XML request schema ISNI XML request schema document ISNI Atom Pub interactive request requirements ISNI Data contributors usage guidelines ISNI database source profiles RAG information ISNI bulk load submission Documents relating to data submission output ISNI ‘s CBS software ISNI XML response schema ISNI XML response schema document ISNI XML notification schema bulk load assigned ISNIs.xsd bulk load ISNI not assigned.xsd bulk load too many matches.xsd ISNI Data contributors reports and notifications guidelines Searching Update Utilities Harvard University Library 2014-11-18
  28. 28. Procedures for maximizing assignment • Refinement of matching algorithms • E.g. introduced rare title word; • Now ignoring date of birth 1900 • Re-import program • Rematch with new rules • Rematch after new data added • ISNI Quality Team: Data sampling • assessing impact of single source • Recommendations for program changes • New criteria • Assessing uncommon surname assignment • Rules for online rich assignment ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  29. 29. Online: Guarantee assignment – Personal Name ISNIs will be automatically assigned where there are no possible matches in these cases:  There are matches with a database record with a different source  A personal name is unique and includes a surname and forename  The request includes an “isNot” statement  The metadata supplied is considered rich as per these cases: • Full date of birth and death supplied • Year of birth + 1 title or instrument+ 1 related name (co-author or affiliated institution) • 1 title or instrument + 1 external URL link of type encyclopaedia, home page (not social network page) + 1 related name (co-author or affiliated institution)  The request is resolving a possible match by including a PPN ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  30. 30. Online: Guarantee assignment – Organisation Name ISNIs will be automatically assigned where there are no possible matches in these cases:  There are matches with a database record with a different source  An organisation name is unique and does not consist only of abbreviations  The metadata supplied is considered rich as per these cases: • Includes LOCODE & • Organisation type & • Organisation URL  The request is resolving a possible match by including a PPN ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  31. 31. Maximizing assignment ISNI ‘s CBS software Searching  Enter a request record online (Web page or via API)  Batch loaded records – passive method Update Utilities May 2012 % assigned Oct 2014 % assigned • Quality Team manual fixes • OCLC periodic re-match runs • Matches from later batch loading & online activity ALCS 41,523 63.86% 49,157 76.66% PROL 2,205 35.24% 4,143 66.18% PROQ 65,122 12.89% 243,481 48.19%  Batch loaded records – active method • Resolve possible matches found by the system • Search the database for candidate records for merging • Enrich a record with URLs to external sources such as author’s web pages, Wikipedia, IMDB, MusicBrainz, Discogs, etc. May 2012 % assigned Oct 2014 % assigned AUVLU 0 0% 1,716 48.28% ICLA 0 0% 2,208 97.61% Harvard University Library 2014-11-18
  32. 32. Resolving Possible Matches ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  33. 33. Compare Screen ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  34. 34. Adding a new record – Michel Calame ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  35. 35. Adding a new record ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  36. 36. Adding your source to an existing record ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  37. 37. Atom Pub API (Machine to machine) • Requests and replacements (you can replace your existing data citing local identifier) • Request • Atom Pub Header • Content = Request in the ISNI XML Request schema • Documentation • ISNI Atom Pub API guidlines.doc • ISNI request.xsd (XML schema) • ISNI request schema.doc (describes the schema) • ISNI response.xsd (XML schema) • ISNI response schema.doc (describes the schema) ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  38. 38. WinIBW – QT Tool • Sees whole records • Can edit and delete all data • Can force merge • Macros with VB scripts • Download records or selected fields • E.g. identify a set of records for re-import ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  39. 39. Hunting anomalies ISNI ‘s CBS software • Strange source combinations • Lived > 120 years; published before 10 years • Mismatching main names • Browse index and Hitrange • DOB 1900- • Theses & dead < 1950 • Matching failures, e.g. TEL, Bowker, VIAF Searching Update Utilities
  40. 40. Utilities • Re-import a set of records • Delete source from a set of records • Pseudonym fix • Move from name variant to related name • Link related name to other record • Generate other record if it doesn’t exist ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18
  41. 41. End User Note ISNI ‘s CBS software Dear Sir / Madam, The ISNI 0000000117488848 refers to "Marco Antonio Casanova", Professor at the Catholic University of Rio de Janeiro. I am not the author of "Fragmentos póstumos. - Nietzsche uma introdução filosófica" or "Segunda consideração intempestiva da utilidade e desvantagem da história para a vida". The author of these works is "Marco Antonio dos Santos Casa Nova". You may confirm this information by consulting our CVs at the Brazilian Research Council: Marco Antonio Casanova (me): http://lattes.cnpq.br/0400232298849115 Marco Antonio dos Santos Casa Nova (the other author): http://lattes.cnpq.br/3409704326617178 Searching Update Utilities Harvard University Library 2014-11-18
  42. 42. Reports and Notifications • Bulk reports • Basic • Enriched • Notifications • Ad hoc reports • Report generator • WinIBW download • Statistics ISNI ‘s CBS software See document ISNI Data contributors reports and notifications guidelines.doc http://www.isni.org/content/documents-related-data-submission-output Searching Update Utilities Harvard University Library 2014-11-18
  43. 43. Statistics Basic statistics Cross matches VIAF matches ISNI ‘s CBS software Searching Update Utilities Harvard University Library 2014-11-18

×