www.in2n.de
IN2N: Cross-institutional Authority
Collaboration
Alexander Haffner (DNB)
The IN2N Project
 research project, executed in cooperation with:
 the German National Library and
 the German Film Institute
 duration
 December 2012 - December 2014
 financially supported by the German Research Foundation
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Authority Collaboration in the German-speaking
Library Community
 collaborative maintaining and linking authority data are essential
components of descriptive and subject cataloguing
 Integrated Authority File (Gemeinsame Normdatei, GND)
 more than 10 million authority entries
 describing persons, corporate bodies, conferences and events, places or
geographic names, topics, and works
 aligned to VIAF, German Wikipedia etc.
 accessible as Linked Open Data
 BUT
 data exchange based on harvesting strategies
 data model and data formats are very library specific
 non-library organizations are almost excluded
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Cross-institutional Authority Collaboration
 assumption: authority data from libraries can support the organization of
data from other major players
 arising questions from the library perspective:
1. Are there stakeholders that do the same work as libraries, and maybe even
better?
2. How can the work be shared?
3. What collaboration models have to be established for partners from new
domains to be able to participate in the authority maintenance process of
libraries?
4. Are we already fulfilling all the technical and organizational requirements for
successful collaboration?
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
IN2N Objectives
 initial alignment and linking of the existing authority entries for persons in
filmportal.de (180.000) and GND (2.9 million)
 establishment of an organizational and technical web-based
infrastructure for data exchange based on differentiating
 storage systems,
 data formats, and
 data models
 development of a generalized collaboration model for working with
further non-library cooperation partners
 use of Linked Open Data
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Major Phases for an Active Collaboration
1. initial data match between the partners„ data set and the GND data, and a
succeeding bi-directional data import from information missed in the
respective data stock
2. cataloging routine via a web interface to perform GND queries in real-time
and to update GND entries by transmitting differences to the currently
stored data entry
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 1: Initial Match&Merge
Initial Match
Module
DNB DIF
GND
film-
portal
Intellectual
Consolidation
GND
Merge
Module
filmportal
Merge
Module
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 1: Initial Match
 match characteristics
 entityType, name, dateOfBirth, dateOfDeath, gender, placeOfBirth,
placeOfDeath, occupation
 match results can be divided into:
1. Exact equivalence between two persons,
2. Potential equivalences between persons, or
3. No equivalence to the corresponding dataset.
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
3
1 2
identify criteria for
class membership
Phase 1: Improvement of the Match Process
 match process evaluation
 iterative configuration of the match algorithm
 enriching the GND dataset with third party information
 German Wikipedia as discovery aid
 comprises 250.000 GND references
 provides person's filmographic information
 executed an equivalence check between filmportal.de information and
Wikipedia‟s person templates as well as article texts
 discovered more than 10,000 GND matches
 Culturegraph‟s Metafacture Framework
http://github.com/culturegraph/metafacture-core/wiki
 powerful tool suite for metadata processing
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 1: Intellectual
Consolidation
 easy to handle user interface
to make quick equivalence
decisions
 Web UI with person‟s main
characteristics and match
score assignments
 links for further research
 i.e. filmportal.de, GND,
Wikipedia, VIAF
 re-use of EntityFacts
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 1: Initial Data Import
 merge characteristics complementing the match characteristics
 titleOfNobility, academicDegree, periodOfActivity, affiliation,
geographicAreaCode, biographicalOrHistoricalInformation, homepage,
contributedWork, externalIdentifier
 partners have different needs with regard to the data ingest
 i. e. deviations in cataloging rules, controlled values
 consequence: institutional responsibility for data ingest
 DNB as responsible party for the GND has to define access restrictions
 it is allowed to overwrite a date of birth but not to delete an existing one
 if place of birth contains a link to a geographic entry it is not allowed to
replace it by a literal information
 …
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 2: Cataloging Routine via the Web
 goal
 lower the threshold for the implementation of non-library editorial systems
accessing the GND as their authority reference system
 providing a simple and efficient GND search as well as update interface
 data format
 limited data set (approx. 25 elements for persons)
 self-explanatory element names
 property-based search functionality on GND data
 use of widely applied standards
 updates without knowledge of the complete corresponding GND record
 incremental approach vs. record based approach
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 2: Use Case
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
filmportaleditorialsystem
GNDdatastock
remote search via the person‘s name
partial data ingest into local database
data adaptation by the editor
transmission of changes
local person search by
editor without success
result set transmission
editor selects entry from result set
Phase 2: Applicable Data Formats
EAC-CPF/XML
GND-MARC-
Format
BIBFRAME
for Authorities
RDA
Vocabularies
GND/RDF
?
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Phase 2: Update Interface on Property Level
 REST-Interface with a JSON transmission format
 operations
 add, change, and delete
PUT uri=”http://d-nb.info/gnd/129952788”
add name(name.forename=”Wolke A.”; name.surname=”Hegenbarth”)
add dateOfBirth(dateOfBirth.year=“1980”; dateOfBirth.month=”05”;
dateOfBirth.day=”06”)
add placeOfBirth=”Meerbusch, Deutschland”
PUT uri=”http://d-nb.info/gnd/129952788”
change
name(name.forename=”Wolke A.”; name.surname=”Hegenbarth”)
name(name.forename=”Wolke Alma”; name.surname=”Hegenbarth”)
change
placeOfBirth=”Meerbusch, Deutschland”
placeOfBirthUri=”http://d-nb.info/gnd/2029013-5”
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Cross-Dataset Search
 Culturegraph.org
 acts as datahub for searching and browsing
 analyzes major bibliographic catalogs and crosslinking data to make
equivalences and relationships available
 authority data as access points
 IN2N will support the platform in order to benefit from its developments
 providing authority data, bibliographic and filmographic data
 re-use of Culturegraph‟s REST interface for search
 dynamically result integration into local catalog‟s representation
 usability evaluation
 find the right balance between local and remote information
 increasing the user‟s search success
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Timeline
 2013
 implementation of match environment
 implementation of property-based update interface
 GND Change Notifier
 1st Quarter 2014
 Web-UI for intellectual consolidation
 initial startup of the extended filmportal.de editorial system
 RDF representation for entries from filmportal.de
 3rd Quarter 2014
 cross-dataset search
 acquiring additional partners
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Conclusion
 cross-institutional collaboration on independent and different database
systems and data formats is possible
 inconsistent data models can cause substantial problems
 customization sometimes is necessary
 highly granular data supports the collaboration
 decisions to be made by libraries
 How valuable is “library rules compliant data”?
 Are we prepared to compromise?
 Are we already open-minded enough to let external partners touch our data?
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
Vielen Dank!
discussion is welcome…
www.in2n.de
Alexander Haffner
German National Library
a.haffner@dnb.de
IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013

IN2N: Cross-institutional Authority Collaboration

  • 1.
  • 2.
    The IN2N Project research project, executed in cooperation with:  the German National Library and  the German Film Institute  duration  December 2012 - December 2014  financially supported by the German Research Foundation IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 3.
    Authority Collaboration inthe German-speaking Library Community  collaborative maintaining and linking authority data are essential components of descriptive and subject cataloguing  Integrated Authority File (Gemeinsame Normdatei, GND)  more than 10 million authority entries  describing persons, corporate bodies, conferences and events, places or geographic names, topics, and works  aligned to VIAF, German Wikipedia etc.  accessible as Linked Open Data  BUT  data exchange based on harvesting strategies  data model and data formats are very library specific  non-library organizations are almost excluded IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 4.
    Cross-institutional Authority Collaboration assumption: authority data from libraries can support the organization of data from other major players  arising questions from the library perspective: 1. Are there stakeholders that do the same work as libraries, and maybe even better? 2. How can the work be shared? 3. What collaboration models have to be established for partners from new domains to be able to participate in the authority maintenance process of libraries? 4. Are we already fulfilling all the technical and organizational requirements for successful collaboration? IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 5.
    IN2N Objectives  initialalignment and linking of the existing authority entries for persons in filmportal.de (180.000) and GND (2.9 million)  establishment of an organizational and technical web-based infrastructure for data exchange based on differentiating  storage systems,  data formats, and  data models  development of a generalized collaboration model for working with further non-library cooperation partners  use of Linked Open Data IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 6.
    Major Phases foran Active Collaboration 1. initial data match between the partners„ data set and the GND data, and a succeeding bi-directional data import from information missed in the respective data stock 2. cataloging routine via a web interface to perform GND queries in real-time and to update GND entries by transmitting differences to the currently stored data entry IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 7.
    Phase 1: InitialMatch&Merge Initial Match Module DNB DIF GND film- portal Intellectual Consolidation GND Merge Module filmportal Merge Module IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 8.
    Phase 1: InitialMatch  match characteristics  entityType, name, dateOfBirth, dateOfDeath, gender, placeOfBirth, placeOfDeath, occupation  match results can be divided into: 1. Exact equivalence between two persons, 2. Potential equivalences between persons, or 3. No equivalence to the corresponding dataset. IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013 3 1 2 identify criteria for class membership
  • 9.
    Phase 1: Improvementof the Match Process  match process evaluation  iterative configuration of the match algorithm  enriching the GND dataset with third party information  German Wikipedia as discovery aid  comprises 250.000 GND references  provides person's filmographic information  executed an equivalence check between filmportal.de information and Wikipedia‟s person templates as well as article texts  discovered more than 10,000 GND matches  Culturegraph‟s Metafacture Framework http://github.com/culturegraph/metafacture-core/wiki  powerful tool suite for metadata processing IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 10.
    Phase 1: Intellectual Consolidation easy to handle user interface to make quick equivalence decisions  Web UI with person‟s main characteristics and match score assignments  links for further research  i.e. filmportal.de, GND, Wikipedia, VIAF  re-use of EntityFacts IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 11.
    Phase 1: InitialData Import  merge characteristics complementing the match characteristics  titleOfNobility, academicDegree, periodOfActivity, affiliation, geographicAreaCode, biographicalOrHistoricalInformation, homepage, contributedWork, externalIdentifier  partners have different needs with regard to the data ingest  i. e. deviations in cataloging rules, controlled values  consequence: institutional responsibility for data ingest  DNB as responsible party for the GND has to define access restrictions  it is allowed to overwrite a date of birth but not to delete an existing one  if place of birth contains a link to a geographic entry it is not allowed to replace it by a literal information  … IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 12.
    Phase 2: CatalogingRoutine via the Web  goal  lower the threshold for the implementation of non-library editorial systems accessing the GND as their authority reference system  providing a simple and efficient GND search as well as update interface  data format  limited data set (approx. 25 elements for persons)  self-explanatory element names  property-based search functionality on GND data  use of widely applied standards  updates without knowledge of the complete corresponding GND record  incremental approach vs. record based approach IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 13.
    Phase 2: UseCase IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013 filmportaleditorialsystem GNDdatastock remote search via the person‘s name partial data ingest into local database data adaptation by the editor transmission of changes local person search by editor without success result set transmission editor selects entry from result set
  • 14.
    Phase 2: ApplicableData Formats EAC-CPF/XML GND-MARC- Format BIBFRAME for Authorities RDA Vocabularies GND/RDF ? IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 15.
    Phase 2: UpdateInterface on Property Level  REST-Interface with a JSON transmission format  operations  add, change, and delete PUT uri=”http://d-nb.info/gnd/129952788” add name(name.forename=”Wolke A.”; name.surname=”Hegenbarth”) add dateOfBirth(dateOfBirth.year=“1980”; dateOfBirth.month=”05”; dateOfBirth.day=”06”) add placeOfBirth=”Meerbusch, Deutschland” PUT uri=”http://d-nb.info/gnd/129952788” change name(name.forename=”Wolke A.”; name.surname=”Hegenbarth”) name(name.forename=”Wolke Alma”; name.surname=”Hegenbarth”) change placeOfBirth=”Meerbusch, Deutschland” placeOfBirthUri=”http://d-nb.info/gnd/2029013-5” IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 16.
    Cross-Dataset Search  Culturegraph.org acts as datahub for searching and browsing  analyzes major bibliographic catalogs and crosslinking data to make equivalences and relationships available  authority data as access points  IN2N will support the platform in order to benefit from its developments  providing authority data, bibliographic and filmographic data  re-use of Culturegraph‟s REST interface for search  dynamically result integration into local catalog‟s representation  usability evaluation  find the right balance between local and remote information  increasing the user‟s search success IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 17.
    Timeline  2013  implementationof match environment  implementation of property-based update interface  GND Change Notifier  1st Quarter 2014  Web-UI for intellectual consolidation  initial startup of the extended filmportal.de editorial system  RDF representation for entries from filmportal.de  3rd Quarter 2014  cross-dataset search  acquiring additional partners IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 18.
    Conclusion  cross-institutional collaborationon independent and different database systems and data formats is possible  inconsistent data models can cause substantial problems  customization sometimes is necessary  highly granular data supports the collaboration  decisions to be made by libraries  How valuable is “library rules compliant data”?  Are we prepared to compromise?  Are we already open-minded enough to let external partners touch our data? IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013
  • 19.
    Vielen Dank! discussion iswelcome… www.in2n.de Alexander Haffner German National Library a.haffner@dnb.de IN2N: Cross-institutional Authority Collaboration | DC-2013 | 4. September 2013