Improving Library Services with
Semantic Web Technology
 - in the realm of Repository Systems
Dr. Timo Borst

Head of IT Development
German National Library for Economics /
Leibniz-Information Centre Economics
Kiel/Hamburg, Germany

ICDK 2011
14th – 16th February, Gurgaon/India

                                          Die ZBW ist Mitglied der Leibniz-Gemeinschaft
Overview
1. Current situation: Distributed (meta-)data management in library
   applications

2. Popular approaches towards aggregation and homogeneity of
   metadata

3. Our approach: Integration and aggregation of authority values
   with Semantic Web technology
         a) General idea
         b) Use case: Indexing
         c) Use case: Retrieving

4. “Lightweight” integration into existing repository systems and
   service providers

5. Conclusion

                                                                    Seite 2
Current situation

•   The rise of repository systems for academic publishing…



•   …has led to a landscape of distributed systems, each of them
    holding its own metadata…



•   …which is harvested and aggregated by service providers




                                                                   Seite 3
Popular approaches towards aggregation and
homogeneity of metadata
•   Normalization in advance (before harvesting) requires

      •   a mandatory metadata scheme to be applied by the local repositories
      •   a set of controlled vocabularies (e.g. for publication types)
      •   an automatic validation of the harvested metadata

•   Normalization afterwards (after harvesting) requires

      •   the definition of a minimum set of metadata fields
      •   the definition of a basic intermediate metadata scheme for normalizing
          the heterogeneous metadata records,
      •   optionally data cleansing strategies like name disambiguation and
          automatic indexing on the basis of thesauri


Both approaches are problematic and reveal ambiguities on the aggregation level !



                                                                                    Seite 4
Current situation

•   …sounds easy and straight, but implies
    severe problems esp. with regard to
    ambiguity of
     • author names
     • subject headings




                                             Seite 5
Current situation
„The major difficulty we have found is with DSpace’s handling of
metadata. While we feel that the number of fields in Dublin Core is
adequate for most if not all uses (DCMI Usage Board 2006), we are
troubled by the lack of authority control when completing its fields.
Without some control over uniform titles, authors and subjects
accessing the items in the future will very problematic.“
S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital-
repository-a-project-analysis/)
       „Neither the standards nor the software unterlying
      institutional repositories anticipated performing naming
      authority control on widely disparate metadata from
      highly unreliable sources.“
      D. Salo (http://minds.wisconsin.edu/handle/1793/31735)


                                                                     Seite 6
Our approach: Integration of authority values with
Semantic Web technology

•   General idea: “Provide a framework for integrating authority
    data, which is both normative and flexible enough to tolerate
    local idiosyncrasies on a string level.”
•   Approach: Concept modelling based on Semantic Web / SKOS
    standards




                                                                    Seite 7
Our approach: Integration of authority values with
Semantic Web technology




                                                Seite 8
Our approach: Integration of authority values with
Semantic Web technology – Web service
Example queries (for concepts):




http://zbw.eu/beta/stw-ws/suggest?query=finanzkr
…delivers all terms beginning with “finanzkr”

http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels&
concept=http://zbw.eu/stw/descriptor/19664-4&lang=en
…delivers all english synonyms of the german “Finanzkrise”

                                                               Seite 9
Use case: (Self-)Indexing
•   One of the most prominent use cases especially for librarians, but also
    for scientists and active users not familiar with subject specific
    vocabularies
•   Main goals:
     •    Support the process of indexing in order to achieve a classification
          of documents which is both coherent and flexible in the sense that
          it permits local idiosyncrasies related to authority terms
     •    Align different vocabularies in the sense that indexing in one
          vocabulary is automatically linked to another vocabulary
•   Implementation: Extension of the submission interface of our repository by
    integrating the terminology web service as an autosuggest function



                                                                        Seite 10
Use case: (Self-)Indexing

Submission form https://econstor.eu




                                      Seite 11
Use case: Retrieving
•   To be considered as the most important use case

•   Often leading into the classical dilemma of precision and
    recall
•   Main goal:
     • Support the process of retrieving, so users can find the
       relevant set of documents

•   Implementation: Automatic expansion of the original query with
    synonyms, narrower and related terms




                                                                 Seite 12
Use case: Retrieving

Expanded search for „financial crisis“ http://econstor.eu




                                                            Seite 13
Use case: Retrieving

Expanded search for „financial crisis“ http://econstor.eu




                                                            Seite 14
Use case: Retrieving

Expanded search for „financial crisis“ http://econstor.eu




                                                            Seite 15
Anwendungsfall_2: Suche




                          Seite 16
Anwendungsfall_2: Suche




                          Seite 17
“Lightweight” integration into existing repository systems
and service providers




                                                             Seite 18
“Lightweight” integration into existing repository systems
and service providers
Benefits
• „Lightweight“ extension of legacy systems
• Strategy of „least intrusion“: No update or migration needed
• No changes to the core system, only some changes to the data model
  may be required:
  • Additional column for storing the URI of the authority key
  • Export resp. harvesting of the authority as a resource must be able
      (->OAI-ORE)

• Other types of library applications suitable for these adaptations:
  •   catalogues
  •   portals (e.g. to generate publication lists from an identified author or
      thematic issues)
  •   Any collaborative system with annotation system

                                                                                 Seite 19
Zusammenfassung und Fazit
• Bibliotheksanwendungen erzeugen und verwalten jeweils eigene
  idiosynkratische Datenbestände.
• Dies erschwert die Pflege, den Austausch, die Aggregation und die
  Homogenisierung der (Meta-)Daten für erweiterte Dienste.
• Vorgelagerte Webservices als Teil einer übergreifenden Normdaten-
  Infrastruktur können frühzeitig zur Homogenisierung der Metadaten
  beitragen (bei gleichzeitiger Lokalisierung).
• Wenn diese Webservices verbreitet entstehen und genutzt werden,
  besteht die Chance zu einer weitergehenden Vernetzung lokal
  gepflegter Metadaten bei gleichzeitiger Verbesserung der
  datenbasierten Services.
• Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an
  Betreiber von Bibliotheksanwendungen, diese Webservices mit
  möglichst minimalem Aufwand in ihre Anwendungen zu integrieren.
                                                                   Seite 20
Vielen Dank!


Dr. Timo Borst
Deutsche Zentralbibliothek für
Wirtschaftswissenschaften /
Leibniz-Informationszentrum
Wirtschaft (ZBW)

t.borst@zbw.eu



                                 Seite 21
Anwendungsfall_3: Erfassung von Autoren


  •Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher
  der Ausnahmefall
  •Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) +
  BibliotheksnutzerInnen (?)
  •Vorgang: Eingabe von AutorInnen-Namen
  •Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von
  Normdaten zu verbessern, die durch Webservices bereit gestellt werden




                                                                          Seite 22
Anwendungsfall_3: Erfassung von Autoren
•Erfassungsmaske unter http://87.106.250.18/beta/econstor/




                                                             Seite 23
Bisherige Lösungsansätze zur Aggregierung &
Homogenisierung
  •Metadatensuche durch Aggregatoren
  •     Parallele Abfrage entfernt-verteilter Systeme
  •     Rückgabe und Aufbereitung des Suchergebnisses als
        zusammengesetzte Trefferliste
  •Harvesting
  •     Regelmäßiges Einsammeln von entfernt-verteilten
        Metadaten
  •     Homogenisierung ex ante oder ex post
  •Föderierte Suche
  •…

                                                            Seite 24
•[1] http://wiki.dspace.org/index.php/Authority_Control_of_Metadata_Values
Literatur
  •[2] http://minds.wisconsin.edu/handle/1793/31735
   •[3] http://dsug09.ub.gu.se/index.php/dsug/dsug09/paper/view/22/3
   •[4] http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/
   •[5] http://code.google.com/p/dspace-agrisap/wiki/ThesaurusAddOn
   •[6] http://edoc.hu-berlin.de/conferences/dc-2008/subirats-imma-199/PDF/subirats.pdf
   •[7] http://www.jisc.ac.uk/media/documents/programmes/sharedservices/na
   mes-phase-one-final-report,.pdf
   •[8] http://idea.library.drexel.edu/bitstream/1860/3173/1/20070051011.pdf
   •[9] http://ptsefton.com/blog/2006/06/06/the_affiliation_issue_in
   _institutional_repository_software/
   •[10] http://library.ust.hk/info/nac/nac-technical.html
   •[11] http://www.seco.tkk.fi/publications/2009/kurki-hyvonen-onki-people-2009.pdf
   •[12] http://journals.sfu.ca/archivar/index.php/archivaria/article/download/11883/12836
   •[13] http://www.dini.de/fileadmin/workshops/oa-netzwerk-
   juni2009/vernetzungstage_2009_malitz.pdf




                                                                                    Seite 25

Improving library services with semantic web technology in the realm of repositories

  • 1.
    Improving Library Serviceswith Semantic Web Technology - in the realm of Repository Systems Dr. Timo Borst Head of IT Development German National Library for Economics / Leibniz-Information Centre Economics Kiel/Hamburg, Germany ICDK 2011 14th – 16th February, Gurgaon/India Die ZBW ist Mitglied der Leibniz-Gemeinschaft
  • 2.
    Overview 1. Current situation:Distributed (meta-)data management in library applications 2. Popular approaches towards aggregation and homogeneity of metadata 3. Our approach: Integration and aggregation of authority values with Semantic Web technology a) General idea b) Use case: Indexing c) Use case: Retrieving 4. “Lightweight” integration into existing repository systems and service providers 5. Conclusion Seite 2
  • 3.
    Current situation • The rise of repository systems for academic publishing… • …has led to a landscape of distributed systems, each of them holding its own metadata… • …which is harvested and aggregated by service providers Seite 3
  • 4.
    Popular approaches towardsaggregation and homogeneity of metadata • Normalization in advance (before harvesting) requires • a mandatory metadata scheme to be applied by the local repositories • a set of controlled vocabularies (e.g. for publication types) • an automatic validation of the harvested metadata • Normalization afterwards (after harvesting) requires • the definition of a minimum set of metadata fields • the definition of a basic intermediate metadata scheme for normalizing the heterogeneous metadata records, • optionally data cleansing strategies like name disambiguation and automatic indexing on the basis of thesauri Both approaches are problematic and reveal ambiguities on the aggregation level ! Seite 4
  • 5.
    Current situation • …sounds easy and straight, but implies severe problems esp. with regard to ambiguity of • author names • subject headings Seite 5
  • 6.
    Current situation „The majordifficulty we have found is with DSpace’s handling of metadata. While we feel that the number of fields in Dublin Core is adequate for most if not all uses (DCMI Usage Board 2006), we are troubled by the lack of authority control when completing its fields. Without some control over uniform titles, authors and subjects accessing the items in the future will very problematic.“ S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital- repository-a-project-analysis/) „Neither the standards nor the software unterlying institutional repositories anticipated performing naming authority control on widely disparate metadata from highly unreliable sources.“ D. Salo (http://minds.wisconsin.edu/handle/1793/31735) Seite 6
  • 7.
    Our approach: Integrationof authority values with Semantic Web technology • General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.” • Approach: Concept modelling based on Semantic Web / SKOS standards Seite 7
  • 8.
    Our approach: Integrationof authority values with Semantic Web technology Seite 8
  • 9.
    Our approach: Integrationof authority values with Semantic Web technology – Web service Example queries (for concepts): http://zbw.eu/beta/stw-ws/suggest?query=finanzkr …delivers all terms beginning with “finanzkr” http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels& concept=http://zbw.eu/stw/descriptor/19664-4&lang=en …delivers all english synonyms of the german “Finanzkrise” Seite 9
  • 10.
    Use case: (Self-)Indexing • One of the most prominent use cases especially for librarians, but also for scientists and active users not familiar with subject specific vocabularies • Main goals: • Support the process of indexing in order to achieve a classification of documents which is both coherent and flexible in the sense that it permits local idiosyncrasies related to authority terms • Align different vocabularies in the sense that indexing in one vocabulary is automatically linked to another vocabulary • Implementation: Extension of the submission interface of our repository by integrating the terminology web service as an autosuggest function Seite 10
  • 11.
    Use case: (Self-)Indexing Submissionform https://econstor.eu Seite 11
  • 12.
    Use case: Retrieving • To be considered as the most important use case • Often leading into the classical dilemma of precision and recall • Main goal: • Support the process of retrieving, so users can find the relevant set of documents • Implementation: Automatic expansion of the original query with synonyms, narrower and related terms Seite 12
  • 13.
    Use case: Retrieving Expandedsearch for „financial crisis“ http://econstor.eu Seite 13
  • 14.
    Use case: Retrieving Expandedsearch for „financial crisis“ http://econstor.eu Seite 14
  • 15.
    Use case: Retrieving Expandedsearch for „financial crisis“ http://econstor.eu Seite 15
  • 16.
  • 17.
  • 18.
    “Lightweight” integration intoexisting repository systems and service providers Seite 18
  • 19.
    “Lightweight” integration intoexisting repository systems and service providers Benefits • „Lightweight“ extension of legacy systems • Strategy of „least intrusion“: No update or migration needed • No changes to the core system, only some changes to the data model may be required: • Additional column for storing the URI of the authority key • Export resp. harvesting of the authority as a resource must be able (->OAI-ORE) • Other types of library applications suitable for these adaptations: • catalogues • portals (e.g. to generate publication lists from an identified author or thematic issues) • Any collaborative system with annotation system Seite 19
  • 20.
    Zusammenfassung und Fazit •Bibliotheksanwendungen erzeugen und verwalten jeweils eigene idiosynkratische Datenbestände. • Dies erschwert die Pflege, den Austausch, die Aggregation und die Homogenisierung der (Meta-)Daten für erweiterte Dienste. • Vorgelagerte Webservices als Teil einer übergreifenden Normdaten- Infrastruktur können frühzeitig zur Homogenisierung der Metadaten beitragen (bei gleichzeitiger Lokalisierung). • Wenn diese Webservices verbreitet entstehen und genutzt werden, besteht die Chance zu einer weitergehenden Vernetzung lokal gepflegter Metadaten bei gleichzeitiger Verbesserung der datenbasierten Services. • Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an Betreiber von Bibliotheksanwendungen, diese Webservices mit möglichst minimalem Aufwand in ihre Anwendungen zu integrieren. Seite 20
  • 21.
    Vielen Dank! Dr. TimoBorst Deutsche Zentralbibliothek für Wirtschaftswissenschaften / Leibniz-Informationszentrum Wirtschaft (ZBW) t.borst@zbw.eu Seite 21
  • 22.
    Anwendungsfall_3: Erfassung vonAutoren •Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher der Ausnahmefall •Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) + BibliotheksnutzerInnen (?) •Vorgang: Eingabe von AutorInnen-Namen •Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von Normdaten zu verbessern, die durch Webservices bereit gestellt werden Seite 22
  • 23.
    Anwendungsfall_3: Erfassung vonAutoren •Erfassungsmaske unter http://87.106.250.18/beta/econstor/ Seite 23
  • 24.
    Bisherige Lösungsansätze zurAggregierung & Homogenisierung •Metadatensuche durch Aggregatoren • Parallele Abfrage entfernt-verteilter Systeme • Rückgabe und Aufbereitung des Suchergebnisses als zusammengesetzte Trefferliste •Harvesting • Regelmäßiges Einsammeln von entfernt-verteilten Metadaten • Homogenisierung ex ante oder ex post •Föderierte Suche •… Seite 24
  • 25.
    •[1] http://wiki.dspace.org/index.php/Authority_Control_of_Metadata_Values Literatur •[2] http://minds.wisconsin.edu/handle/1793/31735 •[3] http://dsug09.ub.gu.se/index.php/dsug/dsug09/paper/view/22/3 •[4] http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/ •[5] http://code.google.com/p/dspace-agrisap/wiki/ThesaurusAddOn •[6] http://edoc.hu-berlin.de/conferences/dc-2008/subirats-imma-199/PDF/subirats.pdf •[7] http://www.jisc.ac.uk/media/documents/programmes/sharedservices/na mes-phase-one-final-report,.pdf •[8] http://idea.library.drexel.edu/bitstream/1860/3173/1/20070051011.pdf •[9] http://ptsefton.com/blog/2006/06/06/the_affiliation_issue_in _institutional_repository_software/ •[10] http://library.ust.hk/info/nac/nac-technical.html •[11] http://www.seco.tkk.fi/publications/2009/kurki-hyvonen-onki-people-2009.pdf •[12] http://journals.sfu.ca/archivar/index.php/archivaria/article/download/11883/12836 •[13] http://www.dini.de/fileadmin/workshops/oa-netzwerk- juni2009/vernetzungstage_2009_malitz.pdf Seite 25