Regal - a Repository for Electronic Documents and Bibliographic Data


Published on

Published in: Internet, Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Regal - a Repository for Electronic Documents and Bibliographic Data

  1. 1. graphthinking a Repository for Electronic Documents and Bibliographic Data Felix Ostrowski (graphthinking, @literarymachine) Jan Schnasse (hbz, @InspektorHicks) ELAG, June 11th 2014, University of Bath
  2. 2. graphthinking Rationale: A new foundation for Edoweb ● A system to gather, describe and archive deposit copies of electronic publications and websites on behalf of the State Library Center of Rhineland-Palatinate (LBZ) ● Operated by the North Rhine-Westphalian Library Service Center (hbz) since 2002 ● Technical evolution: OPUS – Digitool – regal
  3. 3. graphthinking The current system and its shortcomings: Digitool ● Digitool end-of-life is coming ● Unwanted/unexpected dependencies to other projects hosted on the same Digitool instance ● Performance issues (we have millions of objects in Digitool) ● No easily configurable search indexes or OAI-PMH interfaces for single collections ● No out-of-the-box support of regional requirements (e.g. metadata delivery to German National Library), extra money/developer hours needed
  4. 4. graphthinking The current system and its shortcomings: Homemade ● Mix of self developed and Ex Libris components ● Vicious circle – introduction of workarounds – unpredictable migration costs – decision to stay on obsolete version – running out of support – introduction of workarounds ● Administrative responsibilities in different hbz working groups
  5. 5. graphthinking Altogether, this leads to a expensive, hard to maintain and outdated system that doesn't satisfy our and ours clients needs.
  6. 6. graphthinking The following aspects are mandatory to achieve our goals ● Increase the overall performance ● Provide an up-to-date, modern user interface ● Use open source software (Fedora, Elasticsearch, Drupal) ● Seamlessly import (meta-)data from Digitool and potentially other (repository) systems ● Integrate the system with the emerging Linked-Open-Data ecosystem, especially authority data ● Loosen the tight integration with Ex Libris Aleph ● Expose (meta-)data for easy discovery & re-use by others.
  7. 7. graphthinking Overview of the new architecture regal (backend) Fedora Elasticsearch regal-drupal (frontend) Ex Libris Aleph lobid API
  8. 8. graphthinking Data model ● Simple hierarchical data model consists of nodes associated via hasPart and partOf relations ● Each node is identified by a namespace combined with a Universally Unique Identifier (UUID) ● Each node can have a bit and a metadata stream ● Metadata canonically stored as RDF N-triples ● Bitstream can contain arbitrary data
  9. 9. graphthinking
  10. 10. graphthinking Fedora (3.7.1) ● mainly used to organize and associate multiple datastreams and their versions ● provides a long term accessible data storage ● usage of Proai as OAI-PMH solution
  11. 11. graphthinking Elasticsearch (1.1.0) ● Used to provide performant lookup (for metadata and full-text) ● Stores compacted JSON-LD ● Faceting can be used to browse the collection
  12. 12. graphthinking Backend / API ● Java Web API (RESTful) implemented with Jersey ● Abstracts access to storage & indexing, transparently updates Fedora and different Elasticsearch indexes ● Provides resources as OAI-ORE aggregations
  13. 13. graphthinking Drupal Frontend ● Re-use of common features – User management – Template-system – Field API – RDF Mappings – HTML-Form API ● Extended with custom modules for – Storage Backend – Linked Data Fields – JavaScript UI enhancements
  14. 14. graphthinking No big surprises for plain text input...
  15. 15. graphthinking Catalinking
  16. 16. graphthinking Simple lookup widget with configurable data sources (currently only lobid-API is implemented)
  17. 17. graphthinking
  18. 18. graphthinking Additional linked data is integrated on-the-fly
  19. 19. graphthinking
  20. 20. graphthinking Client-side sorting (and soon also searching) of linked data
  21. 21. graphthinking Exposing data
  22. 22. graphthinking
  23. 23. graphthinking
  24. 24. graphthinking Importing data
  25. 25. graphthinking This is simply a shortcut, any linked data URI can be used.
  26. 26. graphthinking Tada!
  27. 27. graphthinking
  28. 28. graphthinking Managing structure
  29. 29. graphthinking Possible child nodes, in case of a monograph these are only files. Journals provide more complex structures (volumes, issues, articles).
  30. 30. graphthinking
  31. 31. graphthinking Basic technical metadata added by the backend.
  32. 32. graphthinking Move object by settings its new parent.
  33. 33. graphthinking Faceted search, brought to us by Elasticsearch
  34. 34. graphthinking Facets can be added and removed individually.
  35. 35. graphthinking
  36. 36. graphthinking Anybody can say anything about anything...
  37. 37. graphthinking Local views on remote resources, e.g. authors and classifications.
  38. 38. graphthinking Obstacles encountered / lessons learned: Drupal ● is designed to be standalone, so we basically have two backends ● its HTML Form API can be awkward to work with if you don't want to do things the "Drupal-way" ● a pure JavaScript / HTML5 frontend might replace Drupal in upcoming versions
  39. 39. graphthinking Obstacles encountered / lessons learned: Fedora ● is more of an infrastructure than a storage system ● because of its complexity, we consider authorization via XACML a big disadvantage ● OAI-PMH is also not supported very well ● we are still looking for a more lightweight solution ● perhaps as lightweight as simply using the file system for both bitstreams and metadata
  40. 40. graphthinking Obstacles encountered / lessons learned: Elasticsearch ● Works very well with JSON-LD in general ● but needs some care to create proper mappings ● and could use a more generic notion of relations than only parent/child.
  41. 41. graphthinking Further regal applications ● Migrate further Digitool and non-Digitool repositories ● Frontend: Prototype of an OER World Map
  42. 42. graphthinking Good news: Linked Data Works! ● regal / Edoweb is not a research project, ● it is integrated into the hbz IT landscape, ● it is on the web, ● it does not require expertise in Linked Data, ● and real librarians will use it to create real catalog entries.
  43. 43. graphthinking Thank you! Questions? Now or later to