Successfully reported this slideshow.

Putting Historical Data in Context: how to use DSpace-GLAM



Upcoming SlideShare
DSpace-CRIS & OpenAIRE
DSpace-CRIS & OpenAIRE
Loading in …3
1 of 113
1 of 113

More Related Content

Putting Historical Data in Context: how to use DSpace-GLAM

  2. 2. We will talk about… 1. Theoretical and methodological foundations of the DSpace-GLAM project 2. Managing digital objects with DSpace 3. Exentending the DSpace data model with DSpace-GLAM 4. Integrating DSpace and DSpace-GLAM entities 5. Digital cultural resources fruition and sharing with add-ons 6. Dataset analysis with CKAN 7. Conclusions
  3. 3. The BIG DATA age • Since several years the term "Big Data" has been bursting into the world of Information Technology, • Promising potential related to a new generation of technologies and architectures able to extract value from the enormous amount of data which is continuously produced in the most different fields
  4. 4. In the science domain "Big Data" are seen as an opportunity even bigger The "data deluge" will make obsolete some of the fundamental concepts on which the scientific method has been based so far A new scientific paradigm ?
  5. 5. No more theories? No more hypothesis? No more models? Numbers speak for themselves? A new scientific paradigm ?
  6. 6. Certainly new opportunities… Source: • Being able to manipulate and analyze massive amounts of data represents an important progress for science • It won’t abolish the need to build, refine and verify theories • It will allow to formulate hypotheses and test them infinitely more rapidly and on an infinitely larger sample than in the past
  7. 7. …also for humanities No data deluge, but…growing amount of data • Databases • Electronic journals • Digitization • Tools for data extaction • …
  8. 8. A variety of multidisciplinary data are related to Cultural Heritage and History Different in: Typology Format Structure Scale
  9. 9. More and more complexity In the humanities most of the data are created or collected by people (not measured by instruments) They are affected by individuals, place, time The are fragmentary, partial, biased Source:
  10. 10. Putting data in context Digital Cultural Data have to be analyzed together with all contextual information, digital and not digital, needed to answer research questions, such as: • (cultural, social, economic, technological…) production context of a document/monument • formation processes of an archaeological record • contextual associations at different levels and scales (according to the different dimensions of variations) Source:
  11. 11. A Digital Humanities approach is fundamental… Such an approach, with its focus on relationships, can help in identifying the important dimensions of variation (the CONTEXT) It can help in analyzing primary sources as evidences of a network of heterogeneous systems which can be studied by means of them through a global (holistic) and multidimensional analysis Technological Environmental Social Cultural Economic Source: Hodder I. 2016, Studies in Human-Thing Entanglement, p. 28
  12. 12. …within a Digital Library Management System To move such an approach from theory to practice we need infrastructures and tools for integration, analysis and storage of digital data and resources. Today most of the cultural digital resources and data are in the Digital Libraries or Repositories Are Digital Libraries and Repositories that must provide tools for: • modeling, visualising and analysing information, both in a qualitative and quantitative way, as well as collaboratively working on it • highlighting the relationships between data at different scales • explaining interpretations about the important dimensions of variation and about the network of contextual relations in which historical sources are involved To enter the daily workflow of historians, archaeologists and humanities scholars.
  13. 13. Why DSpace? To achieve the outlined goals and build a state-of-art Digital Library Management System, open source software is preferable. Development of open source software gives effective way to create Digital Library Management Systems with a small financial investment. Looking exactly at sustainability, among the most used open source Digital Library Management Systems, we chose DSpace. ,
  14. 14. Why DSpace? DSpace out-of-the-box allows to: • capture and describe digital material using a submission workflow module, or a variety of batch ingest options • distribute digital assets over the web through a search and retrieval system • preserve digital assets over the long term ,
  15. 15. Why DSpace? The system is based on the specifications of the OAIS (Open Archival Information System) for Long Term Preservation and is able to manage the whole "life-cycle" of a digital object in terms of "Digital Curation", by means of: • metadata creation according to different standards • SIP (Submission Information Package) import and validation • AIP (Archival Information Package) creation • AIP export • storage management • digital resources dissemination (also by means of the OAI- PMH) • digital object history management and integrity check ,
  16. 16. Why DSpace? , There are over 2200 digital repositories and libraries worldwide using the DSpace application for a variety of digital archiving and dissemination needs. DSpace is often used as an institutional repository to provide access to research outputs, scholarly publications, library collections, educational material and more. It is also used as a digital library to store, preserve and disseminate digital cultural heritage. A fairly large part of the world cultural and scientific heritage is already managed, accessed and preserved using DSpace It makes sense to enhance a system already widely used rather than propose to migrate data to new platforms
  17. 17. DSpace Data Model ,
  18. 18. Communities & Collections , • Communities and collections are entities useful to aggregate DSpace items by: • Provenance and responsibility >>> Communities • Metadata, workflow, curation >>> Collections • They both aggregate the items but they are conceptually different things!
  19. 19. Communities , Create your Community
  20. 20. Collections , Create your Collection
  21. 21. Collections ,
  22. 22. Collections ,
  23. 23. Collections ,
  24. 24. Workflow ,
  25. 25. Curating items ,
  26. 26. User management , E-People and Groups are the way DSpace identifies application users for the purpose of granting privileges
  27. 27. DSpace metadata , Out-of-the-box DSpace can support multiple flat metadata schemas You can configure multiple schemas by means of the “Metadata Schema Registry” and select metadata fields from a mix of configured schemas to describe your items Communities and collections have some simple descriptive metadata (a name, and some descriptive prose)
  28. 28. The submission process ,
  29. 29. Defining the submission form , Configure the submission form by means of input-form.xml file You can configure different forms for different collections You can create internal vocabularies for the fields
  30. 30. input-form.xml ,
  31. 31. input-form.xml , dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value must match the value of the schema element defined in the Metadata Schema Registry dc-element (Required) : Name of the element dc-qualifier: Qualifier of the element entered, e.g. when the field is contributor.advisor the value of this element would be advisor. Leaving this out means the input is for an unqualified element. repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you mark a field repeatable, the UI servlet will add a control to let the user ask for more fields to enter additional values. label (Required): Text to display as the label of this field, describing what to enter, e.g. "Your Advisor's Name". input-type(Required): Defines the kind of interactive widget to put in the form to collect the Dublin Core value.
  32. 32. input-form.xml , hint (Required): Content is the text that will appear as a "hint", or instructions, next to the input fields. Can be left empty, but it must be present. required: When this element is included with any content, it marks the field as a required input. If the user tries to leave the page without entering a value for this field, that text is displayed as a warning message. For example, <required>You must enter a title.</required> Note that leaving the required element empty will not mark a field as required, e.g.:<required></required>
  33. 33. input-form.xml – dropdown menus , To create an internal flat vocabulary you have to: • use the «dropdown», «qualdrop» or «list» value within the <input-type> element • populate the <value-pairs> element
  34. 34. Hierarchical Taxonomies and Controlled Vocabularies , Dspace offers also a way for structuring and managing more complex, hierarchical controlled vocabularies Managed in a separate file Taxonomies are described in XML Vocabularies are invoked from the input- form.xml, using the <vocabulary> tag within the related <field>
  35. 35. Batch submission process , Requires the creation of a DSpace Simple Archive: • A directory for each item to import, containing: • the files that make up the item. • An xml file where each metadata element has it's own entry within a <dcvalue> tagset. There are currently three tag attributes available in the <dcvalue> tagset: • <element> - the Dublin Core element • <qualifier> - the element's qualifier • <language>- (optional)ISO language code for element • A “contents” file, with the files enumeration • An (optional) collection file with the information about the collection(s) the item belongs to <dublin_core> <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue> <dcvalue element="date" qualifier="issued">1990</dcvalue> <dcvalue element="title" qualifier="alternative" language="fr">J'aime les Printemps</ dcvalue> </dublin_core>
  36. 36. UI Batch Import , You have to: • Compress the item directories into a zip files. • Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so • Then log-in as Administrator and fill the form
  37. 37. UI Batch Import ,
  38. 38. Batch metadata editing , DSpace provides a batch metadata editing tool. The batch editing tool facilitates the user to perform the following: • Batch editing of metadata by means of a comma delimited file in CSV format • Batch additions of metadata (e.g. add an abstract to a set of items) • Batch find and replace of metadata values (e.g. correct misspelled surname across several records) • Mass move items between collections • Mass deletion, withdrawal, or re-instatement of items • Enable the batch addition of new items (without bitstreams) via a CSV file • Re-order the values in a list (e.g. authors)
  39. 39. Batch metadata editing ,
  40. 40. Extending Dspace Cultural Institutions in the «Big Data Age» ask for: • Complex and multidimensional metadata structures • Complex data models • Relationships management between different entities • Tools for digital data and resources visualization, analysis and interpretation Why not use an “extended” version of DSpace to meet these relevant needs?
  41. 41. DSpace-GLAM (Galleries, Libraries, Archives, Museums) Built by 4Science on top of DSpace and to meet the needs of Cultural Heritage institutions Flexible and extensible data model inherited from DSpace-CRIS (our RIMS) to manage relevant metadata standards and specific conceptual models With dedicated add-ons for digital objects curation, fruition and sharing Also an add-on for datasets visualization and analysis
  42. 42. DSpace-GLAM (Galleries, Libraries, Archives, Museums) DSpace-GLAM is free, open source, compliant with open standards Add-ons are mainly distributed following a new business model (crowdsourcing) Provides institutions with a sustainable and effective tool to manage and analyze Cultural Heritage Information
  43. 43. Weakness of DSpace metadata management • Flat metadata model • Weak support for technical and structural metadata • All information are stored as string at the database level with minimal support (and validation) for data entry in the UI • DSpace-GLAM improves the metadata at the item level providing: -Additional input types for data entry (number, year and regex validation) -Partial support for nested metadata -Support for technical and structural metadata
  44. 44. DSpace-GLAM: interoperability , • Connect to VIAF records and Getty Vocabularies for precise identification of persons, artists and places • It has been reported to work nicely with «plain» DSpace, with the authority implementation. Plan to include it out-of-box in DSpace 7
  45. 45. Extending the DSpace Data Model DSpace-GLAM can manage all the entities important to contextualize digital cultural heritage: • Persons • Families • Fonds • Events • Places • Concepts • ………….. Entities can be created to integrate different metadata standards and conceptual models
  46. 46. Extending the Data Model
  47. 47. • Persons • Projects • Organizations are pre-defined entities inherited from DSpace-CRIS … but you are not required to use (all of) them.  you can define additional entities  you can define your own relationships between entities, including the ones that you have defined Pre-defined entities
  48. 48. Defining other entities
  49. 49. Entities components Tabs Box Fields
  50. 50. Entities components: tabs
  51. 51. Entities components: boxes
  52. 52. Entities components: fields
  53. 53. • Each DSpace-GLAM entity instance has a status flag • Public: the details page is visible to anyone and it will be linked where appropriate. The record is included in public search results • Private: only administrators can access the details page. The entity is indexed only for use as authority entry • Each property/attribute value has an edit mode: • Editable • Visibility flag only • Only Administrators • Read only • A field becomes visible when included in a public visible tab/box Data model configurationDSpace-GLAM visibility and security
  54. 54. • Visibility of a tab or box can be restricted to System administrators Only RP owner Admins and RP Owner  specific users and groups related to the entity instance • To restrict the visibility of a box or tab to specific groups or users one or more properties must be indicated containing the users and/or groups that have access to the protected box / tab Data model configurationDSpace-GLAM visibility and security
  55. 55. • It can be performed via UI and exported to xls • It can be imported from XLS files Data model configurationData model configuration
  56. 56. Data model configurationCreating entities relationships
  57. 57. Data model configurationCreating entities relationships
  58. 58. Data model configurationCreating inverse relationships between entities DSpace-CRIS can use the SOLR indexes to reverse a relation • Documents are linked to the person  • But you can also list the documents under a specific person  Relations are defined in the configuration spring file cris-relationpreference.xml and characterized by A name The target entity (a CRIS Entity or a DSpace Item) The SOLR query with {0}, {1} placeholders to be replaced with the CRIS-ID or the uuid of the source CRIS instance
  59. 59. Data model configuration Creating inverse relationships between entities (cris-relationpreference.xml) <bean id="relationINTERPRETATIONVSEVENTSConfiguration" class=""> <property name="relationName" value="" /> <property name="relationClass" value="" /> <property name="type" value="crisevents" /> <property name="query"> <value>crisevents.eventsrelatedinterpretation_authority:{0}</value> </property> </bean> Name Target entity Solr query
  60. 60. Data model configurationCreating inverse relationships between entities • Inverse relations can be • Visualized • Used to show aggregated statistics • To be visualized, relations are embedded in components (see cris- components.xml)
  61. 61. Data model configuration Creating inverse relationships between entities (cris-components.xml) <!-- Dynamic object component --> <bean id="doComponentsService" class=""> <property name="components"> <map> <entry key="journalspublications" value-ref="publicationlistforjournals" /> <entry key="eventsdocuments" value-ref="publicationlistforevents" /> <entry key="placesevents" value-ref="eventlistforplaces" /> <entry key="eventsperson" value-ref="personlistforevents" /> <entry key="fondschild" value-ref="fondschildforfonds" /> <entry key="fondspublications" value-ref="publicationlistforfonds" /> <entry key="conceptdocuments" value-ref="publicationlistforconcept"/> <entry key="conceptperson" value-ref="personlistforconcept"/> </map> </property> </bean> Name of the related box for visualizing data
  62. 62. Data model configuration Creating inverse relationships between entities (cris-components.xml) <!-- Person list for Events dynamic entity --> <bean id="personlistforevents" class=""> <property name="relationConfiguration" ref="relationEVENTSVSRPConfiguration" /> <property name="commonFilter"> <util:constant static-field="" /> </property> <property name="target" value="" /> <property name="facets" ref="facetsRPforComponentConfiguration" /> <property name="types"> <map> <entry key="all" value-ref="allObjectsComponent" /> </map> </property> </bean>
  63. 63. Data model configurationCreating inverse relationships
  64. 64. Data model configuration Integrating DSpace and DSpace-GLAM (dspace.cfg) • All the GLAM’s entities can be linked with DSpace Items and used as authorities for item’s metadata • This can be done adding some code to dspace.cfg file ##### Authority Control Settings ##### = = RPAuthority, org.dspace.content.authority.ItemAuthority = PublicationAuthority, org.dspace.content.authority.ItemAuthority = DataSetAuthority, = EVENTAuthority, = FONDSAuthority, = CONCEPTAuthority, = INTERPRETATIONAuthority,
  65. 65. Data model configuration Integrating DSpace and DSpace-GLAM (dspace.cfg) choices.plugin.dc.relation.conference = EVENTAuthority choices.presentation.dc.relation.conference = suggest authority.controlled.dc.relation.conference = true cris.DOAuthority.dc_relation_conference.filter = resourcetype_authority:events = events ItemCrisRefDisplayStrategy.publicpath.dc.relation.conference = events choices.plugin.dc.relation.concept = CONCEPTAuthority choices.presentation.dc.relation.concept = suggest authority.controlled.dc.relation.concept = true cris.DOAuthority.dc.relation_concept.filter = resourcetype_authority:concept = concept ItemCrisRefDisplayStrategy.publicpath.dc.relation.concept = concept choices.plugin.dc.relation.fond = FONDSAuthority choices.presentation.dc.relation.fond = suggest authority.controlled.dc.relation.fond = true cris.DOAuthority.dc_relation_fond.filter = resourcetype_authority:crisfonds AND crisfonds.fondsleaf:true ItemCrisRefDisplayStrategy.publicpath.dc.relation.fond = fonds Authority name Display mode For authority values Origin for authority values Entity to populate with new values Authority has its own ID Path to use to link the entity
  66. 66. Data model configurationIntegrating DSpace and DSpace-GLAM
  67. 67. Data model configuration Creating inverse relationships between items and entities
  68. 68. Data model configuration Creating inverse relationships between items and entities
  69. 69. Data model configurationClustering of related objects Out-of-the-box are available components implementations to allow configurable rendering of inverse relation for each entities (dspace items or dspace-glam entities) It is possible • to configure which facets show in the component • to apply filters to the relation • It is possible to enable a clustering using custom categories defined by facet queries It is aware of the preference expressed for the relationships
  70. 70. Managing hierarchical archival structures Extending the data model makes the system able to manage the hierarchical metadata structure required by archival standards such as ISAD (G) and EAD DSpace-GLAM can also manage the production and preservation context of the archive required by ISAAR-CPF, EAC-CPF and ISDIAH
  71. 71. Creating and managing Archival Fonds at different levels
  72. 72. Relating an Archival Unit (Item) to a Fond
  73. 73. Visualizing hierarchical archival structures
  74. 74. Overview of the DSpace-GLAM data model
  75. 75. Overview of the DSpace-GLAM data model
  76. 76. Pointing out Social Networks The system is able to draw graphs based on relationships between Persons using data from the different entities and from the DSpace Items In particular it draws relationships between persons who: • Co-authored the same items • Partecipated in the same event(s) • Partecipated in event(s) in the same place(s) • Are related to the same concept(s)
  77. 77. Visualizing relationships between historical figures
  78. 78. Network configuration (network.cfg) Networks are implemented by plugins You can write your own implementation typically starting from the default ones You can canfigure the network layout (colors, nodes numbers, levels)
  79. 79. Formalizing and analysing interpretations Interpretations are logical processes which starts from data and/or assumptions and through logical reasoning and connecting persons, events, documents, etc., arrive to one or more conclusions Often, in humanities, such processes are merged and hidden within natural language narratives To make such processes explicit, we have to scompose them in different components and in atomic propositions and display such elements
  80. 80. Formalizing and analysing interpretations
  81. 81. Linking interpretations to entities With DSpace-GLAM you can link an interpretation to the items, the events and the persons, it is related to Moreover you can link different interpretations to the same entity
  82. 82. Contextualizing historical data Painting: The flagellation Painter: Piero della Francesca Event: Council of Ferrara (AD 1438) Event: Council of Mantua (AD 1459) Place: FerraraConcept: Renaissance Concept: Humanism Concept: Neoplatonism Person: Emperor John VIII Palaiologos Place: Mantua Interpretation: Ronchey
  83. 83. Ready for Linked Open Data
  84. 84. Ready for Linked Open Data Linking and relating the created entities with other authorities, the institution is ready to be part of the Linked Data Graph Now we are working to include also the additional entities into the DSpace RDF management features GLAM
  85. 85. Navigation Global search across the whole Digital Library
  86. 86. Infographics Global search across the whole Digital Library
  87. 87. Top objects using several criteria Global search across the whole Digital Library
  88. 88. Faceted Search Facets
  89. 89. Customizable Browse indexes
  90. 90. Customizable Browse indexes
  91. 91. DSpace-GLAM use cases Cutural Heritage image files (digitalized manuscripts, paintings, monuments, archaeological finds, rare books, etc.) need to be consulted online, discussed and commented / annotated IIIF protocols and formats allow you to meet these requirements in a standard and understandable (for both humans and machine) way
  92. 92. DSpace-GLAM use cases High-quality scanned books have images typically over 100MB for each page The structure of image sequences are complex and relevant (sequences of pages, of the phases of an historical event, of a cycle of frescoes, etc.)
  93. 93. DSpace-GLAM use cases The same requirements apply to audio and video content -Streaming -Internal structure -Annotation / commenting / transcript Adopt an open standard: the MPEG-DASH format allows adaptive streaming over simple html client with full support for multiple tracks, ToC, subtitles
  94. 94. 4Science IIIF Image Viewer Addon IIIF Compliant 1. Presentation API 2. Image API 3. Search API 4. Authentication API (soon)
  95. 95. DSpace item with “see online option”
  96. 96. Offering an integrated Universal Viewer player
  97. 97. IIIF Image API allows a smooth interaction with the image files
  98. 98. IIIF Presentation API generated on the fly using the metadata of the item and the bitstreams
  99. 99. Bitstreams metadata Hierarchical ToC An example from a PDF document offered as a complex package of page- image
  100. 100. Link images with their textual transcription / OCR Indexing standard format (hOCR) in a webannotation server to supply IIIF Search API
  101. 101. Side by side – image vs text using an additional OCR panel
  102. 102. An example in Arabic characters
  103. 103. IIIF Image Viewer: share and reuse Share images with other scholars/users without waiving proper attribution, e.g. using the «manifest» JSON file: https://dspace- manifest in another IIIF Image Viewer:
  104. 104. Audio/Video streaming Full open source stack: 1. Transcoding 2. Adaptive streaming 3. MPEG-DASH standard
  105. 105. Audio/Video streaming https://dspace- /7&provider=video-streaming Allows the transcode of the audio/video formats in a format and encoding appropriate to the adopted media server (adaptive video streaming) Using the DASH standard protocol allows sharing video with other scholars/users without waiving proper attribution, e.g. using the «manifest» XML file: stream/1841/ch/0/29/94/83/manifest.mpd in another DASH client es/dash-if-reference-player/index.html
  106. 106. Visualizing and analysing datasets 4Science has released a free and open source integration with CKAN, the world's leading open-source data management platform Using an extensible viewer framework you can now offer data discovery, exploration, preview, sampling and visualization from your DSpace repository CKAN makes open webservices for tabular data available:
  107. 107. Visualizing and analysing datasets We look at Dspace-GLAM not only as a tool for management and preservation, but also for analysis Our integration with CKAN allows the visualization and analysis of repertoires and inventories by means of grids, graphs or maps Datasets can also be related to items and other entities https://dspace- Archaeological finds geolocalization
  108. 108. Visualizing and analysing datasets Pottery distribution
  109. 109. Why do I need DSpace-GLAM? • DSpace-GLAM is a powerful extension of DSpace created by 4Science to meet the needs of Galleries, Libraries, Archives and Museums • to be able to manage, analyze and preserve digital objects • together with historical, archaeological or other cultural datasets, • relating them with other entities such as persons, events, places, concepts, etc. • to describe the context of cultural objects and data, according to different granularity levels, and to different interpretations • using worldwide adopted, cutting-edge, open-source software and open standards
  110. 110. How I get DSpace-GLAM? • Every institution, can install Dspace-GLAM or upgrade its DSpace installation to DSpace-GLAM, extending documents management by creating new entities • Your publications will be safely managed as before, adding the advantage of linking them to relevant information such as authors, datasets, events, concepts, networks and much more
  111. 111. When can I move to DSpace-GLAM? • Now: every moment is appropriate to enhance your Digital Library, to better support research activities and make your service more relevant • Upgrading from DSpace to DSpace-GLAM or installing a brand-new “extended” DLMS does not take much extra effort and it is largely rewarded by the extraordinary results that you can get • As an extra security, (if you already have a Dspace repository) DSpace-GLAM does not alter the structure of the current objects managed by DSpace, so you can go back from DSpace-GLAM to DSpace at any time just dropping (a lot of) extra tables… but we are confident that you will not want to do that
  112. 112. • Our goal is to provide an environment for integrating the traditional hermeneutic and interpretative work of historical sciences with data visualization and analysis • In this way, we hope, there may be a fundamental change in the way digital cultural heritage is experienced, analyzed and contributed to by the whole scientific community Data Science in a Digital Humanities Framework
  113. 113. Thanks for your attention Andrea Bollini <> mobile: +39 333 934 1808 skype: a.bollini orcid: 0000-0002-9029-1854 Claudio Cortese <> mobile: +39 333 9340846 skype: claudio.cortese74 orcid: 0000-0003-4572-9711