DSpace: Technical Basics

DSpace: Technical
Basics

Iryna Kuchma
Open Access Programme Manager
Open Access and the Evolving Scholarly Communication
Environment workshop, July 11, 2012, Makerere University

www.eifl.net Attribution 3.0 Unported

Application Architecture

The DSpace system is organised into three tiers
which consist of a number of components

Each layer only invokes the layer below it i.e. the
application layer may not used the storage layer
directly

The Storage Layer

The storage layer is responsible for physical
storage of metadata and content

DSpace uses a relational database to store all
information about the organization of content,
metadata about the content, information about e-
people and authorization, and the state of
currently-running workflows.

The Business Logic Layer

The business logic layer deals with managing
the content of the archive, users of the archive
(e-people), authorization, and workflow

The Application Layer

The application layer contains components
that communicate with the world outside of the
individual DSpace installation, for example the
Web user interface and the Open Archives
Initiative protocol for metadata harvesting service
The DSpace Web UI is the largest and most-
used component in the application layer. Two
versions:
1. JSPUI: Built on Java Servlet and JavaServer Page
technology
2. XMLUI (Manakin): Built on XML and Cocoon technology

Server Architecture

User Interface
Web Application Server

These systems may reside on a single server or
be hosted separately on dedicated servers

Structural Overview

DSpace is split into three directory trees:
Source Directory [dspace-src]
 Surprisingly, this is where the source code resides
Install Directory [dspace]
 Populated during install & during normal operation
 Contains:
 Configuration files
 Command line tools
 Libraries
 DSpace archive (depending on configuration)

Web Deployment Directory
[tomcat]/webapps/dspace
 Contains the JSPs and Java classes and libraries
necessary to run DSpace

Persistent Identifiers

The use of location based identifiers such as the
Uniform Resource Locator (URL) often leads to
problems in accessibility to resources with time
Often when accessing a resource via a hyperlink
users receive a “404 - page not found” error
Persistent identifiers are an attempt at solving the
issues surrounding resource identification and
long term preservation
A persistent identifier allows the resource to be
uniquely identified in a way that will not change if
the resource is renamed or relocated

Persistent Identifiers

This means that a resource can be reliably
referenced for future access by humans and
software

Caveat: Persistence is heavily dependant on
organisation policy i.e. persistence of an object is
only effective if an organisation maintains and
manages this persistence

Different systems in use for persistent identifiers
 Persistent Uniform Resource Locators (PURLs)
 Digital Object Identifiers (DOI)
 Handle – Used by DSpace

The Handle

 In a handle system, resource address is identified by a
unique handle assigned by a common registration service

http://hdl.handle.net/2160/568

Registration Handle Prefix Local Identifier
Service
http://hdl.handle.net 2160 568

Practical: Using a Handle

 Navigate to Aberystwyth’s DSpace repository – Cadair
 Select an item from a collection and note the handle
address

 Open this address in a new browser window

 The handle will resolve an redirect back to your original
item

Configuring the Handles service

Out of the box, a DSpace installation will use the
handle:
hdl:123456789
These aren't really Handles, since the global
Handle system doesn't actually know about them

3 Steps to handle configuration

Configuring the Handles service

In order to use handle in DSpace, registration for
a prefix with the Corporation for National
Research Initiatives (CNRI) is required

How to register with CNRI?
 Complete the registration form on the CNRI website
 Create & Upload the sitebndl.zip to CNRI
 Pay a small annual fee

http://www.handle.net/service_agreement.html

Generating the sitebndl.zip

The Site Bundle is an archive which contains
information about your DSpace installation and is
used to generate your handle
To generate the sitebndl.zip run the command:
[dspace]/bin/dsrun net.handle.server.SimpleSetup
[dspace]/handle-server
You will be required to complete a series of
questions
Once completed the sitebndl.zip can be found:
[dspace]/handle-server/sitebndl.zip
Complete the registration and upload the
sitebndl.zip

Configuring the Handle Server

Once registration is complete, a handle should be
returned from CNRI
Configuring the Handle Server
Edit the [dspace]/handle-server/config.dct to
include the lines in the “server_config” clause:
"storage_type" = "CUSTOM"
"storage_class" = "org.dspace.handle.HandlePlugin”

Update all references to YOUR_NAMING_AUTHORITY to
your assigned handle:
300:0.NA/YOUR_NAMING_AUTHORITY -> 300:0.NA/2097

Updating the Handle Prefix

Edit [dspace]/config/dspace.cfg and update the
handle prefix

A restart of Tomcat will be required
If items have already been deposited into DSpace
their handle will need updating
[dspace]/bin/update-handle-prefix 123456789
YourHandle

Starting the Handle Server

Finally start the handle server
[dspace]/bin/start-handle-server

A script will be required to automate the starting
of the handle server upon a server boot

Once configured the handles should resolve as
the practical demonstrated earlier in this module

Workflow scenarios
Scenario 1: Head of research

I want to be able to see everything
my researchers deposit for quality
control purposes

Workflow scenarios
Scenario 2: Repository manager

I want to approve everything that
goes in to the repository to make
sure there are no copyright issues or
bad metadata

Workflow scenarios
Scenario 3: Cataloguer

I want to be able to see everything
my researchers deposit for quality
control purposes

The three workflows
DSpace has three workflow steps
1. Accept/Reject Step
2. Accept/Reject/Edit Metadata Step
3. Edit Metadata Step

You can use any combination of the three
 Steps are worked through in order
Which might be used in each of the
previous scenarios?

RSS feeds
RSS feeds
– Site level (all new items)
– Community level (new items in all contained
collections)
– Collection level (new items in that collection)
Can be read in modern web browsers
Can be subscribed to in news reader
software

Alerts
Alerts
– Created by users
– Created for a collection
– Emails sent each day for new items
– Script must run daily:
• [dspace]/bin/sub-daily

DSpace statistcis
DSpace statistics:
– Collated from DSpace log files
– Reports generated daily (daily and monthly
reports)
– http://dspace.example.com/dspace/statistics
• Or via the Administer menu
– Can be private (must be logged in) or public
• In dspace.cfg:
– report.public = [true|false]

Statistics collected
The following statistics are collected
– General overview (e.g. number of items
archived / number of item views / user logins)
– Archive Information (numbers of each type of
item)
– Item view counts
– Actions performed
– Search terms used

Google Analytics
Google Analytics allow a richer and more
detailed suite of statistics
• Time visitors spent on the site
• Where they came from
• Terms they used in search engines to find items
• The geographic location of visitors
• How many pages they looked at
• Which pages they started and ended their visit on
– JSPUI requires a small code change, Manakin
has a configurable option.

Credits
These slides have been produced re-using
The DSpace Course by:
– Stuart Lewis & Chris Yates
– Repository Support Project
http://www.rsp.ac.uk/

– Part of the RepositoryNet
– Funded by JISC
http://www.jisc.ac.uk/

DSpace: Technical Basics

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to DSpace: Technical Basics

Similar to DSpace: Technical Basics (20)

More from Iryna Kuchma

More from Iryna Kuchma (20)

Recently uploaded

Recently uploaded (20)

DSpace: Technical Basics