DSpace:Technical Basics - Identifiers; User management and authentication options; Item Submission Workflows; Import and Export; RSS Feeds, Alerts and News; DSpace Statistics and Google Analytics; SWORD Basics.
DSpace: TechnicalBasicsIryna KuchmaOpen Access Programme ManagerOpen Access and the Evolving Scholarly CommunicationEnvironment workshop, July 11, 2012, Makerere Universitywww.eifl.net Attribution 3.0 Unported
Application ArchitectureThe DSpace system is organised into three tierswhich consist of a number of componentsEach layer only invokes the layer below it i.e. theapplication layer may not used the storage layerdirectly
The Storage LayerThe storage layer is responsible for physicalstorage of metadata and contentDSpace uses a relational database to store allinformation about the organization of content,metadata about the content, information about e-people and authorization, and the state ofcurrently-running workflows.
The Business Logic LayerThe business logic layer deals with managingthe content of the archive, users of the archive(e-people), authorization, and workflow
The Application LayerThe application layer contains componentsthat communicate with the world outside of theindividual DSpace installation, for example theWeb user interface and the Open ArchivesInitiative protocol for metadata harvesting serviceThe DSpace Web UI is the largest and most-used component in the application layer. Twoversions:1. JSPUI: Built on Java Servlet and JavaServer Page technology2. XMLUI (Manakin): Built on XML and Cocoon technology
Server Architecture User Interface Web Application ServerThese systems may reside on a single server orbe hosted separately on dedicated servers
Structural OverviewDSpace is split into three directory trees:Source Directory [dspace-src] Surprisingly, this is where the source code residesInstall Directory [dspace] Populated during install & during normal operation Contains: Configuration files Command line tools Libraries DSpace archive (depending on configuration)Web Deployment Directory[tomcat]/webapps/dspace Contains the JSPs and Java classes and libraries necessary to run DSpace
Persistent IdentifiersThe use of location based identifiers such as theUniform Resource Locator (URL) often leads toproblems in accessibility to resources with timeOften when accessing a resource via a hyperlinkusers receive a “404 - page not found” errorPersistent identifiers are an attempt at solving theissues surrounding resource identification andlong term preservationA persistent identifier allows the resource to beuniquely identified in a way that will not change ifthe resource is renamed or relocated
Persistent IdentifiersThis means that a resource can be reliablyreferenced for future access by humans andsoftwareCaveat: Persistence is heavily dependant onorganisation policy i.e. persistence of an object isonly effective if an organisation maintains andmanages this persistenceDifferent systems in use for persistent identifiers Persistent Uniform Resource Locators (PURLs) Digital Object Identifiers (DOI) Handle – Used by DSpace
The Handle In a handle system, resource address is identified by a unique handle assigned by a common registration service http://hdl.handle.net/2160/568 Registration Handle Prefix Local Identifier Service http://hdl.handle.net 2160 568
Practical: Using a Handle Navigate to Aberystwyth’s DSpace repository – Cadair Select an item from a collection and note the handle address Open this address in a new browser window The handle will resolve an redirect back to your original item
Configuring the Handles serviceOut of the box, a DSpace installation will use thehandle: hdl:123456789These arent really Handles, since the globalHandle system doesnt actually know about them3 Steps to handle configuration
Configuring the Handles serviceIn order to use handle in DSpace, registration fora prefix with the Corporation for NationalResearch Initiatives (CNRI) is requiredHow to register with CNRI? Complete the registration form on the CNRI website Create & Upload the sitebndl.zip to CNRI Pay a small annual feehttp://www.handle.net/service_agreement.html
Generating the sitebndl.zipThe Site Bundle is an archive which containsinformation about your DSpace installation and isused to generate your handleTo generate the sitebndl.zip run the command: [dspace]/bin/dsrun net.handle.server.SimpleSetup [dspace]/handle-serverYou will be required to complete a series ofquestionsOnce completed the sitebndl.zip can be found: [dspace]/handle-server/sitebndl.zipComplete the registration and upload thesitebndl.zip
Configuring the Handle ServerOnce registration is complete, a handle should bereturned from CNRI Configuring the Handle ServerEdit the [dspace]/handle-server/config.dct toinclude the lines in the “server_config” clause:"storage_type" = "CUSTOM""storage_class" = "org.dspace.handle.HandlePlugin”Update all references to YOUR_NAMING_AUTHORITY toyour assigned handle:300:0.NA/YOUR_NAMING_AUTHORITY -> 300:0.NA/2097
Updating the Handle PrefixEdit [dspace]/config/dspace.cfg and update thehandle prefixA restart of Tomcat will be requiredIf items have already been deposited into DSpacetheir handle will need updating [dspace]/bin/update-handle-prefix 123456789 YourHandle
Starting the Handle ServerFinally start the handle server [dspace]/bin/start-handle-serverA script will be required to automate the startingof the handle server upon a server bootOnce configured the handles should resolve asthe practical demonstrated earlier in this module
Workflow scenariosScenario 1: Head of research I want to be able to see everything my researchers deposit for quality control purposes
Workflow scenariosScenario 2: Repository manager I want to approve everything that goes in to the repository to make sure there are no copyright issues or bad metadata
Workflow scenariosScenario 3: Cataloguer I want to be able to see everything my researchers deposit for quality control purposes
The three workflowsDSpace has three workflow steps1. Accept/Reject Step2. Accept/Reject/Edit Metadata Step3. Edit Metadata StepYou can use any combination of the three Steps are worked through in orderWhich might be used in each of theprevious scenarios?
RSS feedsRSS feeds– Site level (all new items)– Community level (new items in all contained collections)– Collection level (new items in that collection)Can be read in modern web browsersCan be subscribed to in news readersoftware
AlertsAlerts– Created by users– Created for a collection– Emails sent each day for new items– Script must run daily: • [dspace]/bin/sub-daily
DSpace statistcisDSpace statistics:– Collated from DSpace log files– Reports generated daily (daily and monthly reports)– http://dspace.example.com/dspace/statistics • Or via the Administer menu– Can be private (must be logged in) or public • In dspace.cfg: – report.public = [true|false]
Statistics collectedThe following statistics are collected– General overview (e.g. number of items archived / number of item views / user logins)– Archive Information (numbers of each type of item)– Item view counts– Actions performed– Search terms used
Google AnalyticsGoogle Analytics allow a richer and moredetailed suite of statistics • Time visitors spent on the site • Where they came from • Terms they used in search engines to find items • The geographic location of visitors • How many pages they looked at • Which pages they started and ended their visit on– JSPUI requires a small code change, Manakin has a configurable option.
CreditsThese slides have been produced re-usingThe DSpace Course by:– Stuart Lewis & Chris Yates– Repository Support Project http://www.rsp.ac.uk/– Part of the RepositoryNet– Funded by JISC http://www.jisc.ac.uk/