• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Implementing Durham Etheses - Sebastian Palucha (Pecha Kucha)

Implementing Durham Etheses - Sebastian Palucha (Pecha Kucha)



Pecha Kucha slides on Durham University's experience of implementing their Etheses system, presented by Sebastian Palucha, on Friday 2nd August 2013 at Repository Fringe 2013.

Pecha Kucha slides on Durham University's experience of implementing their Etheses system, presented by Sebastian Palucha, on Friday 2nd August 2013 at Repository Fringe 2013.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • In the last decade a custom has become established among the tribes of higher education universities – namely to have a repository. At Durham we are very proud from our difference, in pursuit to follow other tribe members we decided to have three different repositories: Durham Research Online, Durham ETheses and Archive and Special Collection. This is the story how we implemented Durham E-Theses.
  • We start our run with limited peripheral vision. We concentrate on in a short time scale to be ready for a first deposit with beginning of 2009 teaching year. We have a goal to have as simple as possible single PDF file deposit. And we had strong inclination to be complaint with EThOS project. We chose EPrints as having some experience with Durham Research Online. However, due to how this was implemented, DRO couldn't be used to handle theses. We opted for a separate EPrints installation. Durham E-Theses background Timeframe of the work Changes made to EPrints plugin Issues recognised in OAI-PMH metadata exchange Harvesting digitised data from Ethos Download Groovy script developed – code snippets and basic use example : Action: Should I develop GUI/GITHUB version En Mass load and metadata correction Changes to OAI-PMH to filter duplication Further digitised material upload - Comments about ETHOS harvesting procedures when metadata elements exchange (e.g. embargo introduced due to take down procedures)
  • Over the time, however, we changed or corrected our initial expectation. The live brought to us new ideas and theirs implementation were challenging as running up a step hill. But certainly very enjoyable once achieved.
  • One of the very first task was to branding our new repository. This might seem just merging two CSS style sheets, one from EPrints and one from University CMS. Unfortunately, it requires a lot of time trying to avoid some browser issues, most notably IE. Imagine our excitement when recently University modified its branding.
  • We also scaled down EPritnst data model just to support theses type and removed unnecessary functionality. The goal was to have a very simple deposit process. Our first deposit screen “Details” collects thesis metada. User, year, department are obtained from user database. We redesign embargo functionality to underpin University 5 years model. Metadata: title, abstract, award, full text status - predefined from LDAP, year, department, non required, keywords, Moving embargo to first stage was with the huge coding expenses
  • Eprints comes with highly customizable LaTeX based cover page. The initial template is rather uninteresting. We added to our cover pages information how to cite thesis as well as use policy. On some occasion we had character encoding problems so we need to modify the plugin and LaTeX template to avoid this issue.
  • As everybody we love stats. We use Google Analytics for monitoring number of full text downloads. This was implemented in two stages. Core code was modified to produce link which would trigger GA counting when clicked. A special profile was set in GA. The PDF url has department information so it is easy to produce stats against individual department.
  • It was our aspiration to be a member of EthOS project. We registered our service very early to be harvested by OAI-PMH protocol. We also modified out of the box UKETD-DC plugin to provide additional information most notably thesis embargo. Recently we participate in the EthOS workshop to share our knowledge and experience.
  • Through the Britihs Library EthOS project more than 1000 thesis has been digitised on demand by users. We would like to upload those PDF file back to our service. We developed simple command line client which talks with EthOS WS API. Subsequently all files has been mass uploaded to Durham E-Theses.
  • http://etheses.dur.ac.uk/cgi/oai2?verb=ListRecords&from=2012-07-05T22:16:45Z&metadataPrefix=uketd_dc http://etheses.dur.ac.uk/cgi/oai2?verb=ListRecords&from=2012-07-05T22:16:45Z&metadataPrefix=oai_dc – will display Epid:1530 Modification include: - Adding new fields into EPrints data model - eprinds_field.pl, workflow, phrases. - Adding conditional OAI matadata format filter ( metadataPrefix=uketd_dc ) in oai.pl configuration file - Adding metadata format filter support in /cgi/oai2 script In order to protectively avoid record duplication by EthOS reharvesting we extended editorial data model and added EthOS persisted ID and simple control whether the record should be excluded from EthOS /UKETD-DC harvesting process. We have modified OAI-PMH core plugin to filter EThOS records.
  • One of our major technical concern was related that the fact that occasionally wrong character encoding could introduced in metadat. This look rather unpleasant and can spoil user experience. Recently we adopted a radical solution and we have converted whole internal database to be UTF-8 compliant and we have updated Eprints database connector.
  • In our default implementation we assumed that copyright of the theses would retain within the user. This was unnecessary conservative approach which did not encouraged our students to explore CC licences. So we were very glad, once approached by students to implement specific CC with particular jurisdiction. We updated wording and CC version to support 3 jurisdiction England & Wales, world and USA.
  • And than we realized, that once we have a work licensed under CC we need to clearly state this not only on the abstract page but also in automated harvest as well as cover pages.
  • Durham E-Theses is a younger sister of Durham Research Online service. One of our later goal was to unify those two repositories by providing single search box for the external users. We use Google Custom Search with bespoken search result. Durham E-Theses results are presented under Theses tab.
  • In the last year we started local project to digitised our stock of 10k paper theses. Some of those theses comes with supplementary material which is also beeing digitised. Similar to the BL we implemented robust take down policy on the author request.
  • Currently we are in the process to implement the EU cookies law. It is rather murky area with not clear interpretation which cookies are essential thus required by service. With additional google services we fill like running in the night without light and clear course to follow.
  • We also introduce permanent “dark” storage based on deletion state. We have not anticipate that users would like to deposit encrypted PDFs or multimedia files.

Implementing Durham Etheses - Sebastian Palucha (Pecha Kucha) Implementing Durham Etheses - Sebastian Palucha (Pecha Kucha) Presentation Transcript

  • Implementing Durham E-Theses Presented by Sebastian Palucha #rfringe13 CC BY jitze http://www.flickr.com/photos/jitze1942/3521700792
  • ∂ Durham E-Theses  Initial project spring/summer 2009  First deposit September 2009  ~ 300 research theses per year  Simple deposit, single PDF  EThOS interoperability  EPrints 3.1.3 (born 2009) CC BY didbygraham http://www.flickr.com/photos/didbygraham/5646920685/
  • ∂ Registered: EThOS, Driver, OCL Digital Gateway (2010 spr.) EThOS harvest in operation (2010 sum.) Google Analytics stats (2010 dec.) EThOS digitised theses loaded (2011 sum.) Google Custom Search (aut. 2011) Collaboration with The BL to improve EThOS services (aut. 2011 – spr. 2012) EU/ICO Cookie Law support (2013 sum.) local digitisation project, 10k (2012 spr2 – ) MySQL migrated to UTF-8 (2013 spring) Creative Common Licences introduced (2012 aut.) CC BY AlishaV http://www.flickr.com/photos/alishav/3156574283 Key milestones
  • ∂ Branding: uniform user experience • Issues: browsers, branding changes • Durham University CMS CSS • Eprints 3 CSS
  • ∂ Simplistic single PDF deposit • Details > Upload > Deposit • LDAP integration + user field population • Embargo implemented in first screen CC BY Pink Sherbet Photography http://www.flickr.com/photos/pinksherbet/236299644
  • ∂ Cover pages  Highly customized LaTeX code  Issues with UTF-8 both LaTeX and plugin  Issues with dynamic if/else
  • ∂ Google Analytics: full text downloads • Two steps: 1. PDF download link (core code) 2. special GA profile • URL structure include department codes ?DDD32 • Internal code modification
  • ∂ EThOS interoperability through OAI-PMH harvest • Issues with out of the box plug-in, changes to XML schema needed • uketdterms:qualificationlevel not defined in EPrints data model • Embargo date not included. Plugin assumes embargo on an record level, whereas EP on an document level! • Added department names • Occasional issues with UTF-8 encoding
  • ∂ EThOS download WS • Script for mass download https://github.com/paluchas/ethos-bl groovy EthosDownloadClient.groovy -i 238830 –m download
  • ∂ EThOS avoiding duplication • We store EThOS persistent IDs • We modified /cgi/oai2 script to conditionally exclude ethos records • Modified record can be exposed to EThOS harvest in future
  • ∂ UTF-8 issues Unknown copy/paste issues seen:  OAI/PMH  Cover Pages LaTeX  Abstract pages Solution:  Code modification  Whole MySQL database migration to UTF-8, fortunately double encoding CC BY familymwr http://www.flickr.com/photos/familymwr/5548057120//
  • ∂ Creative Common Licences  Approached by student: specific query about particular CC to be used  A lot of redefinition is code
  • ∂ CC outreach
  • ∂ Better search, DRO integration Google Custom Search with modified search results
  • ∂ Retrospective digitisation project • 10k paper theses being digitised by local company • Mass upload with metadata in XML file and digitised material in PDF files, web and archive version. A lot of metadata and quality issues • Interesting samples of other materials: big prints, DVDs, CDs, cassette tapes, microfilms, small datasets and research software.
  • ∂ EU/ICO Cookies Law CC BY USAG-Humphreys http://www.flickr.com/photos/ 31687107@N07/6206906748
  • ∂ Repository versus real life • Users would like to deposit other than PDF files. • Requested “Dark” storage • Encrypted PDFs • Take down requests, and Web cached content. How far should we liaise with external world • Some students are not aware about consequences of web deposits: 3rd party copyright, sensitive data not embargoed etc. • Disciplinary differences; not only humanities vs. sciences. • External user requesting contact with author or supervisors
  • ∂ Sustainability • Operational: virtualization, operating systems support, database • Customization: Bespoken changes and technology deficit • Support: hard to coordinate across the University departments CC BY Rennett Stowe http://www.flickr.com/photos/tomsaint/4515448425
  • ∂ Future plans  Review process, be paper free, include pass list, extend workflow to exam board  Actively encourage students to use CC licences by demonstrate its benefit  Encourage deposit of key data sets and explore data visualization  Migrate to new repository framework  Integration with Durham University RIS  Google Analytics live stats, integration with IRUS-UK CC BY Boston Public Library http://www.flickr.com/photos/boston_public_library/8902381985/
  • ∂ Repository of the future CC by http://www.flickr.com/photos/keoni101/7069578953 CC BY Keoni Cabral http://www.flickr.com/photos/52193570@N04/7069578953