Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PhD Thesis Digitisation Project

813 views

Published on

Presentation by Gavin Willshaw at University of Edinburgh Open Knowledge Network event, 28th April 2017

Published in: Education
  • According to Google Survay ThesisScientist.com is the best Website for Learn Research and Thesis Work.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

PhD Thesis Digitisation Project

  1. 1. PhD thesis Digitisation Project Gavin Willshaw, Digital Curator, Library & University Collections @gwillshaw
  2. 2. Project background • 27,000 PhD theses dating from early 1600s to present day • 10,000 already digitised / in digital format • 2005: requirement for submission of digital thesis • Several small-scale digitisation projects
  3. 3. The collection • Largely standardised • Yet, lots of diversity: • Latin / handwritten • Awkward foldouts • Varying size • Some theses damaged / dirty • Biological specimens…
  4. 4. Project aims • Provide global, unhindered access to unique Edinburgh research • Obtain equipment, software and expertise for future mass digitisation projects • Digitise 17,000 PhD theses – online by end 2018 • Create basic MARC records for 4,000 uncatalogued theses • Undertake conservation work on 2,000 damaged theses
  5. 5. • 10,000 theses scanned destructively in-house • Boards and spines removed • Pages fed through Kodak i4250 document scanner Destructive scanning
  6. 6. • 3,000 unique theses scanned non- destructively in-house • i2s Copibook Cobalt scanners (x2) • Angle support allows for scanning items with tight bindings • 4,000 unique theses outsourced Non-destructive scanning
  7. 7. LIMB Server batch processing software • Deskew, sharpen, remove signatures / addresses, OCR, QA • Output keyword searchable 300 DPI PDF
  8. 8. Copyright / Licensing • Made available open access through Edinburgh Research Archive (ERA) • However, copyright still held by authors, not UoE • 2039 rule: all unpublished works (inc PhDs) under copyright until 2039, even if author died centuries ago • UoE has no right to openly licence • Low risk; Take-down policy
  9. 9. • Gain expertise in mass digitisation • Obtain equipment / software at project end for future digitisation initiatives • More control over fragile material / workflows • Frees up 500 linear metres of shelf space Why this approach?
  10. 10. Date Activity Feb 16 Funding confirmed May 16 Equipment and staff in place – scanning work begins Jun 16 First batch of digitised theses online Nov 16 Conservation work begins Mar 17 Procurement partner confirmed and outsourcing begins Jul 17 Conservation work complete May 18 All in-house scanning and processing complete Dec 18 All outsourced theses returned Dec 18 All theses available online Timeline
  11. 11. • 5,646 scanned in-house • 4,132 duplicate items • 1,514 unique items • 4,898 processed in-house • 4,434 online • On track to have in-house element completed within timeframe Progress to date
  12. 12. Not just text…
  13. 13. Some notable authors
  14. 14. • Linking theses to Wikipedia • Wikisource • Looking to explore advanced research techniques (e.g. text mining / data visualisation) Beyond scanning
  15. 15. • libraryblogs.is.ed.ac.uk/phddigitisation • era.lib.ed.ac.uk • facebook.com/crc.edinburgh • @CRC_EdUni • @gwillshaw Find out more
  16. 16. Gordon Brown: By Copyright World Economic Forum (www.weforum.org), swiss-image.ch/Photo by Remy Steinegger [CC BY- SA 2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons Arthur Conan Doyle: By Arnold Genthe - PD image from http://www.sru.edu/depts/cisba/compsci/dailey/217students/sgm8660/Final/They got it from: http://www.lib.utexas.edu/photodraw/portraits/,where the source was given as:Current History of the War v.I (December 1914 - March 1915). New York: New York Times Company., Public Domain, https://commons.wikimedia.org/w/index.php?curid=240887 Alexander McCall Smith: By TimDuncan (Own work) [CC BY 3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons Honor Fell: See page for author [CC BY 4.0 (http://creativecommons.org/licenses/by/4.0)], via Wikimedia Commons Isabel Emslie Hutton: By Post of Serbia (http://www.wnsstamps.post/en/stamps/RS060.15) [Public domain], via Wikimedia Commons Attributions

×