UCD Digital Library: Creating online access to historical and contemporary collections - opportunities and challenges


Published on

Presentation given by Julia Barrett, UCD Library Research Services Manager, at Academic & Special Libraries Annual Seminar 1st March 2013, Dublin, Ireland

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A number of people assisted me with this presentation and so I’d like to acknowledge Peter Clarke, Orna Roche, John Howard and particularly Audrey Drohan who is here in the audience….
  • IVRLA = Irish Virtual Research Library & Archive
  • This is the backgroundFunded by the HEAThe IVRLA project developed a proof of concept in relation to creating a body of digitised content. What was important was the testing out of a variety of different types of materials within a range of collections. And to create a repository that would be scaleable into the futurePartnership and relationshipswith repositories very important – to get buy-in from both academics and repositories - something that has been actively continued and built upon. The key repositories and partnerships were ….
  • UCD Mícheál Ó Cléirigh InstituteThese are examples of collections
  • Qualitative datasets (survey data; media-based documentation of testimonies from field informants; textual/tabular data)Research Collections, e.g.: surveys the holdings of particular thematic areas from a variety of repositories, creating lists and then to digitise a selection of the material. E.g. Towards 2016 project surveyed UCD’s holdings of material relating to the 1916 Rising and to digitise a selection of this primary sourcematerial, arrange them and comment on them and make them available on the IVRLA.So the IVRLA is what preceded the DL. Before I move specifically to the DL I’m going to look at some of the key questions around DL development and look at the overall framework we are using.
  • A digital library is something that is dynamic and something that puts the user in the centre.We need to consider:Why are we digitising? – to provide easier access; to enable new research; to add value by linking collections? To facilitate the preservation of fragile materials?What are we digitising? – do we have a collection development policy? Can we digitise everything we want to digitise? Copyright issues? Sensitive materials and ethical issues?For whom? – who are our end users? Identified researcher cohorts? What about the opportunities to build relationships with our repositories and Schools?How will this be of benefit?How will we know?How will we provide access and support?Then the questions around HOW…metadata, workflows, infrastructures, long-term preservation etc.And what overall models will we use?...[OAIS…next slide]
  • An Open Archival Information System (or OAIS) consists of an organization of people and systems, that has accepted the responsibility to preserve information and make information / collections available for a Designated Community.The model advocates adherence to best practices and standards; and also to data modelling and workflowsVisually it looks like this….[next slide]
  • This model is one on which the DL is largelybased.SIP – in a zipped folder we receive the datastreams (i.e. each content item is a datastream and will consist of the image, metadata). This is then ingested (i.e. uploaded plus associated processes) into the DL where it is archived and managed. From the store the access policies are developed (e.g. levels of access), thereby allowing users access following their querying.The AIP is something that we are looking at now – not quite there yet and will be dependent on using Premis metadata schema for preservation purposes.So, where are we now with the DL?
  • All the IVRLA collections are now in the DLSo…lastyear we implemented a new infrastructure, fedora 2.2 to 3.5. This offers improved efficiencies and stabilityThe Solr search engine offers hit highlighting, faceted search, dynamic clustering, etc. It’s a powerful indexing and retrieval tool.Move to JPEG2000 – we’ve moved away from the delivery of data using the sometimes problematic djvu format to JPEG2000.Because of the way the compression engine works, JPEG 2000 supplies a higher quality final image even when zoomed in on.This version of Fedora also allows for geospatial searching – we started to look at that last year – and this year we are further developing that (will talk about later on)We have also increased the number and range of collections…..
  • See AD’s listing19th Century Social History Pamphlets – include a variety of themes: education, health, famine, business etc.UCDscholarcast – downloadable lectures from School of English and elsewhereThomas Hardy’s The Return of the Native – Mnauscript copy of Hardy’s work
  • 2 Georgian buildings collections in ArtStorDigital file = one photo, one page of a book (i.e. one scan), one audio-file etc.
  • Top right – School Manuscripts collection – My Home district (Sligo). This consists of a series of selected essays by schoolchildren , a topographical description of their own locality which they were encouraged to write. Early 20th centuryBeranger watercolours – beautiful.Joyce and unidentified man.National Folklore Coll. PhotoText of a speech by terenceMacSwiney.Many of the DL’s users are not researchers but members of the public who are exploring the collections for personal perspectivesOther materials are rare and pre-date the founding of the irish state.
  • Other e.g.:UCD has a partnership with the Franciscans and as a result the DL has some important collections e.g. the Luke Wadding papers, a monograph collection, material culture mendican order collections – really beautiful chalices and other religious objects. Images from multiple perspectives in great detail. This one is the William Ferris Chalice.Other e.g.s here: School games collection (hopscotch); sculpture trail; Austin Clarke collection…At the moment and over the next few months we are looking at….[next slide]
  • Over the next year…in addition to more collections….
  • Increase in diversity of collections:Currently using the old IVRLA workflow.Different scanning procedures depending on the type of collection e.g. images, coins.A book will have one descriptive record with multiple scansAn photo will have one record and one imageThe sculptures in the UCD Sculpture Trail collection will each have a record but will have many different viewsSo we need to draw up different workflows (some more complex than others – maps are complicated) to which different standards will be applied.There are approx. 30 main metadata standards and many more smaller ones
  • Linked data is a method of publishing structured data so that it can be interlinked and become more useful.In order to link data successfully names of people, places etc. must be standardised through the use of an authority list e.g. e.g. VIAF - The Virtual International Authority File . So what happens is that the names authority and the value are automatically added to the metadataResources are expressed in the form of a “triple” consisting of a subject – relationship – object e.g. “Dan Brown wrote the Da vinci code” – using a standardised form of Dan brown’s name – VIAF.We are still working on this on backend….not yet available through the DL front page.
  • Evaluating how to implement full-text searching this yearSolr (search engine we are using on Fedora) has an in-built OCR (optical character recognition) engine – but we need to develop this.Not everything is OCR-able – e.g. handwriting.
  • We are also looking at enhancing the user experience and specifically looking at evaluating the Hydra framework - Hydra is a repository solution that is being used by institutions on both sides of the North Atlantic to provide access to their digital content.  Hydra provides a versatile and feature rich environment for end-users and repository administrators alike.Northwestern University Library's Digital Image Library We are looking at security issues around accessing restricted resources – we need to implement more sophisticated access control mechanisms.We need to extend our storage and backup capacityTo underpin all of this, what software, hardware etc are we using…..next few slides
  • i2s SupraScan Quartz A1 HD book scanner – book scanner up to A1 – high definition
  • ISO 19115 - geospatialNUDS: Numismatic Description StandardAuthorities / Ontologies (or “Vocabularies”)LCTGM LC Thesaurus for Graphic Materials Use of authorities imp. In realising linked dataWhatever standards we are using we are trying to ensure adherence to best practiceI think the number of languages, formats, schemas etc does highlight one of the challenges…that of complexity….[next slide]
  • I’ve been working in this area for just over a year – it has taken me some time to get used to new language, concepts, acronyms – this is a wordle created from 3 recent emails…
  • Understand the complexityIt’s not just about scanningCommunicate the complexityThis is just the workflow to get the item ready for ingest (i.e. from analog to when ingested = upload plus associated processes)Assessment selection will link in to our collection development PolicyCopyright clearanceDigitisation / image processing etc.Etc.
  • This is a diagram showing how we implement content access.So following ingest to the Fedora server (green)… we then make the collection available (via our service components) to our end users in a variety of different waysSo creating digital libraries will need a range of skills – lots of stages.
  • Range of skills needed, including IT skillsNeed therefore to pull together teams of people from areas not only in the LibrarySpecialised areas – with usually not many people in each area – so vulnerable to loss of expertise. Real challenges where one staff leaving can throw the whole systemKnowledge management therefore is importantSo how does this work in UCD Library….[next slide]
  • The IVRLA was an externally funded project.The Digital Library is not and so the different areas of activity are centred in the current library structure as part of the ongoing work of the library.Within that we are looking at “mainstreaming” from Research & Innovation to other library unitsProgrammer – about to recruitCS – just another library collectionIR: managing – same sorts of infrastructure, workflow management etc.
  • This is a slide showing the Republic of Letters which describes scholarly communities and networks of knowledge in the 18th century. The Iberian Short Title Catalogue (ISTC) is a growing catalogue/dataset of books printed on the Iberian Peninsula up to 1650. A tool for:Location of early print materialsResearch in history of printingSchool of HistoryWhen we showed our academic this he was exceedingly interested – he too could do something similar utilising this type of data visualisation.Among the many advantages of the Fedora platform are the potentials for the augmentation and reuse of the data. Geospatial coordinates and other data can be embedded into the underlying data for both publishers/printers and physical location of the items. This could be repurposed into a data visualisation tool to show timelines of publications, patterns of use, the most influential publishers, etc. But must manage how to whet people’s appetite with what is realistic – timewise and with our existing resources. Concentrate on the path rather than a possible end helps to manage expectations.The last section of my presentation looks at the opportunities…[next slide]
  • Libs collaborating with other libraries?
  • Numismatics project: School of ClassicsIB: School of History1916 photo: UCD ArchivesThomas Hardy manuscript of the Return of the Native – Special CollsIrish Nurses Journal – School of Nursing and the INMOPhotos from the Irish Folklore CommissionSchool of Art History Georgian photos
  • Library role in relation to auditing existing digital collections on Campus – could be developedWhat collections? Links to anniversaries and other defined priorities – adhere to an overall colldev policy
  • OSI scanned the maps in 2006-2008 – never made availableBecause of retirements and re-deployment only one member of OSI staff remains who worked on that project – danger of loss of knowledge and loss of collectionsGeorectification will mean being able to overlay the old map with a current map and to make comparisons. With the implementation of our mapping framework we’ll be able to include images of buildings etc on these also.Paul Ferguson gave us 600 maps from the Military Series - 1:25,000 scale (1940s)We scanned approx 340 25" OSi mid 20th century maps for TCD
  • Availability of primary source material (students don’t usually see these until 3rd / 4th level) – therefore pedagogic advantage
  • NursingAlso:Social historyLabour history
  • Opportunities through the implementation of newer technologies…If you are teaching about the history of the development of Dublin City and interested in Georgian arch – 2 collections…very detailed views – interior and exterior. Building materials, architecture, plasterwork, ornamentation, whole range of things.This is the Custom House
  • Lord Howth’s house – one of the Dublin County john Roque maps of the 18th century, from UCD’s Special CollectionsMore about maps…..
  • Use of geospatial technologies:As already mentioned, many resources in the UCD Digital Library have a geographic .WE are experimenting with geocoding these references, and providing links to external sources where additional information is available—such as geonames.org and OpenStreetMap.
  • Final 2 opportunities….Develop new skills in a growth area exciting areaPossibilities for libraries collaborating with each other? E.g. metadata policies (but all using the same standards?)
  • A repository of UCD’s digital cultural heritage materials…diversityA repository of data of various kinds and provides a framework for resource discoveryA platform for new forms of digital publicationsA platform for the dissemination of the outcomes of UCD research and creativityA platform for innovation in library services, teaching & learning, and researchA proactive way of partnering with our extended community
  • Air….
  • UCD Digital Library: Creating online access to historical and contemporary collections - opportunities and challenges

    1. 1. Academic & Special Libraries Section Annual SeminarUCD Digital Library: Creating online access to historical & contemporary collections - opportunities and challenges Julia Barrett, Research Services Manager, UCD Library Julia.barrett@ucd.ie
    2. 2. Outline• UCD Digital Library – IVRLA background – where we are now – where next?• Opportunities• Challenges
    3. 3. 5 year pilot project,IVRLA’s main goals were access funded by the Irish State and the EUand preservation. 2005-2010 Proof of concept UCD Ó CLÉIRIGH INSTITUTE IRISH SPECIAL COLLECTIONS DIALECT ARCHIVE Body of digitised content Humanities-based source Functioning repository prototype material located in 7 physical ART HISTORY UCD ARCHIVES locations on campus GEOLOGICAL SCIENCES NATIONAL FOLKLORE COLLECTION
    4. 4. Sample IVRLA CollectionsUCD Archives National Folklore Collection• Papers of William Frazer • Folklore Photograph Collection• Papers of Eugene O’Curry • Questionnaire: Emigration to• Papers of James Meenan America• Boehm/Casement Papers • Questionnaire: Tinkers [Travellers]Special Collections • Urban Folklore Project (Dublin)• Beranger Watercolours • Schools’ Manuscript Collection – Carna and Ballinasloe, Co. Galway• Historic Maps Collection• Ó Lochlainn Collection: Ballads Irish Dialect Archive• 19th Century Pamphlet Collection • Manuscript Collection• UCD Letters • Card Collection• The Beckett Country Collection UCD Mícheál Ó Cléirigh• Curran Collection: Photographs • UCD MOCI Monograph Collection• ODonovan/Reeves correspondence
    5. 5. Content Diversity• Variety of content types – Text: letters, books, pamphlets, ephemera, manuscripts, diaries, ballads, ess ays – Audio: sound recordings – Video: video interviews – Images: photographs, slides, paintings – Cartographic: maps – Datasets: database• 30 Core Collections online• 17 Research Collections – which show how research can be done using existing digital resources and how this can generate even further research resources in digital humanities
    6. 6. Static Repository v. Digital Library• Why are we digitising?• What are we digitising?• For whom?• How will this be of benefit?• How will we know?• How will we provide support?• How will we provide access? Different levels of access?• What infrastructure?• What metadata standards / policies will we use?• What workflows will we use?• What about long-term preservation?• What models will we use?
    7. 7. OAIS Model: Open Archival Information System• OAIS principles (ISO 14721:2003) – Organisation (people) and systems – Provides services to identified communities – Sustainability• Preservation orientation – Durability and usability• Adherence to best practices/standards• Pay attention to data modelling & workflow
    8. 8. OAIS Functional Model (ISO 14721)SIP = Submission Information PackageAIP = Archival Information PackageDIP = Dissemination Information Package
    9. 9. UCD Digital Library Launched in 2012
    10. 10. Infrastructure• Fedora 2.2 to 3.5• Utilisation of Solr – Open source platform to enable flexible and configurable indexing and searching • http://lucene.apa che.org/solr/• JPEG2000• Geospatial capabilities
    11. 11. Collections• 54 online, incl. 3 new – 19th Century Social History Pamphlets – UCDscholarcast – Thomas Hardy’s The Return of the Native• 1 in final stage – Desmond FitzGerald Photographs• 6 waiting to begin• 4 planning stage• 4 proposals
    12. 12. Collections• External – 15 collections in Europeana – 2 collections in ARTstor Approximately 190,000 digital files in total…and counting
    13. 13. Examples
    14. 14. Examples
    15. 15. In addition to more collections…• Workflow and metadata policies• Linked data• Extension of geospatial capabilities• Full-text searching• Implementation of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)• Access policies and user accounts for restricted content, etc.• Extension of storage and backup capacity• Investigation of Hydra http://projecthydra.org/
    16. 16. Workflow and Metadata Policies• Different workflows for different collection types• Different metadata policies for different collection types – MODS (Metadata Object Description Schema); DC (Dublin Core); ESE (Europeana); EAD (Encoded Archival Description) – electronic finding aid – VRA (Visual Resources Association) – Geospatial (ISO 19115 Geographic Information – Metadata, etc.)
    17. 17. Linked Data• Exposure of metadata using semantic web technologies – Makes metadata actionable as data – Uses RDF (resource description framework) data model • Expressions about resources in the form of SUBJECT- RELATIONSHIP- OBJECT (“triple”)
    18. 18. Geospatial: New Web Mapping Framework• Geographic dimension to many resources• New mapping framework implemented to better expose this geospatial information to its users• Framework provides new tools for finding resources by geospatial criteria • makes use of the geospatial indexing capabilities of its search engine Solr • can visualise georeferenced information on a map• These enhancements improve user experience but also lay the foundation for additional geospatial data and information services planned for the UCD Digital Library, such as the display of georectified historic maps.
    19. 19. Full-text Searching• “I was trying to search for a person (my aunt, who worked in Leeson St and was one of the people who moved the hospital to Nutley Lane). However, if I understand it correctly, the files are not searchable. Is this planned and what system would be used?”
    20. 20. Implementation of OAI-PMH• The Open Archives Initiative Protocol for Metadata Harvesting (OAI- PMH) is a mechanism for repository interoperability• Implementation would allow for automatic harvesting of records to library catalogue
    21. 21. Hydra http://projecthydra.org/Hydra provides a versatileand feature richenvironment for end-usersand repositoryadministrators alike• Manipulation of images (e.g. crop, rotation)• Manage workflows
    22. 22. UCD Digital LibrarySoftware Hardware• oXygen • i2s SupraScan Quartz A1 HD• FileMaker Pro book scanner• Adobe Photoshop • Kodak iQsmart2 flatbed• Adobe Bridge scanners x 2• Komodo • Scanning workstations• HeidiSQL• Cygwin • External hard drives• NX Client • VM servers• Fedora-Commons• Apache Tomcat
    23. 23. UCD Digital LibraryLanguages File Formats• XML • TIFF• XSLT • JPEG2000• PHP • JPEG• HTML • PDF• PERL • Word• Java • Excel• JavaScript • CSV• Ruby-on-Rails • Shapefiles• SPARQL • WAV• JSON • MP3• etc • MPEG4
    24. 24. UCD Digital LibraryMetadata Schemas Authorities/Ontologies• MODS • LCSH• DC • Art & Architecture Thesaurus• EAD • MARC country & geographic• ISO 19115 area codes• VRA • LCTGM• NUDS • Logainm - Placenames• ESE Database of Ireland• EDM • VIAF• METS • DBPEDIA• MIX • GeoNames• PREMIS • OpenStreetMap
    25. 25. Challenges• Complexity• Range of skills needed• Mainstreaming• Managing expectations
    26. 26. Complexity
    27. 27. Simplified Workflow Analog Assessment Rights Selection Clearance Digital Digitisation BackupPublish Storage Preservation Post Digital Processing Library Ingest Metadata Workflows Creation © University College Dublin
    28. 28. Implementation of Content Access
    29. 29. Skills• Project management• Collections management: policy, identification, acquisition• Relationship building – repositories, Schools, external organisations• Systems architecture and administration (includes security)• Systems analysis, systems integration• Workflow analysis• Digitisation • Library• Metadata • Research IT• Digital preservation • Corporate & Legal• Intellectual property/Digital rights management • UCD Communications• Marketing & promotion • Schools, Repositories etc.• Analysing use • Communities of Practice e.g. Fedora Community
    30. 30. Mainstreaming• Research & Innovation • Client Services – Research Services Manager – College Liaison Librarians – – Digital Initiatives SLA Presentations, advocacy – Research Services LA – Desk staff; discoverability through – Metadata Librarian library catalogue – Programmer – Assistance with aspects of the digitisation project e.g. image processing• Collection Services • Outreach – Collection Development & – Promotions, global Description Librarians (metadata) communications – Collection development policy – integration • Build on existing skills-base – Cataloguing – metadata – Institutional Repository
    31. 31. Managing Expectationswww.stanford.edu/group/toolingup/rplviz/rplviz.swf
    32. 32. Opportunities• Working / collaborating with Schools and Repositories• Working / collaborating with external organisations• Identifying and using existing scanned collections• Facilitating availability, accessibility and usability• Using technologies to enhance the user experience• Staff development• Library collaboration (?)
    33. 33. Working with Schools and Repositories
    34. 34. • Builds relationships – Schools, Institutes, Repositories, Buildings & Services, etc.• Collections become more visible and discoverable• Library role in relation to auditing existing digital collections on Campus• What collections? Links to anniversaries and other defined priorities
    35. 35. Collaborating with external organisations• Osi 19th century 5’ and 10’ town plans • TCD’s 1:25,000 military series and UCD’s 25” maps to fill their gaps
    36. 36. Existing scanned collections• May need to be “rescued”• May need more work• May help fill mutual gaps• Collaborate – fewer staff, sharing of skills (e.g. metadata, geo-rectification)• Be clear about who does what and what the outcomes will be (MOU) – OSI: 1,000 scans of 5’ and 10’ 19th towns / cities (2006- 2008); UCD: metadata and geo-rectification – TCD & UCD map swap
    37. 37. Availability and Accessibility• Availability “The main advantage to having the journal of primary digitised, for me, was that it greatly source increased how accessible it was. material Prior to the journal being made available in digital format the only copies (most especially• Easy of the older editions) were those in the accessibility National Library or those stored in the INMOs to such own (excellent) library. materials As you know, researchers consulting library copies are somewhat constrained by library opening hours and by the fact that another researcher may be using the material on the same day - rendering it unavailable to you”.
    38. 38. • Enables the promotion of a collection to multiple related disciplines
    39. 39. Technologies• Use of JPEG2000 to facilitate zooming without loss of detail
    40. 40. Staff Development and Library Collaboration• Build on staff experience• Develop new skills in a growth area• Possibilities for libraries collaborating with each other?
    41. 41. Most Popular Collections (past 5 months)• 1. Schools Manuscript Collection - My Home District 2. The Irish Nursing Journals Collection 3. Folklore Photograph Collection 4. Schools’ Manuscript Collection - Carna & Ballinasloe, Co. Galway 5. Folklore Schools 1937-38 6. Tierney/MacNeill Photographs 7. Questionnaire: Irish Famine (1845-1852) 8. Historic Maps Collection 9. Folk music 10. UCDscholarcast
    42. 42. The Broader Vision …The UCD Digital Library is ….• A repository of UCD’s digital cultural heritage materials…diversity• A repository of data of various kinds• A resource discovery framework• A platform for scholarly interaction with digital content• A platform for new forms of digital publications• A platform for the dissemination of the outcomes of UCD research and creativity• A key component of UCD research infrastructure• A platform for innovation in library services, teaching & learning, and research• A catalyst for new partnerships between UCD Library and its extended community• A node in an emergent global environment of linked data
    43. 43. digital.ucd.ie• Sé fáth mo bhuarthaSé fáth mo bhuartha