Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Islandora Webinar: Highlighting CUHK Chinese Digital Collections


Published on

The webinar will feature a presentation and Q&A session with Jeff Liu, Digital Services Librarian and Louisa Lam, Head, Research Support and Digital Initiatives at the CUHK Library.

The CUHK Library has curated a collection of over five million digital objects in the past 20 years. It features Chinese literature, culture, arts, politics, society and religion. Until recently, the collection was stored in a broad range of different systems, complicating the discovery of these precious digital assets.

In 2015, librarians at CUHK embarked on a project to find a permanent, single platform for digital content. Objectives of the project included enhanced discoverability, multi-language support (Chinese, Japanese & Korean) and custom development capability to modify display and viewing features that would showcase Chinese literature in its true form.

Islandora met all the functional requirements and more, including support for digital humanities projects and access to a user-driven open source software community.

The CUHK library was also attracted to the vendor services and support available through discoverygarden. We provided advice, support and custom development assistance; contributing to the launch of the digital repository every step of the way.

The repository ( officially launched in February 2016, making the CUHK Library digital initiatives pioneers in Hong Kong.

Published in: Software
  • Hello! Who wants to chat with me? Nu photos with me here
    Are you sure you want to  Yes  No
    Your message goes here

Islandora Webinar: Highlighting CUHK Chinese Digital Collections

  1. 1. ABOUT US • Louisa Lam • Head of Research Support & Digital Initiatives, a new team setup in July 2015 • Prior to that was the Head of Information Technology & Planning Team, responsible for all issues of library systems and infrastructures, and Digitization Projects • Jeff Liu • Has assumed the role of Digital Services Librarian since July 2015. • Prior to that was a Systems Librarian with duties in the management of ILS, Ezproxy, Website development, Technical Support of all E-Resources, application system, plus Digitization • No deep understanding/prior experience in metadata, MARC, MODS, Solr, Fedora which are all core components of Islandora
  2. 2. ABOUT CUHK • Established in 1963 • A comprehensive research university with a 137.3-hectare campus overlooking Tolo Harbor at Shatin, New Territories • Comprises of 9 colleges, 8 Faculties and Graduate School • 18,698 undergraduates and postgraduates, 7,157 teaching staff
  3. 3. ABOUT CUHK LIBRARY • Comprised of 7 libraries: • University Library • Lee Quo Wei Law Library • Chung Chi College Elizabeth Luce Moore Library • Architecture Library • New Asia College Ch’ien Mu Library • United College Wu Chung Multimedia Library • Li Ping Medical Library 15
  5. 5. OVERVIEW OF CUHK DIGITAL COLLECTIONS • 333,263 objects in the system currently • ~98% are book / manuscript images, ~11,000 records are ETDs • There are also images and photos • 95% are in Chinese • Over 3 million image objects not yet migrated – Some require special handling (Hong Kong Literature Database, Rulan Chao Pian Music Collection • Continuous development of new collections
  6. 6. MOVING INTO ISLANDORA Before 2012, over 5 million of digital objects stored in Tamino XML Database with no user interface Difficulties: • Time consuming to develop new interface, new schema, new workflow for each new collection • Every upload and every metadata update need to submit a request to technical team • Not flexible enough for fast growing collection and large-scale implementation • Staff mindset not yet ready for open source collaborative program development
  7. 7. MOVING INTO ISLANDORA Interim Solution (2013 – 2016): • Develop a new portal using Drupal CMS to have a standard interface Long-term Solution (2014 - ): • Looking for mind-set changes, and alternative system • Re-organize the staff / team structure before re-organizing the content • Open source instead of a proprietary system • Standardization instead of isolated development
  8. 8. PROJECT TIMELINE Time Frame Action Summer 2014 Identified Islandora Aug –Oct 2014 Local installation by our own technical team – identified the needs for more support Oct 2014 Contracted with Discovery Garden Nov 2014 – Mar 2015 Installation, Theming and Customization of CUHK instance by Discovery Garden Jun 2015 Ingested around 100 Daoist Books into the repository without deep understanding of Islandora and structured metadata to meet an urgent request of a faculty - turned out to be a very good trial and error exercise) Jul 2015 New team setup - dedicated to Digitization and Digital Repository development (1 Digital Services Librarian and 1 web programmer) Aug 2015 – Jan 2016 Work with Digital Initiatives Group to tweak the theme to sync with New Library Website developed on Drupal by the same web programmer
  9. 9. PROJECT TIMELINE Time Frame Action Aug 2015 - Studied and implemented the metadata standard, XSLT, user Interface, functionalities – after learning a lot from the Islandora Conference in Aug 2015 Mid Oct 2015 - - Started the re-ingestion of stitch bound classic Chinese books into the platform as the basis for the Daoist Texts Collection after experimenting with the system for months - Developed new workflow for migration - Developed tools and bug fixes to prepare migration of legacy collections 12 Feb 2016 Soft launch of the Repository 17 Mar 2016 Official launch of the Repository with more collections migrated Digital Scholarship Lab opened on the same day Mar 2016 - Continue to migrate legacy collections into the Repository
  10. 10. CHINESE RARE BOOK DIGITAL COLLECTION • First collection to migrate to the new portal built on Drupal CMS • Using Drupal views for search and retrieval. E-books are linked to an external e-book reader (in Flash player format) • Problems: • Flash player is fading out • a large vol. of backlog cannot be handled by the system and servers • Before Islandora
  11. 11. CHINESE RARE BOOK DIGITAL COLLECTION • Book Solution Pack widely used • All objects are on a single platform for searching and viewing • Book metadata output from Innovative ILS and converted with Library of Congress’s MARCXML to MODS XSLT with a few localized changes including flipping Marc Tag 880 (PinYin for Chinese characters) and adding local note and TOC fields • With Islandora
  12. 12. CHINESE RARE BOOK DIGITAL COLLECTION • Customized Internet Archive Book Reader page progression for displaying and flipping classic Chinese books correctly (different from western books) • All books objects in zip files are ingested by Drush command (The parameter of page progression entered during ingestion) • With Islandora
  13. 13. CHINESE RARE BOOK DIGITAL COLLECTION (4) • Customized sorting for display of Chinese titles with the proper titleinfo and the numeric value of partNumber from MODS (otherwise v. 2 will be shown after v. 19 but not v.1 by default sorting) • With Islandora
  14. 14. CHINESE RARE BOOK DIGITAL COLLECTION • Using the OpenSeaDragon Viewer, the image can be displayed clearly in the huge digital display wall at the new Digital Scholarship Lab • The wall is built from twelve nos. of 55-inch high-resolution LED TV screens to provide an extremely high resolution of 24,883,200 pixels (7,680 pixels x 3240 pixels) • With Islandora
  15. 15. OTHER CHINESE RARE BOOKS COLLECTIONS • As at today, over 320,000 page objects from 1,800 rare books were ingested to the system • The Chinese Rare Books Digital Collection provides a concrete experience for developing a feasible migration strategy for other rare books collections • Book objects are ingested according to their subject area and theming. • Daoist Texts Collection was setup with books titles selected by Department of Culture and Religion Studies and Centre for Studies of Daoist Culture • A new Chinese Medicine Collection would be created • The migration would take more than a year, priorities will be given to those recently digitized but not yet accessible items, followed by B&W images digitized in last 10 years.
  16. 16. ELECTRONIC THESES & DISSERTATIONS COLLECTION • Before Islandora: • Launched in 2014 with more than 10,000 ETD records • One of the very high-use digital collections • Search and retrieval system built on the existing Tamino XML database • Our portal on Drupal CMS using iframe to display the search and browse pages • PDF files are housed at another server • Full text search is not by default due to system capabilities • Each data load is using MARC→Excel→XML • Lack of OAI-PMH function for harvesting
  17. 17. ELECTRONIC THESES & DISSERTATIONS COLLECTION • With Islandora • Launched in the locally-developed platform for less than 2 years, but decided to migrate to Islandora due to the collection’s significance, popularity and well-structured records. • Relies on the Islandora Scholar Solution Pack for display and the Islandora Solr Facet Pages module for browsing • All data ingested are come from Innovative ILS and converted to XML for ingestion • Challenge: display of non-English characters in the Islandora Solr Facet Pages due to limitation of Solr facet prefix • Preparation of data for ingestion took more than 2 weeks. But the migration process is so straightforward that it took < 3 working days. • This collection would be launched in May 2016
  18. 18. Oracle Bones Collection • Before Islandora: • A small but an important collection • Contains images of Oracle Bones that were existed 3,500 years ago (Physical copies are kept at our main library) • Previously used a html webpage to display the digital image. Later changed to use Drupal view/node to store and display in our portal
  19. 19. Oracle Bones Collection • With Islandora • An expert from Academia Sinica, Taiwan, to provide proper metadata on each physical bone • The first project that share metadata at ArchiveSpace • Using OpenRefine to massage the metadata to follow MODS schema for Islandora • OpenSeaDragon viewer to display and magnify the images clearly for users • Future: As some 3D Oracle Bone images are available, may explore to see how they can be displayed in Islandora
  20. 20. Sheng XuanHuai Archive • This is a collaboration project with Arts Museum of CUHK • ~10,000 letters/manuscripts of Mr. Sheng, who was a very influential entrepreneur in the late Qing Dynasty • Vertical-transcribed Chinese text would be displayed side-by-side with the page image • Transcription viewer is not new in Islandora, but vertical display that fits for traditional writing direction of Chinese characters does enrich this collection • This collection would be launched at around Dec 2016
  21. 21. SEARCH OF CHINESE CHARACTERS • Enable the cross search of Traditional Chinese , Simplified Chinese and Variant Chinese characters (TSVCC) E.g. 台灣 (Taiwan) U+53F0 vs 臺灣 (Taiwan) U+81FA • Discovery Garden has helped to apply Hong Kong’s TSVCC mapping table into the Solr • Now conducting test to enhance precise search of word and phrase of Chinese characters
  22. 22. LESSONS LEARNED • Steep learning curve • Support from the community is essential for continuous development • The migration of metadata is a real challenge • Two sources – ILS and ArchiveSpace with different structure and level of details • The development of a standard and automatic method will save much time in further massaging the data in Islandora • Much time spent on developing a single workflow for the team so as to save time and effort in migration – Islandora provides the capability to handle a single workflow for different collections and media types in a single and standard platform • Dedication, focus and concentration helps to execute the project! • An important component for the whole suite of service to support the university's Research and Digital Scholarship activities
  23. 23. THANK YOU Repository URL: Contact: