Networked digital library through harvesting


Published on

Digital Archive

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Networked digital library through harvesting

  1. 1. Networked Digital Library through Harvesting: The Future of Digital Archiving Barnali Roy Choudhury and Dr. Parthasarathi Mukhopadhyay Department of Library and Information Science The University of Burdwan, Burdwan – 713 104
  2. 2. DIGITAL LIBRARY A digital library is a library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible by computers.[1] The digital content may be stored locally, or accessed remotely via computer networks. (Wikipedia) The DELOS Digital Library Reference Model[2] defines a digital library as: An organization, which might be virtual, that comprehensively collects, manages and preserves for the long term rich digital content, and offers to its user communities specialized functionality on that content, of measurable quality and according to codified policies.
  3. 3. No traditional library is self sufficient; No digital library is self sufficient;
  4. 4. Networked Digital Library An entity that collects metadata in a central place from selected Dls for providing centralized searching
  5. 5. OBJECTIVES  To harvest metadata in a single window (centralized search facility) from different OAI/PMH repositories related to LIS;  To design union catalogue of scholarly objects through harvesting (by using OAI/PMH protocol, PKP open source harvesting software on LAMP architecture); and  To provide comprehensive search facilities to end users of LIS domain for accessing scholarly objects (search metadata locally and access full-text globally).
  6. 6. CRITERIA for DL selection Selection of a particular domain Selection of most efficient and effective dataset Selected data are OAI/PMH compatible or not
  7. 7. Open Access Institutional Digital Repository Institutional Digital Repositories (IDRs) are digital collections that organize, preserve, and make accessible the intellectual output of a single institution or a group of related institutions (Crow, 2002). A typical IDR has following attributes Open-access Repositories allow author/ right holders to deposit their articles  May allow preprints (pre-published manuscripts)  Normally allow post-prints (peer-reviewed and published articles)  Most reputed academic publishers allow authors to deposit some version of their articles in such repositories (
  8. 8. OpenDOAR
  9. 9. ROAR
  10. 10. IDRs in LIS domain Directory for Open Access Repositories ( lists      around 51 open access repositories among them 43 are in English language; 24 are only LIS & IT related; 18 are OAI/PMH compatible. In English, ELIS consist of highest no. of records i.e, 9565 Registry of Open Access Repositories (roar. lists around 6 institutional repositories among them 5 are OAI/PMH compatible. allow us to search & list open access repositories by subject, country and content type.
  11. 11. Cross Collection Interoperability These repositories allows submission of scholarly materials globally (i.e cross-institutional) by extensive uses of two interoperability standards Z39.50 is a protocol for distributed search services; OAI/PMH deals with metadata harvesting
  12. 12. What is OAI/PMH 1. The OAI/PMH is a light-weight standard protocol for harvesting metadata records from ‘data providers’ to ‘service providers’ 2. It provides some rules to harvest the metadata of a repository not the full content. 3. The content should be retrieve form source repository allows ‘service provider’ to say ‘give me some or all of your metadata records’ 4. Based on HTTP and XML 5. Simply carries metadata 6. Mandates simple DC as record format  but extensible to any XML format – IEEE LOM, ONIX, MARC, METS, MPEG-21, etc.
  13. 13. HOW OAI WORKS? OAI “VERBS” Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord H HTTP Request A (OAI Verb) R V E OAI S T HTTP Response E (Valid XML) R R E P O OAI S I T O R Y
  14. 14. METHODOLOGY OF DESIGING     LAMP related activities Harvester related activities Repository related activities Development of repositories
  15. 15. LAMP related activities  The prototype harvesting framework developed at Department of LIS, The University of Burdwan, named as UniLIS, is based on open source software and open standards. It uses LAMP architecture as base,  Linux (Ubuntu 9.10)as operating system,  Apache (2.2.8) as Web server,  MySQL (5.0.0) as RDBMS, and  PHP version 5.X as harvesting tool Linking PHP with Apache & MySQL
  16. 16. Harvester related activities The requirements of PKP harvester are as follows –  PHP >= 4.2.x (including PHP 5.x); Microsoft IIS requires PHP 5.x  MySQL >= 3.23.23 (including MySQL 4.x/5.x)  Apache >= 1.3.2x or >= 2.0.4x or 2.0.5x /Microsoft IIS 5.x or 6.x  Operating system: Any OS that supports the above software, including Linux, BSD, Solaris, Mac OS X, Windows (preferably NT based Windows flavors)
  17. 17. Harvester related activities This group includes two major tasks – • Installation of PKP harvester requires a) login name and password for system administrator (root user) b) database details (name of the MySQL database, user of database and password of the database user)
  18. 18. Harvester related activities ii) Configuration of PKP harvester  a) site management (configuration of site specific details, language, crosswalk, plug-in and reading tools);  b) Archives (creation of archives, managing created archives); and  c) other administrative functions (layout, customization etc.).
  19. 19. UniLIS Burdwan Department of LIS, The University of
  20. 20. UniLIS Department of LIS, The University of Burdwan
  21. 21. Site Administration
  22. 22. IDRs related requirements Name of open access repositories LDL Librarians Digital Library Sponsoring Institute Documentation Research and Training Centre (DRTC), Indian Institute, Bangalore centre (ISI). India. No of records 249 items (2009-03-13) Software in use Dspace URL of the repository OAI/PMH base URL est Document type Articles; Conferences; Theses; Multimedia Language English, Hindi, Kannada
  23. 23. ADD ARCHIVE
  24. 24. ARCHIVES
  26. 26. BROWSING
  27. 27. BROWSING
  28. 28. SEARCHING
  29. 29. Search result
  30. 30. View Record
  31. 31. View Original
  32. 32. UniLIS repository  Presently it includes 5 large-scale open access repositories in LIS domain.  In future it is going to include LIS specific open access journals, ETDs and other open access repositories for the purpose of developing a comprehensive local search service for open access resource in the domain of LIS.
  33. 33. THANK YOU