Smita Chandra LibrarianIndian Institute of Geomagnetism firstname.lastname@example.org
What is a Repository? Open access digital archive on open source software A managed, persistent way of making research, learning and teaching content with continuing value both discoverable and accessible Repositories can be subject or institutional in their focus Putting content into an institutional repository enables staff and institutions to manage and preserve it, and therefore derive maximum value from it A repository can support research, learning, and administrative processes. They are commonly used for open access research outputs
What is an institutional repository?Clifford Lynch, Executive Director, Coalition for Networked Information, stated“In my view, a university-based institutional repository is a set ofservices that a university offers to the members ofits community for the management and dissemination of digitalmaterials created by the institution and its community members. It ismost essentially an organizational commitment to the stewardship ofthese digital materials, including long-term preservation whereappropriate, as well as organization and access or distribution.” ARL: A Bimonthly Report, no. 226 (February 2003) Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age http://www.arl.org/resources/pubs/br/br226/br226ir.shtml
Open Access Institutional Repositories What is open access (OA)?Many definitions – a report from the Joint Information Systems Committee (JISC) in the UK of 2006 stated:The Open Access research literature is composed of free, online copies of peer-reviewed journal articles and conference papers as well as technical reports, theses and working papers. In most cases there are no licensing restrictions on their use by readers. They can therefore be used freely for research, teaching and other purposes. (http://www.jisc.ac.uk/publications/publications/pub_openacce ss_v2.aspx)An Open access institutional repository is that repository where are contents are freely available for use.
What OA is not ? There are various misunderstandings about Open Access. It is not self-publishing, nor a way to bypass peer-review and publication, nor is it a kind of second- class, cut-price publishing route. It is simply a means to make research results freely available online to the whole research community. http://www.jisc.ac.uk/publications/briefingpapers/20 06/pub_openaccess_v2.aspx
Gold and Green OA publishingGold OA - uses a funding model that does not charge readers or their institutions for access e.g. Ariadne, D- Lib Magazine and First MondayGreen OA - authors publish papers in one of the 25,000 or so refereed journals in all disciplines and then self- archive these papers in open access/digital/institutional repositories.
Institutional Repositories are:Centered around a university (other academic institution) and contain items which are the scholarly output of that institutionA collection of (digital) objects, in a variety of formatsInclude works of various degrees of scholarly authority and from various stages in the process of scholarly inquiry. In addition to published works, an IR may include preprints, theses & dissertations, images, data sets, working papers, course material, or anything else a contributor depositsTypically motivated by a commitment to open access
Institutional RepositoriesInstitutions are logical implementers of repositoriesbecause they can take responsibility for:– Centralising a distributed activity– Framework and Infrastructure– Permanence that can sustain changes– Stewardship of Digital assets– Preservation policy for long term access– Provide central digital showcase for the research, teaching and scholarship of the institution
IRs & Digital LibrariesInstitutional Repositories Digital Libraries Are organized around a May be built around any number of particular institutional organizing principles (often topic, community subject, or discipline) Often are dependent upon the Are the product of a deliberate voluntary contribution of collection development policy materials by scholars for the content in their collection Typically include an important Are mainly repositories and service aspect (reference and therefore may only offer limited research assistance, interpretive user services content, or special resources.)
How does an IR content differ from other digital collections? Content is deposited in a repository – by content creator, owner etc. Repository architecture manages the content and the metadata Repository software offers a minimum set of basic services – put, get, search Repository must be sustainable, trusted, well- supported and well-managed Heery, R. and Anderson S. (2005) Digital Repositories Review. UKOLN and AHDS. Available at: http://www.jisc.ac.uk/uploaded_documents/digital- repositories-review-2005.pdf
Origins & Development Open Archives Initiative- Protocol for Metadata Harvesting (OAI-PMH)Digital Library
Why? – university view An institutional repository is a tangible indicator of research output of a university – thus increasing its visibility, prestige and public value Repository content is readily searchable – both locally and globally Can be used as a marketing tool for the institution Allows an institution to manage its Intellectual Property Rights appropriately
Why? – funder’s view Funders see improved access to, and wider dissemination of research For example, in the UK the eight research councils have adopted policies mandating that results from their tax-payer funded research be ‘open’, available and accessible to all via IRs or similar subject repositories e.g. Economic and Social Research Council http://www.esrc.ac.uk/_images/Full_text_decision _tree_tcm8-4138.pdf/
IRs can be used for: Scholarly communication Storing learning materials and coursework Managing collections of research documents Preserving digital materials for the long term Knowledge management Electronic publishing Research assessment exercise Collaboration tool
Benefits of setting up an institutional repositoryFor researchers Showcase your institute’s output Increases citation for authors 24-hour access through any web-enabled device Life’s work in one location Satisfies funder’s mandates Persistent URLsFor librarians Provides new ways for archiving & preserving valuable work Time-saving and cost-effective Help to identify trends Reduce duplication of records
More BenefitsFor the university An effective marketing tool Increase the visibility, reputation and prestige Greater interdisciplinary research Enhanced funding Facilitates gathering data such as publications for AssessmentsFor the global community Free access of scholarly information Taxpayers fund a large amount of scientific research Developing countries Increase public knowledge Gain access to a wide variety of materials
Publication and DepositionAuthor writes paper Submits to journal
Publication and DepositionAuthor writes paper Submits to journal Deposits in e-print repository
Publication and DepositionAuthor writes paper Submits to journal Deposits in e-print repository Paper is refreed
Publication and DepositionAuthor writes paper Submits to journal Deposits in e-print repository Paper is refreed Revised by author
Publication and DepositionAuthor writes paper Submits to journal Deposits in e-print repository Paper is refreed Revised by author Author submits final version
Publication and DepositionAuthor writes paper Deposits in e-print Submits to journal repository Paper is refreed Revised by author Author submits final version
What type of content can be deposited in an Institutional Repository? Faculty Pre-prints, post-prints, research findings, working papers, technical reports, conference papers Multimedia, videos, teaching materials, learning objects Data sets (scientific, demographic, etc.) and other ancillary research material Web-based presentations, exhibits, etc. Students Theses and dissertations Projects and portfolios Awarded research Performances and recitals
Starting & Maintaining an IRSteps to Building an IR1. Justify the relevance to the institution and contributors2. Develop a policy framework. How will we find this content and what will we do with it?3. Build the infrastructureBonus: Get institutional support and a mandate.
Starting & Maintaining an IRIR Technology IR software (Open Source/Commercial) OAI-PMH harvesting protocol/software (Free) Intel/Pentium servers for IR Linux/Red Hat OS, MySQL/PostGre DBMS, Apache/Tomcat web server, Perl/Java (Free)
Starting & Maintaining an IRCore issues • Policy Decisions • Organizational Issues • Cultural Issues
Starting & Maintaining an IR Policy decisions• Scope : Reinforce the repository’s active support for theinstitution’s mission, values and goals- Identify/build a context in which the repository is necessary- Multidiscipline / single subject /Entire research output /database for each functional unit• Types of documents - Single database for different types /single one• Software: OSS like DSpace or GNU Eprints or develop own• Research Deposit Types: Thesis, Journal articles, Preprints, Reports, Conference papers, Book Chapter, etc• Resources: Human, IT, Funding• Stake holders: Library, Each Department, Institute as a whole• Services : Focus on building services not collections
Starting & Maintaining an IR Management and Organizational Issues• Deposit options -Researcher self deposit and /or assisted deposit- Metadata quality - Ensuring quality and rich metadata is labour intensive• Digitization: Born digital / Scanning• File formats: Accept all, Only PDF and/or other, Conversion• Only full text database and/or Bibliographic• Copyright: RoMeO Publishers Copyright policies• Quality assurance: Peer review, Editing• Deposit Agreement and Use Agreement - Depositor’s declaration: Non-exclusive license - Copyright/Patent/Trademarks - Repository’s rights and responsibilities: Distribute, Store, Migrate, Copy Rearrange, Remove - Use Agreement: Copy, Distribute, Display, Share, Author credit
Starting & Maintaining an IR Cultural Issues• Advocacy - Sensitive to organizational culture and background - Community size - Strategy: stakeholders, management committees• Copyright - Concern of researchers, Legal department• Positioning - Library/Institute Website
Obstacles to building a repository in- house Open source institutional repository software is free to acquire but expensive to implement Delays due to slow response times from over-burdened IT services Lack of personnel with the correct skills Projects often go on for much longer than necessary Other priorities can crop up unexpectedly and divert resources away from the repository project
Four Widely Used SystemsProduced by Berkeley Electronic Press (bepress), focused on maintainingscholarly output. Not open source.Developed at the University of Southampton (UK). Widely considered to bethe least complex of the major repository software platforms. Developed at Cornell and University of Virginia. Based on a framework known as the Flexible Extensible Digital Object and Repository Framework. Designed by MIT and Hewlett-Packard to manage the intellectual output of research institutions and provide for long-term preservation.
Subject/Discipline Based Repositories Definition : Subject repositories are archives which collect and manage material relating to one or more related subject areas. A number currently exist mainly within science subjects.Subject repositories often managed by an individualfor a group
Subject/Discipline Based Repositories Relies on peer interaction – no mandate Individual agreements have to be struck No definitive boundaries Quality control issues Sustainability issues Transitory – collection at risk Responsibility for preservation Issues over the return on the money and effort invested
Subject/Discipline Based RepositoriesSignificant subject repositories include many using e-Prints or DSpacesoftware: ArXiv - http://www.arxiv.cornell.edu/ (physics, mathematics, non- linear science and computer science) Cogprints - http://cogprints.ecs.soton.ac.uk/ (Cognitive sciences including psychology, neuroscience, linguistics and other related areas) CiteSeer - http://citeseer.nj.nec.com/cs (computer science) HTP Prints - http://htpprints.yorku.ca/ (History and theory of psychology) PubMedCentral - http://www.pubmedcentral.nih.gov/ (US National Library of Medicines digital archive of life sciences journal literature. PhilSci Archive - http://philsci-archive.pitt.edu/ (philosophy of science) E-LIS - http://eprints.rclis.org/ (library and information science) RePEc (Research Papers in Economics)
How Does an IR Work?The Open Archival Information System (OAIS)
How Does an IR Work?Submission and Ingestion contributor metadata formatting CopyrightPost-Submission quality metadata (DC) Intellectual Property issuesUser QueryOngoing workflows Preservation Administration Data Management System customization
OpenDOAR – Directory of Open Access Repositories The OpenDOAR service provides a quality-assured listing of open access repositories around the world. OpenDOAR staff harvest and assign metadata to allow categorisation and analysis to assist the wider use and exploitation of repositories. Each of the repositories has been visited by OpenDOAR staff to ensure a high degree of quality and consistency in the information provided: OpenDOAR is maintained by SHERPA consortium staff at the University of Nottingham, UK http://www.opendoar.org/about.html
Benefits in depositing material Increase in citations, impact and usage (useful for research evaluations such as the planned Research Evaluation Framework in UK in 2013) Increase in public research profile – both for the individual as well as the institution Preservation of research outputs from the institution
ROAR- Registry of Open Access RepositoriesAims to monitor overall growth in the number of eprint archives and to maintain a list of GNU EPrints sites (http://roar.eprints.org)Available from Southampton University, UKData gathered automatically via OAI-PMHAlso ROAR Materials Archiving Policies – ROARMAP -163 Institutional repositories (including Rourkela National Institute of Technology, Bharathidasan University in India)(http://roarmap.eprints.org)
Other ‘overviews’ of IRsRepository66 – a mash-up by Stuart Lewis formerly of Aberystwyth, now at Auckland University, New Zealand based on OpenDOAR and ROAR (http://maps.repository66.org/)World ranking of institutional repositories(http://repositories.webometrics.info/about_rank.html)
Searching Across Multiple IRs The use of OAI-PMH compliant metadata permits “one stop shopping”
Repository architecture Largely institutional focus though some exceptions – arXiv, COGPRINTS, etc Interoperability through centralized aggregators (national and global) Search services (OAIster, Intute, …) Registries (DOAR, ROAR, …) Harvesting metadata about content using OAI-PMH (metadata = simple Dublin Core) Content = PDF
Constraints of IR Absence of a well defined institutional policy Lack of IR expertise in India Insufficient funds for IT Infrastructure and manpower Apathy of authors towards time consuming and lengthy deposition procedure. Ignorance of users in the absence of appropriate literacy program
Constraints of IR (Contd…) Publisher’s rigid attitude towards copyright policy Customization of open source software is a bottle neck Nature of content: Classified/restricted and Unclassified/Open Diversity of content and the language used in the full texts Relying on unproven methods for long term digital preservation.
Digital Preservation in IRs Importance of Digital Information Preservation 1975 – Two Viking space probes sent to Mars by USA. Data generated by unrepeatable mission cost $1 billion. Recorded data on magnetic tapes was corrupted / unidentifiable after 2 decades despite being kept in climate controlled environment. Scientists could not access data, unable to decode the formats used.
Importance of Digital Information Preservation Original format developers not alive. Finally old printouts tracked and retyped. NASA therefore is the biggest supporter of Digital Preservation Projects. This illustrates wide gap in information generation and its management.
Threats Media decay and failure Massive storage failures, outdated media Access Component Obsolescence Outdated formats, applications & systems Human and Software errors & External Events
Information Deluge Present & Future ProjectionsYawning gap betweenOur ability to create digital informationOur infrastructure and capacity to manage and preserve it over timeCumulative effect foreseen as future “digital dark ages”
Need for Digital Preservation preserving natural/cultural heritages for promoting academic research enabling public access to legacy collections
IRs and Digital Preservation An IR is a model for a preservation system It requires “most essentially an organizational commitment to the stewardship of … digital materials, including long-term preservation where appropriate, as well as organization and access or distribution” Attributes of a “Trusted Digital Repository”“…an organisation that has responsibility for the long-term maintenance of digital resources, as well as making them available [through time and across changing technologies] to communities agreed on by the depositor and the repository.” Research Libraries Group http://www.rlg.org/longterm/attributes01.pdf
Definition: Digital PreservationThe maintenance of digital materials over the long-termwith a view to ensuring its continued accessibility. Itensures that the digital resources are stored correctlyand maintained adequately in the online world, suchthat they are available consistently for use over time.“Long-term” includes timescales of decades or even centuries
Preservation Strategies Technology preservation Keep the hardware alive Technology emulation Create an environment to be able to run the existing software Data migration Convert data to new formats to run in new applications
Open Archival Information System (OAIS) SIP = Submission Information Package AIP = Archive In formation Package DIP = Dissemination Information Package Published by Consultative Committee for Space Data System (CCSDS) 2002, ISO 14721 : 2003 standard An archive consists of an organization of people and systems with responsibility to preserve information and make it available to users.
OAIS: Definitions To define an Open Archival Information System The term open means that the document was developed in an open way, and does not imply that access to any OAIS should be unrestricted An archive is defined as an "organization that intends to preserve information for access and use by a designated community." (p. 1- 8) While an OAIS itself need not be permanent, the information being maintained has been deemed to need "Long Term Preservation" Long term = long enough for there to be a concern about the impact of changing technologies
OAIS: Purpose and Scope Primary focus on digital information Specific aims include: A framework for the understanding and awareness of the archival concepts needed for long term preservation (access) Terminology and concepts for describing and comparing: Architectures and operations Preservation strategies and techniques Data models Consensus on elements and processes for long term preservation A foundation for other standards
OAIS: Applicability Applicability: Applicable to any archive, but mainly focused on organisations with responsibility for making information available for the long term Of interest to those who create information Conformance An OAIS must support the information model - but does not specify any particular method of implementation Mandatory responsibilities (section 3.1)
Implementing OAIS Summing up the fundamentals : OAIS is a reference model (conceptual framework), NOT a blueprint for system design It informs the design of system architectures, the development of systems and components It provides common definitions of terms, a common language and means of making comparison But it does NOT ensure consistency or interoperability between implementations
Summing Up : OAIS The OAIS model is a foundation stone for current and future digital preservation efforts It is already widely used to inform the development of preservation tools and repositories It could be used in the future as a basis for conformance
Research Objectives1. To design an institutional repository using DSpace, that is both sustainable and viable and can fulfill the long-term digital preservation of materials deposited into it2. To map the Open Archival Information System (OAIS) Reference Model on the in situ institutional repository, weigh the benefits of OAIS features against institutional repository usability and to identify the institutional repository challenges to the relevant features of the OAIS3. To assess the applicability of products developed by projects employing the OAIS model on small and medium sized institutional repositories, using the IIG institutional repository as a test bed4. To ensure that the required policies, guidelines, strategies, procedures and agreements exist while implementing the OAIS model, that will embed digital preservation into IIG’s workflow
Conclusion from the study This research was able to identify all the components necessary for the implementation of the OAIS model for a geoscience domain specific institutional repository