Institutional Repositories


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Institutional Repositories

  1. 1. Institutional Repositories & Discipline Based Repositories Pauline Simpson National Oceanography Centre, Southampton GRADE Kick Off Meeting 28 Sep 2005
  2. 2. Outline <ul><li>Geospatial data = </li></ul><ul><li>Institutional and Subject Repositories </li></ul><ul><li>Repository choices </li></ul><ul><li>Data Centres </li></ul><ul><li>Possible solutions </li></ul>
  3. 3. Geospatial Data <ul><li>Scope within GRADE </li></ul><ul><ul><li>Numerical data, raw and analyzed </li></ul></ul><ul><ul><li>Information Products </li></ul></ul><ul><ul><ul><li>Publications </li></ul></ul></ul><ul><ul><ul><li>CD Roms, DVD </li></ul></ul></ul><ul><ul><ul><li>Learning Objects </li></ul></ul></ul>
  4. 4. Repositories are spreading because … <ul><li>Supplementary to traditional publication </li></ul><ul><li>Do not affect current research publication processes </li></ul><ul><li>Give easy access </li></ul><ul><li>Give rapid access </li></ul><ul><li>Give long-term access </li></ul><ul><li>Increase readership and use of material </li></ul><ul><li>They offer advantages to institutions </li></ul><ul><li>They offer advantages to research funders </li></ul><ul><li>They offer new ways for information to be linked and used </li></ul>
  5. 5. Subject/Discipline Based Repositories <ul><li>Relies on peer interaction – no mandate </li></ul><ul><li>Individual agreements have to be struck </li></ul><ul><li>No definitive boundaries </li></ul><ul><li>Quality control issues </li></ul><ul><li>Sustainability issues </li></ul><ul><li>Transitory – collection at risk </li></ul><ul><li>Responsibility for preservation </li></ul><ul><li>Issues over the return on the money and effort invested </li></ul><ul><li>? A trusted repository? Supported by …. </li></ul>Subject repositories often managed by an individual for a group
  6. 6. Subject repositories are archives which collect and manage material relating to one or more related subject areas. A number currently exist mainly within science subjects. <ul><li>Significant subject repositories include many using e-Prints or DSpace software: </li></ul><ul><li>ArXiv - http:// / (physics, mathematics, non-linear science and computer science) </li></ul><ul><li>Cogprints - http:// / (Cognitive sciences including psychology, neuroscience, linguistics and other related areas) </li></ul><ul><li>CiteSeer - http:// (computer science) </li></ul><ul><li>HTP Prints - http:// / (History and theory of psychology) </li></ul><ul><li>PubMedCentral - http:// / (US National Library of Medicine's digital archive of life sciences journal literature. </li></ul><ul><li>PhilSci Archive - / (philosophy of science) </li></ul><ul><li>E-LIS - http:// / (library and information science) </li></ul><ul><li>RePEc (Research Papers in Economics) </li></ul><ul><li> </li></ul><ul><li>   </li></ul>
  7. 7. Institutional Repositories <ul><li>Freely accessible web-based databases providing access to the full text of scholarly material produced by members of an institution. </li></ul><ul><li>Digital collections that capture and preserve the intellectual output of the communities. </li></ul><ul><li>What are the essential elements? </li></ul><ul><li>Institutionally defined: Content - generated by the community </li></ul><ul><li>Scholarly content: , published articles, books, book sections, preprints </li></ul><ul><li>and working papers, conference papers, enduring teaching </li></ul><ul><li>materials, student theses, data-sets, etc. </li></ul><ul><li>Cumulative & perpetual: preserve ongoing access to material </li></ul><ul><li>Interoperable & open access: free, online, global, utilising standards : </li></ul><ul><li>OAI , Dublin Core etc </li></ul>
  8. 8. Institutional Repositories <ul><li>Institutions are logical implementers of repositories </li></ul><ul><li>because they can take responsibility for: </li></ul><ul><li>  </li></ul><ul><li>–        Centralising a distributed activity </li></ul><ul><li>–        Framework and Infrastructure </li></ul><ul><li>–        Permanence that can sustain changes </li></ul><ul><li>–        Stewardship of Digital assets </li></ul><ul><li>–        Preservation policy for long term access </li></ul><ul><li>–        Provide central digital showcase for the research, </li></ul><ul><li>teaching and scholarship of the institution </li></ul><ul><li>“ a trusted repository” supported by the Information Community </li></ul>
  9. 9. Institutional Repository Software for geo data <ul><li>OSI Directory of Institutional Repository Software V.3 http:// / </li></ul><ul><li>E-Prints (GNU)  [ http:// / ].  Open-source OAI-compliant </li></ul><ul><li>software developed at University of Southampton to enable anyone to set up </li></ul><ul><li>their own Open Archives-compliant institutional archive.  Originally programmed for subject repositories but now re-engineered for IR. D oes not identify treatment of datasets, though can cover bibliographic description </li></ul><ul><li>DSpace: Durable Digital Depository [ http:// / ].  Open-source software developed at MIT for their own repository; released as open source software in Nov. 2002.  </li></ul><ul><li>Overtly identifies datasets. Offers opportunity to explore the issues surrounding the </li></ul><ul><li>incorporation of different metadata standards within one system…. Different disciplines have adopted different sets of metadata standards to accommodate their particular data needs. </li></ul><ul><li>Two examples are the CSDGM standard for geospatial data and the DICOM standard for digital imaging in medicine. … develop more general standards, such as Dublin Core, which </li></ul><ul><li>proposes a basic set of common elements that can be used across many different disciplines and document types. </li></ul><ul><li>(DC and MARC are norms) </li></ul>
  10. 10. need to register to search
  11. 11. - information products
  12. 12. Repository Choices <ul><li>Subject - arXiv, Cogprints, RePEC, </li></ul><ul><li>Institutional – Southampton, Glasgow, Nottingham (SHERPA), MBA UK </li></ul><ul><li>National - DARE (all universities in the Netherlands), Scotland, British Library (proposal) </li></ul><ul><li>National / Subject - ODINPubAfrica </li></ul><ul><li>International - Internet Archive ‘Universal’, OAIster </li></ul><ul><li>Regional - White Rose UK </li></ul><ul><li>Consortia - SHERPA-LEAP (London E-prints Access Project) </li></ul><ul><li>Funding Agency – NIH (PubMed), Wellcome Trust (UK PubMed), NERC </li></ul><ul><li>Project - Public Knowledge Project EPrint Archive </li></ul><ul><li>Conference - 11th Joint Symposium on Neural Computation, May 15 2004 </li></ul><ul><li>Personal – peer to peer, web pages etc </li></ul><ul><li>Media Type - VCILT Learning Objects Repository, NTDL (Theses) </li></ul><ul><li>Publisher – journal archives </li></ul><ul><li>Data Repositories/Archives - NODC, BODC, DOD, JODC, BADC etc </li></ul><ul><ul><li>Science, particularly Environmental Science is well served </li></ul></ul><ul><ul><li>Logical host for numeric datasets </li></ul></ul>
  13. 13. Data Centres/ Archives / Repositories <ul><li>Within organisational infrastructures but not defined by it </li></ul><ul><li>National responsibilities </li></ul><ul><li>Subject and Technical Specialists, quality control of content </li></ul><ul><li>Secure storage and migration policies </li></ul><ul><li>Well developed Metadata schema & Standards </li></ul><ul><ul><li>DIF – Directory Interchange Format, FGDC etc </li></ul></ul><ul><ul><li>ISO 19115 </li></ul></ul><ul><ul><ul><ul><li>the minimum set of metadata required to serve the full range of metadata applications (data discovery, determining data fitness for use, data access, data transfer, and use of digital data); </li></ul></ul></ul></ul><ul><ul><ul><ul><li>optional metadata elements - to allow for a more extensive standard description of geographic data, if required; </li></ul></ul></ul></ul><ul><ul><ul><ul><li>a method for extending metadata to fit specialized needs. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Though ISO 19115:2003 is applicable to digital data, its principles can be extended to many other forms of geographic data such as maps, charts, and textual documents as well as non-geographic data. </li></ul></ul></ul></ul><ul><li>“ a trusted repository” supported by the Data Management Community </li></ul>
  14. 14. <ul><li>ARCHIMEDE : A Canadian software solution for institutional repositories [ http:// ]. OAI compliant software developed by Laval University Library. Archimede has been developed in a multilingual perspective, with internationalization as a focus. The text (or content) of the interface is independent and not embedded in the code making it relatively easy to develop an interface in a specific language without having to work on the code itself. English, French and Spanish interfaces are already offered in Archimede. That feature allows also the user to switch easily from language to language anywhere and anytime during his search and retrieval process. </li></ul><ul><li>Berkeley Electronic Press [ http:// ].  Commercial OAI-compliant software used by the University of California’s eScholarship Repository . </li></ul><ul><li>CERN Document Server Software (CDSware) [ http:// / ]. OAI compliant software developed by, maintained by, and used at, the CERN Document Server. </li></ul><ul><li>Project Tapir [ ]: Tapir provides additional functionality to digital asset management software DSpace primarily designed for Electronic Theses and Dissertations supervision, submission and dissemination. See Queen's University Project .   </li></ul><ul><li>  </li></ul><ul><li>Fedora™ Project: An Open-Source Digital Repository Management System [ http:// / ]. Jointly developed by the University of Virginia and Cornell University, Fedora is a general-purpose digital object repository system that can be used in whole or part to support a variety of use cases including: institutional repositories, digital libraries, content management, digital asset management, scholarly publishing, and digital preservation. </li></ul><ul><li>Greenstone [ = p&p =home ].  Suite of open-source multilingual software for building and distributing digital library collections.  Produced by the New Zealand Digital Library Project at the University of Waikato, and developed and distributed (since 2000) in cooperation with UNESCO and the Human Info NGO.  Presently in limited use at New Zealand Digital Library Project and some other sites. </li></ul><ul><li>OCLC Research Software [ http:// ]. A list of open source software developed by the Online Computer Library Center (OCLC) to build a repository and harvest data according to OAI-PMH standards. </li></ul><ul><li>FIGARO, i-TOR, etc </li></ul>
  15. 15. Dilemma for Researcher <ul><li>Mandates from major funding agencies now require grantees to deposit research output in a ‘designated repository’ or ‘any’ </li></ul><ul><ul><li>Wellcome Trust (UK PubMed) - £400 million producing 3500 papers per year </li></ul></ul><ul><ul><li>RCUK </li></ul></ul><ul><li>Where should the full text of their research be deposited </li></ul><ul><li>Researcher wants to enter metadata and deposit only once and perhaps deposit all related material in one place? </li></ul><ul><li>Situation at present </li></ul><ul><ul><li>Harvesting, but harvester is not the choice of the depositor </li></ul></ul><ul><ul><li>Duplicate keying metadata into repositories of choice </li></ul></ul><ul><ul><li>Cannot target multiple repositories with one exercise </li></ul></ul><ul><li>Does it matter where it is deposited since Google Scholar, Yahoo, Scopus , will pick it up wherever it is? </li></ul>
  16. 16. Repositories taking over the world? <ul><li>Turf War </li></ul><ul><ul><li>Not between Institutional and Subject Repositories – complementary and should coexist </li></ul></ul><ul><ul><li>Possibly between Text based and Numeric based repositories </li></ul></ul><ul><ul><ul><li>Repositories of whatever flavour v. Data Centres </li></ul></ul></ul><ul><ul><ul><ul><li>Are both spilling over into each others territory? </li></ul></ul></ul></ul><ul><li>The Cavalry : JISC Digital Repositories Programme </li></ul><ul><ul><ul><ul><li>Strand: Linking Text and Data </li></ul></ul></ul></ul>
  17. 17. Learning & Teaching workflows Research & e-Science workflows Aggregator services Repositories : institutional, e-prints, subject, data, learning objects Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Harvestingmetadata Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Deposit / self-archiving Peer-reviewed publications: journals, conference proceedings Publication Validation Data analysis, transformation, mining, modelling Resource discovery, linking, embedding Deposit / self-archiving Learning object creation, re-use Searching , harvesting, embedding Quality assurance bodies Validation Presentation services: subject, media-specific, data, commercial portals Resource discovery, linking, embedding From: Lyon : CNI - JISC - SURF C onference , May 2005
  18. 18. CLADDIER Project ** ( C itation, L ocation A nd D eposition in D iscipline and I nstitutional  R epositories ) <ul><li>The CLADDIER system will be a step on the road to a situation where (in this case, environmental) scientists will to be able to move seamlessly from information discovery (location), through acquisition to deposition of new material, with all the digital objects correctly identified and cited. The lessons learned will be of applicability for the relationships between other discipline based repositories and institutional repositories . </li></ul><ul><li>**JISC Digital Repositories Programme 2005 - </li></ul>
  19. 19. <ul><li>Persistent identifiers </li></ul><ul><li>semantically transparent </li></ul><ul><li>Versioning </li></ul><ul><li>Dataset Citations </li></ul><ul><li>Publishing practice </li></ul><ul><li>Automated Linking both ways </li></ul><ul><li>citation png </li></ul>
  20. 20. Where to Deposit <ul><li>One outcome of CLADDIER Project </li></ul><ul><li>‘ pull’ = Harvesting </li></ul><ul><li>‘ push’ = CLADDIER outcome </li></ul><ul><ul><li>Enable researcher to deposit in one repository and choose to upload (push) the metadata to another repository of choice. </li></ul></ul><ul><ul><li>Logical to ‘push’ from IR to Subject? </li></ul></ul><ul><ul><li>Redundancy of records? </li></ul></ul>
  21. 21. Thank You <ul><li>Pauline Simpson ( [email_address] ) </li></ul>
  22. 22. Data Centres <ul><li>Discovery metadata - What data sets hold the sort of data I am interested in? This enable organisations to know and publicise what data holdings they have. </li></ul><ul><li>Exploration metadata - Do the identified data sets contain sufficient information to enable a sensible analysis to be made for my purposes? This is documentation to be provided with the data to ensure that others use the data correctly and wisely. </li></ul><ul><li>Exploitation metadata - What is the process of obtaining and using the data that are required? This helps end users and provider organisations to effectively store, reuse, maintain and archive their data holdings. </li></ul>