Create your own digital repository

2,656 views

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,656
On SlideShare
0
From Embeds
0
Number of Embeds
64
Actions
Shares
0
Downloads
63
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • I am the Digital Projects Coordinator for the Getty Research Institute, and am heavily involved in new digital initiatives at the Research Institute. In my career I have administered library systems, collections management systems, digital asset management systems and repository systems, so I am here to give you a basic survey of the repository landscape, just enough that you can recognize most of the issues that will be important if you implement a digital repository of your own.
  • A. Terminology and Scenarios
    Digital Library, Digital Repository, Institutional Repository, Trusted Digital Repository, Digital Asset Management, Content Management System, Digital Commons... are they all the same thing?
    Is the underlying purpose of the repository for preservation, access, or data management?
  • Geoff Payne of ARROW defines an IR as “a managed collection of digital objects, institutional in scope, with consistent data and metadata structures for similar objects, enabling resource discovery by the ‘Communities of Practice’ for whom the objects are of interest”
  • In virtually every system description you will here about OAIS compliance.
    The Open Archival Information System…spells out a model for sharing digital objects: establishes a common framework of terms and concepts, identifies the basic functions (ingest, data management, archival storage, administration, access, preservation planning), and defines the information model and types of metadata needed.
    An important thing to remember is that it is a conceptual model – it isn’t a specific architecture
  • 2) Scenarios - No one solution for all situations; continuum of access and preservation
    While I would hesitate to say that there are very many implementations that are purely Access and Access only, it is possible, for instance, to create a repository of materials that you don’t own, and therefore have no stake in preserving – as opposed to a Trusted Digital Repository that might be a “dark archive” and have virtually no access component (except if you can’t get at it, what is the point of preserving it?) Elsewhere managers of institutional information systems, approaching this area from a different perspective, look to content management systems, or digital asset management systems to fulfill many of the roles of an institutional repository. Anyway – these placements along the continuum could be argued, but the point is that you could choose to place your repository anywhere along this continuum.
  • As you begin to plan for your repository, you must ask yourself what your goals are. For instance, are you replacing printed image collections with digital image collections to support curriculum? Are you creating a depository for your community of practice? Are you creating collections of document surrogates, with item level description, to reduce the need for handling fragile originals? Are you digitizing collections of images for which you will never provide item level description? Are you creating catalog records of objects and images, pointing users to repositories and originals? Are you licensing databases and making them available? Or are you building some combination of the above?
  • For whom are you building your repository?
    How much do you understand about your users?
    Do your users want unmediated access to large collections of digital objects?
    Or do they want what interpreted objects in the context of a broader story?
    Need to pay attention to identifying audiences
    Know for whom and why is crucial in developing a plan since every image carries a price.
    Must pay attention to the long-term implications of choices made today.
    At least in the near term, it is unlikely that you can equally serve all user groups. Ideally, you would build custom interfaces for different user groups – all accessing the same digital objects.
  • File formats supported - Text, images (including complex image objects), datasets, video, audio, learning objects, etc.
  • Free for users, open source software does generate concern regarding support and maintainability
    Proprietary: Pay for the software and consulting or subscription fees – maintenance fees. The vendor owns and maintains the code
    Open Source: free downloadable software that can be customized and enhanced
    Examples: DSpace, EPrints, Fedora, Greenstone
    Software Service Model: Vendor owns and distributes software – hosts and manages your data for you.
  • Metadata costs can be half the total cost of a digitization project.
    The creation of appropriate metadata depends on anticipating needs of the user.
    Dublin Core
    METS (Metadata Encoding Transmission Standard) is the most suitable existing metadata format for use in creating these distributed complex objects – and it will likely contain other complex metadata such as MIX (Metadata for Images in XML Schema) and MODS (Metadata Object Description Schema, which is based on MARC21). Although there are others – domain specific metadata schemas
  • <number>
    “Metadata” is often used interchangeably (and confusingly) with “data”
    “Metadata” is often used to refer to meta tags on HTML pages on the web
    Definition: “data about data”
    Information about any aspect of a resource - size, location, attributes, topic, origin, use, audience, creator, quality… the list is endless
  • <number>
    Metadata aids in the discovery, identification, assessment, and management of described entities
    Impact of metadata on collection access
    Without metadata there is no service to users
    Metadata provides the means for resource discovery, grouping, filtering, matching user needs
    Keyword searching works only for resources that are text-based - excludes photographs, data sets, objects, maps, audio, video…
    Metadata itself as valuable content
    Item descriptions, Finding aids, Reviews
  • <number>
    Administrative: for managing and administering information resources (e.g. location information, version control)
    Descriptive: for the description or identification of information resources (e.g. specialized indexes, finding aids, individual object records)
    Preservation: for the preservation management of information resources (e.g. documentation of data “refreshing” and migration)
    Technical: related to how a system functions or how metadata behaves (e.g. hardware and software documentation, tracking of system response times)
    Structural?
    Use: (e.g. use and user tracking, usability studies)
    Descriptive what is it?
    Description of the item
    Discovery – How can I find it?
    Access points
    Administrative When was it created, and can I use it?
    Viewing, maintenance, IPR
    Structural – What files comprise it?
    Navigation and organization
    Preservation – which key characteristics of the resource need to be maintained?
    Identifiers – How can I get to it?
  • While a repository will obviously support the institution’s management of its assets, the real power is in the combined network of repositories, thus there is OAI-PMH – Open Archive Initiative-Protocol for Metadata Harvesting – a harvesting protocol for sharing metadata about digitial objects – Dublin Core is the standard
    Some implementations report that the simple DC default as specified in the OAI-PMH is not adequate to describe repository resources effectively.
  • <number>
    (Metadata Encoding & Transmission Standard)
    METS is an XML schema designed for creating XML document instances that express the hierarchical structure of digital objects, the names and locations of the files that comprise those objects, and the associated metadata.
    What is Mets used for?
    To package metadata with digital objects in XML syntax
    For retrieving, storing, preserving, and delivering digital resources
    For interchange/contribution of digital objects with their associated metadata
    As an information package in a digital repository
    Who is responsible?
    METS is an initiative of the Digital Library Federation (DLF)
    The principal author is Jerry McDonough (New York University)
    The Library of Congress is the maintenance agency http://www.loc.gov/standards/mets/
    The METS Editorial Board is responsible for schema content
    Characteristics
    METS is:
    an open standard
    non-proprietary
    developed by the library community
    (relatively) simple
    extensible
    modular
  • <number>
    A Chinese Album, presented in the METS vieyour built into RLG Cultural Materials. Note that it displays some very basic descriptive Metadata at the top of the page.
  • <number>
    A descriptive metadata standard
    Why Mods?
    XML (Extensible Markup Language) is (or will be) the markup language for the web
    LC is investigating XML as a new more flexible syntax for MARC element set
    There is a need for rich descriptive metadata in XML, but simpler than full MARC, especially for complex digital library objects
    Potential Uses
    As a rich (but not too rich) XML metadata format for emerging initiatives
    Z39.50 Next Generation specified format
    to represent metadata for harvesting (OAI)
    As an interoperable core for convergence betyouen MARC and non-MARC XML descriptions
    Acceptable format for RLG Cultural Materials repository
    Advantages
    The metadata element set is richer than Dublin Core
    The MODs element set is more compatible with existing descriptive records (MARC) than Dublin Core
    MODS is a rich descriptive element set that works youll with hierarchical METS digital objects
    Features
    Uses language-based tags (unlike MARC)
    Elements generally inherit semantics of MARC
    Elements particularly applicable to digital resources
    MODS does not assume the use of any specific rules for description (Is this good or bad?)
    Use of XML schema allows for flexibility and availability of freely available tools
  • <number>
    Diagram of a Digital Object Expressed in METS
    Here’s an attempt at a visualization of how all these parts of the METS object play together.
    Header (metsHdr)
    Metadata on the METS Object itself
    Descriptive Metadata (dmdSec)
    Metadata on the physical artifact
    Content Files (fileSec)
    Declares all datafiles comprising the METS Object
    Structural Map (structMap)
    Hierarchical arrangement of the datafiles
    Administrative Metadata (amdSec)
    Technical
    Metadata on the digital file
    NISO Metadata for Images in XML Standard (MIX) http://www.loc.gov/standards/mix/
    Rights
    Metadata specifying legal access
    Source
    Metadata on the source of digitization
    Digital Provenance Information
    Metadata on the life-cycle of the file
    Behaviors (behaviorSec)
    Executable behaviors associated with a METS Object
  • How LC is using MODS
    To describe electronic resources
    AV project, web archiving
    Incorporation with XML resources
    METS projects for digital resources
    OAI collections
    LC offers MODS, MARCXML, DC simple for OAI harvesting
  • <number>
    Repository in context of other digital systems in institution
    A high priority should be to try to ensure that efforts are coordinated, and to follow common standards and architectures where possible.
    What are standards and why do you need them?
    mutually accepted guidelines that promote the consistent encoding and recording of data
    fundamental to the efficient exchange of information
    essential for meaningful search-and-retrieval of information
    to improve the quality and consistency of information
    to improve compatibility of information structures
    to protect the long-term value of data
    to facilitate information retrieval
    to facilitate information exchange
  • The tension between protecting intellectual property rights and broadening access has become a serious issue for the selection process.
  • 1) What constitutes success?
    2) Web 2.0 and other futures?
  • <number>
  • Create your own digital repository

    1. 1. Create YourOwnCreate YourOwn Digital RepositoryDigital Repository Leah PrescottLeah Prescott Getty Research InstituteGetty Research Institute October18, 2007October18, 2007 Master of St. Bartholomew,Master of St. Bartholomew, The Meeting of the ThreeThe Meeting of the Three Kings (detail),Kings (detail), ca. 1480 J. Paul Getty Museumca. 1480 J. Paul Getty Museum DIGITIZING IN A MATERIAL WORLDDIGITIZING IN A MATERIAL WORLD
    2. 2. What is a Digital Repository? Digital Library, Digital Repository, Institutional Repository, Trusted Digital Repository, Digital Asset Management, Content Management System, Digital Commons... Are they all the same thing?
    3. 3. What is a Digital Repository? “A managed collection of digital objects... with consistent data and metadata structures for similar objects, enabling resource discovery by the ‘Communities of Practice’ for whom the objects are of interest” Geoff Payne
    4. 4. What is a Digital Repository? OAIS Reference Model Open Archival Information System
    5. 5. What is a Digital Repository? Access Preservation InstitutionalRepository AssetManagementSystem CollectionsManagementSystem TrustedDigitalRepository
    6. 6. Emblem (“Emblem (“Premitur, non opprimiturPremitur, non opprimitur”) from Claude Paradin,”) from Claude Paradin, The heroicall devises of M. Claudius Paradin (London: W.The heroicall devises of M. Claudius Paradin (London: W. Kearney, 1591)Kearney, 1591) http://emblem.libraries.psu.edu/parad176.htmhttp://emblem.libraries.psu.edu/parad176.htm Planning fora Repository
    7. 7. Planning fora Repository What is the mission? What kinds of content will you provide? Who are the key users? Who are the key stakeholders? What can you afford? What are your top service priorities?
    8. 8. Planning fora Repository What Users Want fromDigital Image Collections
    9. 9. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users tend to be impatient with long waits
    10. 10. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users tend to be impatient with long waits ◊ Users need both image and text versions
    11. 11. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users tend to be impatient with long waits ◊ Users need both image and text versions ◊ Users are generally content with current image quality
    12. 12. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users tend to be impatient with long waits ◊ Users need both image and text versions ◊ Users are generally content with current image quality ◊ Completeness and legibility of pages with minimal scrolling is the primary user requirement for text- based documents.
    13. 13. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users tend to be impatient with long waits ◊ Users need both image and text versions ◊ Users are generally content with current image quality ◊ Completeness and legibility of pages with minimal scrolling is the primary user requirement for text- based documents. ◊ Users want multiple views to support their different research needs and some are interested in tonal and color fidelity
    14. 14. Planning fora Repository What Users Want fromDigital Image Collections ◊ Large format graphical images such as maps are difficult to display and fully comprehend online – need tools that provide zoom, pan, and peripheral- view capabilities
    15. 15. Planning fora Repository What Users Want fromDigital Image Collections ◊ Large format graphical images such as maps are difficult to display and fully comprehend online – need tools that provide zoom, pan, and peripheral- view capabilities ◊ Users prefer a simple interface
    16. 16. Planning fora Repository What Users Want fromDigital Image Collections ◊ Large format graphical images such as maps are difficult to display and fully comprehend online – need tools that provide zoom, pan, and peripheral- view capabilities ◊ Users prefer a simple interface ◊ Users want to navigate the structural and intellectual content of image collections – backward and forward, jump to specific pages, table of contents, index
    17. 17. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images
    18. 18. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail
    19. 19. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side
    20. 20. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side Save search result sets
    21. 21. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side Save search result sets Sort search result sets
    22. 22. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side Save search result sets Sort search result sets Export images into other software
    23. 23. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side Save search result sets Sort search result sets Export images into other software Produce high-quality printouts
    24. 24. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side Save search result sets Sort search result sets Export images into other software Produce high-quality printouts Annotate images with comments and save them to a notebook
    25. 25. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users want to manipulate graphic images Zoom to view detail Examine Two or more images side by side Save search result sets Sort search result sets Export images into other software Produce high-quality printouts Annotate images with comments and save them to a notebook Use image-editing tools
    26. 26. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users need a variety of search functions
    27. 27. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users need a variety of search functions ◊ Preferences for access points vary according to the collection and the type of image Some perform searches for known items Other pose queries that contain a preponderance of subject terms
    28. 28. Planning fora Repository What Users Want fromDigital Image Collections ◊ Users need a variety of search functions ◊ Preferences for access points vary according to the collection and the type of image Some perform searches for known items Other pose queries that contain a preponderance of subject terms ◊ Search engines, query formulation and database indexing affect search results
    29. 29. Planning fora Repository SystemRequirements – Technical File formats supported Metadata standards Interoperability: OAI compliance, Z39.50 etc. Persistent URL Search/Browse of metadata Full-text search Workflow, submission for content approval User authentication Customization features
    30. 30. Planning fora Repository SystemRequirements – Technical Free vs. commercial software (licence, subscription fees Open Source vs. Proprietary Technical Support available for fee vs. free by phone by email via online forms
    31. 31. Planning fora Repository System Requirements - Metadata
    32. 32.  ““data about data”data about data” —— data categoriesdata categories  data describing a discrete data object ordata describing a discrete data object or objectsobjects  cataloging orindexing information createdcataloging orindexing information created to arrange, describe, and otherwise enhanceto arrange, describe, and otherwise enhance access to an information objectaccess to an information object Planning fora Repository System Requirements – Metadata What is it?
    33. 33. Why is metadata important?Why is metadata important?  forenhanced accessibilityforenhanced accessibility  forretention of contextforretention of context  forexpanding useforexpanding use  formulti-versioningformulti-versioning  forlegal issuesforlegal issues  forpreservation of dataforpreservation of data
    34. 34. Types of MetadataTypes of Metadata  AdministrativeAdministrative  DescriptiveDescriptive  PreservationPreservation  TechnicalTechnical  StructuralStructural  UseUse
    35. 35. Dublin Core-Dublin Core- http://http://dublincore.orgdublincore.org//
    36. 36. METS:METS: A Metadata “Wrapper” forA Metadata “Wrapper” for Digital Information ObjectsDigital Information Objects
    37. 37. sample METS Objectsample METS Object
    38. 38. MODSMODS (Metadata Object Description Schema)(Metadata Object Description Schema) An initiative of NetworkDevelopment andAn initiative of NetworkDevelopment and MARC Standards Office at the Library ofMARC Standards Office at the Library of CongressCongress http://http://www.loc.gov/standards/modswww.loc.gov/standards/mods//  Uses XML schemaUses XML schema  Originally designed forlibrary applications,Originally designed forlibrary applications, but may be used forothersbut may be used forothers  A derivative of MARCA derivative of MARC
    39. 39. Title InfoTitle Info NameName Type of resourceType of resource GenreGenre Origin InfoOrigin Info LanguageLanguage Physical descriptionPhysical description AbstractAbstract Table of contentsTable of contents Target audienceTarget audience NoteNote SubjectSubject ClassificationClassification Related itemRelated item IdentifierIdentifier LocationLocation Access conditionsAccess conditions ExtensionExtension Record InfoRecord Info MODS high-level elementsMODS high-level elements
    40. 40. Why do you need standards?Why do you need standards?
    41. 41. DATA STANDARDS:DATA STANDARDS: A BRIEF TYPOLOGYA BRIEF TYPOLOGY  Data structureData structure standards (metadata element sets):standards (metadata element sets):  MARC, EAD, DublinCore, METS, CDWA, VRA CoreMARC, EAD, DublinCore, METS, CDWA, VRA Core  Data contentData content standards (cataloging rules):standards (cataloging rules):  AACR(AACR(RDA), ISBD, CCO, DA:CSRDA), ISBD, CCO, DA:CS  Data valueData value standards (vocabularies):standards (vocabularies):  LCSH, LCNAF, TGM, AAT, TGN, ULANLCSH, LCNAF, TGM, AAT, TGN, ULAN  Data formatData format standards (standards expressed in machine-standards (standards expressed in machine- readable form):readable form):  MARC, MARCXML, EAD, CDWA LiteXML, DublinCoreMARC, MARCXML, EAD, CDWA LiteXML, DublinCore SimpleXMLschema, DC QualifiedXMLschema, VRA CoreSimpleXMLschema, DC QualifiedXMLschema, VRA Core XMLschemaXMLschema
    42. 42. Implementing a digital repository Content Selection Does the item or collection have sufficient value to and demand from a current audience to justify digitization? Does the proposed item or collection have active current users? Is there greater demand than can be served by the original or a traditional type of copy? Does it support high priority activities such as teaching of core courses that have large enrollments? Is it marketable to a group of specialists widely dispersed who all need access?
    43. 43. Implementing a digital repository Content Selection Do limitations on handling of fragile or valuable originals create a source of demand for high quality surrogates? How does it fit with other materials on the same subject? Does it help build a distributed online collection? Do we have the legal right to create a digital version? Do we have the legal right to disseminate it?
    44. 44. Implementing a digital repository Content Selection Can the materials be digitized successfully? Does or can digitization add something beyond simply creating a copy? Can and should images be manipulated to make them more legible than the original items? OCRing for searchable text Is the cost appropriate?
    45. 45. Issues with the web...Issues with the web...  Are yourcollections “reachable” byAre yourcollections “reachable” by commercial search engines?commercial search engines? If yes, how will you “contextualize”If yes, how will you “contextualize” individual collection objects?individual collection objects?  If not, what is yourstrategy to leadIf not, what is yourstrategy to lead web users to yoursearch page?web users to yoursearch page?
    46. 46. The “Visible web” versusThe “Visible web” versus the “Deep web”the “Deep web”  The Visible web is what you see in the resultsThe Visible web is what you see in the results pages fromgeneral web search engines & subjectpages fromgeneral web search engines & subject directories (static web pages)directories (static web pages)  The Invisible orDeep web consists of data fromThe Invisible orDeep web consists of data from dynamically searchable databases that cannot bedynamically searchable databases that cannot be indexed by search engines, because they aren’tindexed by search engines, because they aren’t “stored” anywhere.“stored” anywhere.
    47. 47. PreservationPreservation Enduring Care Bitstream Copying Durable, Persistent Media Migration Emulation Encapsulation Technology Preservation Reinterpretation
    48. 48. EvaluationEvaluation  Use metadata and usability analysisUse metadata and usability analysis should be a routine part of digitalshould be a routine part of digital library work.library work.  Study end-userbehavior(includingStudy end-userbehavior(including yourown)yourown)
    49. 49. Joseph Ducreux,Joseph Ducreux, Yawning (Self-Yawning (Self- Portrait)Portrait), before 1783, before 1783 J. Paul Getty MuseumJ. Paul Getty Museum Thank youThank you for yourfor your attention!attention! lprescott@getty.edulprescott@getty.edu

    ×