the UPS protoproto project

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    the UPS protoproto project - Presentation Transcript

    1. herbert van de sompel, michael nelson, thomas krichel the UPS protoproto project UPS 1 Meeting Santa Fe - October 21th 1999
    2. project description demo the UPS protoproto dex the data exchange framework
    3. project why a protoproto? •  UPS: enable cross-archive end-user services •  protoproto: –  facilitate discussions –  identify issues involved in creating cross-archive services –  experiment with digital object concepts for archive material –  does not claim to be a solution •  protoproto is multi-disciplinary –  a special instance of cross-archive –  there is a market –  promotional value
    4. project who? •  coordination: herbert van de sompel, michael nelson, thomas krichel •  involvement of: – Old Dominion U & NASA Langley – U of Surrey – U of Ghent – Los Alamos National Laboratory - Library – Russian Academy of Science - Siberian branch
    5. project sponsors •  Los Alamos National Laboratory - Research Library •  JISC eLib WoPEc project
    6. project datasets –  metadata only –  full text remains at archives –  static dumps obtained ca. July 99 objects full-text !organization the arXiv 85,223 85,223 17,983 CogPrints 742 659 14 NACA 3,036 3,036 100 NCSTRL 29,184 9,084 93 NDLTD 1,590 951 1 RePEc 73,367 13,582 2,453 Total 193,142 112,535
    7. project metadata formats format the arXiv internal CogPrints internal NACA Refer NCSTRL RFC1807 NDLTD MARC RePEc ReDIF
    8. project metadata extraction •  Getting metadata out of archives –  not all archives support metadata extraction •  some archives have undocumented metadata extraction procedures –  not all archives support rich criteria for extraction •  single dump concept only •  Intellectual property and use rights not always clear
    9. project metadata quality •  Metadata has problems with: –  record duplication –  crucial missing fields –  internal errors –  ambiguous references to people and places, publications
    10. project metadata conversion •  all datasets converted to ReDIF: •  essential to have a single fomat for the creation of services •  supply by archives in a single format was not realistic •  no downgrading of data •  data enhancements: •  creation of unique identifier •  addition of raw subject-classification •  normalization of publication types
    11. project re-creation of archives •  creation of archives for ReDIF-ed metadata •  using intelligent digital objects : “buckets” RePEc arXiv NCSTRL
    12. project buckets •  Buckets were chosen to study the implications of using rich, intelligent objects in UPS •  Buckets are: –  DL protocol / system independent –  self-contained and mobile –  handle their own display, enforcement of terms and conditions, and dissemination of their contents –  designed for bundling multiple data representations and data instance types •  The aggregative nature of buckets is well suited for adding valued-added services at the object level
    13. project creation of end-user service •  NCSTRL+ digital library service •  indexing buckets in archives by requesting their metadata •  enhanced user-interface •  NCSTRL+ search results point at buckets •  buckets auto-display •  buckets provide link to full-text in native archive
    14. project scaling problems •  UPS contains 193K objects –  using buckets consumed inodes (~60 inodes per bucket) •  filesystem reformatted with more generous amount of inodes –  Solaris and Dienst conflict •  Dienst wants each object in an publishing authority to be in a single directory •  Solaris has a hard limit of 32K objects in a directory •  resolution: use many (100+) authorities for UPS
    15. project addition of linking service •  integrate the archives with the traditional communication mechanism •  context-sensitive linking to deliver extended services via SFX technology
    16. project SFX linking service extended services metadata evaluate metadata metadata system A system B
    17. project SFX linking database
    18. project addition of linking service •  buckets for arXiv, NCSTRL and RePEc are SFX- aware •  Cogprints, NACA, NDLTD not SFX-aware •  SLAC/SPIRES is SFX-aware •  linking services for preprint metadata + for published version
    19. demo the UPS protoproto •  will be available starting beginning of November •  UPS list will be notified •  disclaimer “not a production system” http://ups.cs.odu.edu:8000/ http://ups.cs.odu.edu
    20. dex some issues (I) • data exchange framework • data provision vs. data implementation • central searching, distributed archives •  need for a framework by which archives can describe themselves: •  content •  terms and conditions •  protocols, criteria supported to extract (meta)data •  metadata scheme, subject classification scheme, material-type scheme, ...
    21. dex some issues (II) •  need for an identifier scheme for archives and archive objects • (cf. ISSN, ISBN, DOI) •  metadata quality obstructs the creation of services •  desirabile to extend metadata with citation information •  smart objects •  archived objects that are active, not passsive
    22. dex providing vs. implementing data •  Providing data: –  publishing into an archive –  providing methods for metadata “harvesting” •  provide non-technical context for sharing information also •  Implementing Data: –  harvest metadata from providers –  implement user interface to data •  Even if provided by the same DL, these are distinct functions
    23. dex providing vs. implementing data Native harvesting interface Input Provider Native Input interface end-user Provider interface interface Native end-user interface No machine based way to Machine and user interfaces extract metadata… for extracting metadata….
    24. dex providing vs. implementing data Native Input and harvesting end-user Implementor interfaces optional interface Native Native harvesting harvesting interface interface Input Provider Input Provider interface interface Native Native end-user end-user interface interface optional (e.g., RePEc)
    25. dex self-describing archives •  Much of the learning about the constituent UPS archives occurred out of band… •  Given an unknown archive, we should be able to algorithmically determine the archive’s metadata... Native harvesting interface Where possible, the harvesting interface Input interface Provider should provide the same criteria as the end-user Native end-user interface interface
    26. dex self-describing archives •  Recommended criteria for metadata extraction: –  subject classification –  accession date –  publication date •  Criteria for archive description –  metadata formats employed –  contact information for archive –  publication type scheme –  identifier scheme –  subject classification scheme
    27. dex identifiers •  Useful in: –  reference linking –  can be used in citations –  resolving duplications •  UPS duplications were removed by hand –  tracking publication lifecycle •  Need the ability for an object to have multiple unique identifiers –  organization, discipline, etc.
    28. dex smart objects •  Premise: Objects are more important than the archives that hold them •  SODA: Smart Objects, Dumb Archives •  Objects should be the canonical authority for •  metadata •  contents •  use •  Objects should be able to grow and change •  correct metadata •  add new formats •  add new services •  reflect the lifecycle of the object
    29. dex smart objects •  It would be beneficial if the archived objects could be heterogenous: •  with their own “look-and-feel” •  unique functionality / services –  e.g., the data archiving needs of an atmospheric scientist can be different than that of a computer scientist, engineer or medical researcher •  yet maintained a standard API for: •  extracting metadata •  content retrieval •  resource discovery on the object •  terms and conditions
    30. dex lessons learned •  A strong distinction between the provision of data, and the implementation of data –  also, a socio-legal context for sharing metadata •  Open, “self-describing” archives •  A universal, unique identifier name space •  Archived objects with more intelligence and flexibility

    + Herbert Van de SompelHerbert Van de Sompel, 6 months ago

    custom

    203 views, 0 favs, 0 embeds more stats

    More info about this document

    CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

    Go to text version

    • Total Views 203
      • 203 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories