Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Invited Demo: Mediapedia: Managing the Identification of Media Carriers Nicholas del Pozo Douglas Elford David Pearson Digital Preservation Digital Preservation Digital Preservation National Library of Australia National Library of Australia National Library of Australia Parkes Place, ACT 2600 Parkes Place, ACT 2600 Parkes Place, ACT 2600 Australia Australia Australia Even if there is an acceptance that access to a piece of media is ‘atAll digital information is stored on physical carriers. Given the risk’ of being lost, anyone trying to access its content will still bevariations in carrier types, the quantity produced and in faced with the following issues:circulation, along with the potential importance of the content • the carrier type may need to be identified;being stored on them, not taking any steps to document andpreserve the characteristics of different carrier types will make it • assuming the carrier type is known, then the technologymuch more difficult, and eventually impossible, to extract content (and associated dependencies for accessing thoseeven in the short-term. technologies) must be ascertained; andThe Mediapedia is intended to provide a sustainable way of • even when the above issues are resolved, accessing thefacilitating carrier type identification as well as documenting their media may still be problematic: technology (or parts oftechnical requirements and general preservation information. By it) may not be readily available, or the carrier itself mayenabling a community of specialist individuals and organizations have degraded so that it is no longer collaborate in the documentation of these carriers it will For example, 5¼ inch floppy disks are now a problematic carrierhopefully create a sustainable body of knowledge which can be type to access (accurately or otherwise), though this was notcentrally and persistently accessed via the web. From a always assumed to be the case. This is due to a number of factorspreservation and risk management perspective, we can either such as short term deterioration of the physical materials, possibleapproach this problem as a community or ignore it at our corruption of the data content on them and the loss in theindividual peril. availability of hardware (e.g., drives, cables and motherboard) and software (e.g., drivers and operating systems) which areKeywords required to load, recognize and read the physical disk. TheseDigital preservation, media carriers, National Library of factors are not necessarily mutually exclusive. Hardware,Australia, obsolescence, open source software, Prometheus. software, and file format obsolescence can occur independently from each other. In addition, not only can hardware become obsolete, but it can also be susceptible to chemical or physical1. INTRODUCTION degradation. For example, magnetic tape might start to de-Anyone who has material stored upon obscure and older laminate after a certain amount of time or after a certain amountproprietary media carriers, and even more common carriers such of usage. Moreover, these problems are not only applicable toas audio and video materials or floppy disks and CDs, will obsolete or older materials. Brand new carriers which are not yeteventually encounter problems accessing this content. Although in common usage may be just as inaccessible as older carriers thataccessing current common carrier types may appear to be self are no longer in use (e.g., HD-DVD).evident presently, this may not always be the case. Over time, ifindividuals or organizations are not proactive there is a risk of Therefore, all carriers should be considered a temporary storageloosing access to the content stored on these carriers. In some medium only. Ironically, in many cases these carriers have beencases, by the time an organization realizes there is a problem, it perceived or marketed as long-term storage options. However,may already be is too late to retrieve this content. both the life-cycle of the carrier and the knowledge about it are dynamic. In the case of carrier specifications and documentation, their often ephemeral and proprietary nature means that while initially information may be readily available, it can easilyThis work is licensed under the Creative Commons Attribution- disappear within a short period of time due to changing marketsNoncommercial-No Derivative Works 3.0 Unported license. You are free or business conditions. Information about older materials that pre-to share this work (copy, distribute and transmit) under the following date the web is usually even more difficult to locate.conditions: attribution, non-commercial, and no derivative works. To viewa copy of this license, visit Because the problem is so diverse and complex, and the nature ofnd/3.0/. carriers so dynamic, there is no single or simple solution.DigCCurr2009, April 1-3, 2009, Chapel Hill, NC, USA Therefore, there are implications to the types of carriers that content is stored on (both in the short- and long-term) that may not be immediately evident. 76
  2. 2. 2. MEDIAPEDIA The Mediapedia doesn’t just store descriptive information aboutIn order to assist in managing these risks to carriers, we need to carriers, but also contains information about their dependenciesknow a range of information about them. For example: when and and genre specific technical knowledge, and is designed to beby whom the carrier was created; when it was used; the advertised both a human and machine harvestable resource. The data isshelf-life versus the actual shelf-life; the requirements to access a intended to be curated and sustained by a base of trusted sourcesspecific carrier type. Mediapedia was designed to be an open, across a range of media genres. High quality images of eachtrusted and sustainable mechanism for documenting, retaining and carrier type can be used to quickly confirm or refine search resultsdisseminating this kind of knowledge [1]. The prototype of this (Figure 2). They can also be used as the basis for conducting aweb-based resource is intended to enable the identification of visual survey. This combination of a detailed classification systemvarious types of carriers and their associated dependencies. Basic and the ability to search across multiple identificationinformation which allows the identification of carrier types is characteristics, attributes, descriptive text or images allows humanprovided, along with more detailed technical information about users to quickly identify carriers. Machine users can harvestthe carrier itself, and mechanisms that are needed to provide carrier information via persistent identifiers associated to eachongoing access. Future versions could include information about carrier requirements, community based risk assessments andinformation about potential migration paths.Unlike the Wikipedia [2], the Mediapedia does not require priorknowledge of a carrier’s name for discovery. It was specificallydesigned so that carrier types can be identified in a number ofdifferent ways. A user can search across other physicalcharacteristics or descriptive details such as manufacturer,product code and other specific identifying markings that can befound on the carrier. For more advanced users, carriers can beidentified through the use of a detailed and systematicclassification system that was developed from several commonstandards, including Dublin Core Type Vocabulary [3] and theRDA/ONIX Framework for Resource Categorization [4]. Theprimary function of this classification system is to organize the Figure 2 Mediapedia – Media Carrier Variantcarriers into meaningful and flexible taxonomical groupings orcategories, and to make them discoverable to different audiences(see Figure 1). As such, the user can also search by Carrier or 3. CONCLUSIONProcess types within different Genres [5]. Given the variations in carrier types, the number of units produced and currently in circulation, and given the potential importance of the information being stored on them, not taking any steps to document and preserve this content is not an acceptable option. By identifying and knowing their characteristics and dependencies, we can more proactively manage the risk for these carriers, and therefore of the content that they contain. It is hoped that the creation and use of the Mediapedia provides a sustainable way of facilitating carrier type identification as well as documenting technical and preservation information. Enabling a community of specialist individuals and organizations to collaborate in the documentation of these carriers will hopefully create a sustainable body of knowledge which can be centrally and persistently located. In addition, as this type of web-based service is not only human readable, but eventually also machine harvestable, it could potentially be re-used by other services and systems. From a preservation and risk management perspective, we can approach this problem as a community, or ignore it at our individual peril. 4. REFERENCES [1] Mediapedia Prototype, at Figure 1 Example of the Mediapedia Classification [2] Wikipedia Home Page, at System. 77
  3. 3. [3] Dublin Core Type Vocabulary, at [4] RDA/ONIX Framework for Resource Categorization, at vocabulary/index.shtml [5] Mediapedia Classification information, at 78