Managing Digital Content Over Time: Identify and Select


Published on

Presented by Sarah Grimm (Wisconsin Historical Society) and Emily Pfotenhauer (WiLS) for the WiLSWorld conference, Madison, Wisconsin, July 24, 2013. Content based on Modules 1 & 2 of the Digital Preservation Outreach and Education (DPOE) Baseline Digital Preservation Curriculum developed by the Library of Congress.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Just to start out, we will talk a bit about the upcoming presentation. The curriculum in this program was developed by the Library as part of their Digital Preservation Outreach and Education program. DPOE is part of a national effort to encourage individuals and organizations to actively preserve their digital content. DPOE is training individuals in each region across the US to help people such as yourselves who are dealing daily with ever increasing amounts of digital content. The first 4-day National DPOE Train-the-Trainer workshop was held at the Library of Congress Sept 2011 with people from each region in the US. Since then, additional programs have been held in Indiana, Illinois and Alaska. DPOE National Trainer Network: 63 participants in Train-the-Trainer programs across the US
  • The trainings are built around these 6 modules. Modules are not designed to necessarily provide specific technological solutions but they are designed to make sure the right questions are being asked about each stage.Identify – Covers the creation of a scalable inventory. You need to understand what you have now and what you expect to have in order to plan for the long-term. Select – Not all digital content you currently have should be kept. Store – Looks at long-term storage requirements and possible options for meeting those requirements. How can you use that information to develop long-term storage management policies?Protect – This module looks at content protection over time (Physical degradation of the bits and bytes) as welll as physical protection of the systems themselves (Disaster recovery, server access)Manage – focus on preservation planning (Policy development, planning, training, funding)Provide – access policies, intellectual property issues + the planning for the technological access of the items.
  • Anything we may encounter in a digital form is going to fall under digital content. This is going to encompass anything that comes our way and that we are going to have to think about preserving for a long period of time. There are really two sources of this digital material – Those created from physical sources that we turn into digital items – digitization of maps / documentsThose that are created digitally – items on our computers, digital photosA lot of times we are charged with not only holding and preserving that digital content, but also making it available to the public. But this ever increasing digital world is presenting us with new challenges. And what are those problems? ……
  • Who – is it me? Doesn’t someone in IT do that? Too Complex – I don’t get digital objects, there are too many types and formatsToo Daunting – I’m the only one doing this and I only do this part time – how am I going to manage/ organize all of this stuff?Too Technical – This is too complex and using equipment I don’t understand and don’t have time to learnBut all of this needs to be dealt with, because to not do anything would mean potentially losing things forever.
  • We are looking to digital preservation for an answer because we realize that being in digital form is not the same as being digitally preserved. Digital preservation is active management of digital content over the long term with access as it’s ultimate goal. With books or documents – We can read it and put it on the shelf and continue to open it and read it for decades with proper handling. However, once something is digitized, we can’t expect to set it aside and then open it in 10 years much less 50 without active management. We must find ways to ensure that the digital item is accessible. In order determine how we are going to preserve something, we must first have an understanding of what we have. We must IDENTIFY it
  • As stated earlier, the volume and kinds of digital materials we create or inherit are growing.Much of it is useful and even necessary for our work, but much of it is not. Think of the string of e-mails created as people go back and forth discussing a topic. Or different drafts of a document. Or various copies of a digitized object as we try to get it “just right”. How much of that is really worth saving for posterity? The Identify stage helps us figure out what content we have, so we can determine what needs to be kept.Good digital preservation requires an explicit commitment of resources, which - for most organizations - means planning ahead. If you don’t know the extent of the problem, you don’t know what resources you need. The first step in planning for digital preservation is to know where you stand with regard to your digital assets.
  • And if so, who would you need to get that from?
  • Ask Audience who has an inventory?
  • Ask if anyone currently has an inventory and what software is being used
  • Refer to the handout here …..Nature and Location – is all the information onsite, or would you need to travel to multiple locations to capture everything? Resources – How many people can you get to help – is it just you, a small staff, volunteers? Timeframe – Give yourself a time frame for this. Keep in mind this is never “done”Can have audience pull out the inventory here
  • What? - Work at the collection level, not the item level. What is the familiar title for the collection?Description – Provide a brief description of what is in the collection. You are collecting information about items that are known and may be in your catalog + items that have come in your door that are waiting to be dealt with + items that are being created (digitization projects) + things you may not even know about yet…….
  • Creator – so that you can go back to them with any issues
  • It’s a good idea to note the format of you digital media, or what the digital content is stored on, since some format types last longer than others. Digital content on more fragile media (floppy disc) might be a higher priority.
  • You should make sure to specify the location of digital content in your inventory. Some things you will want to consider:How will you specify whether content is located online (meaning on your computer hard drive or a network server), or offline (meaning stored on some removable piece of media, like a CD or flash drive)?Location in storage systemKeep in mind that you will need to update the inventory whenever the content moves. If you get too specific you might spend all your time updating file locations.Ask Audience - WHAT OTHER FIELDS ARE NOT INCLUDED THAT WOULD BE HELPFUL
  • After you’ve compiled your inventory,it can be easy to get overwhelmed. You know you’ve got lots of digital content, but how much of it is really your organization’s responsibility to preserve? Meanwhile, you’ve still got more logs—more new digital content—coming in down the river. One of the goals of selecting content to preserve is to help get your logs moving again—start setting priorities and pick a few things to tackle first so everything can start flowing more efficiently.
  • Not all of the content you’re dealing with may in fact be appropriate or necessary for you to preserve, and you don’t want to commit resources to preserving materials you don’t have to. You may hear people argue that storage is cheap so we should keep everything. Unfortunately that perspective is rather short-sighted. Storage may be cheap, but preserving the quality of content over the long-term is not. There are periodic migration costs, moving the digital materials into systems where you will preserve it. Monitoring files for corruptionand change. [have you lost bits? Are the files degrading? Not to mention maintaining access to the files, which means updating your discovery and dissemination services every time hardware and software change. [an ongoing, recurring cost] The idea behind long-term preservation is that you will be making this content available in the future. It isn’t enough just to save the content if you can’t access it any more.[Quality]Even if we could keep everything forever, would we want to? Is that manageable given the type of content that you hold? Not all digital content may be preservation quality – if you have high resolution scans of your photos, do you also need to preserve the lower quality versions of these scans? And not all will be significant enough to warrant preservation. [that string of emails about organizing the staff Christmas party…]Does the digital content we take in match our mission and scope of collections? Quite often materials find their way to us that have little or nothing to do with our mission, yet we give hem a home and expend our resources on maintaining them. Maybe there is a better/more logical home for that content? [Maybe you could partner with another org that is better placed to hold and preserve that content.]The selection process for digital content is very analogous to the selection process for non-digital materials – you don’t collect materials for your archive that don’t match your mission, and you should keep that same principles in mind when selecting digital content.
  • The Basic steps for Selection require you toReview your potential digital content – start with the outcomes of your inventory; look over what you have and think you might have coming in. Understand the implications.Define and then apply criteria for what you will select to preserve. It’s the best way to ensure consistency (across an organization, over time and staffing changes). Document (and preserve) selection decisions: [Why are you keeping things? What is your rationale? You – your staff - and your successors – need to understand why you chose to keep that particular content. Don’t assume it will be obvious to everyone.]Implement your decisions – and stick to your criteria!Don’t take in or keep content not in your definedscope of preservation. Review your selection criteria regularly to ensure they meet your needs. They are there to ensure consistency and can also be a helpful tool in controlling what content comes your way. ( an argument in your arsenal for those times when you need to say ‘no’ to someone).
  • When you’re first getting started, it’s helpful to treat selection as a managed, structured project in order to plan and coordinate the process [and plan for the future]. The selection criteria you choose will be uniquespecific to your situation, your organization and its mission. So where can you go for guidance to begin this project of defining your selection criteria? Look inside your organization first: are there mission-related documents that might give you clues? existing manualsandpolicies, such as records retention schedules? Or Collecting policies?Also look outside your organization: Are there legal restrictions and/or ethical requirements that will guide your choices?On the question of uniqueness, you may not want to include anything that is preserved elsewhere. You may want to focus only on what meets the needs of your primary audience. And the value of materials - determined by a variety of factors - must be assessed in light of your own situation, the materials themselves, and their place in their wider context, whatever that may be.Taking this wider view will enable you to make intelligent choices regarding your selection. Once you have clarified the ideal of what you WANT to preserve, then you’re ready to consider what you are actually ABLE to preserve.
  • Even if something fits your desired criteria, it still might not be reasonable for you to select it. You can use decision tree or list of questions to help you decide what’s practical to preserve.You’ve already considered the content in view of your selection criteria. And you should already have answered ‘yes’ to both of these Qs to continue considering the materials you hold.does the content have long term value?does it fit your scope and mission?Next you need to consider Technical issues:is it feasible for you to preserve the content? [Is it a “digital time bomb”? Some formats are a challenge to preserve, such as video/time-based media. Some may be too damaged to preserve. Do you have the skills and resources (either to undertake the preservation yourself or to buy the skills in)?Some types of material may require far more expertise and resources than you have available. AndAccess.Even if we’re not making it public, how useful is a server full of digital content that is safe, but that we can’t access?We need to askis it possible to make the content available over time?Are you the only holder of this content? [Duplication]If it is not feasible to preserve the content, and not possible to make it available and usable, then it probably shouldn’t be included in your selection –especially if you know you are not the only holder of this digital content.
  • Once you have your selection criteria, it may not be possible to review/select everything at once, so how might you sequence the process? Again, the answer will be different for each organization.Think about what’smost significant to your organization?most extensive? (and therefore a more coherent body of material to manage)most requested/used?Easiest to tackle (e.g. most familiar, most ready for ingest – a quick win for your digital preservation process; very helpful when you are having to prove the value of your efforts to a reluctant administration)Oldest (possible historical importance)Newest(possible immediate interest)Mandated (via local policies, legislation, etc.)At risk? If it were no longer available, what digital files would be the hardest to replace? Some formats become obsolete a lot faster than other formats. PDFs are viable for a really long time – video files, however, get old very quickly.
  • Because digital preservation is a long-term commitment it’s important to establish solid, ongoing relationships with the creators of your digital content. How many of you are managing digital content created by people outside of your library or archives? Other departments or maybe even other institutions?Communication is key – particularly when the content is from external creators. You’ll need to agree on terms for the transfer and retention of digital content to your library (and even where it’s from others within your library). Ideally, you’d want to review the content with the creators to determine which of their material is really important to be preserved, and ensure that what they’re giving you meets your selection criteria. Be aware that most content creators don’t have a clue as to what an archival format is, or how to create content that is likely to be manageable for long-term access. Education of content creators is very important. Working with them at the outset can save you many headaches later. The other important point here is that this doesn’t need to be just YOUR project – connecting with content creators means you can share the love a bit and put some of the onus on THEM to help YOU
  • Remember that you need to document your selection process.Start out by adding information to inventory for material that you plan to preserve over the long termSupplement your inventory withUse:'Lifespan' of content? Does its value/use change over time?When will content no longer be active? [retention period: how long will you retain it?
  • The outcome of going through the work of selection is to gain a sense of control over what you have to deal with, what your scope is, and what your policies and priorities are for selection. This is critical to developing a sustainable program for support of long-term preservation and access.By applying your selection criteria to your inventory, you will have more detailed information to work with in your planning. This documentation can also inform your work with creators of digital content. This might include the creation of submission agreements or other policies so that the content coming in to your organization fits your selection criteria for long-term support.The selection process puts you on the path to a sustainable program. Selecting content is ultimately not a one-time project but a long-term, ongoing process, so formalizing it through policies, schedules and other documented criteria will help you avoid more log jams in the future.
  • As you are going through the inventory and selection process, you will find things in many places and named in many different ways depending on who worked on the item. Digital items are so much easier to save psychologically for people. 100 items on your hard drive doesn’t take up as much visual space as 100 items in your office. A file that is 1 kb looks pretty much like the one that is 1 MB or 1 GB. There also tends to be more copies of digital items, everyone keeps a draft, or it gets attached to an email and sent to 10 people, or it gets filed in two places. Everybody keeps their own items…project documentation is rarely one person managing the group’s information anymore. Its multiplied by the number of people working on the projectAs a result – EVERYTHING IS SAVED – “just in case” and its often saved more then once
  • And then add the other computers and storage drives in your organization…….and you get the physical rendering of your organization’s digital items. So we are going to take a few minutes to talk about ways to get some control over this digital mess and talk a bit about File Naming and File Management. Hopefully, these will help you as you sort through your digital content and determine how you will approach the long-term management of the items you are working with.
  • Accidental Overwriting – ex: photos from a digital camera, meeting minutes/agendasFinding – were the minutes saved as April Minutes, 04 minutes, Board minutes, recent minutes, etcGenerally speaking, avoid special characters in file names. While your system may accept them now, there is no guarantee these characters would move to a new system over time should that be required
  • This slide contains links to both the web version and the You Tube version of 4 videos created by the State Library of North Carolina about File Naming procedures. They total about 10 minutes and provide some great tips.
  • As you are creating your inventory, you are likely to discover a lot of really simple places you can clean up the files you are reviewing. Co-locate – It’s OK to move things around if it makes sense to do so. Bury – If you have several layers to hunt through, it can be really hard to find anything – Shallow is betterPurge – Unless there is a really good business reason for keeping them.
  • File backups – EX: Speeches had multiple drafts  Final + copies in several different font sizes Supplementary files – folder of images that were used in a power point. Files you can’t open – CorruptedFormats – may receive Word and pdf – May not want to keep both. Breadcrumbs – OK to leave “sticky notes” (AKA “READ ME”) files in folders. Can give a brief description of contents, retention schedule, any naming conventionsDon’t know – unknown file formats, files on old media (floppies), password protected… and then come up with a plan to deal with theses items.
  • Once you’ve decided how you want to handle file naming issues and have made file management decisions – Document itIt doesn’t have to be long….. You can distribute it in your organization – post it on an intranet, place it in a procedures manual WHY – You will not be the only keeper of the information. (You weren’t here to ask)It will help others who may be helping you with the inventoryYou can hand it out to organizations/departments you receive information from In order to better manage our files, we will accept these file types and formats, they will be named this way. Do not give us password protected documentsYou don’t have to organize and fix everything, but you do need to give other people the tools to help you.
  • Key parts of the DPOE ongoing effort are the training calendar and the DPOE ListServ
  • Managing Digital Content Over Time: Identify and Select

    1. 1. Managing Digital Content Over Time Sarah Grimm, WHS Emily Pfotenhauer, WiLS Slides and handouts: Supported by WHRAB
    2. 2. Managing Digital Content Over Time: Identifying Content Supported by WHRAB
    3. 3. DPOE Mission The mission of the Digital Preservation Outreach and Education (DPOE) program of the Library of Congress is to encourage individuals and organizations to actively preserve their digital content, building on a collaborative network of instructors, contributors, and institutional partners.
    4. 4. Six Training Modules  Identify - what digital content do you have?  Select - what portion of that content is your responsibility to preserve?  Store - how should your content be stored for the long term?  Protect - what steps are needed to protect your digital content?  Manage - what provisions are needed for long-term management?  Provide - how should your content be made available over time?
    5. 5. What is Digital Content?  Digital content is any content that is published or distributed in a digital form, including text, data, sound recordings, photographs and images, motion pictures, and software. ◦ Digital materials created from analog sources ◦ Born-digital content  Digital materials you currently have – or expect to acquire or create – that you want to preserve.
    6. 6. What’s the Problem?  Increasing amounts of digital assets are arriving on our doorstep or being created by us  The digital assets arrive in all formats and on all formats  Time sensitivity - the longer we wait or the longer our donors wait the increased chance that something will be unreadable
    7. 7. Digital Reality in 2013  Everyone is ◦ creating digital content ◦ distributing digital content ◦ using digital content  And we are responsible for managing digital content now or expecting to in the near future
    8. 8. What are the Challenges? Who takes the lead? What can I do? Where do I start? The impediments Too complex (I don’t understand...) Too daunting (I don’t have time...) Too technical, etc. (Computers scare me...)
    9. 9. What Could Possibly Go Wrong?
    10. 10. Digital Preservation Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. Working group on Defining Digital Preservation, ALA Annual Conference, 6/24/2007
    11. 11. Why Do We Identify Content?  Not all digital content can or should be preserved  Preservation requires an explicit commitment of resources  Good preservation decisions are based on an understanding of the possible content to be preserved
    12. 12. First Steps • Identifying content is a first step to planning for current and future preservation needs • Ask: what content do I have, will I have, might I have, must I have? An inventory is the best way to identify what content you have now – and raise awareness in your institution.
    13. 13. Does your institution have an inventory of your digital content?
    14. 14. If not, do you need permission to begin an inventory project?
    15. 15. Inventory Considerations  Inventory content more important than style and format  Inventory results should be: ◦ Documented: an inventory should actually exist ◦ Usable: use a simple format to sort, list, etc. ◦ Available: accessible to others ◦ Scalable: content will be added during Select ◦ Current: update periodically
    16. 16. Inventory Tips  Don’t let implementing the software become the focus.  Use software you know and have available  Stick with a single format; don't change once you've decided on it.  Be consistent, comprehensive, and concise
    17. 17. How Much Detail to Include  Inventories can be general to detailed  Determine appropriate level of detail for you  Factors in determining level of detail: ◦ Extent of content to be inventoried ◦ Nature & location of content ◦ Resources available to complete inventory ◦ Timeframe & deadlines for completion
    18. 18. What Do You Have?  Identify collections of digital materials.  Provide a brief title and description  Estimated growth over time ***
    19. 19. Who Manages It?  Department – currently managing the collection/digital content  Staff – primary people responsible  Creator (Internal or External) – who created the digital content
    20. 20. What does it consist of?  Medium (6cds, 1 hard drive)  Extent = Format + Amount (600 .pdfs, 30 .doc)  File Size – (MB, GB, TB)
    21. 21. Date Considerations Inventories should note: • Date of inventory and updates to it • Dates associated with the content (18721901) • Date of files – created or modified (2009) • Date received – if relevant / possible (2011)
    22. 22. Content Location Locations of content are important : • List primary locations (Network drive location, Hard drive on Bob’s shelf) • List locations of all backups/copies (CDs in the storage room, weekly backup tapes) Must remember to change locations as content moves
    23. 23. Analyze the Results When the inventory is complete, ask yourselves what digital content ◦ do we have that we didn’t know about? ◦ should we be keeping that we aren’t now? ◦ will we create or likely acquire in the future? ◦ are we required to keep? ◦ do we need to review?
    24. 24. Goals  Identify potential digital content you may need to preserve  Treat the inventory as a management tool that grows as your preservation program grows  Use it as a planning tool – e.g., to prepare staff, training, annual growth  Use as a basis for acquiring content, defining submission agreements, plans
    25. 25. Managing Digital Content Over Time: Selecting Content to Preserve Supported by WHRAB
    26. 26. Six Training Modules  Identify - what digital content do you have?  Select - what portion of that content will be preserved?  Store - how should your content be stored for the long term?  Protect - what steps are needed to protect your digital content?  Manage - what provisions are needed for long-term management?  Provide - how should your content be made available over time?
    27. 27. Why select content to preserve? Log jam on the St. Croix River, 1886 Wisconsin Historical Society WHi-2364
    28. 28. ● Cost: storage may be cheap, management is not…especially over time ● Discovery and dissemination services: scale, scope, performance, sustainability ● Quality of content may be variable ● Matching mission to content Why select content to preserve?
    29. 29. Basic Steps  Review your potential digital content (go back to inventory)  Define - then apply - selection criteria  Document (and preserve) selection decisions  Implement your decisions (Store, Protect, Manage, and Provide modules) Picking fruit Wisconsin Historical Society WHi-67733
    30. 30. What criteria should be used to select digital content for preservation? Postal workers sorting mail, 1955 Wisconsin Historical Society WHi-36392
    31. 31. Selection Criteria  Mission: Scope of Collections, Collecting Policies  Records retention manuals/policies (internal or externally mandated)  Legal & ethical requirements (professional bodies; your stakeholders; future users)  Uniqueness (only source or preserved elsewhere? Avoid duplication)  Value (historical, evidential, can’t reproduce?)
    32. 32. Practical Considerations Stop if or when the answer is NO ● Content – Does the content have long term value? – Does it fit your scope and mission? ● Technical – Is it feasible for you to preserve the content? ● Access – Is it possible to make the content available? – Are you the only holder of this content?
    33. 33. Setting Priorities Ask yourself which digital content is ● most significant to your organization? ● most extensive? ● most requested/used? ● easiest? ● oldest? ● newest? ● mandated? ● at risk?
    34. 34. Include Creators in the Process ● Communication is key, particularly when content comes from external creators ● Keep content creators in the conversation ● Arrange a convenient time for them to talk about your preservation plans ● Identify list of materials to review with them ● Document the results and send them a copy
    35. 35. Selection Documentation Supplement your inventory with more detailed information about the material you plan to preserve over the long term.  Use ◦ What’s the lifespan of the content? ◦ Will its value/use change over time? ◦ Retention period
    36. 36. Access and rights  Access ◦ How will the public access the content? ◦ Is access restricted? How? For how long?  Rights ◦ Who owns the rights to preserve and disseminate?
    37. 37. Prioritizing  Data criticality ◦ Is it only in digital form? Do we hold the only copy?  Business/mission criticality ◦ If we lose it, what’s the damage to our reputation? How will it impact our function or services?
    38. 38. Selection Exercise Postal workers sorting mail, 1955 Wisconsin Historical Society WHi-36392
    39. 39. Goals/Outcomes • Expanded inventory of content to preserve …and what you can delete (gray areas identified) • Agreements with content creators e.g. submission agreements, retention schedules • Well-defined and documented selection criteria, policies and procedures • Better understanding of content for future planning and growth Greater knowledge = greater control!
    40. 40. File Naming File Naming  Why is this important? ◦ To prevent accidental overwriting ◦ To help you find it again Train Wreck Image ID: WHi-2011  Don’t use special characters in your file/folder titles (^”<>|? / : @’* &.) Just because you CAN doesn’t mean you SHOULD…..
    41. 41. Resources  State Library of North Carolina – ◦ Web es ◦ YouTube
    42. 42. File Management  Store similar digital items together ◦ Co-locate in a central location  Don’t bury items in multiple levels  Get rid of easy-to-purge items ◦ Rescued or recovered documents ◦ Empty file folders ◦ ~.tmp files
    43. 43. File Management  Make decisions about what NOT to keep ◦ File backups/copies/drafts ◦ Supplementary files that provide no additional long-term value ◦ Corrupted files ◦ File Formats  Leave breadcrumbs  Determine what you don’t know
    44. 44. Document Your Decisions….
    45. 45. DPOE Resources  Training calendar: cation/courses/index.html  DPOE listserv: cation/join.html  DPOE survey: FS8
    46. 46. Questions? Sarah Grimm (WHS) Emily Pfotenhauer (WiLS) Slides and handouts: 2013