Digitizing a newspaper clippings collection: a case study in small-scale digital projects

8,594
-1

Published on

Presented at the Medical Library Association's 2009 Annual Meeting in Honolulu, HI.

Published in: Education, Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,594
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Thanks patriciaSo, in talking about my case study today
  • While no documentation exists, the original process of building this collection was probably similar to what we do today. Starting in the 1930s, library member would skim the local daily papers for any mention of LSU School of Medicine, and it’s faculty, staff or students.
  • When an article was discovered, it was clipped, dated, and the name of the paper was noted. The articles were then glued to standard typing paper, usually several to a page, somewhat in order by date. The paper was assigned a simple call number. A librarian would read the articles, underline named entities, and assign a subject heading, which was recorded in a small local card catalog. The pages of clippings were placed in folders by year and put into filing cabinets. This continued for 50 years.
  • the clip file was still sitting in filing cabinets when I was assumed responsibility of the clip file in 2002. The subject catalog was still intact. However, there were some problems – filing cabinet storage had not been kind to typing paper, which curled heavily. The newsprint showed signs of age. Rust marks appeared where staples & paperclips had once connected pages. Some chunks of clippings were missing. And the only way to look up anything before 1985 was the card catalog in tech services.
  • On a side note: In 1998 the library indexed articles in the Newspaper Clippings file using ProCite and Reference Manager(local database software) back to 1985.This continues today, though we have migrated the content from ProCite to Refworks, another biblio. Mgmt system. (You can ask me about how data migration between Procite and Refworks went later.) the other 47 years of the collection remained in technical services, with very limited access.
  • So basically, what we had was a unique local news collection, spanning the majority of the 20th century, collected under questionable archival storage methods, and limited access to documents before 1985. What were we going to do about it?So I started thinking….maybe we could write a grant for this……
  • So I wrote an AMIGOS library services grant which would investigate using Greenstone digital library software, an open source UNESCO product, to digitize newspaper clippings. Though the grant application was rejected, the process did provide a catalyst for action. Admin was impressed enough with the grant’s digitization plan that they provided funding for a scanner, software and travel to a continuing education class on digital projects. However, we discovered to our dismay later that year that the Greenstone software would not work properly on our secure intranet. In addition, the quality of images from our original scanner were poor. Then Katrina happened. Everything in the library was ok, but it was moved to remote storage for half a year.During the ensuing hiatus, in 2006 staff took several continuing education classes on digitization. In 2007, an opportunity opened for us to join LOUISana digital library, the state digital library consortium. We were able to obtain access to the OCLC’s ContentDM platform, which was previously out of our price range.After several months of collection planning and software training, we were able to begin our own collection. To date we have over 1600 newspaper clippings spanning the years 1933 to 1953. “Digital Imaging of Library Materials” SOLINET course, Baton Rouge, LA, 23 July 2007“Digitization Fundamentals”. Course offered by the Illinois Digitization Institute at the University of Illinois Urbana-Champaign, IL, February 27-March 17, 2006 “Putting the Digital Puzzle Together”, ALCTS 2004 Pre-Conference, Orlando, FL 25 June 2004
  • Here is the simplified workflow scheme for our digital projects. Library staff scans the original piece of paper and saves it as a TIF file. Using PSP, we digitally manipulate the original scan to create single TIF files of the clippings. If not visible, the call number, date and newspaper name from the original are copied and pasted to the now isolated clipping. The item is also processed for alignment and picture quality. Next, library staff loads the clipping into the CDM project client and enters cursory metadata (title, journal, and technical info), and records their progress in a Scanning Log. The librarian performs Opitcal Character Recognition (OCR) on the clipping to create an excerpted text field (this field is keyword searchable in the digital library), and assigns subject headings. OCR takes a bit of time, but it is a good way to review the article & assign the proper subject heading. After a final quality check the item is approved and uploaded to the digital library. ContentDM automatically archives collections, so a backup is burnt to archival quality CD roms after the item is added to the collection, as well as saved on the server.
  • Cataloguing & metadata are another important part of this project. Our consortium already had metadata standards in place, requiring collections to use Dublin Core and a few more administrative pieces of metadata. In addition, Content DM allows users to build your own controlled vocabularyWe used the collections card catalog as a basis to build our own institutional controlled vocabulary, which also served to verify names and spellings of physicians. However, sometimes other subjects are necessary. When applicable, we consult the MeSH Browser for subjects. For example, the MeSH term “congresses as topic” is used when an article discusses conferences, or “Publications” when an article discusses a new book or journal article. Sometimes, MeSH is not useful, especially when discussing things like campus expansion or university events. In these cases a subject heading is assigned by the librarian. To further open the collection, keyword searching is enabled for the excerpted text field. Items can also be browsed by year, subject, creator or title on the collection’s description page.
  • Some considerations for this project:The collections deteriorating condition made storage a huge priority. Physical files were transferred to flat Archival storage boxes and acid-free folders. But digital storage is an issue as well. Server space and data backups are critical. Orgs must also consider the digital mortgage: how will you will transfer old files to new formats as software and hardware change?Here you can see some of the collection standards we set for images, metadata, hardware and software. Regarding training, everyone needs to learn photoshop. The PM should take at least a class on managing digital projects-regional & state lib groups are good sources. Our consortium takes a ‘train the trainer’ approach to ContentDM, so I was responsible for training local staff on the software later. Staffing - We currently have 1 librarian & 2 staff members working on this project. Staff are asked to scan about 60 items per week - about 8-10 hours work. To address scheduling for the busy librarian, a great idea came from my boss: set one day aside for the project each week. Friday has since become Digitization Day and has worked well in keeping the load of items needing cataloging to a reasonable amount. Finally: documentation: Note what you do each day. Create a local policy and stick to it. One thing we do – suggested by our consortium – is keep a scanning log:an excel file that tracks file name, the date scanned & file size, as well as the locations where the data is currently stored, and whether it has been backed up. It’s convenient way to track the size and progressof a collection.
  • My general question is:
  • So: challenges.Support from your institution from the start is critical. Administration has to be on board to provide funding and act as a liaison to other resources, ie: a legal dept if you ever have copyright questions.You will also need IT support. Getting our IT dept.to provide support for open source library software was a challenge. One of the benefits of consortialmembership is THEY provide tech support.One of the first challenges we encountered was software sustainability. The greenstone digital library software, while free, did not work within our intranet and required higher level tech skills than we possessed. Problems with our old scanner resulted in poor quality images that had to be redone – we’ve since upgraded hardware. Currently ContentDM is undergoing an upgrade to a new version. This has required more training. Another issue we’ve seen in our statewide is the growth of “orphan collections” – collections that have been abandoned by their creatorsCopyright. Our collection is unique in that it collects clippings from many sources. All materials were published after 1923. Therefore, the work may be protected by copyright until 2018. Our solution: the images of the newspaper clippings collection can only be viewed on-campus or with an off-campus login. Metadata is viewable to anyone. This way, any user can search our collection, and if they are not from our campus we can work with them to get the information. Funding is a final challenge. Consortial membership to the digital library is about $2000/year. Our hardware & software ran about $1500 in startup costs. I suggest looking for Grants & scholarships: a scholarship from the SCC region helped me to attend an ALCTS continuing education class on metadata, and we have currently applied for an IMLS “Connecting to Collections” Bookshelf grant in order to get more books on digital and physical preservation.
  • In conclusion, Searchable historic archiveWe now have over 10 years (1600 items) of institutional history available online in a searchable, cataloged database. Personally I find it a lasting tribute to the 8 decades of persistent work on this collection at LSUIncreased visibilityThanks to OCLC’s contentDM indexing, results appear in google (and soon worldcat). We’ve received several inquiries from around US (inquiries from med student & mother). The collection also gives us excellent ideas for our library’s blog.Catalyst for change The collection has also acted as a change agent: inspiring our staff to organize our rare books room, research archival methods for storage, and apply for a small preservation bookshelf grant.Mentoring One of the things I’m proudest of is the mentoring opportunity this created. Keith Pickett, a staff member who helped start this project, completed his library degree & is now Digital Initiatives Librarian at the university of new orleansAnd finally, we now have a workflow in place & experience for future projects. Because of this project, our dental school has started a photograph collection. Our future projects include a collection of photos and ephemera on LSU and Charity hospital history, in order to coincide with the 70 year anniversary of our institution.
  • How do you get from this
  • Here’s what I’ll be discussing today about our newspaper clippings collection. study. Collection descriptionTimelineWorkflowCataloguingConsiderations ChallengesResults
  • The Newspaper clippings file has been collected since 1933, and is still collected today. the clippings consist mostly of local/regional newspapers. There are about 45 publications currently indexed. Content-wise, the collection is a 70 year snapshot of the development of the health sciences in Louisiana.
  • Articles include topics such as:People, places & events associated w/ LSU school of medicine,
  • the development of health infrastructure in Southeast Louisiana and New Orleans ,
  • and the Development of 20th century Health Sciences education in Louisiana.
  • The reason why this project actually came about was the condition of the physical clippings collection.
  • Digitizing a newspaper clippings collection: a case study in small-scale digital projects

    1. 1. Digitizing a newspaper clippings collection a case study in small-scale digital projects Maureen “Molly” Knapp LSUHSC New Orleans LA
    2. 2. question
    3. 3. How do you get from this
    4. 4. To this?
    5. 5. objectives • Collection description • Timeline • Workflow • Cataloguing • Considerations • Challenges • Results
    6. 6. Collection Description • Collected since 1933, still collected today • Mostly local/regional papers • Traces local history of health sciences
    7. 7. Articles include topics such as: People, places & events associated w/ LSU school of medicine,
    8. 8. the development of health infrastructure in Southeast Louisiana and New Orleans ,
    9. 9. and the Development of 20th century Health Sciences education in Louisiana.
    10. 10. The collection’s condition
    11. 11. While no documentation exists, the original process of building this collection was probably similar to what we do today. Starting in the 1930s, library member would skim the local daily papers for any mention of LSU School of Medicine, and it’s faculty, staff or students.
    12. 12. clipped, dated, and the name of the paper was noted. The articles were then glued to standard typing paper, usually several to a page, somewhat in order by date. The paper was assigned a simple call number. A librarian would read the articles, underline named entities, and assign a subject heading, which was recorded in a small local card catalog. The pages of clippings were placed in folders by year and put into filing cabinets. This continued for 50 years.
    13. 13. the clip file was still sitting in filing cabinets when I was assumed responsibility of the clip file in 2002. The subject catalog was still intact. However, there were some problems – filing cabinet storage had not been kind to typing paper, which curled heavily. The newsprint showed signs of age. Rust marks appeared where staples & paperclips had once connected pages. Some chunks of clippings were missing. And the only way to look up anything before 1985 was the card catalog in tech services.
    14. 14. On a side note: In 1998 the library indexed articles in the Newspaper Clippings file using ProCite and Reference Manager(local database software) back to 1985. This continues today, though we have migrated the content from ProCite to Refworks, another biblio. Mgmt system. (You can ask me about how data migration between Procite and Refworks went later.) the other 47 years of the collection remained in technical services, with very limited access.
    15. 15. Condition • Questionable archival collection/storage methods • Paper condition deteriorating • Very limited access to documents pre-1985
    16. 16. Timeline May 2009 1400+ Fall 2007 objects available on Library joins state the state digital Spring 2004 Library Fall 2005 Projects on digital library library, more projects on the way funds digital project hold, remote storage consortium Summer 2004-2005 2006-2007 Continuing Jan 2008 Staff training Continuing education during & planning, project education, Greenstone library displacement begins fails, explore other free software options
    17. 17. Image processing: basic work flow Digital Scan original manipulation in (library staff) PSP (library staff) Object added to OCR, cataloging CDM Project Client (librarian) (library staff) Item approved & Archiving added to Digital (Content DM) Library (librarian)
    18. 18. Cataloguing • Constortium Metadata standards (DC) • Existing card catalogue • Incorporation of MeSH • Creation of institutional controlled vocabulary • Key word searching enabled
    19. 19. Considerations Storage Standards Training/Staffing Documentation • Physical: transfer to • Images: TIF, 600 dpi, • Training: everyone • Note what you do flat storage 8-bit grayscale needs to know PSP each day • Digital: how much • Metadata: DC15, • Train the trainer - • Create a local space will we need Consortium digitization manual • Continuing on a server? standards, education –regional • Incorporate collection needs • Archives: how will library groups, etc standards in your you back up digital • Hardware: HP practice & stick to • Staffing: How many data? ScanJet 8390 them hours per week will CD/DVDs, external Scanner, 21” be dedicated? • Keep a scanning log hard drives monitor, Dell – track files, size of • Time mgmt: computer • “The Digital project, progress, l allocate one day per Mortgage” • Software: ocations where data week for project Photoshop, ABBYY is stored OCR software, CONTENTdm
    20. 20. Challenges Buy-in Sustainability Access/© Funding • Library • Software • Copyright • Scholarships Administration clearance • Upgrades • Regional CE • IT support • IP Restrictions funding, regio • Orphan (TIFs) nal library collections groups • Metadata openly available • Grants • IMLS • Institutional budget • Write budget proposal
    21. 21. Results/Observations • Searchable historic archive (1933-1953) • Increased visibility • Catalyst for change • Mentoring • Work flow in place for future projects
    22. 22. So to answer the question
    23. 23. How to get from this
    24. 24. To this,
    25. 25. Careful planning
    26. 26. Research and education
    27. 27. Persistence
    28. 28. Hard work
    29. 29. http://www.Louisianadigitallibrary.org Maureen “Molly” Knapp | mknapp@lsuhsc.edu | THANK YOU!

    ×