Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Crowdsourcing Digitization: Harnessing Workflows to Increase Output


Published on

Are the highly selective models of digital content creation satisfying user demands for
increasing access to our vast collection holdings? In this era of decreasing library
budgets and increasing responsibilities, is such a level of staffing possible at any but the
well-funded libraries? As a recent article in the New York Times estimated, it would take
1,800 years for the National Archives to digitize its text holdings at the current rate of
digitization1. Since November 2005, the University of Maryland libraries has engaged in
another model for digitization: a workflow model that harnesses the digitization already
being done by archivists and other staff for requests by patrons. By “crowdsourcing”
selection decisions in this way the libraries have built a collection of over 5,000 objects
from the holdings of the University Archives and Historical Manuscripts. This model is
based on two main principles:
· Selection: As one part of a programmatic approach to digitization, selections are
based on user request and added to the publicly accessible digital repository
· Image capture: Digitization itself proceeds on the premise that creating useful
surrogates is more important than digital reformatting. The path to a successful
workflow is fraught with perils, though.
The presenters will discuss the issues that have proven most effective and most difficult
in the large-scale digitization workflow in place at UM. They will highlight the technical
requirements chosen for images, metadata, and quality control and speak about how
they were, or in some cases were not, able to achieve them. In bringing to light these
issues we hope to continue an ongoing conversation (most recently articulated at
OCLC\'s \"Digitization Matters\" forum) about the purpose of digital collections and
standards of digital surrogate creation, especially in the age of mass digitization projects.
We hope to explore need to harness all of the library’s expertise and resources where
they can best be deployed.

Published in: Education, Technology
  • This is a very interesting presentation. The content is interesting and the presentation quality is high.
    Are you sure you want to  Yes  No
    Your message goes here

Crowdsourcing Digitization: Harnessing Workflows to Increase Output

  1. 1. Crowdsourcing Digitization Harnessing Workflows to Increase Output Gretchen Gueguen, East Carolina University Ann Hanlon, Marquette University LITA National Forum, 2008 Cincinnati, Ohio
  2. 2. What is crowdsourcing? <ul><li>Jeff Howe, Wired Magazine , 2006 </li></ul><ul><ul><li>“ distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains” – best example, Wikipedia… </li></ul></ul><ul><ul><li>Any end achieved by harnessing the wisdom and labor of crowds </li></ul></ul><ul><ul><li>Distributing the burden of a large endeavor </li></ul></ul>Howe, Jeff. “The Rise of Crowdsourcing”, Wired Magazine , Issue 14.06, June 2006
  3. 3. Crowdsourcing Digitization <ul><li>Crowd? </li></ul><ul><ul><li>Patrons and Co-workers </li></ul></ul><ul><li>Capturing digitization for patron request </li></ul><ul><ul><li>Selection is driven by patron request </li></ul></ul><ul><li>Centralized and Decentralized staffing for digitization </li></ul><ul><li>Object : Build robust digital collections </li></ul><ul><li>Online collections dense enough for systematic research (not just showcases) </li></ul>
  4. 4. Crowdsourcing Digitization <ul><li>The Wisdom of Crowds </li></ul><ul><ul><li>How the project was conceived and developed: success story </li></ul></ul><ul><li>The Madness of Crowds </li></ul><ul><ul><li>How the project failed, why: bringing it back from the brink </li></ul></ul><ul><li>Crowd Control </li></ul><ul><ul><li>Methods used and lessons learned </li></ul></ul><ul><li>Attracting a Crowd </li></ul><ul><ul><li>Critical mass for the masses: why we digitize </li></ul></ul>
  5. 5. The Wisdom of Crowds
  6. 6. The Wisdom of Crowds <ul><li>Project Background: Archives and Special Collections </li></ul><ul><ul><li>Digital image management for archives and special collections </li></ul></ul><ul><ul><li>Reducing redundancy – many items requested for digitization more than once, why not track them? </li></ul></ul><ul><li>Digital Collections and Research (DCR) </li></ul><ul><ul><li>New office to coordinate digitization efforts established </li></ul></ul><ul><ul><li>Establishing a digital repository </li></ul></ul><ul><ul><li>More ambitious than just image management </li></ul></ul>Image management = capturing patron scanning workflow to populate the new repository
  7. 7. The Wisdom of Crowds <ul><li>Coordination between Archives and Digital Collections: </li></ul><ul><ul><li>New metadata schema </li></ul></ul><ul><ul><li>New best practice guidelines </li></ul></ul><ul><li>Developing Repository </li></ul><ul><ul><li>Fedora required development </li></ul></ul>Meanwhile, patron scanning continues to grow…
  8. 8. The Wisdom of Crowds <ul><li>Answer: Scanning Database </li></ul><ul><ul><li>Microsoft Access database: “stop-gap measure” while digital repository was being built </li></ul></ul><ul><ul><li>Corresponded to newly created XML schema and metadata requirements for repository </li></ul></ul>
  9. 9. The Wisdom of Crowds
  10. 10. The Wisdom of Crowds <ul><li>Biggest beneficiary: University Archives </li></ul><ul><ul><li>Receives the most scanning requests from patrons </li></ul></ul><ul><ul><li>Capture patron requests, as well as items scanned prior to implementation of Scanning Database </li></ul></ul><ul><ul><li>University celebrating 150 th anniversary </li></ul></ul><ul><ul><ul><li>Documentary </li></ul></ul></ul><ul><ul><ul><li>“ Coffee table” book </li></ul></ul></ul><ul><ul><ul><li>Departmental histories </li></ul></ul></ul><ul><ul><ul><li>Nostalgic alumnae </li></ul></ul></ul>
  11. 11. The Wisdom of Crowds <ul><li>Collections created by crowdsourcing digitization: </li></ul><ul><ul><li>University AlbUM </li></ul></ul><ul><ul><li>National Trust for Historic Preservation Postcard Collection </li></ul></ul>
  12. 12. The Madness of Crowds
  13. 13. The Madness of Crowds <ul><li>Evolution </li></ul><ul><ul><li>Evolving standards for both metadata and imaging </li></ul></ul><ul><li>Training/Quality </li></ul><ul><li>(dis)Organization </li></ul><ul><li>Backlog </li></ul>
  14. 14. The Madness of Crowds <ul><li>Evolution </li></ul><ul><ul><li>Quality of legacy scans </li></ul></ul><ul><ul><ul><li>file types </li></ul></ul></ul><ul><ul><ul><li>spatial resolutions </li></ul></ul></ul><ul><ul><ul><li>Color profiles </li></ul></ul></ul><ul><ul><ul><li>Clipping, noise, and other “problems” </li></ul></ul></ul><ul><ul><ul><li>Flawed equipment </li></ul></ul></ul><ul><li>Training/Procedures </li></ul><ul><li>(dis)Organization </li></ul><ul><li>Backlog </li></ul>
  15. 15. The Madness of Crowds Rotated 90º Rotated 180º 24-bit color 300 dpi tif 8-bit 600 dpi tif 48-bit color 600 dpi tif Bitonal EPS 16-bit 300 dpi JPEG indexed color 72 dpi gif PDF???
  16. 16. The Madness of Crowds
  17. 17. <ul><li>Evolution </li></ul><ul><ul><li>Metadata Quality </li></ul></ul><ul><ul><ul><li>Lack of experience with controlled vocabularies and input standards </li></ul></ul></ul><ul><ul><ul><li>Changing metadata requirements </li></ul></ul></ul><ul><li>Training/Procedures </li></ul><ul><li>(dis)Organization </li></ul><ul><li>Backlog </li></ul>The Madness of Crowds It’s not quite wrong… but, it’s not quite right
  18. 18. <ul><li>Evolution </li></ul><ul><li>Training/Procedures </li></ul><ul><ul><li>No standard guidelines for scanning procedures </li></ul></ul><ul><ul><li>No quality control procedures for images or metadata </li></ul></ul><ul><ul><li>No one to set them up anyway… </li></ul></ul><ul><li>(dis)Organization </li></ul><ul><li>Backlog </li></ul>The Madness of Crowds
  19. 19. The Madness of Crowds
  20. 20. The Madness of Crowds
  21. 21. The Madness of Crowds <ul><li>Evolution </li></ul><ul><li>Training/Procedures </li></ul><ul><li>(dis)Organization </li></ul><ul><ul><li>Does everything fit in a “collection? </li></ul></ul><ul><li>Backlog </li></ul>
  22. 22. The Madness of Crowds <ul><li>Evolution </li></ul><ul><li>Training/Procedures </li></ul><ul><li>(dis)Organization </li></ul><ul><li>Backlog </li></ul><ul><ul><li>Robust metadata standard to enable repurposing and “sharability” </li></ul></ul><ul><ul><li>Could take 10x more time to do metadata than scanning </li></ul></ul><ul><ul><li>Volume of scanning didn’t leave much time for metadata </li></ul></ul>
  23. 23. The Madness of Crowds
  24. 24. Crowd Control
  25. 25. <ul><li>Create Documentation </li></ul><ul><li>“ Teachable” standard </li></ul><ul><li>Responsibility </li></ul><ul><li>Quality </li></ul><ul><li>Divide and Conquer?!? </li></ul>Crowd Control
  26. 26. Crowd Control <ul><li>Create Documentation </li></ul><ul><li>TEACH it </li></ul><ul><li>Responsibility </li></ul><ul><li>Quality: Live it, Learn it, Love it </li></ul><ul><li>Divide and Conquer </li></ul><ul><ul><li>6. file format </li></ul></ul><ul><ul><li>3. straightness and placement </li></ul></ul><ul><ul><li>1. resolution </li></ul></ul><ul><ul><li>2. color </li></ul></ul><ul><ul><li>4. reference points (targets) </li></ul></ul><ul><ul><li>5. noise </li></ul></ul>
  27. 27. Crowd Control Puglia, 2007 Imaging Environment Defined Image State RAW Prepped for a specific output Output Referred - looks towards output Input Referred - looks towards sensor Original Referred - defined relationship between original and digital version Current Practice Emerging Practice More technical metadata is needed Should be able to get by with less technical metadata
  28. 28. <ul><li>Create documentation </li></ul><ul><li>TEACH it! </li></ul><ul><li>Quality: Live it, Learn it, Love it </li></ul><ul><ul><li>Have curatorial staff check for accuracy and completeness </li></ul></ul><ul><ul><li>DCR staff follow up with a check of a statistically significant portion for style and consistency </li></ul></ul><ul><ul><li>Eventually, give curatorial staff to make corrections as they find them using the web-based administrative form </li></ul></ul><ul><li>Responsibility </li></ul><ul><li>Divide and conquer?!? </li></ul>Crowd Control
  29. 29. <ul><li>Documentation </li></ul><ul><li>“ Teachable” standard </li></ul><ul><li>Quality: Live it, Learn it, Love it </li></ul><ul><li>Responsibility </li></ul><ul><ul><li>Someone has to have some </li></ul></ul><ul><ul><li>But it doesn’t have to be an entire job </li></ul></ul><ul><li>Divide and Conquer </li></ul>Crowd Control
  30. 30. <ul><li>Create documentation </li></ul><ul><li>TEACH it! </li></ul><ul><li>Quality: Live it, Learn it, Love it </li></ul><ul><li>Responsibility </li></ul><ul><li>Divide and conquer?!? </li></ul><ul><ul><li>Stub record created at request time; Cataloging enhances </li></ul></ul>Crowd Control
  31. 31. Crowd Control <ul><li>Create documentation </li></ul><ul><li>TEACH it! </li></ul><ul><li>Quality: Live it, Learn it, Love it </li></ul><ul><li>Responsibility </li></ul><ul><li>Divide and conquer </li></ul><ul><li>Give up </li></ul><ul><ul><li>Less control, more power </li></ul></ul>
  32. 32. Crowd Control <ul><li>Would you want to try this? </li></ul><ul><ul><li>Give yourself room to evolve and change through the project </li></ul></ul><ul><ul><li>Don’t feel like every image is a precious snowflake </li></ul></ul><ul><ul><li>More than any single technique, it’s the philosophy of crowdsourcing that’s more important </li></ul></ul>
  33. 33. Crowd Control <ul><li>Would you want to try this? </li></ul><ul><ul><li>Don’t feel like every image is a precious snowflake </li></ul></ul>l Access to a low-quality scan… … is still better than no access at all.
  34. 34. <ul><li>Would you want to try this? </li></ul><ul><ul><li>More than any single technique, it’s the philosophy of crowdsourcing that’s important </li></ul></ul>
  35. 35. Crowd Control
  36. 37. Attracting a Crowd
  37. 38. Attracting a Crowd <ul><li>Letting Go </li></ul><ul><ul><li>“ Letting go” creates efficiencies </li></ul></ul><ul><ul><li>Looking at expertise across the Libraries </li></ul></ul><ul><ul><li>Distribute the burden </li></ul></ul><ul><li>Move away from “trophy” collections </li></ul><ul><li>toward online Research Collections </li></ul>
  38. 39. Attracting a Crowd <ul><li>Distributed Problem-solving </li></ul><ul><ul><li>Ideas from Archives: </li></ul></ul><ul><ul><ul><li>Organizing repository by subject rather than by collection </li></ul></ul></ul><ul><ul><ul><li>Dabbling in folder-level description (and digitization) rather than just item-level </li></ul></ul></ul><ul><li>Neutral Collection-building </li></ul><ul><ul><li>Erway, Ricky and Jennifer Schaffner. 2007, “Gearing Up to Get Into the Flow.” Report produced by OCLC Programs and Research (formerly RLG) </li></ul></ul>
  39. 40. Attracting a Crowd <ul><li>Distributed Problem-solving </li></ul><ul><ul><li>Ideas from Archives: </li></ul></ul><ul><ul><ul><li>Using “stub records” from patron request forms </li></ul></ul></ul><ul><ul><ul><li>Dabbling in folder-level description (and digitization) rather than just item-level </li></ul></ul></ul><ul><li>“ Neutral” Collection-building </li></ul><ul><ul><li>Wikipedia-style collection-building </li></ul></ul><ul><ul><li>Building a collection with wide range </li></ul></ul>
  40. 41. Attracting a Crowd <ul><li>Mass digitization </li></ul><ul><ul><li>Google projects: </li></ul></ul><ul><ul><ul><li>Books </li></ul></ul></ul><ul><ul><ul><li>Newspapers </li></ul></ul></ul><ul><li>Mass decision- making </li></ul><ul><ul><li>Instead of item-level decision-making </li></ul></ul>
  41. 42. Attracting a Crowd <ul><li>Making Digitization a Core Function of the Library </li></ul><ul><ul><li>Mission Statements come to life! </li></ul></ul><ul><ul><li>Organizing around digitization – very little has really been done yet </li></ul></ul>Why? For researchers <ul><li>“ Fringe activities” need to become core investments </li></ul><ul><ul><li>Metadata creation </li></ul></ul><ul><ul><li>Digitization </li></ul></ul>Council on Library and Information Resources (CLIR). No Brief Candle: Reconceiving Research Libraries for the 21 st Century , 2008.
  42. 43. Crowdsourcing Digitization <ul><li>THANKS! </li></ul><ul><li>Access these slides at: </li></ul><ul><li> </li></ul><ul><li>Or: </li></ul><ul><li> </li></ul>Gretchen Gueguen [email_address] East Carolina University Greenville, North Carolina Ann Hanlon [email_address] Marquette University Milwaukee, Wisconsin