Integrating Unique Materials into the Global Discovery Network


Published on

Integrating Unique Materials into the Global Discovery Network : a presentation from OCLC Research and the RLG Partnership

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • First of all I’d like to say how sorry I am that I’m unable to join you in person today. However, I am glad that we’ve found a way to leverage technology and that I’m virtually with you, which is hopefully a step up from someone else giving my presentation. I want to talk to you today about a significant program of work that we’ve launched on behalf of the RLG Partnership within the last year, focusing on making unique materials more discoverable and useable by scholars and citizens everywhere. Our work agenda in this area seems to me to be very US-centric in many ways, although I think it is of interest to a broader audience as well. So I first want to apologize my rather parochial perspective. Second, I want to appeal to you directly to help broaden our projects and help us understand needs and impact in your own environment.
  • For libraries and other cultural organizations prepared to invest resources and staff expertise in collaboratively designing innovative programs and future services, the RLG Partnership is a global alliance of like-minded institutions that focuses on making operational processes more efficient and shaping new scholarly services by directly engaging senior managers. Today’s research libraries have to manage their collections differently, organize what they deliver more effectively, and create new infrastructures to sustain a new service array in order to be a valued part of the scholarly process. Confronted with disruptive technologies and challenging economics they must genuinely transform how they make their collections available to their users—a feat that is difficult for a single institution to tackle alone. Unlike regional, trade or issue-driven groups in the community, the RLG Partnership: * is supported by the full capacities of OCLC Research, * benefits from an international, system-wide perspective, and * connects to the broad array of OCLC products and services.
  • Changed circumstances. From the time OCLC RLG foundation, in the mid 1970s until the advent of the Web some twenty years later, libraries were rightly considered destinations. Photo of Sterling Memorial Library, Yale University (G. Waibel).
  • Having said that, I’ll launch in by talking about the American context that has motivated our work, and talk about some trends we see in our environment. Money is always an important (and motivating!) factor. One of the trends we see includes agencies who are offering funding to institutions to directly tackle what has been called the hidden collections problem. In the US, the hidden collections problem is viewed as acute, and has risen to a point where funding agencies are creating programs specifically to address the problem of collections that are held in libraries and archives but which are unprocessed and/or uncataloged and effectively remain hidden to scholars. I want to put special emphasis on the interests of the Mellon Foundation and their partnership with the Council on Library and Information Resources to create a new funding stream for making previously “hidden” collections more accessible. The Mellon Foundation has long funded projects in this area, but it’s important to note the establishment of a new funding program. Additionally, two of our national funding agencies, NEH and NHPRC also have new or retooled funding programs with a similar slant. Another trend is a lot of environmental awareness about the problem, which is seen in a number of reports and activities. The library of congress’s On The Record Report from 2006 had recommendations not only for the Library of Congress, but also for the larger library community. Some of those high level recommendations include: 2.1.1 Make the discovery of rare, unique and other special hidden materials a high priority 2.1.2 Streamline cataloging for rare, unique and other special hidden materials, emphasizing greater coverage and broader access 2.1.3 Integrate access to rare, unique and other special hidden materials with other library materials 2.1.4 Encourage digitization to allow broader access 2.1.5 Share access to rare, unique and other special hidden materials The Association of Research Libraries has a very active special collections working group, which has just issued a discussion report in March which has a major focus on increasing the discoverability and access of special collections. And finally the report More Product, Less Process written by Mark Greene and Dennis Meissner in 2005 has caused American archivists to think innovatively about ways to decrease processing backlogs and give access to more collections. So we have an active community of practitioners poised for change.
  • Additionally on the environmental side, we are seeing various statements about the roll of special collections in the research library. Our own “information context document, issued in 2007 predicted increased prominence and importance for special collections and more recently a similar statement about special collections and archives was issued as part of the Taiga Provocative Statements, earlier this year.
  • I’d now like to turn to our work, which can be broken down I think into three different categories. The first group of projects are oriented around understanding the current landscape, and includes data analysis, synthesis work, surveys, and reports. The second group of projects looks at what we already have in terms of data and asks how we can make the best of what we already have. The third group of projects is more forward looking – how can we change processes, policies and attitudes to be more efficient and effective, and to bring the special collections – particularly that which is invisible – more directly into the light.
  • A signal work item we are undertaking in terms of establishing where we are now will be updating and expanding a key piece of work that was conducted just over 10 years ago. In 1996, the Association of Research Libraries undertook a major survey of special collections within its members. The survey included questions about size and scope of collections, as well as management of collections and staffing to support collections. This landmark survey resulted in a 2001 ARL report that coined the phrase “hidden collections,” and touched off an explosion of work. In many ways the suggestions and findings in the ARL report were similar to those made in the 1999 UKOLN report, Full Disclosure. This fall, we are redoing and expanding the survey, both in terms of the questions we are asking and also in terms of the population. We will be surveying a broader range of institutions, including the Independent Research Libraries Association, the Oberlin Group (composed of small liberal arts colleges) and members of the Canadian A cademic and Research Libraries or CARL. The survey will return important longitudinal data on collection size, and how much progress institutions have made on reducing cataloging and processing backlogs. We are also adding questions that were not asked 10 years ago, such as questions about digitization, born-digital records, descriptive practice, archival management systems. The survey will be conducted this fall, and you can expect to see a report in the spring or summer of 2010. Jackie Dooley
  • This is a project that will in the first instance analyze MARC descriptions of archival records to determine content of maximum value, as well as that of lesser value. We are using a sophisticated data mining tool developed in-house by our software engineers. We are starting this project by looking at MARC records for archival materials, but could potentially expand this project to include other types of archival description, such as EAD finding aids. Once we have a handle on the data that we have, we can make recommendations to the cataloging and descriptive community. For example, if data elements are rarely used (either used by a handful of institutions, or represented in a handful of records), they have little or no value in an aggregated or networked space – so effort may be misplaced. We can also look at data elements that are universally represented and see if we can leverage these fields for relevance ranking or for discovery. Jackie Dooley in the lead on this project
  • Another landscape project, recently completed by my colleague Jennifer Schaffner, was to synthesize user studies on archival information systems and publish her findings. Report on findings of user studies (tons, since early 80s) – how do people search for special collections and what do they expect? No surprises – we already know most of what we need to know. Reasearchers… Start with Google Do keyword searching Expect rank results like a search engine Step One: Read Everything
  • Step two – synthesize the evidence No dearth of solid work (despite frequent whining) No surprises as the lit review piled up the various conclusions from user studies Now the impact of discovery of archives and special collections resides in the metadata (link to Jackie’s work) Resulting report: The metadata is the interface published earlier this year Findings are that users… … don’t do mediated research and don’t use our websites … search by subject, using keywords … expect relevance ranking … expect comprehensive coverage … scan and scroll long records and finding aids just fine … want to trust what they find – need reliable information We need: … evidence beyond anecdotes and impressions … statistics and search logs
  • Our next project in this area, again led by my colleague Jennifer will be to look at look at search logs for archival information systems. She will be looking at types of search terms that are used, and then turning again to look at the description analysis and asking questions. Are data elements present that would in essence answer the questions that are being asked? Do you have search logs, and can you share them for this project? Jennifer would like to hear from you. We need evidence beyond anecdotes and impressions.
  • I’m now transitioning to projects that relate to what we already have, before turning to projects that will guide us into the future.
  • Relating back to the two projects I just described, I think there are possibilities for data remediation, based on what we’ve learned from the user studies synthesis and what we will learn from the search log analysis…
  • Work on undertaking a project for quick-and-easy conversion of “hidden” finding aids. This project will capitalize on the tremendous investment in intellectual labor to describe unique materials and collections that has already been made. We have surveyed the RLG Partnership to see if there is a need for this time of project, and indeed there is. We are lining up likely funding to develop model methodology for converting paper finding aids, in order to integrate exposure and discovery of collections. This could be something like converting paper finding aids to HTML or PDF and then make them available on the web for discovery. Ideally, we’d optimize these documents for discovery, based on what we’ve learned through user studies and also through the log analysis and combine this with what we know about search engine optimization strategies. The main output of this project would be to develop and test ways for doing this with minimum cost structures, and to make methodologies broadly available.
  • Finally, I’d like to talk about some projects that will help point to new directions.
  • The first project in this area is to develop a list of available tools and methodologies for assessing a repository’s overall collection holdings, including backlogs. This work is being carried out by a working group led by Merrilee Proffitt. A report will be out by the end of this year. Collections assessment activities include gathering information about the status of description, physical (preservation/conservation needs), research interest, state of processing (or processing needs), how “research ready” or “scan worthy” the collections are (for example, are there copyright concerns, privacy or confidentiality issues?). We noticed that institutions were beginning to pursue and secure funding for backlog assessment projects. Institutions are increasingly interested in addressing processing backlogs but want to be able to weigh and assess collections against one another in terms in order to prioritize what is often a daunting multi year course of work. Because institutions do not generally have information about collections on hand in order to make these decisions out of the gate, they are applying for and receiving funding to evaluate archival backlogs or evaluated their archival collections as a whole. Generally speaking there is not a broadly shared methodology or set of methodologies available for collections assessment activities and this project will document available methodologies and highlight features and outcomes. We anticipate that our report will ease the burden of institutions needing to develop their own methods and hopefully will encourage standardization on a few approached . We are examining about 10 different surveys, including one developed by the Philadelphia Area Consortium of Special Collections Libraries or PACSCL, and also the Logjam project developed by the Northwest Regional Archives Council.
  • Work led by Dennis Massie. Builds on previous work, started in 2002 with a forum called Sharing the Wealth. Some interest then, more interest now. The project is being carried out with the help of a working group, populated by ILL practitioners and special collections curators, in some cases pairs from the same institution. From preliminary discussions and throughout a May 28 Webinar we sponsored on the topic called "Treasures on Trucks and Other Taboos: Rethinking the Sharing of Special Collections," two main areas of work for the advisory group seem to have bubbled to the top: 1) streamlining the handling of external requests for special collections materials, and 2)considering what sorts of information and communication are necessary to establish trust between two institutions sufficient for the physical loan of special collections items. We lack UK representation on this group, so if there’s interest in this room, please do get in touch.
  • Related to the previous project. Users of archives and special collections often want copies of original materials after discovering metadata describing relevant content or seeing items in the reading room. Preparation of digital scans, photocopies and photographs can, however, consume a great deal of staff time. Policies and practices vary widely across institutions; these local practices can be both frustrating and confusing for users to understand. In recent years a particular flash point has been the unwillingness of some institutions to allow users to use digital cameras in the reading room in order both to facilitate immediate acquisition of reproductions and to reduce the costs associated with conducting research. In some cases, libraries and archives continue to charge for copies in order to offset lost revenue from duplication services. This project is being tackled by a working group that is addressing workflow and policy issues arising from digitizing (and copying) materials from special collections. Specific focus is on scan-on-demand workflow in reading rooms, integration of patron-initiated scans with large-scale digitization and digital library workflow, recommendations for minimum levels of scanning and metadata, and policies for hand-held cameras in reading rooms. This project is being run by my colleague Jennifer Schaffner who sent me with a message. We need you. The large and active working group needs representation from UK institutions. Please contact me, John, or Jennifer if you are interested in being involved, or in having someone from your institution involved.
  • Two years ago, we published a paper called Shifting Gears that looked at how to ramp up for large-scale digitization of special collections. We continue our interest in helping to encourage institutions in digitizing collections at scale, and are now focusing on balancing fears about rights, especially when digitizing unpublished material at scale Librarians and archivists often make extremely conservative judgments regarding the risk involved in copying unpublished collections. Many institutions have time-consuming, overly-cautious procedures to ensure vigorous compliance with copyright law—sometimes without a full understanding of the law or of the negative impact their procedures have on achieving their mission. If access is the goal, then any unnecessary restriction is counterproductive. The digital age has induced yet more caution, creating the ironic situation where, just when users ought to be getting improved services, they're not even getting as good a service as they could through interlibrary loan, in-person visits, and analog copying. The processes themselves are very costly, not just to the library or archive, but to society, in terms of what may be prevented from entering the scholarly record. Streamlined rights procedures will maximize use of increasingly limited staff and financial resources and will increase service to researchers. Developing a community of practice will establish a baseline that can be followed with some degree of confidence, improving visibility of and access to special collections. This activity will examine strategies for analyzing and developing acceptable risk behaviors and recommend practices for libraries and archives. We anticipate holding an invitational meeting, broadcast in real time, that collects imaginative thinking by experts from archives, special collections, time-based media and the law. Ricky Erway is in the lead on this project.
  • The loss of materials held in libraries and archives worldwide is a concern not only for owning institutions, but also for the international antiquarian book trade and global law enforcement. Centralized, highly visible exposure of "missing materials" is needed to help identify stolen materials, recover missing items and deter future crimes. In order to deter thieves, prevent inadvertent purchases and recover valuable stolen cultural materials, OCLC Research, the RLG Partnership, the RBMS Security Committee and the ABAA convened members of the cultural heritage collecting community to explore strategies for sharing reliable information about missing rare books and other materials. The goal was to surface current policies and procedures and discuss what's lacking in current practice for dissemination of information about missing materials. The WorldCat bibliographic database was suggested as the center for collecting and broadcasting this information. The group quickly agreed that widespread support and community participation will be essential to the success of such a program. In order to centralize information about stolen and missing rare books and special collections, this working group developed a procedure to “tag” records in The tagged records are then automatically fed to a blog, Simultaneously, holdings are set in WorldCat, in order to alert prospective buyers and sellers. Please check it out, and if you are interested in more details, contact my colleague Jennifer.
  • Most of us have a 1960s workflow in the reading room In the TETs – what are we not going to do anymore? Are we going to have staff stand at the copier? How can we re-use the copies we make? Are we gatekeepers for copyright law? “ If it makes me uncomfortable, we need to talk about it.” Outcomes: Best practices for copies and scans, whether made for readers or by readers (any interest?) – Working group forming – Jennifer in the lead Sharing Special Collections (ILL) – “Treasures on Trucks” (Dennis Massie in the lead) An event about rights (winter 2010) – collective move to liberal practices? – Ricky Erway in the lead – follow up to projects on large-scale digitization of special collections – focus on unpublished materials – what was ruled out for Digitization Matters and Shifting Gears lightening up assuming responsibility for articulating the law Google Books agreement working with campus counsel and risk (“do you know any archivists in jail?”) time-based media Steering group, scope the projects – Jen in lead with Dennis Massie (our resource-sharing expert) [we had to ditch the paper call slips question – too local and idiosyncratic]
  • Look at barriers to EAD implementation – working group underway led by Merrilee Proffitt Report forthcoming in fall 2009, titled: “Over. Under, Around and Through: Dealing with the Complexity of EAD” In a recent survey (by AT), 47% of institutions don’t do EAD. All sizes of institutions (even big universities), all types, including archives with IT and without.
  • Integrating Unique Materials into the Global Discovery Network

    1. 1. Merrilee Proffitt Senior Program Officer OCLC Research 23 September 2009 Integrating Unique Materials into the Global Discovery Network
    2. 2. The RLG Partnership <ul><li>Comes together to collaboratively design future services and programs </li></ul><ul><li>Focuses on increasing operational efficiencies , adapting services to new network flows </li></ul><ul><li>Its influence and impact are: </li></ul><ul><ul><li>… system-wide </li></ul></ul><ul><ul><li>… trans-national </li></ul></ul><ul><ul><li>… broadly applicable to libraries, archives & museums </li></ul></ul><ul><li>It is supported by OCLC Research </li></ul>
    3. 3. RLG Partnership: Diverse Perspectives About 130 partner institutions in North America, Europe, the Middle East and in the Asia-Pacific region. National libraries, universities, museums, and independent research libraries
    4. 4. Changed Circumstances Then: Users built workflow around libraries Now: Library must build services around user workflow Discovery happens elsewhere Disclosure
    5. 5. Motivations that focus our attention <ul><ul><li>Funding </li></ul></ul><ul><ul><ul><li>Council on Library and Information Resources (CLIR) </li></ul></ul></ul><ul><ul><ul><li>Mellon Foundation </li></ul></ul></ul><ul><ul><ul><li>National Historic Publications and Records Commission (NHPRC); National Endowment for the Humanities (NEH) </li></ul></ul></ul><ul><li>Reports, activities </li></ul><ul><ul><li>Library of Congress On the Record recommendations </li></ul></ul><ul><ul><li>ARL Special Collections Working Group </li></ul></ul><ul><ul><li>Greene/Meissner report (“More Product, Less Process”) </li></ul></ul>
    6. 6. Libraries will continue to provide direct access to physical materials but this will be very much focused on the special demands of their local constituencies. “Comprehensive” research collection building will be done by a very small number of institutions while special collections of the special or unique materials of research will be maintained and featured at many institutions. RLG Information Context, 2007 Heeding call to action… In five years…The only collection development activities involving librarians will be competition over special collections and archives . Taiga Provocative Statements, 2009
    7. 7. From the darkness to the light: effective &quot;disclosure” of archives and special collections <ul><li>Understanding the lay of the land </li></ul><ul><ul><li>Data analysis, synthesis, reports </li></ul></ul><ul><li>Making the most of what we already have </li></ul><ul><ul><li>New tricks for old data? </li></ul></ul><ul><li>Making smart and prudent decisions about the work ahead </li></ul><ul><ul><li>Working groups, events </li></ul></ul>
    8. 8. Understanding the Lay of the Land
    9. 9. Characterize the Current State of “Hidden Collections” <ul><li>Revisit and update the 2001 Association of Research Libraries survey of special collections libraries </li></ul><ul><ul><li>Expand survey population: ARL, IRLA, Oberlin group, CARL </li></ul></ul><ul><ul><li>Obtain logarithmic data on collection size, backlogs, online access, and others </li></ul></ul><ul><ul><li>Add questions to address new concerns: digitization, born-digital records, descriptive practice, archival management systems, and others </li></ul></ul>
    10. 10. Analyze Archival Descriptive Practice <ul><li>OCLC holds one million MARC records for archival materials; 48K EAD records; 29K HTML records </li></ul><ul><li>Evaluate actual practice by analyzing these records </li></ul><ul><li>Identify data patterns that reveal unnecessary effort, including those rarely used </li></ul><ul><li>Determine which data seems most useful for discovery </li></ul><ul><li>Recommend ways to increase relevance ranking </li></ul>
    11. 11. Analyze Discovery Environments to Optimize User Success
    12. 12. Analyze Discovery Environments to Optimize User Success
    13. 13. Analyze Discovery Environments to Optimize User Success <ul><li>Next steps…. </li></ul><ul><li>Collect logs of successful searches that lead to archival collections (“find logs”) </li></ul><ul><li>Compare and contrast with the results of data mining MARC records for archival materials in WorldCat </li></ul><ul><li>Combine analysis to make recommendations to optimize metadata creation for discovery </li></ul><ul><li>Are there possibilities for data remediation? </li></ul>
    14. 14. Making the most of what we have
    15. 15. <ul><li>If names are important … find ways to leverage uncontrolled names </li></ul><ul><li>If collection size is important … find ways to normalize extent statement syntax </li></ul><ul><li>If titles are important … make them more user friendly </li></ul><ul><li>If subjects are important … find ways to derive subject terms from narrative descriptions, consider enabling end-user tagging </li></ul><ul><li>Consider relationship between MARC record and finding aid to reduce data duplication, increase efficiency </li></ul>Possible outcomes of data analysis/user studies/log analysis
    16. 16. “Offline” descriptions <ul><li>Project to convert paper-based collection descriptions to electronic form to enable network level discovery </li></ul><ul><li>Capitalize on previous intellectual effort </li></ul><ul><li>Develop and test minimal cost structures </li></ul><ul><li>Not a “best”solution, but a good enough solution </li></ul>
    17. 17. Moving ahead…
    18. 18. Develop Model Methodology for Archival Collections Assessment <ul><li>We need to know what we have in order to “expose” what is “hidden” </li></ul><ul><li>… and to be able to plan for preservation, digitization, space management… </li></ul><ul><li>Lack of standard methodology for surveying repository holdings inhibits this </li></ul><ul><li>Sample tools being examined: PACSCL (Philadelphia consortium), Logjam project </li></ul>
    19. 19. Sharing Special Collections <ul><li>Working group: ILL practitioners and curators </li></ul><ul><li>Will develop recommended best practices for streamlining the handling of ILL requests for special collections materials </li></ul><ul><li>Document what is necessary to build sufficient trust between two institutions interested in the physical loan of special collections items </li></ul><ul><li>We need you! </li></ul>
    20. 20. Streamlining Photography and Scanning in Special Collections <ul><li>Working group (we need you!) </li></ul><ul><li>Addressing workflow and policy issues arising from digitizing materials from special collections. </li></ul><ul><ul><li>Scan-on-demand workflow </li></ul></ul><ul><ul><li>integration of patron-initiated scans with large-scale digitization / digital library workflow </li></ul></ul><ul><ul><li>recommendations for minimum levels of scanning and metadata </li></ul></ul><ul><ul><li>policies for hand-held cameras in reading rooms </li></ul></ul>
    21. 21. Balance in Rights Management <ul><li>Rights issues are a serious barrier to risk-averse institutions digitizing unpublished materials </li></ul><ul><li>How to encourage institutions to take modest, appropriate risks in order to facilitate greater access to collections? </li></ul><ul><li>Invitational meeting (Spring 2010?), webcast </li></ul><ul><li>Working group formed to help shape event, devise strawman document </li></ul>
    22. 22. Missing Materials <ul><li>Lost of materials a broadly recognized problem with many stakeholders </li></ul><ul><li>Working group established a set of procedures to “tag” records in to trigger listing </li></ul><ul><li> </li></ul><ul><li>Easy (and free) to participate </li></ul><ul><li>In order to work, institutions must be willing to disclose information about theft </li></ul>
    23. 23. Now it’s time for you to shed some light <ul><li>Questions and answers? </li></ul><ul><li>Comments? </li></ul><ul><li>Your thoughts? </li></ul><ul><li>Thank you! </li></ul><ul><li>[email_address] </li></ul>
    24. 24. Evaluate Current Delivery Practices <ul><li>Streamline Photography and Scanning </li></ul><ul><ul><li>cameras in the reading room? </li></ul></ul><ul><ul><li>scan-on-demand? </li></ul></ul><ul><ul><li>images flow directly into the digital library? </li></ul></ul><ul><li>Share Special Collections </li></ul><ul><ul><li>“ Treasures on Trucks” web seminar May 28 </li></ul></ul><ul><li>Introduce Balance in Rights Management </li></ul><ul><li>Tapping Users' Expertise </li></ul><ul><ul><li>Sharing and Aggregating Social Metadata </li></ul></ul>
    25. 25. Identify Barriers to EAD Creation <ul><li>Choices, choices, choices! </li></ul><ul><ul><li>Archival Management Software: A Report for the Council on Library and Information Resources (Lisa Spiro, CLIR, 2009) </li></ul></ul><ul><li>What are the chief barriers to EAD implementation? </li></ul><ul><ul><li>Limited time & human resources </li></ul></ul><ul><ul><li>Not enough support at higher levels </li></ul></ul><ul><ul><li>I can make finding aids, but how to publish them? </li></ul></ul><ul><ul><li>Lack of technological expertise/support </li></ul></ul><ul><ul><li>Data/system migration issues </li></ul></ul><ul><ul><li>Help! I don’t even know where to begin! </li></ul></ul><ul><li>Have we sufficiently articulated the value of using EAD to underpin our archival discovery systems? </li></ul>