Your SlideShare is downloading. ×
Calhoun future of metadata japanese librarians4
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Calhoun future of metadata japanese librarians4


Published on

Reports on the future of metadata in academic libraries and national research information infrastructures. A shorter version of this presentation was given at a September 8 post-conference of the OCLC …

Reports on the future of metadata in academic libraries and national research information infrastructures. A shorter version of this presentation was given at a September 8 post-conference of the OCLC Asia Pacific Regional Conference, Sept. 6-6, 2010, at Waseda University.

Published in: Education

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • This presentation begins with an overview of changes in scholarship and in the use of scholarly informationI will also speak about the changing expectations of today’s studentsWith changes in research, teaching, and learning comes the need for change in how libraries serve their communities
  • These changes in the communities that libraries serve also drive significant change in cataloging and metadata work.In libraries, cataloging and metadata have described library collections.To understand the future of library cataloging and metadata, we need to first understand how the collections are changing.In this part of the presentation I will speak about the use of the collections housed in library buildings—the traditional library collectionsI will also speak about online articles and journals,the library’s special collections, especially those that have been digitized,And about open access repositories, such as the 129 Japanese institutional repositories indexed in JAIRO, an initiative launched by the NII in April of 2009
  • This slide shows a quote from a report that I was asked to prepare for the Library of Congress in 2006 about the future of the online catalog. In the course of the study I found that the collections of interest to our communities are greater in scope than what we have traditionally made available in our library buildings, or even what we have licensed for online use. Research now depends on digitized text and images, data sets, e-print services, archival materials and learning objects, and more.Some libraries, especially national libraries and national-level information institutes, are beginning to gather and build these new kinds of collections.
  • This is a chart showing the kinds of materials that are described by the catalog records in WorldCat over time. Descriptions of books consistently make up 84% of the database. For now, ebooks are a very small proportion of this 84%.WorldCat is an indicator of the makeup of many individual library catalogs.
  • In the United States, research libraries have responded to this trend by spending more of the materials budget on e-resources.The 2008 statistical survey of libraries belonging to the Association of Research Libraries, or ARL, reported that these libraries are now spending an average of 51% of their collections budget on e-resources.
  • When I say “digital library collections,” what do I mean? This slide is my attempt to illustrate what I mean when I talk about digital collections and the categories they fall into.Some of the collections are digitized—that is, they began as physical objects of some kind—books, photographs, graphical images.Digital libraries are also about materials that may be derived from others by digitization, but that may also be originally created in a digital form—that is, “born digital” --like digital sound files or moving images, or digital texts, or Web sites.
  • This beautiful image, digitized as part of a project at the National Diet Library, is an example of what I mean by “digitized” material.
  • There are indications that digitized collections are attracting a great deal of attention. This is a chart from Alexa. com, a Web traffic analysis service, showing Web traffic to the sites of the national library of France,, and the Library of Congress, Alexa provides data about where users go once they are on a site. In the case of those who visit, 30% visit the expositions pages—a virtual gallery of curated exhibits around the collections. More than 50% of the traffic is split between the BnF library catalog and Gallica—the digital library of France. Over 40% of the visitors to the Library of Congress web site go to American Memory, which LC describes it as a digital record of American history and creativity.
  • There is a good deal of support in Japan for searching for articles in academic journals and the National Diet Library’s Japanese Periodicals database.CiNii (pronounced “sigh-knee” I’m told) provides access to 12 million articles. Of these, 3.2 million articles are freely available for download.This article about Ukiyoe paintings, in the Architecture Institute of Japan’s Journal of Architecture and Planning, is one of those articles.
  • Increasingly, such openly available Japanese online journals are available for searching and access from, as shown here. The URL links the searcher to the full text of the journal.
  • … as shown here.From this point, the searcher can query the journal for keywords in the title of the article on moon landscapes in Ukiyoe paintings.In this case, WorldCat is linking to the full text of the journal in the J-Stage service.
  • This sort of linking relies on a knowledge base.A knowledge base stores metadata about e-journals and e-books that is needed to direct users to the places where they can access online full text content.
  • The typical library environment for knowledge bases is very complex.In general, there are separate knowledge bases for each e-resource service in the library—one supporting federated search, another supporting the link resolver, another the e-resource management system, another the e-resource A to Z list. It is a labor-intensive and error-prone business to maintain all these knowledge bases.
  • For the past few years, OCLC has been developing a very large knowledge base.This year, OCLC is making a large investment in a WorldCat knowledge base “in the cloud” that should make it possible for members to use the same knowledge base to support multiple e-resource services. Soon, OCLC will be encouraging its members to register their electronic holdings of ebooks and ejournals in the knowledge base.Registering e-holdings in the knowledge base will allow members to take advantage of the new knowledge base services that OCLC is building to help both librarians and end-users have a better experience managing and using e-resources.
  • There is already some coverage of Japanese e-journals in the WorldCat knowledge base.This table shows that the knowledge base covers about 8,500 unique e-journal titles from Japanese providers like the NII, J-Stage, and others.
  • I’d like to move now to another increasingly important means for supporting scholarly communications, and that is institutional repositories.The growth of institutional repositories is strong in Japan.The repository shown here is aggregated in the JAIRO service, launched by NII in 2009, that I mentioned earlier. This repository is from Kyoto University. It is ranked #38 in the top 400 open access repositories in the world by the Cybermetrics lab, based on factors like traffic and visibility.
  • Open access repositories, especially discipline-based ones, are also gaining in visibility and impact on scholarship and the dissemination of new research findings. This chart, also from, tracks traffic in 2009 and 2010 to three of the top open access repositories, as ranked by the Cybermetrics Lab. The chart compares traffic the Kyoto University repository I just mentioned (the blue line) to, a discipline-based repository for physics and computer science, and another discipline-based repository, the Social Science Research Network.
  • Before leaving the subject of open access repositories, I would like to discuss another innovation in WorldCat, called OAIster.About eight Japanese institutional repositories are indexed in OAIster.These two screen shots show an example of an article harvested from the Waseda University institutional repository into WorldCat.
  • Until about a year ago, OAIster was hosted by the University of Michigan, when the University of Michigan asked OCLC to take over responsibility for supporting OAIster.It is an aggregation, or union catalog, of articles and other content harvested from a variety of open access collections, both digitized and born digital.OAIster contains 25 million records from 1100 contributing institutions. All of these are now available from, OCLC’s freely available discovery service on the Web.As I mentioned, there are at least 8 contributing institutions from Japan.OCLC is broadening and improving the OAIster harvesting service to attract more contribution. The revised service is called the Digital Collections Gateway.
  • The Digital Collections Gateway is compatible with all OAI-compliant repositories and it is freely available for use by both members and non-members of OCLC.
  • I think you will agree with me, there are many new types of collections beyond printed books and serials that are of interest to the communities that libraries serve.There is no way that all the metadata for these new types of collections can be produced by traditional cataloging methods.We are entering a new era in library practice.In this new era, metadata necessarily comes from many, many sources, not only catalogers.
  • This slide shares my thoughts about where metadata of comes from and will increasingly come from in the future. In this talk, I’ve provided or will provide examples of metadata from communities that professionally produce metadata, like librarians, but also publishers and indexers. We are beginning to see in addition a good deal of author and/or user contributed metadata. Some call this “crowd sourcing” of metadata.On top of that there is metadata being produced through large scale metadata mining and manipulation. Inside the library community, aggregations like WorldCat, for example to produce FRBR work sets and other new services like WorldCat Identifies, are good examples. I will talk more about WorldCat Identities in a moment.
  • I’d now like to talk about going beyond bibliographic description, beyond the MARC record.I’d like to talk about a data mining approach to using bibliographic and authority metadata to create new metadata to describe people.In this section, I’ll be using as an example the Nobel prize winning Japanese author, Kenzaburo Oe.
  • This is Oe’s entry in the Virtual International Authority File, or VIAF, to which the NII has contributed authority data. VIAF is hosted and supported by OCLC. It provides the data mining and programming and the participants s supply the authority datas.Often, participants also supply related bibliographic data that enriches the VIAF records.More than 15 organizations around the world contribute to VIAF.This slide shows that the preferred form of Oe’s name the same (with dates) for National Libraries of Israel (Latin character set) and Czech Republic, LC/NACO, National Libraries of Germany, Australia, Sweden, Poland, France, and Spain. The NII provides the kanji form as the preferred form in Japan; the Biblioteca Alexandrina in Egypt provides the Arabic script form, and the National Library of Israel provides the Hebrew script form.
  • This graphic shows the matches between the NII authority record with those provided by the other 13 sources of authority records for Ōe.The goal of this project is to facilitate research across languages anywhere in the world by making authorities truly international.OCLC is conducting this research because we have proven software for matching and linking authority records for personal names in different languages.
  • This is the list of 27 alternate forms of name from the 14 authority record sources – the NII authority file provided 12 of these forms (highlighted in yellow), including the Russian and Korean forms not represented in the other source files.Once the existing authority records are linked, users will be able to see names displayed in the most appropriate language. For example, German users will be able to see a name displayed in the form established by the dnb, and American users will view the name as established by LC. Users in their respective countries will be able to view name records as established by the other nations, thus making the authorities truly international and facilitating research across languages anywhere in the world.
  • VIAF is a project of the OCLC office of research. WorldCat Identities is a production service of OCLC and is available from WorldCat.This slide and the next two show the WorldCat identities entry for Oe.WorldCat Identities mines data from both bibliographic and authority records to create a view of a person who is the creator or subject of works described in WorldCat.The publication timeline shown here indicates the dates of works both by and about Oe. This information is not compiled manually but is automatically harvested and created by the WorldCat Identities software.
  • This part of Oe’s WorldCat Identities page shows works by and about him.
  • And this last section shows a tag cloud not created by users, but automatically compiled by extracting facets and frequency counts from assigned subject headings in the bibliographic records.
  • By now I expect I may have convinced some of you that the library metadata of the future comes from a lot of places, and that there is flood of it.
  • Here is some advice for coping with this sea of metadata from different places.First, commit to collective action. Our traditional approaches to creating and managing library metadata for physical collections do not scale to what libraries need to do to serve researchers and students today.Many libraries are succeeding by cooperating at local, regional, national, and global levels.
  • I’d like to finish up by talking more about the value of cooperation in creating and managing metadata in the new era.
  • All over the world, libraries are working to capture as much attention on the Web outside their own system as they can. I’ve tried to illustrate this visually for the NLZ. The NLZ pushed a large digitized photo collection out on the Flickr Commons; they push their content out into the NZ library catalogue, and so on.I’ve called this “outward integration” of the collections into the Web—Collections data is synchronized with other aggregations and syndicated in other Web environments.As you know, one of the places that the NLZ is outwardly integrating its collections is—and thus their collections show up in places like Google Book Search as well.
  • Here is one example that I’m familiar with that uses tools and standards to tie inependent aggregator systems together .CBS systems aggregate bibliographic data and holdings in union catalogs and resource sharing services. Liibraries Australia for example uses a CBS system. Australian libraries aggregate up to the union catalog on CBS and through SRU update, The Australian union catalog is within 5 seconds of synchronization with WorldCat. Click. Later this fiscal year we will add SRU data exchange back to Australia so that record contributions and enhancements made in WorldCat can be sent automatically.
  • Allow me to provide an example of how this works with WorldCat. Here is the Google book search entry for Oe’s most widely held book, A Personal Matter.OCLC has an agreement with Google to display a “find in a library” link on the Google Book Search pages.Imagine I am a English-speaking student and I have started my search on Google Book Search.If I click the “find in a library” link…
  • I am taken through, OCLC’s freely available discovery service, to a list of libraries that own this book and that are close to me.I have borrowing rights at the Worthington Public Library, so I click on that link …That link takes me into the catalog at the Worthington library, where I can request that the title be held for me to pick up.In this way, WorldCat is acting as a kind of giant switch, allowing searchers to begin on the big web sites, but pulling them into local library services, whereever the searcher happens to be in the world.
  • This “switching” service relies on having information about what libraries hold what titles, or provide access to what titles, around the world.OCLC has been working with national libraries for many years, but cooperation with national libraries has grown quickly over the last three years. Since 2007, an additional 13 national libraries around the world have seen value for their citizens, students, and scholars in making their collections more visible by sending their bibliographic data to OCLC for loading into the database.
  • Because of all that loading, the WorldCat database now provides a much better multilingual foundation that previously, when WorldCatalog was a union catalog of mainly US library collections.Today, more than half of the materials described in WorldCat are for materials in languages other than English—the orange part of this pie chart.
  • As of the end of last week, over 2.5 million of these records described materials published in Japan—the green bars on this chart.The orange bars show, of the total number of records, how many contain vernacular script.Most of the records are for Japanese books, but some records describe other types of materials—the smaller green and orange bars on the right.
  • The statistics I have just shown you do NOT count the records from the National Diet Library. This slide describes a few facts about the project that is in progress to load the approximately 4.2 million NDL bibliographic records into WorldCat.Because the OCLC team thinks that at least half of the records describe titles that are new to WorldCat, we can predict that the holdings of Japanese language publications in WorldCat will be doubled by the NDL loading project.
  • Here is what one of the test records from NDL looks like. I am told that it describes a directory of Japanese dealers of religious ceremonial furniture and altar fittings. The project has many steps and requires good teamwork, but it is going well.OCLC is honored to have the opportunity to reveal the riches of NDL’s collections to the many people in the world who are interested in Japan.
  • I’d like to finish up by talking more about the value of cooperation in creating and managing metadata in the new era.
  • I could not resist closing with this picture of the Waseda University bear mascot. Thanks again to Waseda and Kinokuniya for hosting this event.