Reuse of Digital Heritage in the Library,
Today and Tomorrow. The Case of the KB.
Lotte Wilms, 08 November 2016
KB in a nutshell
Founded
1798
7 million items = 115 km of
library materials
10,800 current periodicals;
500 licensed databases and e-
journals;
In 2015 the collection grew by:
49,000 books;
42,000 issues of periodicals;
6.5 million digital items;
2,700 e-books;
3,800 websites.
Mission
• The KB brings people and information together;
• The KB makes the library collection of the
Netherlands visible, preservable and usable;
• The KB holds a central position in the library network;
• The KB helps people to become more skilled, smart and
creative.
versus
00100
11001
00011
00111
What happens
with our digital
content?
How do people
use our digital
content?
What role do we
play within the
reuse
community?
How do we keep
our digital
content
accessible?
Scholarly community – Digital Humanities
Scholarly community – Computer scientists
Commercial reuse
Non-commercial reuse
Dataservices &
KB ResearchLab
Delivery of content
Name and/or date
Can we have your
data?
Thank you!Sure! It is available
for research!
The power of our network
We’re not alone!
How do people
use our data?What types of
problems do they
run into?
How can we
shape our
services for
them?
What do
researchers
actually think of
us?!
? ?
You and your collection are part of a bigger whole
Be transparent about your digitisation process. Document
what you do and why
Use standards to facilitate the combination of existing
collections
Be persistent in what you do and offer persistent access to
your data
Work together with your users to ensure a good fit of your
services and to learn from them
Join networks to learn and share about your work and to
promote the use of your collection
Thank you. Any questions?
lotte.wilms@kb.nl
@lottewilms

Reuse of Digital Heritage in the Library, Today and Tomorrow. The Case of the KB.

Editor's Notes

  • #3 The KB, where I work, was founded in 1798. Currently, we hold about 7 million items, which is about 115 kilometers of books, newspapers, journals and manuscripts. Our collection grows each year, with last year almost 50.000 books and 42.000 periodicals.
  • #4 A library such as ours has the task to collect and preserve the written heritage of the Netherlands. We ensure that our storage rooms, which take up 75% of our building, are up to the highest preservation standards and that our material is in the best possible place for storage. The air is not too humid, the temperature is just right and the precious or fragile material is kept in acid free boxes. We have curators that know what is in each collection and research these works and are responsible for acquiring more that fits the profile of their collection. Everything is catalogued to the smallest detail to make sure the audience has the best access to the material. We know what we have, where it is and how we can keep it safe for the future. This is our core business, this is what we do and what we’re good at.
  • #5 But next to our physical collection, we also pride ourselves in a rich digital collection. Over the past 15 years or so, we have digitised 320.000 books, 11 million newspaper pages, 1,5 million journal pages, 1,5 million radio bulletins, almost 500.000 parliamentary papers, almost 20.000 images and five thesauri with all information about our collection, its authors and subjects. Next to this, we host the Digital Library of Dutch Literature which contains 13.000 titles in manually produced full text, 14.000 scanned titles and 2500 eBooks. Together, this results in a very rich digital collection with documents ranging from the Middle Ages until the present time. Much of this data is available for reuse without any restrictions and for that material that is in copyright we have negotiated the rights for scientific use.
  • #6 But this is new to us. We have been collecting and keeping physical material since 1798, but have only been keeping a digital collection for 15 years now. How do we, as a library, ensure that that digitised content is just as accessible for our users as our physical content?
  • #7 What actually happens with this digital content? How are people using it and what role do we as a library play in this? These are questions that we are faced with while making our data available for reuse. During my talk I will give you some insights into how the KB deals with reuse, but also what we deem important for enabling reuse for the future.
  • #8 So, what types of reuse do we encounter?   On the one hand there is the scholarly community that uses our data to answer research questions such as ‘How is the concept Europe depicted in newspapers?’, which is being asked by our current KB Fellow dr. Joris van Eijnatten. Each year, we invite a renowned researcher to join the library for a period of 6 months for a research sabbatical. He or she uses our digital collection to answer a research question, using digital methods. The tools produced for this research are made available for other researchers.
  • #10 But not only humanities scholars find our content interesting, computer scientists also see the great potential of our collection. The Center for Math and Informatics are for example working with the search logs that are generated by the users of our online platform delpher.nl. Here, people can access the bulk of our collection by keyword searching and these keywords and the subsequent digital movement on the website give a very good insight into the activities of users of search systems. From what we’ve gathered, no such data collection is available for researchers in this community and our anonimised logs have shown to be a beautiful source for this research group.
  • #11 On the other hand, there is commercial reuse. If our content is out of copyright or can be made available under a CC0 license, anyone can use the data to build a service or create a product. This has happened for example with the parliamentary papers that we have digitised. All reports from our government from 1814 until 1995 have been collected on one website, but much more legal information is available on various other websites and in books. The company Legal Intelligence has taken all this information and has combined it in their legal search engine which is available through a subscription.
  • #12 And then finally, we also have non-commercial reuse of which Wikipedia is a fantastic example. We currently have provided 3915 images to Wikipedia for use with their articles. All these images are either public domain or CC0 and can thus also be incorporated in other websites or services. Many of these images have been added to Wikipedia during the stay of our Wikipedian-in-residence, who spent 9 months enriching Wikipedia with content from our collection.
  • #13 And how do we then fit into these activities? Because, yes, we have the collection which these people use, but we see ourselves not only as a data supplier. We want to offer the whole package and offer support and facilitate research, something that is natural to a library. The KB finds this particularly important, although the user group is relatively small. In our strategy for 2015-2018 already we have indicated the Digital Humanities as one of the focus areas. To achieve this, the KB has a Digital Humanities team, which consists of a Digital Collections curator, two Digital Scholarship advisors and two Research Programmers. Together we work on projects related to all types of reuse, but also do research on how we can promote the use of our (digital) collection and how we encourage others within our community of libraries to engage in reuse and engage with the academic community in their countries.
  • #14 As I said, working with this digital content is relatively new to us. A number of years ago, people who found out about a digitised collection and wanted to use it for any type of work, contacted us to get a copy of the data. They then came to us with a hard drive onto which we copied our data and that was that. We mostly never heard what happened to it and we didn’t have the network in place to follow up on any work they did. We provided access to the digitised collections via several websites, which all had different teams working on them and different contact persons to provide information about it. Researchers would have to talk to at least four people to get our digitised newspapers, parliamentary papers, books and the attached catalogue. Each of these people would then have to find out if they could provide the data that was requested and how they would have to give access. I think you know where I’m going with this; this is not sustainable. This costs too much time and asks the same questions about different material types. So, how can reuse be part of your future? What is important for an institution to facilitate reuse, to keep it sustainable and to promote use of your digital collection?
  • #15 As often in professional relationships, your network is key. The KB has indicated this in its strategic plan as “The power of our network”. Networks are powerful and I believe your network on several levels can play a crucial role in the reuse of your digital collection and the sustainability of this.
  • #16 For the physical part of our collection, it is important that we have everything in order in our own building. That what we collect is what needs to be collected for our community and that we keep it safe for the future. However, when looking at the digital content, we are not alone anymore. The digital world provides so many opportunities of working together and not only for us, but our users see this as well. It has become so easy to combine digital material into a set that meets exactly your needs. We should thus not only make decisions for our digital collections based on what we feel is right, but see ourselves as part of a whole.
  • #17 A great example of this is the IIIF network or the International Image Interoperability Framework which makes it possible that manuscript pages that have been scattered all over the world can now be seen in correlation with their siblings who are in different libraries. But which steps do you follow to reach a state where you content is not only digital, but also ideally set up to enable reuse on a larger scale?
  • #18 If your organisation is still working on digitisation or wants to improve the digital content you already have, it can be very valuable to join a consortium such as the IMPACT Centre of Competence. Here, you can be part of a community looking at digitisation of historical material from a broader perspective. Not only libraries share their experiences here, but you can also connect with service providers, technical experts and find information about all areas of digitisation. Ask questions, learn what is possible and what is the best fit for your material.
  • #19 Once you have a digital collection, it is very important to be aware of who your users are and what they exactly want and need. You can do this by sending out surveys, having an inbox for general questions or webcare via Twitter or other social media, but I think that this only provides you with the information that you find relevant at that time and that a lot of questions that you don’t ask do not get answered, although they can be crucial for your user community. For example, the OCR of our newspapers is not of the highest quality. This has several reasons that have to do with the quality of the source material. However, during a conference which we attended where work was presented that used our newspapers, we learned that the OCR of the 1920s was actually better to work with than the OCR of the 1930s, something we would not expect. The researchers told us that they thus targeted different areas in our digital collections to ensure the best results. This shows that being involved in your user community gives you insights into the usage of your data, but also information about the data itself. Consequently, we can inform interested parties about the quality difference between the two decades, which might cause them to change a research question to ensure more reliable results.
  • #20 An even better way of finding out what users do, need and want is actually working with them. If you run a project together, you learn a great deal about the pitfalls of your collection, your communication and the reputation of your organisation. We asked ourselves; “How do people use our data?”, “What types of problems do they run into?”, “How can we shape our services in such a way that it is most helpful to the researchers?”, but also “How do the researchers view us as an institution?”, “Are we the research partner that we think ourselves to be and if not, how can we change this?”. The KB has worked together with researchers in small projects for several years now, but since 2014 we have a Researcher-in-Residence program. This program is set up specifically to answer questions that we had in regards to the reuse of our collection in scholarly communities.
  • #21 After 3 years and 7 projects that last 6 months each, we have learned invaluable lessons that we wouldn’t have if we would have engaged with our user community. We know that people are interested in more information that we could have thought, such as the search logs I mentioned earlier. We know that researchers want to know exactly what is available in a specific data collection and that it is very important that we are transparent about how this collection is formed. We even learned that the facet ‘Image’ used on our online portal, Delpher.nl, does not show all images in the newspaper collection, but only those that are published as a separate article. This sounds very basic, but if the KB DH team doesn’t even know this, how could the user? When a user find something like this, it hurts our reputation as a reliable digitiser, because something that was decided in the digitisation process for practical reasons turns out to be a barrier for the correct use of our collection. This shows that to enable reuse in a scholarly way, transparency is very important.
  • #22 Don’t be the black box where physical items go in and digital items come out without any information about what happens in between. Documentation is key and if you have documented your work, decisions and actions properly, it not only provides an invaluable resource for the academic community, your organisation is also not at risk of losing key people with crucial information. Being part of a network of your users is very important to be able to understand the usage of your data, but it can also be very rewarding to work together with colleagues; other libraries or cultural institutions that are in a similar situation. You don’t have to do everything yourself and working together with partners can lessen your burden when thinking about strategic issues that are relevant for the whole community and provide insights into the working of other organisations, but also countries.
  • #23 For example, the KB is a forerunner on the area of Digital Humanities in Europe. We have been working on DH for more than 3 years now and have quite a number or services available. However, we have not done this alone, but found inspiration and a discussion partner with other national libraries such as the British Library. We wanted to formalize this arrangement and also provide others with the same opportunities we had. We have therefore set up a working group within the organisation for European research libraries, LIBER. Here, we can share experiences regarding reuse, discuss what DH entails for a library organisation, learn how other solve issues, provide services, engage with their community, but also to encourage others to take up the gauntlet and start sharing. It is a platform where we can learn, inform, share, promote and lobby.
  • #24 And as part of this international network, we also get access to an even larger audience of researchers by working together with European infrastructures such as DARIAH and CLARIN. These communities represent a vast amount of European researchers, all with a digital interest and a potential user of your collection. And so much is happening in Europe. By joining the European community you not only open up your collection to new users, but you also open up your library to interesting projects with possibly European funding.
  • #25 If you don’t know where to start, Europeana is a great first step. This is the European aggregator of digital cultural heritage. Here, you can showcase your collection and reach a big audience by opening up your data via their services. Europeana has a great PR team and provide many activities surrounding the collections they make available to the European community. Joining them means your collection will become part of these activities and gives you a very valuable way into the European network of cultural heritage institutions that can all be a potential partner to work together with and learn from. You also become part of a strong lobby with regards to reuse, as can be seen here; http://jamdots.nl/view/204/Europeana---Recommendations-for-Research?divFallback=1. And do not only think of other libraries. In Europeana, you can find museums, galleries, archives and research projects.
  • #26 GLAMs, or galleries, libraries, archives and musea, often have similar questions when talking about reuse. In the Netherlands we are trying to answer these questions together. Your network is key when deciding what and how to digitise, how to make this digital collection available and to join the discussion on the restrictions regarding data sets. But also, when looking at the future and providing a sustainable digital collection that will enable reuse for the years to come, being part of network allows you to look for answers together and to ensure that the solutions you implement match those of your national colleagues. This does not only give you a network where you can work together, but also a network where your digital collections can work together.
  • #27  In the Netherlands, this network is called National Digital Heritage or NDE and consists of five national hubs; the National Library, the Netherlands Institute for Sound and Vision, the Cultural Heritage Agency, the Royal Dutch Academy of Sciences and the National Archives. And as a sixth hub, twelve ICT companies have joined forces to make up the Heritage Software Suppliers and provide a Creative Industry viewpoint in the national strategy. Together, we work on three levels; making cultural heritage visible, usable and sustainable. Within these layers we can think about the future of our respective collections, but also how they should interact. Together, we provide the bulk of the Dutch digital collection and much reuse of these sets combines parts of the various institutions, something which will only grow in the future. The digital revolution in the humanities now makes it possible to ask much larger questions to much larger datasets and making sure your collection can be part of the answer to such questions not only increases the use of your data, but also enriches your collection with relevant information from other institutions. So what are some of the things that are happening in this NDE setup?
  • #28 As I mentioned before, knowing what your user needs is highly important. This allows you to set up your structure and services in such a way that this meets their demands but also ensures you have the ideal setup for reuse. So, for this, the layer Visible has instigated discussion groups per sector to talk about the research that people are currently doing and where they run into any issues.   Another example of what the Visible-crew is doing is one of the projects that came out of a recent call for proposals where some money was made available for the innovative reuse of cultural heritage material. This project is now working on a Chrome add-on that shows you a random image of the Europeana collection of Dutch institutions when you open a new browser tab. You can click on the image to see more information about it and then follow it to the website of the cultural heritage institution it is hosted at.   Next to showcasing the data, making sure the institutions are up to speed with the latest developments is also very important. The Open Culture Data organisation is therefore working on building compentence within the CH community by organizing Open Culture Data Labs where CH professionals can learn more about policy making with regards to cultural data, but also the valorization of the subsequent policy.
  • #29 The second layer is the usable-layer. These people are concerned with how the digital sets can be combined into working together. They are working on several white papers about for example linked data, but are also looking at existing thesauri and how they can provide extra information for relevant collections. Next to this, they are responsible for the Digital Collections project where Dutch digital heritage is made available on a searchable platform by aggregating existing content.
  • #30 And finally, the last layer is one of the most important ones, namely the sustainable layer. Here, questions are asked about how to keep the digital collection available for our users on the long term. Here, you can think about things such as which standards should we use, how can we implement persistent identifiers in our digital collection, but also ‘who collects what?’. They are working on a sustainability policy that provides building blocks for institutions who want to introduce sustainability in their organisation.  
  • #31 Within these three layers, the whole process of digitizing and giving access to digital material is addressed and with each step, it is important to keep the future in mind when making a decision. Libraries and other cultural heritage institutions are seen as the reliable party for the safekeeping of books, paintings, archives and other heritage works and we pride ourselves in having the best possible environments for these physical works. Let’s take our role as digitizer of this collection just as seriously and be as reliable for the safekeeping of our digital collection as we are of our physical. Keep in mind that your digital collection is part of a bigger whole. See yourself as part of a bigger whole and open up not only your data, but also opportunities for reuse, research projects, funding and a whole new user community that you never even thought of. So what is most important when thinking about reuse?