Imagine you are interested in the Dutch biologist Hugo de Vries and his teaching methods. You search for his name in the portal, and several hits from different Dutch universities are returned: among them a rough description of his archive, a collection of his educational botanical wall charts, the Dutch translation of Darwin’s book the Origin of Species, a painted portrait of Hugo de Vries, letters written to him by a colleague in Tokyo and a collection of papers relating to the Botanical Garden in Amsterdam. If the various collection managers have registered and uploaded their materials, these treasures will appear on your screen and give you an insight in the person Hugo de Vries, his times, and the context in which he worked. This is our project in a nutshell.
The ultimate goal of our project was to make the academic collections of the Dutch universities visible and to present them in their historic and cultural context in a collaborative portal of the Dutch universities. In fact, as a development project we focused on what it takes of the stakeholders to realize this. For this project, the contribution of the partners was essential. Research has shown that expertise and especially technical knowledge are the most important factors to participate in a portal project. Therefore, in order to attract participants, a project has to offer expertise, resources or a combination of both. In our view, this expertise and resources has to be offered in two ways; on the one hand top down by making technical decisions on behalf of the partner institutions and by taking tasks out of their hands. And on the other hand bottom up , by offering hands-on help, consultancy and training.
To shortly refresh your mind …, we try to reach that goal by: Cross sectoral cooperation: 2 museums and 3 libraries. In fact we deal with a so-called LAM project – library, archive, museum project, since we will incorporate museum and library material as well as complete archives. Making collection and archival descriptions (context) in EAD format (an xml standard for archival descriptions) Using of international metadata standards in order to facilitate international exchange of content (Europeana, ArchiveGrid). Europeana for item descriptions and images and ArchiveGrid, OCLC’s database for collection descriptions, for the abstracts of the collection descriptions. Building an “easy” infrastructure – easy to use for collection managers, easy to maintain. The system will be built by the Digital Production Centre of Amsterdam University Library which will also host the aggregation in the future
Here you see the flow chart which we gave you also as a hand out for your reference. It illustrates how the project is structured. The Digital Production Centre (DPC) uses open international standards for its infrastructure.
Here a most symbolic representation of our project. It has not been easy ánd we do enjoy it! The project is all about Balancing, Timing, Leverage and flexible Strategy:
Strong commitment, but difference in interests and priorities: Difference in focus, benefit / profit, starting point, levels of knowledge and resources within each individual organisation and amongst the partner institutions themselves. The uniting factor being the funding and the common goal is of increasing our visibility. But off course opinions diverge how to reach that goal ☺. An “easy” to use infrastructure is needed to stir collection keepers to upload to the portal. Instant visibility of their uploads and an attractive and good functioning website are other prerequisites Such a website presupposes at least a part-time webmaster as well as sufficient and regularly uploaded content Next to an “easy” to use, an “easy” to maintain infrastructure is needed to cut the costs and keep up the motivation
Another interesting challenge is the harvesting. At the outset it was decided that the portal would deliver its content to Europeana. This is already possible since the Rijksdienst Cultureel Erfgoed (RCE), the Dutch national aggregator for museum material harvested our images and metadata of the items via the OAI-MPH protocol. For this process the RCE used a harvesting tool and an astounding mapping tool for Europeana (http://vimeo.com/19291418). Both tools are open source. Herewith the technical part of the harvesting has been done. The signing of the contract between the Dutch Heritage Foundation, which commissioned our project, with Europeana and with the RCE will take place at the end of the project. Our partner institutions do need more information about the new Europeana contract. Since Europeana promotes a new data agreement under which metadata delivered to Europeana will be released under a Creative Commons 0 Public Domain Dedication. Herewith Europeana drops the non-commercial clause from its former agreement to stimulate open metadata re-use. So far for the harvesting of the portal. The harvesting of the partner institutions however, is not part of the project. The portal now has to be filled with periodic exports of metadata and images by the partner institutions. We hope that in the future we can arrange this by a harvesting process to facilitate the process.
How to realize the transition of a project to a well-oiled service? Organisation Portal owner – SAE, the Academic Heritage Foundation Webmaster - coordinator of the Academic Heritage Foundation Digital Production Centre of the Amsterdam University Library will host and maintain the aggregation Contract SAE and DPC Financial expectations SAE pays a yearly amount to DPC as hosting service for regular uploads and maintenance of the infrastructure The webmaster is paid by the SAE Precondition: the SAE money involved should justify the portal and show the benefits to all SAE partner institutions No money is yet involved in the harvesting of the aggregation by the RCE for Europeana Operational processes – will be discussed by Saskia, who also will deal in depth with the METADATA challenge how to manuals personal / organisational guidance and incentives periodic uploads according to one format – harvesting in the future direct and two-way input modules for stories and collection descriptions (xml invisible)
Process What was our approach? I will discuss 3 different parts of the workflow: Standards Descriptive metadata Conversion and mapping
Standards Collections The project does not impose standards, but tries to convince the partners by illustrating its benefits. However, with one exception: Before the start of the project it was already determined that the EAD standard would be used to describe both museum-, archival- and library collections. Because not that many people are used to working with EAD.xml, we developed a template in Word, based on the documentation that was already in use in Leiden. Training was organized for the partners, to make them acquainted with the template and with the EAD standard. The DPC has made available a specially designed input module, which simplifies the process. This is a tool, which can transform the delivered content (i.e. the Word document) into EAD xml. Institutions are supplied with a login to make use of the input tool through the Internet. Universities already working with EAD –like Leiden and Amsterdam - can supply the DPC with their own generated xml and do not have to make us of the input module. The museums in the project are content with the EAD format, and have already described their collections this way. Some even made much more descriptions than were needed for the project, which caused a dilemma for us. What is more important, to stick to the clear-cut project deliverables, or to reward the effort, with the risk of delay? We have chosen to take the risk :)
In the portal we will work with identifiers. For the collections we use the identifiers as delivered by the partner institution combined with the ISIL code that is assigned to a cultural heritage institution by the Dutch National Archive. This way the collections are ready for further exchange. For the items we make use of the URN standard. We make use of the name space for the URN scheme that is applied by the Dutch Surf Federation.
Items The standard for the items had to be chosen within the project. This standard had to fit a large variety of objects, both two- and three-dimensional, art works, archival material and letters, scientific equipment and taxidermy animals. After analysis of the metadata records, the best suited content standard for our purposes turned out to be Cataloguing Cultural Objects (CCO) by the Visual Resource Association, within the structure standard CDWA Lite.
Whereby: Structure Standard: defines which fields you have to use Content standard: defines how you have to fill these fields All metadata of partner institutions is converted or mapped to this format, the native DPC format for the portal. We however had to add a language field to this format, since language is not used as a distinguishing criterion in the museum world, whereas it is an essential prerequisite in the library and archive domains.
Descriptive metadata At the start of the project, we had made a questionnaire for all the partners, to find out: Which database system was used? How the images were stored What were their publication possibilities, like an online database? Where the metadata were stored and in which format How the relationship between images and metadata was established What kind of identifiers were used? And very important, which standards were used And the partners were also asked to send a representative set of records. The outcome was quite diffuse and we realised that we had to visit the partner institutions to have a look at their databases in situ. This way, we got a much better view of the actual work in the institutions. Not just which database system was used, but also how it was used. The main problem we encountered was that through time various people had worked on the database, without sufficient knowledge of databases and/or a good manual. Manual improvement of the database was started several times, but due to lack of manpower, it was never finished. After this, we wrote the conversion specs, and the test-version of the website was developed. Our goal was to make one conversion for every partner, but given the variety in objects and metadata, this was not possible. So in the end we made one conversion for every collection. Interesting to notice, was that only when the participating institutions were asked to test their records, they realised what the consequences were for the state of their metadata. The abstract project suddenly had become a real product. Every partner had to ask himself: ‘this is the way our metadata will be published on the website and are we happy with this result’?
Most partners didn’t have any experience with the use of controlled vocabulary. As a result, the testwebsite generated very confusing and unusable keyword lists. Here you see a list of all drawings in the CCO category WorkType. How to solve this? Raising awareness for the importance of a controlled vocabulary on collection level, a training was organised in the use of the controlled vocabularies AAT and NBC AAT: art& Architecture Thesaurus of the Getty Institute NBC: a Dutch standard for defining disciplines With a metadata clean-up
Conversion & Mapping With the conversion of the metadata we could partly normalize and enrich them within the project, but this of course doesn’t mean that the original data in the source database are cleaned-up. It would be a hopeless task to curate the data manually.
Moreover, there are more efficient methods for semi-automatic editing of datasets. With applications like Google Refine and Europeana SIP Creator datasets can be curated and enriched much faster and cheaper. However, this will be our advice, but it is not a deliverable of the project, so we do not know yet what is going to happen. One possibility is that the SAE will take it upon itself to organise this data curation. Ideally, everything that could not be curated semi-automatically will be resolved within the final mapping (only one!). But of course we would prefer to limit this as much as possible. There will be different definite mappings: one to CDWA Lite for each partner institution and one general for the CDWA Lite aggregation to Dublin core for harvesting by Europeana.
Concluding: pitfalls and lessons-learned These pitfalls and lessons learned are taken from our actual practice, but when we performed a search on the internet we discovered they are much more universal than we at first thought (they probably look very familiar to our moderator Anra Kennedy as well..). 1. Make sure the success of the project is important for all partners More specific try to find (temporary) advantages of the creation of the portal for each of its contributors. In our project the museums directly benefit from the portal since they do not yet have facilities to publish selections from their online databases. All partners profit from the knowledge gained and from the lifting of sector barriers. Don’t w e all want to present our collections in context: the university library who owns the books of a famous scholar, the archive who owns his written legacy and the museum who owns his scientific equipment. Digitally we can present our cultural heritage in its authentic coherence. In these times of economic hardship this might turn out to be essential. 2. No interface or data set is perfect. Don’t wait getting your metadata online until they are perfect, because this will not happen 3. Don’t operate over the heads of the partner institutions, work but together Even though this might be time consuming because of different levels of knowledge. We have not always kept ourselves to this rule, and the risk is that you might loose partners, because they don’t feel the project is about them anymore. 4. Take away the barriers for participation: “Everything should be made as simple as possible, but not simpler” (Einstein) 5. Take a personal approach: ‘Humanise’ your language and interface, give custom advices and mappings 6. Form is good, content is better During the project, it was very tempting to focus on the glossy and tangible website. Imaginable, because it is a real visible product. But the sustainability of the project results lie not in the form, but in the content: the technical infrastructure, the standards, and most important, in increasing the knowledge of the partners
Although the last two years we have been working on a development project for which the actual building of a portal was not the main goal, we DO get a beautiful website! For this we have to thank ABC media, who was responsible for the design, and who by the way is present at DISH as well. A definite version of the website will be available for the public in a couple of months and we all invite you to come and have a look at: www.academischecollecties.nl Thank you!
www.academischecollecties.nl portal to Dutch academic heritage Saskia van Bergen & Henriette Reerink in commission of the Dutch Heritage Foundation (SAE) [email_address] [email_address] ‘ Everything should be made as simple as possible, but not simpler’ Einstein
Liber 2011 COLLECTION Hugo de Vries The collection consists of 2 Sammelbände with in total 48 offprints which can be identified as from the personal library of De Vries. One volume has an index written in his hand.
DISH 2011 Standards:items - CCO and CDWA Lite ✚
DISH 2011 Content standard: CCO <cdwalite:displayMaterialsTech> pen and sepia ink on laid paper; watermark: star in circle with cross (Briquet 6088) </cdwalite:displayMaterialsTech> Data structure standard: CDWA Lite Standards:items - CCO and CDWA Lite
DISH 2011 ‘ This is the way our metadata will be published on the website. Are we happy with this result?’ Standards: items- descriptive metadata Testsite