Digital Archiving at the Meertens Institute


Published on

The Meertens Institute, part of the Royal Netherlands Academy of Arts and Sciences, is also a memory institution, where records are digitally preserved and curated. This talk will give an overview of the different types of records currently digitally curated at the Meertens Institute. We highlight our recent projects, such as the Sailing Letters project, where we use crowd sourcing to transcribe centuries-old handwritten letters, or the Radical Political Representation project, where we crowd source the analysis of political cartoons. These are all exemplary Digital Humanities cases, and we show our approach to the digital archiving of these materials, from creation to (re-)use.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • First, let me introduce the Meertens Institute.
  • The Meertens Institute is very active in the e-Humanities, in particular for providing the research infrastructure to do Humanities research. This is not suprrising, because almost all of us have a background in the Humanities, including the developers. The technical development department is lead by MKS, and consists of 9 people.
  • I should introduce myself. I am Junte. My background is information science. My skills mostly come from language technologies. I am specialized in digital libraries and information retrieval.. A year ago, I obtained my doctorate on this subject with my PhD thesis “System Evaluation of Archival Description and Access”. At the Meertens Institute, I still work on digital libraries (which includes digital archives), and primarily work on search technology.
  • So what will this talk be about? I will present an overview of the digital archiving of records at the Meertens Institute bylooking at interesting (in my opinion) projects here, some of them still ongoing.By looking at these projects, I try to make the vision clear of our team on digital archiving. For archivists, the question immediately is: what is the record here? This is the smallest logical piece of information. And it depends on whether it is analogue or digital. If it is in paper form, it can be a letter. If it is digital, it can be a file, and this is a very fluid definition, because a part of a file can become a file itself. In my opinion, this only opens up interesting cases for digital archiving. I will also highlight access and use and reuse in this context.
  • When you search for digital archiving on Wikipedia, you get re-directed to “Document or record management system” or . Here, a document or record is used in a very generic sense. A document or record can be a book, a page in a book, or a paragraph, or can be another document genre, such as a video clip or an MP3 file. The essence is that records and their contexts are preserved for long-term use. So digital archiving is about appraisal, arranging and describing digital assets. This means that you do not preserve everything.With archiving, the selection of what to archive is also essential.
  • So digital archiving can be described as document or record management system. Here, we should see system not as an application, but a model. The NASA in the US has developed the Open Archival Information System. (explain) This is a formal and generic model, and much of the digital archiving is based on this model.
  • This model has been used by digital archiving software, such as … There are also different types of archives. We have the traditional archives (explain about governments, corporations, persons), audio-visual archives , personal the Meertens we have them all, in one form or the other. And we have developed our own digital archiving approach, also based on the OAIS model, with our own tools. I should stress that at the Meertens, we only digitally archive unique records, and on demand.
  • To illustrate our vision on digital archiving, this model of the DCC is particularly helpful. Digital archiving is also digital curation. The lifecycle model shows that it is never really ending.
  • We developed an approach to digital archiving.
  • We can also use real volunteers. Then we speak of crowdsourcing.
  • Part of using and reusing data is authorization and authentication. At the Meertens Institute, we use Surfnet for our online digital archives. In order to gain access to the digital archives, different technologies are used. For the individual projects, there are different search engines. Such as the MIMORE search engine (developed by collegue Jan Pieter Kunst) that provides access to 3 different datasets on dialects. Another example is the Dutch Song Database that searches across different sources of data. The common denominator of these search engines is the technology used: The data stored in the MySQL is exported using SQL with PHP scripts. But can we combine the different data – and encourage reuse -- using a single search engine? What advanced search technologies can we use? Can we combine the different existing search engines? Some questions that I am trying to address.
  • We have transformed the data. We are reusing it. Before that it is possible, we again do a ingest and store archiving action. In practise, this means we harvest the federated metadata and index it.
  • Finally, the reuse of the data. This is done with this search engine, which we have baptized as the CMDI MI Search Engine.
  • Digital Archiving at the Meertens Institute

    1. 1. Digital Archivingat the Meertens Institute Martine de Bruin, Jan Pieter Kunst, Maarten vander Peet, Marc Kemps-Snijders, Douwe Zeldenrust, Rob Zeeman, Junte Zhang Meertens Institute, Royal Netherlands Academy of Arts and Sciences
    2. 2. The Meertens Institute• An institute of the Royal Netherlands Academy of Arts and Sciences (KNAW)• Studies Dutch language and culture – Variation in language – Ethnography• Increasingly more active in the e- Humanities, providing the infrastructure.
    3. 3. Technical Development Department of the Meertens Institute• Lead by Marc Kemps-Snijders• Web developers• User interface expert• Java developers• Technical application manager• …and someone who does search engines
    4. 4. About me• Name is Junte Zhang• Information scientist.• Specialized in digital libraries and IR• PhD on thesis “System Evaluation of Archival Description and Access”• …and someone who works on search engines at the Meertens Institute.
    5. 5. Talk is about…• Overview of different approaches used for digital archiving of records at the Meertens Institute – Interesting (finished) projects! – Our vision on digital archiving• Highlighting access and (re-)use.
    6. 6. Digital Archiving
    7. 7. Archival Information Systems (1)
    8. 8. Archival Information Systems (2)• Archon or ICA-Atom or Adlib or Fedora or Dspace or Alfresco or ….?• Traditional archives, audio-visual archives, personal archives...? – At the Meertens we have them all
    9. 9. Digital Curation
    10. 10. Data (1)• Audio-visual records – Audio archive of the Meertens: • (only accessible at Meertens) – Photo archives on singers of Dutch songs or Pilgrimages •• Textual records – The Dutch Songdatabase is a website with descriptions of more than 150.000 dutch songs. – Transcriptions of Spoken Dutch consisting (~1.4 million sentences).• Many more unique records!
    11. 11. Digital Archiving Actions• Difference between digital archiving of analogue records and digital born records – Designing Data (*) – Creating Data – Deciding What Data to Keep (*) – Ingesting Data – Storing Data – Using and Reusing Data
    12. 12. CLARIN Metadata Infrastructure
    13. 13. Designing Data• Done by researchers adhoc (can be research itself)• Using existing markup standards• Considerable use of Filemaker• Custom editors to provide tailored support
    14. 14. Creating Data• Data at the Meertens Institute is research data and stored memory• Paper materials (analogue archives) to digital records ( = digitization) – Conversion – Surrogates• Born digital records (digital archives) – Mostly analogue archives at the Meertens Institute but with digital archiving
    15. 15. Alfalabs GEOreferencer (1)
    16. 16. Alfalabs GEOreferencer (2)
    17. 17. Alfalabs GEOreferencer (3)
    18. 18. Pilgrimages Online project
    19. 19. Example: Speelmuziek
    20. 20. Example: the Sailing Letters
    21. 21. The Sailing Letters (2)
    22. 22. The Sailing Letters (3)
    23. 23. The Sailing Letters (4)
    24. 24. The Sailing Letters (5)
    25. 25. CLARIN Pilgrimages Narratives (1)
    26. 26. CLARIN Pilgrimages Narratives (2)
    27. 27. Example: Radical Political Representation• A joint project of the NIOD institute (historical research) and Meertens Institute (technology)• Develop a framework to describe cartoons systematically  Enhance understanding of crowdsourcing• Gain insight into war-time propaganda and the development of political culture using political cartoons  Make political cartoons accessible and explorable• URL:
    28. 28. Appraise and Select• Coordinator research collections and management (gatekeeper, editor?)• Pragmatic appraisal: acquisition follows the research needs
    29. 29. Ingest Data• Content checks before ingest• Unique identifiers by SARA
    30. 30. Store Data• Managed by IT staff at the Computerization and Automation (Informatisering & Automatisering) department of the KNAW• Intention: long term archiving to specialized archival repositories like DANS and the TLA
    31. 31. Using and Reusing Data (1)• Authorization and authentication – Surfnet• Different roads to Rome – Different technologies• Individual search engines of projects – MIMORE: – NLB: – …• Transform data for Unified search engine: – CMDI MI Search Engine
    32. 32. Transforming and Reusing Data (2)• Using and building CLARIN (Common Language Resources and Technology Infrastructure) in Netherlands• Diverse collections diversely described – (Transformed to) Federated metadata – Different views• Metadata is the pivot in CLARIN – Each resource always has metadata (context)• Semantic, serendipity and focused access
    33. 33. Using and Reusing Data (3)• Connecting to content search engines, not only access to metadata, whenever possible• Integration of more collections of Meertens, Netherlands, and (hopefully) Europe• Continuous life cycle and digital archiving – Searching in virtual research environments – Open issues: e.g. Authorization
    34. 34. Re: Ingest and Store Data Component registry OAI-PMH ISOcat Schema database service MeertensCMDI-dump Indexing SOLR CLARIN-EU OAI-harvesterNederlab/CLARIN(envisaged) Meertens search CLARIN search
    35. 35. Open Issues• Dispose?• Version control?• Quality checks?• Formalizing and automating our approach to digital archiving?• …?
    36. 36. Conclusion• Presented an overview and our vision of Digital Archiving at the Meertens Institute• Highlighted using and reusing data
    37. 37. The end• Questions, remarks, criticism, etc welcome.